CN113435409A

CN113435409A - Training method and device of image recognition model, storage medium and electronic equipment

Info

Publication number: CN113435409A
Application number: CN202110835693.6A
Authority: CN
Inventors: 汪越宇
Original assignee: Beijing Horizon Information Technology Co Ltd
Current assignee: Beijing Horizon Information Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-09-24

Abstract

The embodiment of the disclosure discloses a training method and a device of an image recognition model, a computer readable storage medium and electronic equipment, wherein the method comprises the following steps: determining a sample image set and determining labeling category information respectively corresponding to images in the sample image set; determining a training sample image from the sample image set; determining prediction category information of a target object in a training sample image through a trained image recognition model; determining dynamic parameters of a current round of training an image recognition model as a first group of parameters; a second set of parameters in the image recognition model is adjusted based on the loss function having the first set of parameters, and the annotation class information and the prediction class information. The embodiment of the disclosure can realize dynamic adjustment of the loss function, is beneficial to enabling the model to distinguish samples which are difficult to learn and samples which are easy to learn during training, adjusts the convergence degree of the loss function, reduces the risk of overfitting in the training process, and improves the identification precision of the trained model.

Description

Training method and device of image recognition model, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training method and apparatus for an image recognition model, a computer-readable storage medium, and an electronic device.

Background

When an image recognition model is trained, the problem of unbalanced class of samples usually exists, and samples which are easy to learn and samples which are difficult to learn exist at the same time. The difficult-to-learn samples are samples in the feature space at the boundary edge between classes, such as sample images representing two similar objects. Two objects with high image similarity bring two similar prediction scores, and it is difficult to determine the output category by measuring the difference between the prediction distribution and the true distribution. Typically, the result with the higher confidence is selected as the final output. However, in this case, the confidence of the network model is ambiguous, and therefore, a model training method for a sample difficult to learn is proposed at present.

For example, the following schemes currently exist to improve the recognition capability of image recognition models:

according to the first scheme, samples which are easy to learn are removed and samples which are difficult to learn are reserved through difficult case mining and multiple iterations.

Scheme II: by the existing calculation method of the local loss (Focalloss) function, the proportion of the samples which are difficult to learn and easy to learn is dynamically adjusted when the loss value is calculated, so that the network can learn more samples which are difficult to learn. The local loss (Focalloss) function solves the class imbalance problem by suppressing the gradient created by the accumulation of a large number of negative samples, allowing the network to learn more positive samples.

Disclosure of Invention

The embodiment of the disclosure provides a training method and device for an image recognition model, a computer-readable storage medium and electronic equipment.

The embodiment of the disclosure provides a training method of an image recognition model, which comprises the following steps: determining a sample image set and determining labeling category information respectively corresponding to images in the sample image set, wherein the labeling category information is used for representing the category of a target object in the sample image; determining a training sample image from the sample image set; determining prediction category information of a target object in a training sample image through a trained image recognition model; determining dynamic parameters of the current round of the training image recognition model, wherein the dynamic parameters are a first group of parameters of a preset loss function; a second set of parameters in the image recognition model is adjusted based on the loss function having the first set of parameters, and the annotation class information and the prediction class information.

According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for training an image recognition model, the apparatus including: the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a sample image set and determining labeling category information corresponding to images in the sample image set respectively, and the labeling category information is used for representing the category of a target object in the sample image; a second determining module, configured to determine a training sample image from the sample image set; the third determination module is used for determining the prediction category information of the target object in the training sample image through the trained image recognition model; the fourth determining module is used for determining dynamic parameters of the current turn of the training image recognition model, and the dynamic parameters are a first group of parameters of a preset loss function; an adjustment module for adjusting a second set of parameters in the image recognition model based on the loss function having the first set of parameters, and the annotation class information and the prediction class information.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the training method of the image recognition model described above.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize the training method of the image recognition model.

Based on the training method, the training device, the computer-readable storage medium and the electronic device for the image recognition model provided by the embodiments of the present disclosure, dynamic parameters are introduced when the image recognition model is trained, and along with the change of the training rounds, the convergence degree of the loss function also changes under the action of the dynamic parameters, so that dynamic adjustment of the loss function, that is, dynamic adjustment of the learning state of the model is realized, which is helpful for the model to distinguish an intractable sample and an easy-to-learn sample during training, dynamically adjust the convergence degree of the loss function according to the intractable sample and the easy-to-learn sample, reduce the risk of overfitting during training, and improve the recognition accuracy of the trained model.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a system diagram to which the present disclosure is applicable.

Fig. 2 is a flowchart illustrating a training method of an image recognition model according to an exemplary embodiment of the present disclosure.

Fig. 3 is a flowchart illustrating a training method of an image recognition model according to another exemplary embodiment of the present disclosure.

Fig. 4 is a flowchart illustrating a training method of an image recognition model according to another exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating a training method of an image recognition model according to still another exemplary embodiment of the present disclosure.

Fig. 6 is an exemplary diagram of a curve of a DL loss function of a training method of an image recognition model of an embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of a training apparatus for an image recognition model according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of a training apparatus for an image recognition model according to another exemplary embodiment of the present disclosure.

Fig. 9 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

The first solution described in the above background art takes a long time, has many iterations, and is prone to overfitting.

In the training process, the contribution of the negative sample to the calculation of the loss function is inhibited, and the contribution of the sample easy to learn and the sample difficult to learn to the calculation of the loss function is inhibited, so that the identification precision of the trained model is to be improved.

Exemplary System

Fig. 1 illustrates an exemplary system architecture 100 to which a training method of an image recognition model or a training apparatus of an image recognition model of an embodiment of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an image processing application, a video playing application, a search application, a web browser application, an instant messaging tool, etc., may be installed on the terminal device 101.

The terminal device 101 may be various electronic devices including, but not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc.

The server 103 may be a server that provides various services, such as a background model training server that performs model training using sample images uploaded by the terminal device 101. The background model training server can train to obtain the image recognition model by utilizing the received sample image. The background model training server may feed back the image recognition model to the terminal device 101, and the terminal device 101 recognizes the image by using the trained image recognition model. The server 103 may also receive the image transmitted by the terminal device 103 and recognize the received image by using the trained image recognition model.

It should be noted that the training method of the image recognition model provided in the embodiment of the present disclosure may be executed by the server 103 or the terminal device 101, and accordingly, the training device of the image recognition model may be disposed in the server 103 or the terminal device 101.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the sample image set does not need to be obtained from a remote location, the system architecture described above may not include a network, including only a server or terminal device.

Exemplary method

Fig. 2 is a flowchart illustrating a training method of an image recognition model according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device (such as the terminal device 101 or the server 103 shown in fig. 1), and as shown in fig. 2, the method includes the following steps:

step 201, determining a sample image set, and determining labeling category information respectively corresponding to images in the sample image set.

In this embodiment, the electronic device may determine annotation category information corresponding to the sample image set and images in the sample image set respectively. And the labeling category information is used for representing the category of the target object in the sample image.

The sample images in the sample image set may be images obtained based on various means. For example, the image may be an image captured of a target object or an image synthesized by an electronic device. The sample image set may also be preset, and the electronic device may obtain the sample image set locally or remotely. The target object may be various objects, for example, a human face, a vehicle, an animal, various articles, etc., and accordingly, the category of the target object may be an identification representing a person, a category of a vehicle, a category of an animal, a category of an article, etc.

The image recognition model in this embodiment performs different recognition tasks according to the type of the target object. For example, when the target object is a human face, the image recognition model may be a face recognition model, performing a face recognition task. When the target object is a vehicle, the image recognition model may perform a vehicle recognition task.

The label type information may be represented by a numerical value, for example, a numerical value 1, and indicates that the probability that the target object belongs to the type corresponding to the numerical value word 1 is 1.

In step 202, a training sample image is determined from a sample image set.

In this embodiment, the electronic device may determine the training sample image from the sample image set in various ways. E.g., randomly selected, selected in numbered order, etc.

Step 203, determining the prediction category information of the target object in the training sample image through the trained image recognition model.

In this embodiment, the electronic device may determine the prediction category information of the target object in the training sample image through the trained image recognition model. Specifically, the training sample image may be input into the trained image recognition model, the image recognition model determines feature data of the input training sample image, and then the feature data is used to determine the position of the target object in the training sample image and classify the target object, so as to obtain the prediction category information of the target object.

The prediction type information may be represented numerically, and may be, for example, a probability that the target object obtained by predicting the target object belongs to the type corresponding to the label type information.

Step 204, determining the dynamic parameters of the current round of training the image recognition model.

In this embodiment, the electronic device may determine dynamic parameters for a current round of training the image recognition model. Wherein the dynamic parameters are a first set of parameters of a predetermined loss function. The dynamic parameter may be a parameter related to an iteration turn in training, for example, a turn number of a current turn of training the image recognition model, a change step of the iteration turn number, and the like. And during training, substituting the dynamic parameters into the loss function, and when the iterative training of the model is carried out based on the loss function, the convergence condition of the loss value of the loss function is related to the dynamic parameters.

Step 205, adjusting a second set of parameters in the image recognition model based on the loss function having the first set of parameters, and the annotation class information and the prediction class information.

In this embodiment, the electronic device may adjust a second set of parameters in the image recognition model based on the loss function having the first set of parameters, and the annotation class information and the prediction class information. The first group of parameters, namely the dynamic parameters included in the loss function, and the second group of parameters are parameters adjusted in the training process of the model, and the second group of parameters do not change any more after the training is finished. In general, the second set of parameters may be weights in the model multiplied by the feature data.

The training process of the image recognition model is an optimal solution solving process, wherein the optimal solution is given in a data labeling mode, namely labeling category information corresponding to the training sample image. The process of fitting the model to the optimal solution is mainly performed iteratively by an error minimization method. For an input training sample image, a loss function is set, which includes the first set of parameters, and calculates the difference between the actual output and the expected output of the model (in this embodiment, the difference between the prediction class information and the labeled class information), and conducts the difference to the connection between each neuron in the neural network through a back propagation algorithm, and the difference signal conducted to each connection represents the contribution rate of the connection to the overall error. And then, the original parameters of the model (i.e., the second set of parameters in the model in this embodiment) are updated and modified by using a gradient descent algorithm.

And if the training ending condition is met in the training process of the model (for example, the number of iteration rounds reaches the preset number), stopping training to obtain the trained image recognition model.

The trained image recognition model can be used for recognizing a target object in an actually shot image, and the shot image is input into the image recognition model, so that prediction type information representing the target object in the image can be obtained.

According to the method provided by the embodiment of the disclosure, dynamic parameters are introduced when the image recognition model is trained, and along with the change of the training round, the convergence degree of the loss function is changed under the action of the dynamic parameters, so that the dynamic adjustment of the loss function is realized, that is, the learning state of the model is dynamically adjusted, the model is helpful for distinguishing the samples which are difficult to learn and the samples which are easy to learn during training, the convergence degree of the loss function is dynamically adjusted according to the samples which are difficult to learn and the samples which are easy to learn, the risk of overfitting in the training process is reduced, and the recognition accuracy of the trained model is improved.

In some alternative implementations, step 205 may be performed as follows:

first, the type of the current training sample image is determined.

The type of the sample image indicates that the training sample image is a sample difficult to learn or a sample easy to learn. The type of sample image may be determined according to the prediction category information. For example, the prediction type information is a probability p indicating a type to which a target object in a training sample image belongs, the prediction type probability threshold is th, if p-th <0, the training sample image is a sample that is difficult to learn, and if p-th >0, the training sample image is a sample that is easy to learn.

Then, a second set of parameters in the image recognition model is adjusted based on the type, the first set of parameters, the annotation class information, and the prediction class information, and using a loss function.

Specifically, information indicating the type of the sample image (e.g., the above-described p-th) may be used as a parameter of the above-described loss function, so that the difficulty level with which the training sample image is used for learning can be reflected in the loss function. In general, the relationship between the loss value of the loss function and the information representing the type of sample image may be set as: the more difficult to learn samples, the larger the loss value, i.e. the more punishment of the loss function, and the more easy to learn samples, the smaller the loss value, i.e. the less punishment of the loss function, so as to prevent the loss generated by accumulation of a large number of easy-to-learn samples from interfering with the learning effect. For example, to implement the correspondence, a parameter-alpha (1+ p-th) may be set in the loss function, where alpha is a preset constant. The larger the p-th is, the easier the sample is to learn, the smaller the loss value is, the smaller the p-th is, the harder the sample is to learn, and the greater the loss value is.

The realization mode introduces the information representing the type of the training sample image into the loss function by determining the type of the current training sample image, thereby dynamically adjusting the convergence degree of the loss function according to the type of the training sample image in the training process and being beneficial to improving the identification precision of the trained image identification model.

In some optional implementations, during the training process, whether to continue the training may be determined according to at least one of the following three conditions:

determining the size relation between the current turn and a preset threshold value of a trained image recognition model; and determining whether to continue training the image recognition model or not based on the size relation.

As an example, if the current round is less than or equal to the preset threshold, the training sample image is continuously determined from the sample image set, and the training is continuously performed by using the training sample image. And if the current round is larger than the preset threshold value, finishing the training.

Determining a loss value obtained by calculating a loss function under the condition II; based on the loss value, it is determined whether to continue training the image recognition model.

As an example, if it is determined that the loss values converge, the training is ended, otherwise the training is continued.

Determining the iterative training time length of the training image recognition model in the training process; and determining whether to continue training the image recognition model or not based on the size relation between the duration and the preset duration.

As an example, if the training duration is less than or equal to the preset duration, the training sample image is continuously determined from the sample image set, and training is continuously performed by using the training sample image. And if the training time is longer than the preset time, ending the training.

It should be noted that, when at least one of the three conditions satisfies the condition for continuing training, the training sample image is continuously determined from the sample image set, and training is continued using the training sample image. As an example, in case one, the current round is less than or equal to the preset threshold, and the training is continued. If the current round in the first case is less than or equal to the preset threshold value and the loss value in the second case is not converged, continuing training; and if the current round in the first case is larger than the preset threshold or the loss value in the second case is converged, finishing the training.

The three conditions for determining whether to continue training provided by the implementation mode can monitor the training process more comprehensively and effectively, control the training process reasonably and improve the training efficiency.

In some alternative implementations, as shown in fig. 3, step 204 may include the following sub-steps:

step 2041, determine a preset prediction category probability threshold.

As an example, the prediction category probability threshold may be 0.5.

Step 2042, determining dynamic parameters of the current round of training the image recognition model based on the prediction class probability threshold.

In particular, the prediction class probability threshold may be used as a dynamic parameter of the above-described loss function. For example, the prediction type information is a probability p indicating a type to which a target object in the training sample image belongs, the prediction type probability threshold is th, and p-th is a parameter of the loss function. Therefore, when training is carried out each time, the input training sample image can be determined to be a sample difficult to learn or a sample easy to learn according to p-th, namely, loss values are calculated according to loss functions of different forms aiming at different training samples, the convergence degree of gradient reduction is adjusted, the convergence conditions of different degrees are determined according to different samples, parameters of the model are adjusted to different degrees, and the method is favorable for further improving the recognition accuracy of the trained image recognition model.

In some optional implementations, as shown in fig. 4, step 204 may further include the following sub-steps:

step 2043, determining the number of current rounds of training the image recognition model.

Usually, each time the training sample image is replaced, the number of rounds is increased by one.

Step 2044, determining the dynamic parameters of the current round of the training image recognition model based on the number of rounds.

In particular, the number of rounds may be used as a dynamic parameter of the above-mentioned loss function. In general, the number of rounds may be included as a parameter in the exponential term in the loss function. For example, an exponential term (0.1 gamma epoch) is set in the loss function, where gamma is a preset constant and epoch is the round number.

By adding the number of turns into the parameters of the loss function, the convergence degree can be reflected in the loss function in the training process, so that the loss function is easier to converge, and the model training efficiency is improved.

In some optional implementations, as shown in fig. 5, on the basis of the above embodiment corresponding to fig. 4, step 204 may further include the following sub-steps:

step 2045, determine the preset iteration round number change step size.

The round number change step is used to adjust the degree of influence of the round number in the loss function on the calculation of the loss value. As an example, a quotient of the round number epoch and the round number change step, which is a division of the round number and the round number change step by a whole, i.e., epoch/step, is set in the loss function. step may typically be set to 10, i.e. it affects the calculation of the loss function once every 10 increases of the epoch.

Step 2046, determining the dynamic parameters of the current round of the training image recognition model based on the iteration round number change step length.

Specifically, the round number change step size may be used as a dynamic parameter of the above-described loss function. In general, the quotient of the number of rounds and the change step size of the number of rounds may be taken as a parameter included in the exponential term in the loss function. For example, an exponential term (0.1 gamma epoch/step) is set in the loss function.

Through setting round number change step length in the parameter of loss function, can adjust the influence of the change of round number to the loss value, when carrying out gradient decline calculation to the loss function simultaneously, reduce corresponding exponential term, the value after the derivation is littleer to prevent the gradient explosion, make the convergence effect of model training better.

The various embodiments included in the method are illustrated below by an example of a loss function. The loss function employed in the embodiments of the present disclosure may be referred to as a dynamic loss function, i.e., dl (dynamic loss), as shown in the following formula:

DL＝-alpha*(1+predictions-threshold)^(0.1*gamma*epoch/step)*log(1-predictions)

where alpha and gamma are predetermined constants, such as alpha-0.25 and gamma-2. The predictions are prediction category information, i.e., the probability of a category to which the target object belongs. threshold is a preset prediction class probability threshold, typically 0.5. epoch is the number of rounds of the current round of training the image recognition model, and step is the number of iterations varying, typically 10. When training is carried out according to the formula, the convergence degree in the training process can be reflected in the loss function through the epoch parameters, so that the loss function is easier to converge, and the model training efficiency is improved. Through the threshold parameter, the currently input training sample image can be distinguished as a sample difficult to learn or a sample easy to learn, namely, the loss value is calculated according to loss functions of different forms aiming at different training samples, the convergence degree of gradient reduction is further adjusted, the convergence conditions of different degrees are determined according to different samples, and the parameters of the model are adjusted to different degrees. Through the step parameter, the gradient explosion in the training process can be prevented, and the network convergence is better.

As shown in fig. 6, it shows a loss value (loss) curve corresponding to the calculation of different round times (epochs) according to the DL loss function, wherein the abscissa in the graph represents the learning difficulty (difficulty of samples) of the sample, and the calculation formula is 1-prediction. As can be seen from FIG. 6, as the epoch increases, the low loss property of the easy-to-learn sample is maintained in the portion of the easy-to-learn sample (sensitivity < 0.3). In the part of the sample difficult to learn (difficuty >0.7), dynamic imposes more penalties, so that the network is more inclined to learn the part of the sample.

Exemplary devices

Fig. 7 is a schematic structural diagram of a training apparatus for an image recognition model according to an exemplary embodiment of the present disclosure. The embodiment can be applied to an electronic device, and as shown in fig. 7, the training apparatus for image recognition model includes: a first determining module 701, configured to determine a sample image set, and determine annotation category information corresponding to images in the sample image set, where the annotation category information is used to represent a category of a target object in the sample image; a second determining module 702, configured to determine a training sample image from the sample image set; a third determining module 703, configured to determine, through the trained image recognition model, prediction category information of the target object in the training sample image; a fourth determining module 704, configured to determine a dynamic parameter of the current round of the training image recognition model, where the dynamic parameter is a first group of parameters of a preset loss function; an adjusting module 705 for adjusting a second set of parameters in the image recognition model based on the loss function having the first set of parameters, and the annotation class information and the prediction class information.

In this embodiment, the first determining module 701 may determine the annotation category information corresponding to the sample image set and the images in the sample image set respectively. And the labeling category information is used for representing the category of the target object in the sample image.

The sample images in the sample image set may be images obtained based on various means. For example, the image may be an image captured of a target object, or an image synthesized by the first determining module 701. The sample image set may also be preset, and the first determining module 701 may obtain the sample image set locally or remotely. The target object may be various objects, for example, a human face, a vehicle, an animal, various articles, etc., and accordingly, the category of the target object may be an identification representing a person, a category of a vehicle, a category of an animal, a category of an article, etc.

In this embodiment, the second determination module 702 may determine the training sample image from the sample image set in various ways. E.g., randomly selected, selected in numbered order, etc.

In this embodiment, the third determining module 703 may determine the prediction category information of the target object in the training sample image through the trained image recognition model. Specifically, the training sample image may be input into the trained image recognition model, the image recognition model determines feature data of the input training sample image, and then the feature data is used to determine the position of the target object in the training sample image and classify the target object, so as to obtain the prediction category information of the target object.

In this embodiment, the fourth determination module 704 may determine dynamic parameters of a current round of training the image recognition model. Wherein the dynamic parameters are a first set of parameters of a predetermined loss function. The dynamic parameter may be a parameter related to an iteration turn in training, for example, a turn number of a current turn of training the image recognition model, a change step of the iteration turn number, and the like. And during training, substituting the dynamic parameters into the loss function, and when the iterative training of the model is carried out based on the loss function, the convergence condition of the loss value of the loss function is related to the dynamic parameters.

In this embodiment, the adjustment module 705 may adjust the second set of parameters in the image recognition model based on the loss function having the first set of parameters, and the annotation class information and the prediction class information. The first group of parameters, namely the dynamic parameters included in the loss function, and the second group of parameters are parameters adjusted in the training process of the model, and the second group of parameters do not change any more after the training is finished. In general, the second set of parameters may be weights in the model multiplied by the feature data.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a training apparatus for an image recognition model according to another exemplary embodiment of the present disclosure.

In some optional implementations, the adjusting module 705 may include: a first determining unit 7051, configured to determine a type of a current training sample image; an adjusting unit 7052 is configured to adjust a second set of parameters in the image recognition model by using the loss function based on the type, the first set of parameters, the annotation class information, and the prediction class information.

In some optional implementations, the apparatus may further include: a fifth determining module 706, configured to determine a size relationship between the current round and a preset threshold of the trained image recognition model; determining whether to continue training the image recognition model based on the size relationship; and/or, a sixth determining module 707, configured to determine a loss value calculated by the loss function; determining whether to continue training the image recognition model based on the loss value; and/or the seventh determining module 708 is configured to determine a duration of iterative training of the training image recognition model in the training process; and determining whether to continue training the image recognition model or not based on the size relation between the duration and the preset duration.

In some optional implementations, the fourth determining module 704 may include: a second determining unit 7041, configured to determine a preset prediction category probability threshold; a third determining unit 7042, configured to determine a dynamic parameter of the current round of training the image recognition model based on the prediction class probability threshold.

In some optional implementations, the fourth determining module 704 may include: a fourth determining unit 7043, configured to determine a number of turns of a current turn of training the image recognition model; a fifth determining unit 7044 is configured to determine, based on the number of rounds, a dynamic parameter of a current round of training the image recognition model.

In some optional implementations, the fourth determining module 704 may include: a sixth determining unit 7045, configured to determine a preset iteration round number change step length; a seventh determining unit 7046 is configured to determine a dynamic parameter of the current round of training the image recognition model based on the iteration round number change step size.

According to the training device for the image recognition model provided by the embodiment of the disclosure, dynamic parameters are introduced when the image recognition model is trained, and along with the change of the training rounds, the convergence degree of the loss function is changed under the action of the dynamic parameters, so that the dynamic adjustment of the loss function is realized, that is, the learning state of the model is dynamically adjusted, and the dynamic adjustment device is helpful for enabling the model to distinguish an intractable sample and an easy-to-learn sample during training, dynamically adjusting the convergence degree of the loss function according to the intractable sample and the easy-to-learn sample, reducing the risk of overfitting during training, and improving the recognition accuracy of the trained model.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 9. The electronic device may be either or both of the terminal device 101 and the server 103 as shown in fig. 1, or a stand-alone device separate from them, which may communicate with the terminal device 101 and the server 103 to receive the collected input signals therefrom.

FIG. 9 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 9, the electronic device 900 includes one or more processors 901 and memory 902.

The processor 901 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.

Memory 902 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 901 to implement the above training methods of the image recognition models of the various embodiments of the present disclosure and/or other desired functions. Various contents such as a sample image, prediction category information, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 900 may further include: an input device 903 and an output device 904, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the terminal device 101 or the server 103, the input device 903 may be a camera, a mouse, a keyboard, or the like, and is used for inputting information such as images and commands. When the electronic device is a stand-alone device, the input device 903 may be a communication network connector for receiving input images, commands, and the like from the terminal device 101 and the server 103.

The output device 904 may output various information including the trained image recognition model to the outside. The output devices 904 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 900 relevant to the present disclosure are shown in fig. 9, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 900 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in a method of training an image recognition model according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of training an image recognition model according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A training method of an image recognition model comprises the following steps:

determining a sample image set and determining annotation category information respectively corresponding to images in the sample image set, wherein the annotation category information is used for representing the category of a target object in the sample image;

determining a training sample image from the set of sample images;

determining prediction category information of a target object in the training sample image through a trained image recognition model;

determining dynamic parameters of a current round of training the image recognition model, wherein the dynamic parameters are a first group of parameters of a preset loss function;

adjusting a second set of parameters in the image recognition model based on the loss function with the first set of parameters, and the annotation class information and the prediction class information.

2. The method of claim 1, wherein said adjusting a second set of parameters in the image recognition model based on the loss function with the first set of parameters, and the annotation class information and the prediction class information comprises:

determining the type of a current training sample image;

adjusting a second set of parameters in the image recognition model based on the type, the first set of parameters, the annotation class information, and the prediction class information, and using the loss function.

3. The method of claim 1, wherein the method further comprises:

determining a magnitude relationship of the trained image recognition model between the current turn and a preset threshold; determining whether to continue training the image recognition model based on the magnitude relationship; and/or the presence of a gas in the gas,

determining a loss value calculated by the loss function; determining whether to continue training the image recognition model based on the loss value; and/or the presence of a gas in the gas,

determining the iterative training time length of the training image recognition model in the training process; and determining whether to continue training the image recognition model or not based on the size relation between the duration and a preset duration.

4. The method of claim 1, wherein the determining dynamic parameters for a current round of training the image recognition model comprises:

determining a preset prediction category probability threshold;

determining a dynamic parameter for a current round of training the image recognition model based on the prediction class probability threshold.

5. The method of any of claims 1-4, wherein the determining dynamic parameters for a current round of training the image recognition model comprises:

determining the number of times of the current time for training the image recognition model;

determining a dynamic parameter for a current round of training the image recognition model based on the round number.

6. The method of claim 5, wherein the determining dynamic parameters for a current round of training the image recognition model comprises:

determining a preset iteration wheel frequency change step length;

and determining dynamic parameters of the current turn for training the image recognition model based on the iteration turn number change step length.

7. An apparatus for training an image recognition model, comprising:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a sample image set and determining annotation category information corresponding to images in the sample image set respectively, and the annotation category information is used for representing the category of a target object in the sample image;

a second determining module for determining a training sample image from the sample image set;

a third determining module, configured to determine, through a trained image recognition model, prediction category information of a target object in the training sample image;

the fourth determining module is used for determining dynamic parameters of the current round of training the image recognition model, wherein the dynamic parameters are a first group of parameters of a preset loss function;

an adjustment module to adjust a second set of parameters in the image recognition model based on the loss function with the first set of parameters, and the annotation class information and the prediction class information.

8. The method of claim 7, wherein the adjustment module comprises:

the first determining unit is used for determining the type of the current training sample image;

an adjusting unit, configured to adjust a second set of parameters in the image recognition model based on the type, the first set of parameters, the annotation category information, and the prediction category information, and using the loss function.

9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-6.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-6.