CN112149741A

CN112149741A - Training method and device of image recognition model, electronic equipment and storage medium

Info

Publication number: CN112149741A
Application number: CN202011023804.5A
Authority: CN
Inventors: 崔程; 魏凯; 杨敏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2020-12-29
Anticipated expiration: 2040-09-25
Also published as: CN112149741B

Abstract

The application discloses a training method and device of an image recognition model, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: adjusting the resolution of an original training image of a plurality of pieces of training data to a first resolution to obtain a plurality of first training images; training a first image recognition model based on a plurality of first training images and a plurality of pieces of training data; adjusting the resolution of the original training images of the plurality of pieces of training data to a second resolution to obtain a plurality of second training images; the second resolution is less than the first resolution; and training the second image recognition model based on the plurality of second training images, the plurality of first training images and the first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model. The method and the device can effectively improve the recognition accuracy of the second image recognition model and can also effectively improve the recognition speed of the second image recognition model.

Description

Training method and device of image recognition model, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, computer vision and deep learning, and specifically relates to a training method and device of an image recognition model, electronic equipment and a storage medium.

Background

The image recognition technology can be widely applied to various visual tasks, such as plant classification, dish recognition, landmark recognition and other fields.

The existing image recognition technology mainly extracts the features of images by means of machine learning and distinguishes different images by the extracted features. For example, the whole process can realize the recognition of the image through an image recognition model trained based on a machine learning mode.

However, in the conventional image recognition field, how to improve the accuracy of the existing image recognition model is one of the most important points to be searched in academia and industry. Therefore, it is desirable to provide an image recognition model with high accuracy.

Disclosure of Invention

The application provides a training method and device of an image recognition model, electronic equipment and a storage medium.

According to an aspect of the present application, there is provided a training method of an image recognition model, wherein the method includes:

adjusting the resolution of an original training image of a plurality of pieces of training data to a first resolution to obtain a plurality of first training images;

training a first image recognition model based on the plurality of first training images and the plurality of pieces of training data;

adjusting the resolution of the original training images of the plurality of pieces of training data to a second resolution to obtain a plurality of second training images; the second resolution is less than the first resolution;

training a second image recognition model based on the second training images, the first training images and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model.

According to another aspect of the present application, there is provided an apparatus for training an image recognition model, wherein the apparatus includes:

the adjusting module is used for adjusting the resolution of the original training images of the plurality of pieces of training data to a first resolution to obtain a plurality of first training images;

a first training module for training a first image recognition model based on the plurality of first training images and the plurality of pieces of training data;

the adjusting module is further configured to adjust the resolution of the original training images of the plurality of pieces of training data to a second resolution to obtain a plurality of second training images; the second resolution is less than the first resolution;

and the second training module is used for training a second image recognition model based on the plurality of second training images, the plurality of first training images and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model.

According to still another aspect of the present application, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to the technology of the application, the second image recognition model is trained through the knowledge distillation technology, wherein the first image recognition model is obtained through training based on the image with the higher resolution, more image features can be obtained, and the first image recognition model can be well transferred to the second image recognition model through the knowledge distillation technology, so that the second image recognition model can learn the recognition capability of the first image recognition model only by adopting the image with the lower resolution, and the recognition accuracy of the second image recognition model can be effectively improved; meanwhile, the recognition speed of the second image recognition model can be effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is an architecture diagram of a training method for an image recognition model provided herein;

FIG. 4 is a schematic illustration according to a third embodiment of the present application;

FIG. 5 is a schematic illustration according to a fourth embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a training method of an image recognition model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 is a schematic diagram according to a first embodiment of the present application; as shown in fig. 1, this embodiment provides a training method for an image recognition model, which specifically includes the following steps:

s101, adjusting the resolution of an original training image of a plurality of pieces of training data to a first resolution to obtain a plurality of first training images;

s102, training a first image recognition model based on a plurality of first training images and a plurality of pieces of training data;

s103, adjusting the resolution of the original training images of the plurality of pieces of training data to a second resolution to obtain a plurality of second training images; wherein the second resolution is less than the first resolution;

s104, training a second image recognition model based on the second training images, the first training images and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model.

The executing subject of the training method of the image recognition model of this embodiment may be a training device of the image recognition model, and the training device of the image recognition model may be an electronic entity, such as a large-scale computer device, or may also be a software-based application, and trains the image recognition model when in use.

The training data set including a plurality of pieces of training data of this embodiment is used to train the image recognition model, each piece of training data includes an original training image, and each original training image has its own resolution.

When training the first image recognition model, in this embodiment, the resolution of each original training image of the plurality of pieces of training data needs to be adjusted to the first resolution, that is, the plurality of first training images are obtained correspondingly, and then the first image recognition model is trained by combining with other data corresponding to the original training image corresponding to each first training image in the training data.

When training the second image recognition model, firstly, adjusting the resolution of the original training images of a plurality of pieces of training data to a second resolution to obtain a plurality of second training images; moreover, in the present embodiment, the second resolution is smaller than the first resolution. And then training the second image recognition model based on the plurality of second training images, the plurality of first training images and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model. The training images used for training the second image recognition model are a plurality of second training images, and the training images used for training the first image recognition model are a plurality of first training images, which respectively correspond to the same original training images, but the resolution is smaller. Namely, the resolution of a plurality of first training images adopted for training the first image recognition model is higher, and the content of the recognized images is richer when the images are recognized. Although the resolution of the second training images used for training the second image recognition model is smaller, that is, the content in the recognized images is relatively not rich, in this embodiment, the second image recognition model may also be trained in combination with the first training images and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model. The second image recognition model trained in the mode can have the recognition capability of the first image recognition model for recognizing the image with the higher resolution even based on the image with the lower resolution, and further can effectively improve the recognition accuracy and the recognition precision of the image.

The first image recognition model of the present embodiment is similar to a teacher (teacher) model, and the second image recognition model is similar to a student (student) model. The training process of the above image recognition model of the present embodiment is similar to the process of knowledge distillation. That is, a first image recognition model, i.e. the teacher model, is first trained on the basis of a number of first training images of a first resolution and other data of the corresponding training data. Then, learning information based on a plurality of second training images with second resolution and a trained first image recognition model based on a plurality of first training images, training the second image recognition model, namely the student model together, and transferring (transfer) the features learned by the teacher model to the second image recognition model, namely the student model, so that the student model learns the image recognition capability of the teacher model, wherein the transfer process is a knowledge distillation process.

In the existing whole training or knowledge distillation process, the resolution of images adopted by the teacher model and the student model is always consistent. Namely, the teacher model is trained on a specific resolution to obtain a model, and the student model also uses the resolution in the transfer process, so that the problem that the upper limit of the teacher model cannot be effectively mined is caused. In the knowledge distillation process, the student model can use images with smaller resolution, more image features are obtained by the teacher model through image learning with larger resolution, the features can be transferred to the student model through the knowledge distillation technology, and in practical application, the network of the student model is a model obtained through final training, so that the model accuracy can be effectively improved, and the model recognition speed is improved.

Secondly, the parameter quantity of the teacher model selected by the existing distillation technology is generally larger than that of the student model, so that the whole distillation speed is relatively slow. The teacher model and the student model in the scheme provided in the embodiment may be the same model, that is, the teacher model may also be a relatively small model with a higher speed, and only the resolution of the input image is increased to obtain more image features. Therefore, by using the technical scheme of the embodiment, the speed of the whole knowledge distillation can be accelerated, and the speed of the student model obtained by training is improved. Of course, optionally, in this embodiment, the parameter quantity also applicable to the teacher model is larger than that of the student model.

According to the training method of the image recognition model, by adopting the technical scheme, the second image recognition model is trained through the knowledge distillation technology, wherein the first image recognition model is obtained by training based on the image with the higher resolution, more image features can be obtained, and the image recognition model can be well transferred to the second image recognition model through the knowledge distillation technology, so that the second image recognition model only adopts the image with the lower resolution, the recognition capability of the first image recognition model can be learned, and the recognition accuracy of the second image recognition model can be effectively improved; meanwhile, the recognition speed of the second image recognition model can be effectively improved.

FIG. 2 is a schematic diagram according to a second embodiment of the present application; the training method of the image recognition model of the present embodiment further introduces the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the training method of the image recognition model of this embodiment may specifically include the following steps:

s201, collecting a plurality of pieces of training data, wherein each piece of training data comprises an original training image and a corresponding label type;

in this embodiment, the collected pieces of training data are used to train the first image recognition model, and the training process is supervised training. Therefore, when each piece of training data is collected, the labeling category corresponding to each piece of training data needs to be labeled to identify the category to which the original training image in the piece of training data should belong when the original training image is subjected to image recognition.

Further optionally, the step may further include performing image preprocessing on the original training images in the pieces of training data, for example, preprocessing in a random crop, random inversion, and the like, which may enhance data and enrich data amount of the training data. After the image is preprocessed, the corresponding annotation category may not be changed.

S202, adjusting the resolution of the original training images of the plurality of pieces of training data to be a first resolution to obtain a plurality of first training images;

the step is to adjust (resize) the resolution of the original training image in each piece of training data acquired in step S201 to a first resolution, so as to obtain a plurality of first training images. Optionally, the first resolution may be a larger resolution, so that the first image recognition model can extract more features of the image during training.

S203, training a first image recognition model based on the plurality of first training images and the label category corresponding to each first training image in the plurality of pieces of training data;

specifically, during training, each first training image may be input into the first image recognition model, and the first image recognition model outputs the predicted image category features. The image class features are one-dimensional vectors of probability splicing of the first training image on each known image class identified by the first image identification model. For example, if there are 100 known image classes, where the probability of the first class is 0.05, the probability of the second class is 0.1, and the probability of the third class is 0.4, … …, the one-dimensional vector corresponding to the image class feature can be represented as (0.05, 0.1, 0.4, … ….). Then, the class with the highest probability is taken from the image class characteristics as the prediction class of the corresponding first training image, then the prediction class and the labeling class are compared to judge whether the two classes are consistent, and if not, the parameters of the first image recognition model are adjusted to enable the prediction class and the labeling class to be consistent. According to the mode, the first image recognition model is continuously trained by adopting a plurality of first training images and the label type corresponding to each first training image in the training data, so that the prediction type and the label type of each first training image are consistent all the time, the parameters of the first image recognition model are determined at the moment, the first image recognition model is further determined, and the training of the first image recognition model is finished. The first image recognition model trained in the mode can effectively ensure the recognition accuracy of the first image recognition model.

S204, adjusting the resolution of the original training images of the plurality of pieces of training data to a second resolution to obtain a plurality of second training images; wherein the second resolution is less than the first resolution;

s205, acquiring the target image category characteristics correspondingly recognized by the trained first image recognition model based on each first training image;

that is, after the first image recognition model is trained in the above manner, each first training image is input to the first image recognition model, and then the first image recognition model outputs the target category feature. The class with the highest probability in the target class features is necessarily equal to the labeling class to which the original training image corresponding to the first training image in the training data belongs.

S206, training the second image recognition model based on the second training images and the target image category characteristics corresponding to the first training images corresponding to the second training images, so that the second image recognition model learns the image recognition capability of the first image recognition model.

And inputting the target image category characteristics corresponding to the first training image corresponding to the second training image, namely the first training image with the same original training image as the second training image, into the trained first image recognition model to obtain the target category characteristics.

In this embodiment, when the second image recognition model is trained, the resolution of the image used is smaller than the resolution of the image used when the first image recognition model is trained. However, the training of the present embodiment must be performed such that the recognition accuracy of the second image recognition model cannot be reduced, that is, the image capability of the first image recognition model is learned. For example, in this embodiment, the second image recognition model may learn the features of the image recognized by the first image recognition model, and may obtain the same result as the result recognized by the first image recognition model. Conversely, when the second image recognition model learns the features of the object class recognized by the first image recognition model, it can be said that the features of the first image recognition model recognized image learned by the second image recognition model have the same image recognition capability as the first image recognition model. The second image recognition model trained in this way has the same image recognition capability as the first image recognition model, and can accurately recognize the type of the image based on the image with the smaller resolution, so that the trained second image recognition model has higher image recognition accuracy.

For example, the step S206 may specifically include the following steps:

(a) for each second training image, inputting the second training image into a second image recognition model, and acquiring predicted image category characteristics output by the second image recognition model;

(b) constructing a loss function based on the predicted image category characteristics and target image category characteristics corresponding to a first training image which has the same original training image as a second training image;

(c) judging whether the loss function is converged; if not, executing step (d); if yes, executing step (e);

(d) adjusting parameters of the second image identification model to enable the predicted image category characteristics to tend to be consistent with the target image category characteristics; returning to the step (a) and continuing training by adopting the next second training image;

in this step, the process of adjusting the parameters of the second image recognition model may be considered as an adjustment method based on back propagation, and the adjustment is performed in a direction toward convergence of the loss function, that is, the predicted image type features are adjusted so as to be consistent with the target image type features.

(e) Judging whether the loss function is converged all the time in the training of the continuous preset number of rounds, if so, determining a parameter of a second image recognition model, further determining the second image recognition model, and finishing the training; otherwise, returning to the step (a) to continue training by adopting the next second training image.

The above steps (a) - (d) are the case of adjusting parameters during the training process. Step (e) is training cutoff.

Similarly, referring to the above description of the embodiment shown in fig. 1, the first image recognition model of the embodiment may be a teacher model, and the second image recognition model may be a student model, and the whole training process may also be regarded as a knowledge distillation process. Because the image with the higher resolution contains more image basic features, the teacher model can learn more image basic features, and the basic features can not be effectively extracted from the image with the smaller size, so that after the teacher model learns the features, the features can be better transferred to the student model, and finally the student model can obtain higher recognition accuracy through training.

Fig. 3 is an architecture diagram of a training method of an image recognition model provided in the present application. Specifically, the training method of the image recognition model shown in fig. 2 may be adopted, and the training of the image recognition model is realized in the framework shown in fig. 3. Wherein the teacher model corresponds to the first image recognition model, and the student model corresponds to the second image recognition model.

As shown in fig. 3, in the first row, corresponding to steps S201 to S203, for convenience of description, only the training image portion in the training data is shown in fig. 3, that is, before the training of the first image recognition model, that is, the teacher model, a plurality of pieces of training data are collected, each piece of training data includes an original training image and a corresponding label type, and then the resolutions of the original training images in the training data are all reduced to a first resolution, so as to obtain the first training image. Then, training the teacher model based on the first training image, and referring to the specific implementation process of step S203 in detail.

As shown in fig. 3, in the second line to the third line, the feature transfer learned from the teacher model is implemented into the student model. The teacher model adopted by the second row is the well-trained teacher model of the first row. Similarly, step S204 may be adopted to obtain a second training image of the third row. By adopting the step S205, the target image category characteristics that should be recognized by the trained teacher model can be obtained based on each first training image. And (f) continuously training the student model by specifically adopting the modes of the steps (a) - (d) until the training cut-off condition of the step (f) is reached, and finishing training to obtain the trained student model.

According to the training method of the image recognition model, by adopting the technical scheme, the first image recognition model is trained on the basis of the image with the larger resolution, more image features can be obtained, and the image features can be well transferred to the second image recognition model through the knowledge distillation technology, so that the second image recognition model can learn the recognition capability of the first image recognition model only by adopting the image with the smaller resolution, and the recognition accuracy of the second image recognition model can be effectively improved; meanwhile, the recognition speed of the second image recognition model can be effectively improved.

FIG. 4 is a schematic illustration according to a third embodiment of the present application; as shown in fig. 4, the present embodiment provides an apparatus 400 for training an image recognition model, including:

an adjusting module 401, configured to adjust a resolution of an original training image of a plurality of pieces of training data to a first resolution, to obtain a plurality of first training images;

a first training module 402, configured to train a first image recognition model based on a plurality of first training images and a plurality of pieces of training data;

the adjusting module 403 is further configured to adjust the resolution of the original training images of the plurality of pieces of training data to a second resolution, so as to obtain a plurality of second training images; the second resolution is less than the first resolution;

a second training module 404, configured to train the second image recognition model based on the second training images, the first training images, and the trained first image recognition model, so that the second image recognition model learns the image recognition capability of the first image recognition model.

The implementation principle and technical effect of the training of the image recognition model implemented by the above modules of the training apparatus 400 of the image recognition model of this embodiment are the same as the implementation of the related method embodiment, and reference may be made to the description of the related method embodiment in detail, which is not repeated herein.

FIG. 5 is a schematic illustration according to a fourth embodiment of the present application; as shown in fig. 5, the training apparatus 400 for an image recognition model according to the present embodiment further describes the technical solution of the present application in more detail based on the technical solution of the embodiment shown in fig. 4.

In the training apparatus 400 for an image recognition model according to this embodiment, the first training module 402 is specifically configured to:

and training the first image recognition model based on the plurality of first training images and the labeling categories corresponding to the first training images in the plurality of pieces of training data.

Further optionally, as shown in fig. 5, in the training apparatus 400 for an image recognition model of the present embodiment, the second training module 404 includes:

an obtaining unit 4041, configured to obtain, based on each first training image, a target image category feature that is identified by the trained first image identification model;

the training unit 4042 is configured to train the second image recognition model based on the target image category features corresponding to the second training images and the first training images corresponding to the second training images, so that the second image recognition model learns the image recognition capability of the first image recognition model.

Further optionally, the training unit 4042 is configured to:

for each second training image, inputting the second training image into a second image recognition model, and acquiring predicted image category characteristics output by the second image recognition model;

constructing a loss function based on the predicted image category characteristics and target image category characteristics corresponding to a first training image which has the same original training image as a second training image;

judging whether the loss function is converged;

if the image classification features do not converge, the parameters of the second image identification model are adjusted, so that the predicted image classification features tend to be consistent with the target image classification features.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device implementing a training method for an image recognition model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for training an image recognition model provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of the image recognition model provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the relevant modules shown in fig. 4 and 5) corresponding to the training method of the image recognition model in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, namely, implements the training method of the image recognition model in the above method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of an electronic device that implements a training method of an image recognition model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to an electronic device implementing the training method of the image recognition model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the training method of the image recognition model may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the training method of the image recognition model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the second image recognition model is trained through the knowledge distillation technology, wherein the first image recognition model is obtained through training based on the image with the higher resolution, more image features can be obtained, and the knowledge distillation technology can be well transferred to the second image recognition model, so that the second image recognition model can learn the recognition capability of the first image recognition model only by using the image with the lower resolution, and the recognition accuracy of the second image recognition model can be effectively improved; meanwhile, the recognition speed of the second image recognition model can be effectively improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of training an image recognition model, wherein the method comprises:

2. The method of claim 1, wherein training a first image recognition model based on the number of first training images and the number of pieces of training data comprises:

and training a first image recognition model based on the plurality of first training images and the labeling category corresponding to each first training image in the plurality of pieces of training data.

3. The method according to claim 1 or 2, wherein training a second image recognition model based on the number of second training images, the number of first training images and the trained first image recognition model such that the second image recognition model learns the image recognition capabilities of the first image recognition model comprises:

acquiring the target image category characteristics correspondingly recognized by the trained first image recognition model based on each first training image;

and training the second image recognition model based on the second training images and the target image category characteristics corresponding to the first training images corresponding to the second training images, so that the second image recognition model learns the image recognition capability of the first image recognition model.

4. The method of claim 3, wherein training the second image recognition model based on the target image category features for each of the second training images and the first training image for each of the second training images such that the second image recognition model learns the image recognition capabilities of the first image recognition model comprises:

for each second training image, inputting the second training image into the second image recognition model, and acquiring predicted image category characteristics output by the second image recognition model;

constructing a loss function based on the predicted image class features and the target image class features corresponding to the first training image of the original training image which has the same value as the second training image;

judging whether the loss function is converged;

and if the image classification features do not converge, adjusting the parameters of the second image identification model to enable the predicted image classification features to tend to be consistent with the target image classification features.

5. An apparatus for training an image recognition model, wherein the apparatus comprises:

6. The apparatus of claim 5, wherein the first training module is to:

7. The apparatus of claim 5 or 6, wherein the second training module comprises:

the acquisition unit is used for acquiring the target image category characteristics which are correspondingly recognized by the trained first image recognition model based on each first training image;

a training unit, configured to train the second image recognition model based on the second training images and the target image category features corresponding to the first training images corresponding to the second training images, so that the second image recognition model learns the image recognition capability of the first image recognition model.

8. The apparatus of claim 7, wherein the training unit is to:

judging whether the loss function is converged;

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-4.