CN117746125A - Training method and device of image processing model and electronic equipment - Google Patents

Training method and device of image processing model and electronic equipment Download PDF

Info

Publication number
CN117746125A
CN117746125A CN202311759639.3A CN202311759639A CN117746125A CN 117746125 A CN117746125 A CN 117746125A CN 202311759639 A CN202311759639 A CN 202311759639A CN 117746125 A CN117746125 A CN 117746125A
Authority
CN
China
Prior art keywords
image
teacher
feature
student
processing model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311759639.3A
Other languages
Chinese (zh)
Inventor
舒茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311759639.3A priority Critical patent/CN117746125A/en
Publication of CN117746125A publication Critical patent/CN117746125A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The disclosure provides a training method and device for an image processing model and electronic equipment, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of deep learning, natural language processing, computer vision and the like. The specific implementation scheme is as follows: acquiring teacher image processing models and student image processing models with different structures and training data sets of the student image processing models; for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model to obtain teacher image characteristics and student image characteristics; according to the feature space of the student image features, the teacher image features are subjected to feature conversion processing, and according to the converted teacher converted image features and the student image features, the student image processing model is subjected to training processing, so that the training processing method is suitable for knowledge transfer among models with different structures, and the image processing accuracy of the student image processing model obtained through training is improved.

Description

Training method and device of image processing model and electronic equipment
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, natural language processing, computer vision and the like, and particularly relates to a training method and device of an image processing model and electronic equipment.
Background
At present, the model feature distillation method is mainly based on feature distillation of isomorphic models, and the scheme is to realize feature distillation by minimizing the minimum distance between features of two models. However, the above-mentioned scheme is mainly suitable for knowledge transfer between models having the same structure, and is difficult to be suitable for knowledge transfer between models having different structures, and the characteristic distillation efficiency is poor.
Disclosure of Invention
The disclosure provides a training method and device for an image processing model and electronic equipment.
According to an aspect of the present disclosure, there is provided a training method of an image processing model, the method including: acquiring a teacher image processing model, a student image processing model and a training data set of the student image processing model; the teacher image processing model is different from the student image processing model in structure; for each sample image in the training data set, respectively inputting the sample image into the teacher image processing model and the student image processing model to obtain teacher image characteristics output by the teacher image processing model and student image characteristics output by the student image processing model; performing feature conversion processing on the teacher image features according to the feature space of the student image features to obtain teacher conversion image features in the feature space; and training the student image processing model according to the teacher conversion image characteristics and the student image characteristics to obtain a trained student image processing model.
According to another aspect of the present disclosure, there is provided a training apparatus of an image processing model, the apparatus including: the first acquisition module is used for acquiring a teacher image processing model, a student image processing model and a training data set of the student image processing model; the teacher image processing model is different from the student image processing model in structure; the second acquisition module is used for inputting the sample images into the teacher image processing model and the student image processing model respectively for each sample image in the training data set, and acquiring the teacher image characteristics output by the teacher image processing model and the student image characteristics output by the student image processing model; the conversion processing module is used for carrying out characteristic conversion processing on the teacher image characteristics according to the characteristic space of the student image characteristics to obtain teacher conversion image characteristics in the characteristic space; and the training processing module is used for training the student image processing model according to the teacher conversion image characteristics and the student image characteristics to obtain a trained student image processing model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the image processing model set forth above in the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the training method of the image processing model proposed above in the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the training method of the image processing model proposed above in the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a block diagram schematic of a training model of an image processing model;
FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing a training method of an image processing model of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, the model feature distillation method is mainly based on feature distillation of isomorphic models, and the scheme is to realize feature distillation by minimizing the minimum distance between features of two models. However, the above-mentioned scheme is mainly suitable for knowledge transfer between models having the same structure, and is difficult to be suitable for knowledge transfer between models having different structures, and the characteristic distillation efficiency is poor.
Aiming at the problems, the disclosure provides a training method and device for an image processing model and electronic equipment.
Fig. 1 is a schematic diagram of a first embodiment of the present disclosure, and it should be noted that the training method of the image processing model according to the embodiment of the present disclosure may be applied to a training apparatus of the image processing model, where the apparatus may be disposed in an electronic device, so that the electronic device may perform a training function of the image processing model.
The electronic device may be any device with computing capability, for example, may be a personal computer (Personal Computer, abbreviated as PC), a mobile terminal, a server, etc., and the mobile terminal may be, for example, a vehicle-mounted device, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, a smart speaker, etc., and has various hardware devices including an operating system, a touch screen, and/or a display screen. In the following embodiments, an execution body is described as an example of an electronic device.
As shown in fig. 1, the training method of the image processing model may include the following steps:
step 101, acquiring a training data set of a teacher image processing model, a student image processing model and a student image processing model; the teacher image processing model is different in structure from the student image processing model.
In the disclosed embodiment, the number of parameters of the teacher image processing model is much larger than the number of parameters of the student image processing model. The teacher image processing model is obtained through determination according to training data sets under a plurality of tasks; the training data set of the student image processing model is a training data set under part of tasks in a plurality of tasks.
The training process of the teacher image processing model can be that training data sets under a plurality of tasks are obtained; each training dataset includes a plurality of sample images and labels for the sample images; the labels are labels under corresponding tasks; inputting a sample image in the training data set into an initial teacher image processing model aiming at each training data set, and acquiring an image recognition result output by the teacher image processing model; determining the value of the loss function according to the image recognition result, the label of the sample image and the loss function; and carrying out parameter adjustment processing on the teacher image processing model according to the numerical value of the loss function to obtain a trained teacher image processing model.
The tasks, such as an image classification task, an instance segmentation task, a target detection task, and the like, may be set according to actual needs. Wherein the labels under different tasks are different.
The teacher image processing model is obtained through determination according to training data sets under a plurality of tasks; the training data set of the student image processing model is a training data set of part of tasks in the plurality of tasks, so that the teacher image processing model can learn knowledge of the plurality of tasks, the student image processing model can learn knowledge of the plurality of tasks, and the accuracy of the student image processing model obtained through training is improved.
Step 102, for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model, and obtaining the teacher image characteristics output by the teacher image processing model and the student image characteristics output by the student image processing model.
In the embodiment of the present disclosure, the electronic device performs the process of step 102, for example, may be that, for each sample image in the training data set, the sample image is input into a teacher image processing model, and the characteristics of the teacher image output by the last layer of the feature extraction network in the teacher image processing model are obtained; and inputting the sample image into a student image processing model, and obtaining the student image characteristics output by the last layer of the characteristic extraction network in the student image processing model.
The teacher image processing model can be a transducer model; the student image processing model may be a convolutional neural network CNN model. Wherein, in the transducer model, an encoding network and a decoding network can be included; the feature extraction network may be a coding network. In the CNN model, a feature extraction network and a feature prediction network may be included.
The parameters of the transducer model are large, and the learned knowledge is large; and performing characteristic distillation treatment on the CNN model based on the transducer model, so that the CNN model can learn more knowledge, and the accuracy of the CNN model obtained by training is improved.
And step 103, performing feature conversion processing on the teacher image features according to the feature space of the student image features to obtain the teacher converted image features in the feature space.
In the embodiment of the disclosure, the electronic device may use the feature adapter to perform feature conversion processing on the image features of the teacher. Among other things, feature adapters (Feature adapters) are a tool for converting raw features into Feature representations that are appropriate for a particular model or task. In machine learning and deep learning, feature adapters are typically used to normalize, encode, etc., original features to accommodate the needs of different models.
The convolution kernel used for carrying out feature conversion processing in the feature adapter is determined according to the conversion relation between the feature space of the teacher image feature and the feature space of the student image feature. Wherein the feature space may characterize the number of dimensions of the image feature, etc.
The method comprises the steps of determining a convolution kernel in a feature adapter according to a conversion relation between a feature space of teacher image features and a feature space of student image features, and further combining the convolution kernel to perform feature conversion processing, so that the teacher conversion image features in the feature space can be obtained, consistency of the feature space between the teacher conversion image features and the student image features is ensured, and further training processing of a student image processing model can be achieved.
And 104, training the student image processing model according to the teacher converted image characteristics and the student image characteristics to obtain a trained student image processing model.
In the embodiment of the disclosure, the teacher converts the number of image features into one; the number of student image features is one. Correspondingly, the electronic device may perform the process of step 104, for example, by determining a feature similarity between the teacher converted image feature and the student image feature; and carrying out parameter adjustment processing on the student image processing model according to the feature similarity, so as to realize training.
The electronic equipment can construct a loss function according to the feature similarity; and carrying out parameter adjustment processing on the student image processing model according to the numerical value of the loss function. The loss function may be, for example, the inverse of the feature similarity.
According to the feature similarity between the teacher conversion image feature and the student image feature, parameter adjustment processing is performed on the student image processing model, so that the student image processing model learns knowledge in the teacher image processing model, and accuracy of the student image processing model obtained through training is improved.
In an embodiment of the present disclosure, the training dataset of the student image processing model may further include: and a label corresponding to the sample image. Correspondingly, the electronic device may also perform the following procedure: and carrying out fine adjustment processing on the trained student image processing model according to the sample image in the training data set and the label corresponding to the sample image.
The student image processing model may include, among other things, a feature extraction network and a feature prediction network. Wherein the feature extraction network has learned knowledge in the teacher image processing model and the feature prediction network may be the initial network. Correspondingly, the electronic equipment can input a sample image in the training data set into the student image processing model to obtain a prediction label output by a feature prediction network in the student image processing model; determining the numerical value of a loss function according to the predicted label, the label corresponding to the sample image and the loss function of the student image processing model; and carrying out fine adjustment processing on the feature extraction network in the student image processing model according to the numerical value of the loss function, and carrying out adjustment processing on the feature prediction network in the student image processing model.
And performing fine adjustment processing on the trained student image processing model according to the sample image in the training data set and the label corresponding to the sample image, so that when the trained student image processing model is used for a task corresponding to the training data set, the image processing accuracy under the task can be improved.
According to the training method of the image processing model, a training data set of a teacher image processing model, a student image processing model and a student image processing model is obtained; the teacher image processing model is different from the student image processing model in structure; for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model to obtain teacher image characteristics output by the teacher image processing model and student image characteristics output by the student image processing model; according to the feature space of the student image features, performing feature conversion processing on the teacher image features to obtain teacher conversion image features in the feature space; according to the image characteristics converted by the teacher and the image characteristics of the student, training the student image processing model to obtain a trained student image processing model, so that the training method is suitable for knowledge transfer among models with different structures, the student image processing model can learn knowledge of the teacher image processing model with different structures, the characteristic distillation efficiency is improved, and the image processing accuracy of the student image processing model is improved.
In order to enable the student image processing model to learn more knowledge in the teacher image processing model, the electronic device can combine a plurality of student image features and one teacher image feature to determine and obtain a plurality of teacher conversion image features, and further training processing is carried out on the student image processing model, so that accuracy of the student image processing model is further improved. As shown in fig. 2, fig. 2 is a schematic diagram of a second embodiment according to the present disclosure, and the embodiment shown in fig. 2 may include the following steps:
step 201, acquiring a training data set of a teacher image processing model, a student image processing model and a student image processing model; the teacher image processing model is different in structure from the student image processing model.
Step 202, for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model, and obtaining the teacher image characteristics output by the teacher image processing model and the student image characteristics output by the student image processing model; the number of the teacher image features is one; the number of the student image features is a plurality; the feature space of the plurality of student image features is different.
Step 203, a first student image feature having a minimum feature space among the plurality of student image features and a second student image feature other than the first student image feature are acquired.
In the embodiment of the disclosure, the feature space of the student image features is represented by the spatial dimension of the student image features. Wherein the spatial dimensions, i.e., the height and width of the student image features. The height and the width are expressed by the number of feature points in the student image features. Namely, the height is the number of feature points in the height direction in the student image feature; the width is the number of feature points in the width direction in the student image feature.
Wherein, since the sample image has a certain size, the height and the width are used for representation. The height of the sample image is the number of pixel points of the sample image in the height direction; the width of the sample image is the number of pixels of the sample image in the width direction. Thus, the feature space of the student image features may be represented by the sample image, e.g., the feature space may be 1/4,1/8,1/16,1/32, etc. of the sample image.
Taking feature spaces of a plurality of student image features as 1/4,1/8,1/16 and 1/32 of a sample image as examples, wherein the first student image feature can be a student image feature of which the corresponding feature space is 1/32 of the sample image; the second student image feature may be a student image feature whose corresponding feature space is 1/4 of the sample image, a student image feature whose corresponding feature space is 1/8 of the sample image, and a student image feature whose corresponding feature space is 1/16 of the sample image.
And 204, performing feature conversion processing on the teacher image features according to the feature space of the first student image features to obtain a teacher converted image feature.
And 205, up-sampling the teacher converted image features according to the feature space of the second student image features to obtain processed image features.
In the embodiment of the disclosure, the feature space of the teacher converted image feature is the same as the feature space of the first student image feature, for example, 1/32 of the sample image. In one example, the up-sampling of the teacher converted image feature may refer to performing feature point interpolation processing on the teacher converted image feature; the value of the interpolated feature point may be determined according to the value of the feature point around the position to be interpolated.
When the feature space of the teacher conversion image feature is 1/32 of the sample image, the interpolation processing of the teacher conversion image feature can be used to obtain a processed image feature with the feature space of 1/16 of the sample image, a processed image feature with the feature space of 1/8 of the sample image, and a processed image feature with the feature space of 1/4 of the sample image. And then the image characteristics after processing are used as teacher conversion image characteristics.
In another example, the up-sampling of the teacher's converted image feature may refer to deconvolution of the teacher's converted image feature.
And 206, converting the processed image features into image features for teachers.
Step 207, training the student image processing model according to the teacher converted image feature and the student image feature to obtain a trained student image processing model.
In the embodiment of the disclosure, the teacher converts the number of image features into a plurality of; the number of student image features is plural. Correspondingly, the electronic device may perform the process of step 207, for example, by acquiring, for each student image feature, a teacher conversion image feature corresponding to the student image feature from among the plurality of teacher conversion image features; the feature space of the student image features is the same as the feature space of the teacher conversion image features corresponding to the student image features; determining feature similarity between student image features and teacher conversion image features corresponding to the student image features; and carrying out parameter adjustment processing on the student image processing model according to the feature similarities, so as to realize training.
Taking feature spaces of a plurality of student image features as 1/4, 1/8, 1/16 and 1/32 of a sample image as examples, wherein the feature spaces are teacher conversion image features corresponding to the student image features of 1/4 of the sample image, and the feature spaces are also 1/4 of the sample image; the feature space is the teacher conversion image feature corresponding to 1/8 of the student image feature of the sample image, and the feature space is also 1/8 of the sample image; the feature space is teacher conversion image features corresponding to 1/16 of student image features of the sample image, and the feature space is also 1/16 of the sample image; the feature space is the teacher conversion image feature corresponding to 1/32 of the student image feature of the sample image, and the feature space is also 1/32 of the sample image.
The method comprises the steps of determining the feature similarity between student image features and corresponding teacher conversion image features; and constructing a loss function according to the feature similarities, and performing parameter adjustment processing on the student image processing model, so that the student image processing model can learn more knowledge in the teacher image processing model, and the accuracy of the student image processing model obtained by training is further improved.
It should be noted that, the detailed descriptions of step 201 to step 202 may refer to the detailed descriptions of step 101 to step 102 in the embodiment of fig. 1, and will not be described in detail here.
According to the training method of the image processing model, a training data set of a teacher image processing model, a student image processing model and a student image processing model is obtained; the teacher image processing model is different from the student image processing model in structure; for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model to obtain teacher image characteristics output by the teacher image processing model and student image characteristics output by the student image processing model; the number of the teacher image features is one; the number of the student image features is a plurality; the feature spaces of the plurality of student image features are different; acquiring a first student image feature with a minimum feature space and a second student image feature except the first student image feature in the plurality of student image features; according to the feature space of the first student image feature, performing feature conversion processing on the teacher image feature to obtain a teacher conversion image feature; up-sampling the teacher converted image features according to the feature space of the second student image features to obtain processed image features; converting the processed image features into image features for teachers; according to the image characteristics converted by the teacher and the image characteristics of the student, training the student image processing model to obtain a trained student image processing model, so that the training method is suitable for knowledge transfer among models with different structures, the student image processing model can learn knowledge of the teacher image processing model with different structures, the characteristic distillation efficiency is improved, and the image processing accuracy of the student image processing model is improved.
The following examples are illustrative. FIG. 3 is a block diagram of a training model of an image processing model. In fig. 3, a sample Image (Image) is input into a transducer model (ViT, i.e., a teacher Image processing model), and output teacher Image features are acquired; the teacher image features are provided for a feature Adapter (Adapter) to perform feature conversion processing and up-sampling processing, so that the teacher conversion image features with the feature spaces of 1/4, 1/8, 1/16 and 1/32 of the sample image are obtained. Inputting a sample Image (Image) into a CNN model (namely, a student Image processing model), and acquiring student Image characteristics of which the output characteristic spaces are 1/4, 1/8, 1/16 and 1/32 of the sample Image respectively; according to the image characteristics converted by a plurality of teachers and the image characteristics of a plurality of students, determining the numerical value (DistillLoss) of the Loss function, and further carrying out parameter adjustment processing on the CNN model to realize training.
In order to achieve the above embodiments, the present disclosure further provides a training apparatus for an image processing model. As shown in fig. 4, fig. 4 is a schematic diagram according to a third embodiment of the present disclosure. The training device 40 for an image processing model may include: a first acquisition module 401, a second acquisition module 402, a conversion processing module 403, and a training processing module 404.
The first acquiring module 401 is configured to acquire a teacher image processing model, a student image processing model, and a training data set of the student image processing model; the teacher image processing model is different from the student image processing model in structure; a second obtaining module 402, configured to input, for each sample image in the training data set, the sample image into the teacher image processing model and the student image processing model, respectively, to obtain a teacher image feature output by the teacher image processing model and a student image feature output by the student image processing model; the conversion processing module 403 is configured to perform feature conversion processing on the teacher image feature according to the feature space of the student image feature, so as to obtain a teacher converted image feature in the feature space; and the training processing module 404 is configured to perform training processing on the student image processing model according to the teacher converted image feature and the student image feature, so as to obtain a trained student image processing model.
As one possible implementation manner of the embodiment of the present disclosure, the number of the teacher image features is one; the number of the student image features is a plurality; the feature spaces of the plurality of student image features are different; the teacher converts the number of image features into a plurality of images; the conversion processing module 403 is specifically configured to obtain a first student image feature having a minimum feature space among a plurality of student image features, and a second student image feature other than the first student image feature; performing feature conversion processing on the teacher image features according to the feature space of the first student image features to obtain teacher converted image features; performing up-sampling processing on the teacher converted image features according to the feature space of the second student image features to obtain processed image features; and converting the processed image features into image features for teachers.
As one possible implementation manner of the embodiment of the present disclosure, the teacher converts the number of image features into one; the number of the student image features is one; the training processing module 404 is specifically configured to determine a feature similarity between the teacher converted image feature and the student image feature; and carrying out parameter adjustment processing on the student image processing model according to the feature similarity to realize training.
As one possible implementation manner of the embodiment of the present disclosure, the teacher converts the number of image features into a plurality of image features; the number of the student image features is a plurality; the training processing module 404 is specifically configured to obtain, for each student image feature, a teacher conversion image feature corresponding to the student image feature from among a plurality of teacher conversion image features; the feature space of the student image features is the same as the feature space of the teacher conversion image features corresponding to the student image features; determining feature similarity between the student image features and teacher conversion image features corresponding to the student image features; and carrying out parameter adjustment processing on the student image processing model according to the feature similarity so as to realize training.
As one possible implementation of the embodiments of the present disclosure, the training dataset further includes: a label corresponding to the sample image; the apparatus further comprises: and the fine adjustment processing module is used for carrying out fine adjustment processing on the trained student image processing model according to the sample image in the training data set and the label corresponding to the sample image.
As one possible implementation manner of the embodiment of the present disclosure, the feature adapter performs feature conversion processing on the teacher image feature; and the convolution kernel used for carrying out feature conversion processing in the feature adapter is determined according to the conversion relation between the feature space of the teacher image feature and the feature space of the student image feature.
As one possible implementation manner of the embodiment of the present disclosure, the teacher image processing model is determined according to training data sets under multiple tasks; the training data set of the student image processing model is a training data set under part of tasks in a plurality of tasks.
As one possible implementation manner of the embodiment of the present disclosure, the teacher image processing model is a transducer model; the student image processing model is a convolutional neural network CNN model.
The training device of the image processing model comprises a training data set for acquiring a teacher image processing model, a student image processing model and a student image processing model; the teacher image processing model is different from the student image processing model in structure; for each sample image in the training data set, respectively inputting the sample image into a teacher image processing model and a student image processing model to obtain teacher image characteristics output by the teacher image processing model and student image characteristics output by the student image processing model; according to the feature space of the student image features, performing feature conversion processing on the teacher image features to obtain teacher conversion image features in the feature space; according to the image characteristics converted by the teacher and the image characteristics of the student, training the student image processing model to obtain a trained student image processing model, so that the training method is suitable for knowledge transfer among models with different structures, the student image processing model can learn knowledge of the teacher image processing model with different structures, the characteristic distillation efficiency is improved, and the image processing accuracy of the student image processing model is improved.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user are performed on the premise of proving the consent of the user, and all the processes accord with the regulations of related laws and regulations, and the public welfare is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 501 performs the respective methods and processes described above, for example, a training method of an image processing model. For example, in some embodiments, the training method of the image processing model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the training method of the image processing model described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the training method of the image processing model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (19)

1. A method of training an image processing model, the method comprising:
acquiring a teacher image processing model, a student image processing model and a training data set of the student image processing model; the teacher image processing model is different from the student image processing model in structure;
for each sample image in the training data set, respectively inputting the sample image into the teacher image processing model and the student image processing model to obtain teacher image characteristics output by the teacher image processing model and student image characteristics output by the student image processing model;
Performing feature conversion processing on the teacher image features according to the feature space of the student image features to obtain teacher conversion image features in the feature space;
and training the student image processing model according to the teacher conversion image characteristics and the student image characteristics to obtain a trained student image processing model.
2. The method of claim 1, wherein the number of teacher image features is one; the number of the student image features is a plurality; the feature spaces of the plurality of student image features are different; the teacher converts the number of image features into a plurality of images;
the step of performing feature conversion processing on the teacher image feature according to the feature space of the student image feature to obtain the teacher converted image feature in the feature space comprises the following steps:
acquiring a first student image feature with a minimum feature space in a plurality of student image features and a second student image feature except the first student image feature;
performing feature conversion processing on the teacher image features according to the feature space of the first student image features to obtain teacher converted image features;
Performing up-sampling processing on the teacher converted image features according to the feature space of the second student image features to obtain processed image features;
and converting the processed image features into image features for teachers.
3. The method of claim 1, wherein the teacher converts the number of image features to one; the number of the student image features is one; the training processing is performed on the student image processing model according to the teacher converted image feature and the student image feature to obtain a trained student image processing model, which comprises the following steps:
determining feature similarity between the teacher converted image features and the student image features;
and carrying out parameter adjustment processing on the student image processing model according to the feature similarity to realize training.
4. The method of claim 1 or 2, wherein the teacher converts the number of image features into a plurality; the number of the student image features is a plurality; the training processing is performed on the student image processing model according to the teacher converted image feature and the student image feature to obtain a trained student image processing model, which comprises the following steps:
Aiming at each student image feature, acquiring a teacher conversion image feature corresponding to the student image feature in a plurality of teacher conversion image features; the feature space of the student image features is the same as the feature space of the teacher conversion image features corresponding to the student image features;
determining feature similarity between the student image features and teacher conversion image features corresponding to the student image features;
and carrying out parameter adjustment processing on the student image processing model according to the feature similarity so as to realize training.
5. The method of claim 1, wherein the training dataset further comprises: a label corresponding to the sample image;
the method further comprises the steps of:
and carrying out fine adjustment processing on the trained student image processing model according to the sample image in the training data set and the label corresponding to the sample image.
6. The method according to claim 1 or 2, wherein the teacher image feature is subjected to feature conversion processing by a feature adapter;
and the convolution kernel used for carrying out feature conversion processing in the feature adapter is determined according to the conversion relation between the feature space of the teacher image feature and the feature space of the student image feature.
7. The method of claim 1, wherein the teacher image processing model is determined from training data sets under a plurality of tasks;
the training data set of the student image processing model is a training data set under part of tasks in a plurality of tasks.
8. The method of claim 1, wherein the teacher image processing model is a transducer model; the student image processing model is a convolutional neural network CNN model.
9. A training apparatus for an image processing model, the apparatus comprising:
the first acquisition module is used for acquiring a teacher image processing model, a student image processing model and a training data set of the student image processing model; the teacher image processing model is different from the student image processing model in structure;
the second acquisition module is used for inputting the sample images into the teacher image processing model and the student image processing model respectively for each sample image in the training data set, and acquiring the teacher image characteristics output by the teacher image processing model and the student image characteristics output by the student image processing model;
the conversion processing module is used for carrying out characteristic conversion processing on the teacher image characteristics according to the characteristic space of the student image characteristics to obtain teacher conversion image characteristics in the characteristic space;
And the training processing module is used for training the student image processing model according to the teacher conversion image characteristics and the student image characteristics to obtain a trained student image processing model.
10. The apparatus of claim 9, wherein the number of teacher image features is one; the number of the student image features is a plurality; the feature spaces of the plurality of student image features are different; the teacher converts the number of image features into a plurality of images; the conversion processing module is specifically configured to,
acquiring a first student image feature with a minimum feature space in a plurality of student image features and a second student image feature except the first student image feature;
performing feature conversion processing on the teacher image features according to the feature space of the first student image features to obtain teacher converted image features;
performing up-sampling processing on the teacher converted image features according to the feature space of the second student image features to obtain processed image features;
and converting the processed image features into image features for teachers.
11. The apparatus of claim 9, wherein the teacher converts the number of image features to one; the number of the student image features is one; the training processing module is specifically configured to,
Determining feature similarity between the teacher converted image features and the student image features;
and carrying out parameter adjustment processing on the student image processing model according to the feature similarity to realize training.
12. The apparatus of claim 9 or 10, wherein the teacher converts the number of image features into a plurality; the number of the student image features is a plurality; the training processing module is specifically configured to,
aiming at each student image feature, acquiring a teacher conversion image feature corresponding to the student image feature in a plurality of teacher conversion image features; the feature space of the student image features is the same as the feature space of the teacher conversion image features corresponding to the student image features;
determining feature similarity between the student image features and teacher conversion image features corresponding to the student image features;
and carrying out parameter adjustment processing on the student image processing model according to the feature similarity so as to realize training.
13. The apparatus of claim 9, wherein the training dataset further comprises: a label corresponding to the sample image; the apparatus further comprises: and the fine adjustment processing module is used for carrying out fine adjustment processing on the trained student image processing model according to the sample image in the training data set and the label corresponding to the sample image.
14. The apparatus according to claim 9 or 10, wherein the teacher image feature is subjected to feature conversion processing by a feature adapter;
and the convolution kernel used for carrying out feature conversion processing in the feature adapter is determined according to the conversion relation between the feature space of the teacher image feature and the feature space of the student image feature.
15. The apparatus of claim 9, wherein the teacher image processing model is determined from training data sets under a plurality of tasks;
the training data set of the student image processing model is a training data set under part of tasks in a plurality of tasks.
16. The apparatus of claim 9, wherein the teacher image processing model is a transducer model; the student image processing model is a convolutional neural network CNN model.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the image processing model of any one of claims 1 to 8.
18. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the training method of the image processing model according to any one of claims 1 to 8.
19. A computer program product comprising a computer program which, when executed by a processor, implements a method of training an image processing model according to any one of claims 1 to 8.
CN202311759639.3A 2023-12-20 2023-12-20 Training method and device of image processing model and electronic equipment Pending CN117746125A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311759639.3A CN117746125A (en) 2023-12-20 2023-12-20 Training method and device of image processing model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311759639.3A CN117746125A (en) 2023-12-20 2023-12-20 Training method and device of image processing model and electronic equipment

Publications (1)

Publication Number Publication Date
CN117746125A true CN117746125A (en) 2024-03-22

Family

ID=90258778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311759639.3A Pending CN117746125A (en) 2023-12-20 2023-12-20 Training method and device of image processing model and electronic equipment

Country Status (1)

Country Link
CN (1) CN117746125A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015431A (en) * 2024-04-03 2024-05-10 阿里巴巴(中国)有限公司 Image processing method, apparatus, storage medium, and program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015431A (en) * 2024-04-03 2024-05-10 阿里巴巴(中国)有限公司 Image processing method, apparatus, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN113657390A (en) Training method of text detection model, and text detection method, device and equipment
CN115063875B (en) Model training method, image processing method and device and electronic equipment
CN113674421B (en) 3D target detection method, model training method, related device and electronic equipment
CN115082920B (en) Deep learning model training method, image processing method and device
EP3955216A2 (en) Method and apparatus for recognizing image, electronic device and storage medium
CN113642583B (en) Deep learning model training method for text detection and text detection method
CN114494784A (en) Deep learning model training method, image processing method and object recognition method
CN113901907A (en) Image-text matching model training method, image-text matching method and device
CN114187459A (en) Training method and device of target detection model, electronic equipment and storage medium
CN114863437B (en) Text recognition method and device, electronic equipment and storage medium
CN117746125A (en) Training method and device of image processing model and electronic equipment
CN114511743B (en) Detection model training, target detection method, device, equipment, medium and product
CN113468857B (en) Training method and device for style conversion model, electronic equipment and storage medium
CN115311469A (en) Image labeling method, training method, image processing method and electronic equipment
CN114792355A (en) Virtual image generation method and device, electronic equipment and storage medium
CN114840734A (en) Training method of multi-modal representation model, cross-modal retrieval method and device
CN114973333B (en) Character interaction detection method, device, equipment and storage medium
CN114881227B (en) Model compression method, image processing device and electronic equipment
CN114724144B (en) Text recognition method, training device, training equipment and training medium for model
CN116092101A (en) Training method, image recognition method apparatus, device, and readable storage medium
CN113989569B (en) Image processing method, device, electronic equipment and storage medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN112784967B (en) Information processing method and device and electronic equipment
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN114817476A (en) Language model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination