WO2024036847A1

WO2024036847A1 - Image processing method and apparatus, and electronic device and storage medium

Info

Publication number: WO2024036847A1
Application number: PCT/CN2022/139730
Authority: WO
Inventors: 郭若愚; 杜宇宁; 赖宝华; 于佃海; 马艳军
Original assignee: 北京百度网讯科技有限公司
Priority date: 2022-08-16
Filing date: 2022-12-16
Publication date: 2024-02-22
Also published as: CN115063875A; CN115063875B

Abstract

Provided are an image processing method and apparatus, and an electronic device, a storage medium, a computer program product and a computer program. The image processing method comprises: acquiring an image to be processed; and inputting said image into a target image model, such that the target image model outputs a processing result of said image.

Description

Image processing methods and devices, electronic equipment and storage media

Cross-references to related applications

This application claims priority from Chinese Patent Application No. 202210981983.6 filed in China on August 16, 2022, the entire content of which is incorporated herein by reference.

Technical field

The present disclosure relates to the field of computer technology, and specifically to image processing methods and devices, electronic equipment, storage media, computer program products and computer programs.

Background technique

At present, with the continuous development of artificial intelligence technology, models have been widely used in fields such as images, text, and speech, and have the advantages of high automation and low labor costs. In order to meet the needs of model prediction, the model volume is often large. In order to compress the model, the model can be trained based on knowledge distillation. However, in related technologies, model training methods based on knowledge distillation have the problem of low model training accuracy.

Contents of the invention

The present disclosure provides image processing methods and devices, electronic devices, storage media, computer program products, and computer programs.

According to an embodiment of one aspect of the present disclosure, an image processing method is provided, including:

Get the image to be processed; and

The image to be processed is input into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is trained through the following steps:

Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;

Based on the label of the training sample and n second outputs, determine the weight corresponding to the training sample;

Obtaining a total loss function of the student model based on the first output and the weight; and

Update the model parameters of the student model based on the total loss function to obtain the trained target model,

Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;

wherein the total loss function of the student model is obtained based on the first output and the weight,

include:

Obtain a first loss function of the student model based on the first output, the label, and the first weight;

Obtaining a second loss function of the student model based on the first output, n second outputs and the second weight; and

Based on the first loss function and the second loss function, the total loss function is obtained,

Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:

Compare the label with n second outputs to obtain a target number of second outputs consistent with the label; and

The first weight is determined based on the target quantity, wherein the first weight is positively related to the target quantity,

Compare the label with the second output of the i-th teacher model;

In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1; or,

In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs that are consistent with the label;

Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.

According to an embodiment of another aspect of the present disclosure, an image processing device is provided, including:

Acquisition module, used to obtain images to be processed;

A processing module, configured to input the image to be processed into a target image model, and the target image model to output the processing result of the image to be processed, wherein the target image model is trained through the following modules: a first acquisition module, Used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;

A determination module configured to determine the weight corresponding to the training sample based on the label of the training sample and n second outputs;

A second acquisition module, configured to acquire the total loss function of the student model based on the first output and the weight; and

A training module, used to update the model parameters of the student model based on the total loss function to obtain the trained target model,

Wherein, the second acquisition module is also used for:

Obtain a second loss function of the student model based on the first output, n second outputs and the second weight;

Based on the first loss function and the second loss function, obtain the total loss function;

Among them, the determination module is also used to:

Compare the label with n second outputs to obtain a target number of second outputs consistent with the label;

Based on the target quantity, the first weight is determined, wherein the first weight is positively related to the target quantity;

Among them, the determination module is also used to:

Compare the label with the second output of the i-th teacher model;

According to an embodiment of another aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the image processing method described in any embodiment of the foregoing aspect.

According to an embodiment of another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image described in any embodiment of the foregoing aspect. Approach.

According to an embodiment of another aspect of the present disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the image processing method described in any embodiment of the foregoing aspect.

According to an embodiment of another aspect of the present disclosure, a computer program is provided, including computer program code. When the computer program code is run on a computer, it causes the computer to execute the method described in any embodiment of the foregoing aspect. Image processing methods.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of drawings

The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present disclosure. in:

Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure;

Figure 2 is a schematic flowchart of a model training method according to a second embodiment of the present disclosure;

Figure 3 is a schematic flowchart of a model training method according to a third embodiment of the present disclosure;

Figure 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure;

Figure 5 is a block diagram of a model training device according to the first embodiment of the present disclosure;

FIG. 6 is a block diagram of an image processing device according to the first embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device used to implement the model training method or the image processing method according to the embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered to be exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

AI (Artificial Intelligence) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Currently, AI technology has the advantages of high automation, high accuracy, and low cost, and has been widely used.

DL (Deep Learning) is a new research direction in the field of ML (Machine Learning). It is to learn the inherent laws and representation levels of sample data, so that the machine can analyze and learn like humans and can recognize text. , a science of data such as images and sounds, and is widely used in speech and image recognition.

Image Processing refers to the technology of using computers to analyze images to achieve the desired results. Image processing generally refers to digital image processing. Digital image refers to a large two-dimensional array obtained by shooting with industrial cameras, video cameras, scanners and other equipment. The elements of the array are called pixels, and their values are called grayscale values. Image processing technology generally includes three parts: image compression, enhancement and restoration, matching, description and recognition.

Computer Vision refers to the use of cameras and computers instead of human eyes to carry out machine vision such as target recognition, tracking and measurement, and further performs graphics processing to make computer processing into images that are more suitable for human eyes to observe or transmit to instruments for detection. Computer vision is a comprehensive discipline, including computer science and engineering, signal processing, physics, applied mathematics and statistics, neurophysiology and cognitive science, etc.

Figure 1 is a schematic flowchart of a model training method according to the first embodiment of the present disclosure.

As shown in Figure 1, the model training method according to the first embodiment of the present disclosure includes: S101-S104.

S101. Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.

It should be noted that the execution subject of the model training method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work. In some embodiments, execution subjects may include workstations, servers, computers, user terminals and other intelligent devices. Among them, user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.

In the embodiment of the present disclosure, one student model can correspond to n teacher models, and n is not too limited. In one embodiment, n can be 3 or 5.

In the embodiment of the present disclosure, training samples can be input into the student model and n teacher models respectively, the student model outputs the first output, and the n teacher models output the second output. It can be understood that each teacher model Each outputs a second output, that is, n teacher models can output n second outputs.

It should be noted that there are no excessive restrictions on the training samples, student models, and teacher models. Training samples have labels.

In one embodiment, the student model and teacher model are speech models. The speech models include but are not limited to speech recognition models, speech synthesis models, etc.

In one embodiment, when the student model and the teacher model are speech recognition models, the training samples include sample speech, and the labels of the training samples include the reference recognition text of the sample speech.

In one embodiment, when the student model and the teacher model are speech synthesis models, the training samples include sample text, and the labels of the training samples include the reference synthesized speech of the sample text.

In one embodiment, the student model and teacher model are image models, and the image models include but are not limited to action recognition models, image classification models, face recognition models, text recognition models, etc.

In one embodiment, when the student model and the teacher model are action recognition models, the training samples include sample images, and the labels of the training samples include reference recognition actions of the sample images.

In one embodiment, when the student model and the teacher model are image classification models, the training samples include sample images, and the labels of the training samples include the reference general category of the sample image and the reference subcategory of the pixels in the sample image.

In one embodiment, when the student model and the teacher model are face recognition models, the training samples include sample images, and the labels of the training samples include reference face recognition results of the sample images. Among them, the reference face recognition results include but are not limited to face position, face width, face height, face number, etc.

In one embodiment, when the student model and the teacher model are text recognition models, the training samples include sample images, and the labels of the training samples include the reference recognition text of the sample images.

In one embodiment, the student model and teacher model are language models, and language models include but are not limited to text classification models, text segmentation models, etc.

In one embodiment, when the student model and the teacher model are text classification models, the training samples include sample texts, and the labels of the training samples include reference classification results of the sample texts. Among them, the reference classification results include but are not limited to emotion classification results, topic classification results, etc.

In one embodiment, when the student model and the teacher model are text segmentation models, the training samples include sample texts, and the labels of the training samples include reference segmentation results of the sample texts.

S102: Determine the weight corresponding to the training sample based on the label of the training sample and the n second outputs.

It is understandable that different training samples can correspond to different weights. The value range of weight is [0,1].

In one implementation, determining the weight corresponding to the training sample based on the label of the training sample and the n second outputs includes obtaining the similarity between the label and the n second outputs, and determining the weight based on the similarity, where the weight Positively related to similarity. In one embodiment, if the similarity between the label and the n second outputs is higher, it indicates that the label is more likely to be correct, and the corresponding weight of the training sample is also greater. On the contrary, if the label and the n second outputs are The lower the similarity, the greater the possibility of label error, and the smaller the weight corresponding to the training sample.

In one implementation, determining the weight based on the similarity includes identifying a target setting range in which the similarity is located, and determining the setting weight corresponding to the target setting range as the weight corresponding to the training sample. It can be understood that the similarity can be divided into multiple setting ranges in advance, and different setting ranges correspond to different setting weights.

S103. Based on the first output and weight, obtain the total loss function of the student model.

It should be noted that the categories of the total loss function are not too limited. In one embodiment, the total loss function includes but is not limited to CE (Cross Entropy, cross entropy), BCE (Binary Cross Entropy, binary cross entropy) wait.

In one implementation, the total loss function is positively related to the weights. In one embodiment, if the weight corresponding to the training sample is larger, the total loss function is also larger. On the contrary, if the weight corresponding to the training sample is smaller, the total loss function is also smaller.

In one embodiment, when the label is wrong, the similarity between the label and the n second outputs is low, and the corresponding weight of the training sample is also low. At this time, the total loss function obtained is smaller, and the label can be avoided In the wrong case, the total loss function will be larger.

In one implementation, during the x-th training process of the student model, a training sample x can be obtained, the training sample x is input into the student model and n teacher models respectively, and the training sample x of the student model is obtained. The first output of n teacher models is obtained for the training sample x. Based on the label y of the training sample x and the second output of the n teacher models for the training sample x, the weight corresponding to the training sample x is determined, based on The student model obtains the total loss function of the student model in the xth training based on the first output of the training sample x and the weight corresponding to the training sample x. Among them, x is a positive integer.

In one implementation, during the xth training of the student model, a training sample set Ax can be obtained. The training sample set Ax includes training samples 1 to m, and the training samples s in the training sample set Ax are input respectively. From the student model and n teacher models, obtain the first output of the student model for the training sample s, and obtain the second output of the n teacher models for the training sample s. Among them, 1≤s≤m, s and m are positive integers.

Based on the label ys of the training sample s and the second output of the n teacher models for the training sample s, determine the weight corresponding to the training sample s. Based on the first output of the student model for the training sample s and the weight corresponding to the training sample s, obtain the student The model's loss function for training sample s is based on the student model's loss function for training samples 1 to m, and the total loss function of the student model in the xth training is obtained.

In one embodiment, based on the loss function of the student model for training samples 1 to m, obtaining the total loss function of the student model at the xth training time includes calculating the sum of the loss functions of the student model for training samples 1 to m or The average value is determined as the total loss function of the student model in the xth training.

S104. Update the model parameters of the student model based on the total loss function to obtain the trained target model.

In one implementation, updating the model parameters of the student model based on the total loss function includes obtaining gradient information of the total loss function, and updating the model parameters of the student model based on the gradient information. In one embodiment, backpropagation can be performed based on gradient information to update model parameters.

In one implementation, there may be multiple training samples. Update the model parameters of the student model based on the total loss function to obtain the trained target model, including updating the model parameters of the student model based on the total loss function, identifying that the model training end conditions are not currently met, and returning to the next training sample to continue. The model parameters of the student model that adjust the model parameters are updated until the model training end conditions are met, and the student model obtained in the last training is determined as the target model. It should be noted that the end conditions of model training are not too limited. In one embodiment, the end conditions of model training include but are not limited to the model accuracy reaching the set accuracy threshold, the number of model iterations reaching the set number threshold, and the total loss function. reaches the minimum value etc.

In the embodiment of the present disclosure, the target model is obtained by distillation learning of n teacher models. The target model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target model, which helps to save storage space and computing resources of the user terminal.

In summary, according to the model training method of the embodiment of the present disclosure, the training samples are input into the student model and n teacher models respectively, the first output of the student model is obtained, and the second output of the n teacher models is obtained, based on the training samples labels and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training . From this, the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples The problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.

Based on any of the above embodiments, the weight corresponding to the training sample includes a first weight and a second weight corresponding to the teacher model. It should be noted that one training sample can correspond to a first weight, and each teacher model can correspond to a second weight, that is, n teacher models can correspond to n second weights.

Figure 2 is a schematic flowchart of a model training method according to the second embodiment of the present disclosure.

As shown in Figure 2, the model training method according to the second embodiment of the present disclosure includes: S201-S207.

S201, input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.

The relevant content of step S201 can be found in the above embodiments and will not be described again here.

S202: According to the data format of the label of the training sample, convert n second outputs into n second conversion outputs in the data format of the label.

It can be understood that the data format of the label and the data format of the second output may be different. In one embodiment, the data format of the label may be text composed of natural language, and the data format of the second output may be composed of non-natural language. text. In one embodiment, the data format of the label may be Chinese text, and the data format of the second output may be a vector.

In embodiments of the present disclosure, the n second outputs may be converted into n second conversion outputs in the data format of the tag, that is, the data format of the n second conversion outputs is the same as the data format of the tag.

In one embodiment, when the student model and teacher model are image classification models, the labels of the training samples include "road", and the second output includes (1,0,0), (0,1,0), (0, 0,1), (1,0,0), (0,1,0), (0,0,1) can be converted into "road", "landscape" and "building" respectively.

S203: Determine the weight of the training sample based on the label and n second conversion outputs.

For the relevant content of step S203, please refer to the relevant content of step S102, which will not be described again here.

Therefore, in this method, n second outputs can be data converted according to the data format of the tags to obtain n second converted outputs, and based on the tags and the n second converted outputs, the weights, the tags and the n second converted outputs can be determined. The data format of the two converted data is the same, making it easy to determine the weight.

S204: Obtain the first loss function of the student model based on the first output, label and first weight.

In one implementation, based on the first output, label, and first weight, obtaining the first loss function of the student model includes obtaining the first template loss function, and substituting the first output, label, and first weight into the first template loss. function to get the first loss function. It should be noted that the first template loss function is not too limited. In one embodiment, the first template loss function can be set in advance.

In one implementation, obtaining the first loss function of the student model based on the first output, the label and the first weight includes obtaining the first initial loss function of the student model based on the first output and the label, based on the first initial loss function and the first weight to obtain the first loss function. Therefore, in this method, the first initial loss function can be obtained based on the first output and the label, and the first loss function can be obtained based on the first initial loss function and the first weight.

In one implementation, based on the first initial loss function and the first weight, obtaining the first loss function includes obtaining the first product of the first initial loss function and the first weight, and determining the first product as the first loss function.

In one embodiment, the formula of the first loss function is as follows:

Lgt＝F1(Os,GTx)·B

Among them, Lgt is the first loss function, Os is the first output, GTx is the label, F1(Os,GTx) is the first initial loss function, and B is the first weight.

S205: Obtain the second loss function of the student model based on the first output, n second outputs and the second weight.

In one implementation, based on the first output, n second outputs and second weights, obtaining the second loss function of the student model includes obtaining the second template loss function, and adding the first output, n second outputs and The second weight is substituted into the second template loss function to obtain the second loss function. It should be noted that the second template loss function is not too limited. In one embodiment, the second template loss function can be preset.

In one implementation, obtaining the second loss function of the student model based on the first output, n second outputs and the second weight includes obtaining the student model based on the first output and the second output of the i-th teacher model. The i-th third initial loss function, where 1≤i≤n, i is a positive integer, based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model n third loss functions, based on n third loss functions of the student model, obtain the second loss function. Therefore, in this method, a third initial loss function can be obtained based on the first output and the second output, and a third loss function can be obtained based on the third initial loss function and the second weight, and based on n third loss functions, Get the second loss function.

In one implementation, based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtaining the i-th third loss function of the student model includes obtaining the i-th third initial loss function. and the second product of the second weight corresponding to the i-th teacher model, and the second product is determined as the i-th third loss function.

In one implementation, obtaining the second loss function based on n third loss functions of the student model includes obtaining an average value of the n third loss functions of the student model, and determining the average value as the second loss function. Therefore, in this method, the average value of n third loss functions can be determined as the second loss function.

In one embodiment, the formula of the second loss function is as follows:

Among them, Ldist is the second loss function, Os is the first output, Oti is the second output of the i-th teacher model, 1≤i≤n, F2(Os,Oti) is the i-th third initial loss function, C _i is the second weight corresponding to the i-th teacher model, and F2(Os, Oti)·C _i is the i-th third loss function.

S206: Obtain the total loss function based on the first loss function and the second loss function.

In one implementation, obtaining the total loss function based on the first loss function and the second loss function may include performing a weighted sum of the first loss function and the second loss function to obtain the total loss function. It should be noted that the weights of the first loss function and the second loss function are not too limited. In one embodiment, the weights of the first loss function and the second loss function can be preset. Therefore, in this method, the first loss function and the second loss function can be weighted and summed to obtain the total loss function, which improves the flexibility of the total loss function of the student model.

As another possible implementation, the first loss function or the second loss function can also be directly determined as the total loss function.

S207: Update the model parameters of the student model based on the total loss function to obtain the trained target model.

The relevant content of step S207 can be found in the above embodiments and will not be described again here.

In summary, according to the model training method of the embodiment of the present disclosure, the first loss function of the student model can be obtained based on the first output, the label and the first weight, and the first loss function of the student model can be obtained based on the first output, n second outputs and the second weight. The second loss function of the student model obtains the total loss function based on the first loss function and the second loss function. From this, the first loss function of the student model relative to the label and the second loss function of the student model relative to the teacher model can be comprehensively considered to obtain the total loss function of the student model, which improves the accuracy of the total loss function of the student model. , thereby improving the accuracy of model training.

Figure 3 is a schematic flowchart of a model training method according to the third embodiment of the present disclosure.

As shown in Figure 3, the model training method according to the third embodiment of the present disclosure includes: S301-S311.

S301: Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer.

The relevant content of step S301 can be found in the above embodiments and will not be described again here.

S302: Compare the label with n second outputs to obtain the target number of second outputs consistent with the label.

It can be understood that the target quantity is a natural number, and the value range of the target quantity is [0,n]. In one embodiment, when the target number is 0, it indicates that the label is inconsistent with the n second outputs. When the target number is greater than or equal to 1, it indicates that there is a second output consistent with the label among the n second outputs. The target number is When n, it indicates that the label and n second outputs are consistent.

S303: Determine the first weight based on the number of targets, where the first weight is positively related to the number of targets.

It should be noted that the value range of the first weight is [0,1].

In the embodiment of the present disclosure, the first weight is positively related to the number of targets. That is, if the number of targets in the second output that is consistent with the label is larger, it indicates that the possibility of the label being correct is greater, and the first weight is also larger. On the contrary, if The smaller the number of targets in the second output that is consistent with the label, the smaller the probability that the label is correct, and the smaller the first weight.

In one embodiment, the first weight B=k/n. Among them, k is the number of targets, and n is the number of teacher models (that is, the number of second outputs).

S304: Obtain the first loss function of the student model based on the first output, label and first weight.

The relevant content of step S304 can be found in the above embodiments and will not be described again here.

S305: Compare the label with the second output of the i-th teacher model.

S306: In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1.

In the embodiment of the present disclosure, when the label is consistent with the second output of the i-th teacher model, the second weight Ci=1 corresponding to the i-th teacher model can be determined.

S307: In response to the label being inconsistent with the second output of the i-th teacher model, obtain a target number of second outputs that are consistent with the label.

S308: Based on the number of targets, determine the second weight corresponding to the i-th teacher model, where the second weight is positively related to the number of targets.

It should be noted that, for the relevant content of the target quantity, please refer to the above embodiments and will not be described again here.

In the embodiment of the present disclosure, when the label is inconsistent with the second output of the i-th teacher model, the value range of the second weight corresponding to the i-th teacher model is [0,1), that is, the second weight is Any number less than 1 and greater than or equal to 0.

In the embodiment of the present disclosure, when the label is inconsistent with the second output of the i-th teacher model, the second weight corresponding to the i-th teacher model is positively related to the target number, that is, if the second output of the i-th teacher model is consistent with the label, The larger the number of targets, the greater the possibility that the label is correct, and the greater the second weight. On the contrary, if the number of targets in the second output that is consistent with the label is smaller, it indicates that the possibility of the label being correct is smaller, and the second weight is also The smaller.

In one embodiment, when the label is inconsistent with the second output of the i-th teacher model, the second weight Ci=k/n corresponding to the i-th teacher model. Among them, k is the number of targets, and n is the number of teacher models (that is, the number of second outputs).

S309: Obtain the second loss function of the student model based on the first output, n second outputs and the second weight.

S310: Obtain the total loss function based on the first loss function and the second loss function.

S311. Update the model parameters of the student model based on the total loss function to obtain the trained target model.

The relevant content of steps S309-S311 can be found in the above embodiments and will not be described again here.

In summary, according to the model training method of the embodiment of the present disclosure, the first weight can be determined based on the target number of the second output consistent with the label, and when the label is consistent with the second output of the i-th teacher model, the first weight can be determined The second weight corresponding to the i-th teacher model is 1, or, in the case where the label is inconsistent with the second output of the i-th teacher model, the i-th teacher model is determined based on the target number of second outputs consistent with the label. The corresponding second weight.

According to an embodiment of the present disclosure, the present disclosure also provides an image processing method.

FIG. 4 is a schematic flowchart of an image processing method according to the first embodiment of the present disclosure.

As shown in Figure 4, the image processing method according to the first embodiment of the present disclosure includes: S401-S402.

S401, obtain the image to be processed.

It should be noted that the execution subject of the image processing method in the embodiment of the present disclosure may be a hardware device with data information processing capabilities and/or the necessary software required to drive the hardware device to work. In some embodiments, execution subjects may include workstations, servers, computers, user terminals and other intelligent devices. Among them, user terminals include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle-mounted terminals, etc.

It should be noted that there are no excessive restrictions on the images to be processed. In one embodiment, the images to be processed include but are not limited to two-dimensional images, three-dimensional images, etc.

In one implementation, taking the execution subject as a user terminal as an example, the user terminal can obtain the image to be processed from its own storage space, and/or can obtain the image to be processed by shooting with a camera, and/or, Obtain images to be processed from web pages and APPs (Applications).

S402: Input the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed, where the target image model is obtained using a model training method.

It should be noted that the target image model can be obtained by using the model training method described in Figures 1 to 3, which will not be described again here.

It should be noted that there are no excessive limitations on the processing results. In one embodiment, the processing results include but are not limited to action recognition results, image classification results, face recognition results, text recognition results, etc.

In the embodiment of the present disclosure, the target image model is obtained by distillation learning of n teacher models. The target image model is small in size, high in accuracy, and requires less computing resources than the teacher model. For example, if a teacher model is deployed on the user terminal, the teacher model deployed on the user terminal can be replaced with the target image model, which helps to save storage space and computing resources of the user terminal.

In summary, according to the image processing method of the embodiment of the present disclosure, the image to be processed is input into the target image model, and the target image model outputs the processing result of the image to be processed. The target image model is obtained using a model training method, and the target image model is small in size. , high precision, and small computing resources required, which helps improve image processing performance.

It should be noted that the model training method of the embodiment of the present disclosure can also be applied to speech models, language models, etc.

In one embodiment, the speech to be processed can be obtained, the speech to be processed is input into a target speech model, and the target speech model outputs the speech processing result of the speech to be processed. The target speech model is obtained using a model training method, which helps to improve speech processing performance. .

In one embodiment, the text to be processed can be obtained, the text to be processed is input into the target language model, and the target language model outputs the text processing result of the text to be processed. The target language model is obtained using a model training method, which helps to improve text processing performance. .

In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision and disclosure of user personal information are in compliance with relevant laws and regulations and do not violate public order and good customs.

According to an embodiment of the present disclosure, the present disclosure also provides a model training device for implementing the above model training method.

FIG. 5 is a block diagram of a model training device according to the first embodiment of the present disclosure.

As shown in Figure 5, the model training device 500 of the embodiment of the present disclosure includes: a first acquisition module 501, a determination module 502, a second acquisition module 503 and a training module 504.

The first acquisition module 501 is used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is positive integer.

The determining module 502 is configured to determine the weight corresponding to the training sample based on the label of the training sample and the n second outputs.

The second obtaining module 503 is used to obtain the total loss function of the student model based on the first output and the weight.

The training module 504 is used to update the model parameters of the student model based on the total loss function to obtain a trained target model.

In one embodiment of the present disclosure, the weight includes a first weight and a second weight corresponding to the teacher model; the second acquisition module 503 is further configured to: based on the first output, the label and the According to the first weight, the first loss function of the student model is obtained; based on the first output, n second outputs and the second weight, the second loss function of the student model is obtained; based on the The first loss function and the second loss function are used to obtain the total loss function.

In one embodiment of the present disclosure, the second acquisition module 503 is further configured to: acquire a first initial loss function of the student model based on the first output and the label; based on the first initial loss function and the first weight to obtain the first loss function.

In one embodiment of the present disclosure, the second acquisition module 503 is further configured to: acquire the i-th third initial loss of the student model based on the first output and the second output of the i-th teacher model. function, where 1≤i≤n, i is a positive integer; based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th student model Three loss functions; based on n third loss functions of the student model, the second loss function is obtained.

In one embodiment of the present disclosure, the second acquisition module 503 is also used to: acquire the average value of n third loss functions of the student model, and determine the average value as the second loss function .

In one embodiment of the present disclosure, the determination module 502 is further configured to: compare the label with n second outputs, and obtain a target number of second outputs consistent with the label; based on the The first weight is determined according to the target quantity, wherein the first weight is positively related to the target quantity.

In one embodiment of the present disclosure, the determination module 502 is further configured to: compare the label with the second output of the i-th teacher model; respond to the The second output is consistent, and the second weight corresponding to the i-th teacher model is determined to be 1; or, in response to the label being inconsistent with the second output of the i-th teacher model, obtaining the second weight that is consistent with the label. 2. The target number of outputs; based on the target number, determine the second weight corresponding to the i-th teacher model, wherein the second weight is positively related to the target number.

In one embodiment of the present disclosure, the determination module 502 is further configured to: convert n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag; The weight is determined based on the label and n of the second transformation outputs.

In one embodiment of the present disclosure, the second acquisition module 504 is further configured to: perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.

In summary, the model training device of the embodiment of the present disclosure inputs training samples into the student model and n teacher models respectively, obtains the first output of the student model, and obtains the second output of the n teacher models. Based on the training samples Label and n second outputs, determine the weight corresponding to the training sample, obtain the total loss function of the student model based on the first output and weight, update the model parameters of the student model based on the total loss function, and obtain the target model after training. From this, the label of the training sample and the second output of the teacher model can be comprehensively considered to determine the weight corresponding to the training sample, and then based on the first output and weight of the student model, the total loss function of the student model can be obtained, which can avoid training samples The problem of inaccurate total loss function caused by label errors improves the accuracy of the total loss function of the student model, thereby improving the accuracy of model training.

According to an embodiment of the present disclosure, the present disclosure also provides an image processing device for implementing the above image processing method.

6 is a block diagram of an image processing device according to the first embodiment of the present disclosure.

As shown in FIG. 6 , the image processing device 600 according to the embodiment of the present disclosure includes: an acquisition module 601 and a processing module 602 .

The acquisition module 601 is used to acquire images to be processed.

The processing module 602 is configured to input the image to be processed into a target image model, and the target image model outputs the processing result of the image to be processed, wherein the target image model is obtained using a model training method.

In summary, the image processing device of the embodiment of the present disclosure inputs the image to be processed into the target image model, and the target image model outputs the processing result of the image to be processed. The target image model is obtained by using a model training method. The target image model is small in size and It has high accuracy and requires small computing resources, which helps improve image processing performance.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, a computer program product, and a computer program.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, so The instructions are executed by the at least one processor, so that the at least one processor can execute the image processing method described in any of the preceding embodiments.

7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 7 , the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the electronic device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Multiple components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 709, such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.

Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs various methods and processes described above, such as the model training method described in FIGS. 1 to 3 , such as the image processing method described in FIG. 4 . For example, in some embodiments, the model training method or the image processing method may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 700 via ROM 702 and/or communication unit 709 . When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the model training method described above may be performed, or one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the model training method in any other suitable manner (eg, by means of firmware), or configured to perform the image processing method.

Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor The processor, which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

According to an embodiment of the present disclosure, the storage medium is a non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to cause the computer to execute the image processing method described in any of the foregoing embodiments.

In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

According to an embodiment of the present disclosure, a computer program product includes a computer program that, when executed by a processor, implements the image processing method described in any of the foregoing embodiments.

According to an embodiment of the present disclosure, the computer program includes computer program code. When the computer program code is run on a computer, it causes the computer to execute the image processing method described in any of the foregoing embodiments.

It should be noted that the foregoing explanations of the image processing method embodiments also apply to the image processing devices, storage media, electronic equipment, computer program products and computer programs in the embodiments, and will not be described again here.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.

Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short) Among them, there are defects such as difficult management and weak business scalability. The server can also be a distributed system server or a server combined with a blockchain.

It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in the present disclosure can be executed in parallel, sequentially, or in a different order. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.

The above-mentioned specific embodiments do not constitute a limitation on the scope of the present disclosure. It will be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions are possible depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of this disclosure shall be included in the protection scope of this disclosure.

All embodiments of the present disclosure can be executed alone or in combination with other embodiments, which are considered to be within the scope of protection claimed by the present disclosure.

Claims

An image processing method including:

Get the image to be processed; and

The image to be processed is input into a target image model, and the target image model outputs the processing result of the image to be processed,

Among them, the target image model is trained through the following steps:

Input the training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;

Based on the label of the training sample and n second outputs, determine the weight corresponding to the training sample;

Obtaining a total loss function of the student model based on the first output and the weight; and

Update the model parameters of the student model based on the total loss function to obtain the trained target model,

Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;

Wherein, obtaining the total loss function of the student model based on the first output and the weight includes:

Obtain a first loss function of the student model based on the first output, the label, and the first weight;

Obtaining a second loss function of the student model based on the first output, n second outputs and the second weight; and

Based on the first loss function and the second loss function, the total loss function is obtained,

Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:

Compare the label with n second outputs to obtain a target number of second outputs consistent with the label; and

The first weight is determined based on the target quantity, wherein the first weight is positively related to the target quantity,

Wherein, determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:

Compare the label with the second output of the i-th teacher model;

In response to the label being consistent with the second output of the i-th teacher model, it is determined that the second weight corresponding to the i-th teacher model is 1; or,

In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs that are consistent with the label;

Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
The method according to claim 1, wherein said obtaining the first loss function of the student model based on the first output, the label and the first weight includes:

Obtaining a first initial loss function for the student model based on the first output and the label; and

The first loss function is obtained based on the first initial loss function and the first weight.
The method according to claim 1 or 2, wherein obtaining the second loss function of the student model based on the first output, n second outputs and the second weight includes:

Based on the first output and the second output of the i-th teacher model, obtain the i-th third initial loss function of the student model, where 1≤i≤n, i is a positive integer;

Based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th third loss function of the student model; and

The second loss function is obtained based on n third loss functions of the student model.
The method according to claim 3, wherein said obtaining the second loss function based on n third loss functions of the student model includes:

Obtain the average value of n third loss functions of the student model, and determine the average value as the second loss function.
The method according to any one of claims 1 to 4, wherein determining the weight corresponding to the training sample based on the label of the training sample and n second outputs includes:

Convert n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag;

The weight is determined based on the label and n of the second transformation outputs.
The method according to any one of claims 1 to 5, wherein said obtaining the total loss function based on the first loss function and the second loss function includes:

Perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
An image processing device, including:

Acquisition module, used to obtain images to be processed;

A processing module, configured to input the image to be processed into a target image model, and the target image model to output the processing result of the image to be processed, wherein the target image model is trained through the following modules: a first acquisition module, Used to input training samples into the student model and n teacher models respectively, obtain the first output of the student model, and obtain the second output of the n teacher models, where n is a positive integer;

A determination module configured to determine the weight corresponding to the training sample based on the label of the training sample and n second outputs;

A second acquisition module, configured to acquire the total loss function of the student model based on the first output and the weight; and

A training module, used to update the model parameters of the student model based on the total loss function to obtain the trained target model,

Wherein, the weight includes a first weight and a second weight corresponding to the teacher model;

Wherein, the second acquisition module is also used for:

Obtain a first loss function of the student model based on the first output, the label, and the first weight;

Obtain a second loss function of the student model based on the first output, n second outputs and the second weight;

Based on the first loss function and the second loss function, obtain the total loss function;

Among them, the determination module is also used to:

Compare the label with n second outputs to obtain a target number of second outputs consistent with the label;

Based on the target quantity, the first weight is determined, wherein the first weight is positively related to the target quantity;

Among them, the determination module is also used to:

Compare the label with the second output of the i-th teacher model;

In response to the label being consistent with the second output of the i-th teacher model, determine the second weight corresponding to the i-th teacher model to be 1; or,

In response to the label being inconsistent with the second output of the i-th teacher model, obtaining a target number of second outputs consistent with the label;

Based on the target number, a second weight corresponding to the i-th teacher model is determined, where the second weight is positively related to the target number.
The device according to claim 7, wherein the second acquisition module is also used for:

Obtaining a first initial loss function of the student model based on the first output and the label;

The first loss function is obtained based on the first initial loss function and the first weight.
The device according to claim 7 or 8, wherein the second acquisition module is also used for:

Based on the first output and the second output of the i-th teacher model, obtain the i-th third initial loss function of the student model, where 1≤i≤n, i is a positive integer;

Based on the i-th third initial loss function and the second weight corresponding to the i-th teacher model, obtain the i-th third loss function of the student model; and

The second loss function is obtained based on n third loss functions of the student model.
The device according to claim 9, wherein the second acquisition module is also used for:

Obtain the average value of n third loss functions of the student model, and determine the average value as the second loss function.
The device according to any one of claims 7 to 10, wherein the determining module is also used to:

converting n second outputs into n second conversion outputs in the data format of the tag according to the data format of the tag; and

The weight is determined based on the label and n of the second transformation outputs.
The device according to any one of claims 7 to 11, wherein the second acquisition module is also used to:

Perform a weighted sum of the first loss function and the second loss function to obtain the total loss function.
An electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform as claimed in any one of claims 1-6. The image processing method described above.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the image processing method according to any one of claims 1-6.
A computer program product, including a computer program that implements the image processing method according to any one of claims 1-6 when executed by a processor.
A computer program, including computer program code, when the computer program code is run on a computer, causes the computer to perform the image processing method according to any one of claims 1-6.