CN115170919A

CN115170919A - Image processing model training method, image processing device, image processing equipment and storage medium

Info

Publication number: CN115170919A
Application number: CN202210759709.4A
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-10-11
Anticipated expiration: 2042-06-29
Also published as: CN115170919B

Abstract

The disclosure provides an image processing model training method, an image processing device, an image processing apparatus and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, image processing and computer vision. The image processing model training method comprises the following steps: converting first image features output by the teacher model into a first probability distribution; converting second image features output by the student model into second probability distribution; constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions; based on the loss function, model parameters of the student model are adjusted. The present disclosure may improve the accuracy of the trained student model.

Description

Image processing model training method, image processing device, image processing equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning, image processing, and computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image processing model training and image processing.

Background

Knowledge distillation (knowledge distillation) is a common method for model compression, and is different from pruning and quantification in model compression, and knowledge distillation is to train a small model by constructing a lightweight small model and utilizing supervision information of a large model with better performance so as to achieve better performance and precision. This large model is called the teacher (teacher) model and the small model is called the student (student) model. The supervised information from the output of the teacher model is called knowledge (knowledge), while the process by which the student model learns the supervised information from the teacher model is called distillation (distillation).

Disclosure of Invention

The disclosure provides an image processing model training method, an image processing model training device, an image processing method, an image processing device and a storage medium.

According to an aspect of the present disclosure, there is provided an image processing model training method, including: converting first image features output by the teacher model into a first probability distribution; converting second image features output by the student model into second probability distribution; constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions; based on the loss function, model parameters of the student model are adjusted.

According to another aspect of the present disclosure, there is provided an image processing method including: the acquisition module is used for acquiring an image to be processed; the extraction module is used for extracting the image characteristics of the image to be processed by adopting an image characteristic extraction model; the determining module is used for acquiring an image processing result of the image to be processed based on the image characteristics; wherein the image feature extraction model is a student model trained by the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided an image processing model training apparatus including: the first conversion module is used for converting the first image characteristics output by the teacher model into first probability distribution; the second conversion module is used for converting the second image characteristics output by the student model into second probability distribution; a construction module for constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions; and the adjusting module is used for adjusting the model parameters of the student model based on the loss function.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: acquiring an image to be processed; extracting the image characteristics of the image to be processed by adopting an image characteristic extraction model; acquiring an image processing result of the image to be processed based on the image characteristics; wherein the image feature extraction model is a student model trained by the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of the above aspects.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the above aspects.

According to the technical scheme disclosed by the invention, the precision of the trained student model can be improved.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flowchart of an image processing model training method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an application scenario for implementing an image processing model training method or an image processing method according to an embodiment of the present disclosure;

FIG. 3 is a diagram of an image processing model training architecture provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of another image processing model training method provided by the embodiments of the present disclosure;

fig. 5 is a flowchart of an image processing method provided by an embodiment of the present disclosure;

FIG. 6 is a block diagram of an image processing model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an electronic device for implementing an image processing model training method or an image processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, knowledge distillation is to directly monitor the image features output by the student model, for example, an L2 loss function is constructed by using the image features output by the teacher model and the image features output by the student model, and model parameters of the learning model are adjusted based on the L2 loss function, so that the image features output by the student model are as close as possible to the image features output by the teacher model.

However, when the structural difference between the teacher model and the student model is large, the way of directly monitoring the image features output by the student model may result in poor accuracy of the trained student model.

To improve the model accuracy, the present disclosure provides the following embodiments.

Fig. 1 is a flowchart of an image processing model training method provided in an embodiment of the present disclosure, and as shown in fig. 1, the method includes:

101. the first image features output by the teacher model are converted into a first probability distribution.

102. And converting the second image characteristics output by the student model into a second probability distribution.

103. Constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions.

104. Based on the loss function, model parameters of the student model are adjusted.

The teacher model and the student models are deep neural network models, and compared with the student models, the teacher model is complex in structure and high in performance, and the student models serving as the small models can learn knowledge of the teacher model serving as the large model through a knowledge distillation process, so that the performance of the student models is improved.

For the field of image processing, the teacher model and the student model may be referred to as image processing models. Further, the models may be all models for extracting image features, and may also be referred to as image feature extraction models. Specifically, the model structure may be a Convolutional Neural Network (CNN) model, that is, the teacher model may be a CNN model with a larger scale, and the student model may be a CNN model with a smaller scale. More specifically, the teacher model is, for example, a ResNet model, and the student model is, for example, a MobileNet model.

In the field of image processing, the input of a teacher model and the input of a student model are both images, the output of the teacher model is image characteristics, for the purpose of distinguishing, the image characteristics output by the teacher model can be called first image characteristics, and the image characteristics output by the student model can be called second image characteristics.

Unlike the related art in which the loss function is constructed directly based on the first image feature and the second image feature, in this embodiment, the first image feature may be converted into the first probability distribution and the second probability distribution, respectively, and the loss function may be constructed based on the first probability distribution and the second probability distribution.

Wherein a normalization function may be employed to convert the image features into a probability distribution.

The normalization function is for example a softmax function.

The above 101 and 102 have no timing limitation relationship.

When the loss function is constructed, the prior probability distribution of the student model can be determined, and the loss function is constructed based on the prior probability distribution, the first probability distribution and the second probability distribution.

The prior probability distribution is the probability distribution which does not depend on observation data, the randomness of the student model can be expressed, the flexibility can be improved by considering the prior probability distribution of the student model, and the accuracy of the trained student model is further improved.

After obtaining the loss function, the model parameters of the student model may be adjusted by using a general model parameter updating algorithm, such as Back Propagation (BP) algorithm.

In the embodiment, the image features are converted into the probability distribution, the loss function is constructed based on the probability distribution instead of the loss function directly constructed based on the image features, and due to the adoption of the probability distribution, the student model can learn more knowledge, and the precision of the student model is improved. In addition, the prior probability distribution of the student model is also considered when the loss function is constructed, so that the flexibility can be improved, and the precision of the student model is further improved.

For better understanding of the embodiments of the present disclosure, an application scenario to which the embodiments of the present disclosure are applicable is described below. The present embodiment takes the image processing field as an example. The image processing includes, for example: face recognition, target detection, target classification, and the like.

Taking face recognition as an example, the final student model can be used for face recognition. For example, as shown in fig. 2, a user may install an Application (APP) capable of performing face recognition on a mobile device (e.g., a mobile phone), where the APP may collect a face image through a face collecting device (e.g., a camera) on the mobile device, and then, if the mobile device itself has a face recognition capability, for example, the APP configures a student model for face recognition locally on the mobile device, the student model may be used to perform face recognition on the collected face image locally on the mobile device.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

It can be understood that, the above-mentioned face recognition at the mobile device side is taken as an example, and also a student model may be deployed at the server side, at this time, the APP may send the acquired face image to the server side, and the server side performs face recognition on the received face image based on the configured student model for face recognition.

The left side of fig. 2 shows the application process of the student model, for example, face recognition using the student model.

In order to apply the student model, the student model needs to be obtained first, and the student model can be obtained based on knowledge distillation training.

As shown in the right side of fig. 2, the training process may be performed by the server 202, that is, the training of the student model is completed in the server, and the server may send the trained student model to the mobile device, so as to perform face recognition locally on the mobile device by using the student model.

The knowledge distillation framework can comprise a teacher model and a student model, wherein the teacher model is a trained model with a larger scale, and the student model is a model to be trained with a smaller scale.

Taking face recognition as an example, the teacher model may perform feature extraction processing on the image sample to output a first image feature, and the student model may also perform feature extraction processing on the image sample to output a second image feature. The image feature may be embodied as a feature map (feature map).

The image samples may be from an existing sample set, such as ImageNet.

In addition, the image sample corresponding to the teacher model may be referred to as a first image, the image sample corresponding to the student model may be referred to as a second image, and the first image and the second image may be from the same image, for example, obtained by performing different data enhancement processing methods on the same image.

In this embodiment, the first image and the second image are from the same image, and the performance of the student model can be improved by minimizing the subsequent mutual information loss function by using the maximum mutual information characteristic of the homologous data in combination with the subsequent mutual information loss function.

Since the number of samples in the sample set is limited, in order to obtain a larger sample size, in this embodiment, different data enhancement processing may be performed on the same image sample to obtain a first image and a second image, where the first image is used as an input of the teacher model and the second image is used as an input of the student model.

As shown in fig. 3, the same image sample may be referred to as an original image, and a first data enhancement process may be performed on the original image to obtain a first image, and a second data enhancement process may be performed on the original image to obtain a second image, where the first data enhancement process is different from the second data enhancement process.

The data enhancement processing includes, for example: cropping, rotating, blocking, size transforming, modifying brightness, etc.

The first data enhancement processing and the second data enhancement processing may be of the same kind of processing, e.g. both being rotation, or both being brightness modification.

In addition, the processing intensity of the second data enhancement processing is greater than the processing intensity of the first data enhancement processing. Taking the rotation as an example, the rotation angle of the second data enhancement processing is larger than that of the first data enhancement processing.

Because the first data enhancement processing corresponds to the teacher model, and the second data enhancement processing corresponds to the student model, the performance of the teacher model is stronger than that of the student model, so that the teacher model can transmit more knowledge even if a processing mode with lower strength is adopted, and the student model can learn more knowledge and improve the precision of the student model corresponding to a processing mode with higher strength.

The first image may output the first image characteristic after the first image is input to the teacher model, and the second image may output the second image characteristic after the second image is input to the student model. The image features can then be converted to a probability distribution using a softmax function pair. As shown in fig. 3, the first probability distribution corresponding to the teacher model is denoted by t _ logic, and the second probability distribution corresponding to the student model is denoted by s _ logic.

Wherein a joint probability distribution can be constructed based on the first probability distribution (t _ logic) and the second probability distribution (s _ logic).

In addition, a Gaussian distribution conforming to N (0,1) can also be obtained randomly and based on a prior Gaussian distribution conforming to N (m, d), where m and d are learnable parameters.

Then, a mutual information loss function based on gaussian priors can be constructed based on the joint probability distribution, the prior gaussian distribution conforming to N (m, d), and the posterior probability distribution (i.e., s _ logit).

After obtaining the loss function, the model parameters of the student model can be updated by adopting a BP algorithm, and the learnable parameters m and d are updated until reaching the preset iteration times, and then the process is finished, so that the final student model is obtained.

In conjunction with the architecture shown in fig. 3, the present disclosure also provides a model training method.

Fig. 4 is a flowchart of another image processing model training method provided in the embodiment of the present disclosure, where the method provided in this embodiment includes:

401. the original image is subjected to a first data enhancement process to obtain a first image.

402. And performing feature extraction processing on the input first image by adopting a teacher model so as to output first image features.

403. The first image feature is converted into a first probability distribution.

404. And performing second data enhancement processing on the original image to obtain a second image.

405. And performing feature extraction processing on the input second image by adopting a student model to output second image features.

406. Converting the second image feature into a second probability distribution.

Wherein the original image may be an image obtained in an existing sample set.

The first data enhancement processing is different from the second data enhancement processing, so that two different first and second images can be obtained based on the same original image.

The first data enhancement processing and the second data enhancement may be of the same type of data enhancement processing, e.g. both being a rotation operation.

In addition, the processing intensity of the second data enhancement processing may be greater than that of the first data enhancement processing, for example, the rotation angle of the second data enhancement processing is greater than that of the first data enhancement processing.

In this embodiment, the first image and the second image are obtained by performing different data enhancement processing on the original image, and the first image and the second image can be obtained on the basis of a small sample size, so that the accuracy of the student model is improved, and the robustness of the student model can be improved.

The teacher model and the student models may both be deep neural network models for extracting image features, the teacher model being a trained model, and the student models being models to be trained.

The teacher model is, for example, a ResNet model, and the student model is, for example, a MobileNet model.

The input of the teacher model and the student model is images, and the output is image characteristics.

The image features can be converted to a probability distribution using the softmax function.

401-403 and 404-406 have no timing constraint relationship.

407. Constructing a joint probability distribution based on the first probability distribution and the second probability distribution.

The loss function used in this embodiment is a mutual information loss function, which is expressed by a formula:

Mutual_loss＝p(s,t)*(log(p(s))+logp(t)-log(p(s,t))) (1)

where, mutual _ loss is a Mutual information loss function, p (s, t) is a joint probability distribution of the teacher model and the student model, p(s) is a prior probability distribution of the student model, and p (t) is a prior probability distribution of the teacher model.

According to the Bayesian formula: p (s, t) = p (t) × p (s | t), and substituting into equation (1) can yield:

Mutual_loss＝p(s,t)*(log(p(s))-log(p(s|t))) (2)

where is the posterior probability distribution of the student model, i.e., the second probability distribution, in conjunction with figure 3,

p(s|t)＝s_logit。

therefore, a loss function as shown in equation (2) can be constructed.

Since the loss function is related to the joint probability distribution of the teacher model and the student model, the prior probability distribution of the student model, and the posterior probability distribution, the joint probability distribution described above needs to be calculated.

The first probability distribution can be represented by t _ location and the second probability distribution can be represented by s _ location, where the dimensions of t _ location and s _ location are both (N, C), where N is the number of first or second images and C is the dimension of a feature, such as the number of classes, of a single image.

The calculation formula for the joint probability distribution may be:

p(s,t)＝t_logit ^T *s_logit ^T

the superscripted T denotes the transpose operation.

Thus, the dimension of p (s, t) is (C, C).

Further, to obtain a more accurate joint probability distribution, the calculation formula of the joint probability distribution may be:

p1＝(s_logit.unsqueeze(2)*t_logit.unsqueeze(1)).sum(dim＝0)

p2＝(p1+p1.t())/2；

p(s,t)＝p2/p2.sum(dim＝0)；

where p1 is the similarity of t _ location and s _ location, unsqueeze () is a dimension-extending operation, the dimension of s _ location.unsqueeze (2) is (N, C, 1), the dimension of t _ location.unsqueeze (1) is (N, 1,C), sum (dim = 0) is summed for the first dimension, and thus, the dimension of p1 is (C, C). p2 is the mean operation for p1, and p1.T () is the transpose of p1, so the dimension of p2 is also (C, C). p (s, t) is normalized for p2, and the dimension of p (s, t) is also (C, C).

Therefore, the joint probability distribution p (s, t) can also be calculated based on the above calculation formulas of p1, p2, and p (s, t).

408. An initial probability distribution of the student model is determined.

The student model can be randomly initialized by adopting Gaussian distribution to obtain initial probability distribution of the student model.

For example, as shown in FIG. 3, the initial probability distribution is a randomized Gaussian distribution function that fits the N (0,1) distribution.

It will be appreciated that it may also be assumed that the initial probability distribution is another distribution function, such as a laplacian distribution.

The Gaussian distribution is a relatively common distribution function and can basically cover all variable distribution conditions, so that the initial probability distribution of the student model is determined by adopting the Gaussian distribution, the real prior distribution condition of the student model can be matched, and the accuracy of the student model is improved.

409. Determining a prior probability distribution of the student model based on the learnable distribution parameter and the initial probability distribution.

The calculation formula of the prior probability distribution of the student model may be:

p(s)＝m+d*a；

wherein p(s) is the prior probability distribution of the student model; m and d are learnable distribution parameters; a is an initialization probability distribution, such as a Gaussian distribution function that fits an N (0,1) distribution.

It will be appreciated that the prior probability distribution described above may be determined based on one or more sets of learnable distribution parameters.

For example, p(s) = ∑ Σ _i (m _i +d _i *a)

Wherein (m) _i ,d _i ) Is the ith (i =1,2,3.. N) component distribution parameter, N being the number of groups of distribution parameters. In addition, the above-mentioned addition of functions corresponding to a plurality of sets of distribution parameters may be weighted addition.

In the embodiment, the prior probability distribution of the student model is determined based on the learnable distribution parameters and the initial probability distribution, and the distribution parameters can be adjusted in the training process because the distribution parameters are learnable, so that the student model better approaches to the optimal solution, and the precision of the student model is improved. In addition, the learnable distribution parameters are used for determining prior probability distribution, and due to the fact that the learnable distribution parameters act on the prior process, the flexibility of model training can be improved, and the precision of the student model is further improved.

For example, for a scene based on the face image to identify the age, the identification result is that the user can be identified as a child, an adult, or an old person, generally speaking, the identification effect of the adult is better, while the identification effects of the child and the old person are poorer.

Wherein 401-407 and 408-409 have no timing restriction relationship.

410. And constructing a mutual information loss function based on Gaussian distribution based on the joint probability distribution, the prior probability distribution and the posterior probability distribution of the student model.

The posterior probability distribution of the student model is the second probability distribution, i.e. s _ logic in fig. 3, and p (s | t) in the above formula.

Based on the above equation (2), the loss function can be calculated using the above joint probability distribution, prior probability distribution, and posterior probability distribution.

411. Based on the loss function, model parameters of a model student model are adjusted, as well as the learnable distribution parameters.

The training process may be divided into multiple iteration processes, and in each iteration process, a common parameter adjustment algorithm, such as a BP algorithm, may be used to adjust the model parameters and the distribution parameters until a preset number of iterations is reached. And taking the model parameters when the preset iteration times are reached as the model parameters of the finally generated student model.

In the embodiment, a Bayesian formula is utilized, joint probability distribution is constructed based on first probability distribution and second probability distribution, and mutual information loss function is constructed based on joint probability distribution, prior probability distribution and posterior probability distribution, so that the mutual information loss function of the embodiment is different from a general mutual information loss function, but is determined based on prior probability distribution.

In this embodiment, the learnable distribution parameters are adjusted based on the loss function, so that the distribution parameters are adjustable in the training process of the student model, thereby further improving the flexibility and the precision of the student model.

The above describes a model training process, and after training, a trained student model can be obtained. In the model application stage, the student model may be used for image processing.

Fig. 5 is a flowchart of an image processing method according to an embodiment of the present disclosure, and as shown in fig. 5, the image processing method includes:

501. and acquiring an image to be processed.

501. And extracting the image characteristics of the image to be processed by adopting an image characteristic extraction model.

503. Based on the image features, an image processing result is obtained.

The image feature extraction model can be a student model trained by any one of the methods.

In the stage of model application, an image extraction model can be adopted to extract image features, and then an image processing result is obtained based on the image features.

Taking face recognition as an example, the image to be processed may be a face image. Accordingly, the image feature may be an image feature of a face image.

Based on different application scenes, the image features can be input into a model of a related downstream task for processing so as to output an image processing result.

Still taking face recognition as an example, face recognition can be considered as a classification task, and therefore, the image features can be input into a classification model, and the output of the classification model is the face recognition result, such as a face image of which is determined among a plurality of candidates, or an age group of a user to which the face image corresponds. The specific structure of the classification model can be implemented by various related technologies, such as a fully connected network.

In this embodiment, the image feature extraction model is a student model obtained by using the training method, and since the precision of the trained student model is high, the student model can obtain image features with high precision, and the accuracy of an image processing result can be further improved.

Fig. 6 is a block diagram of an image processing model training apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the apparatus 600 includes: a first conversion module 601, a second conversion module 602, a construction module 603, and a determination module 604.

The first conversion module 601 is used for converting the first image features output by the teacher model into a first probability distribution; the second conversion module 602 is configured to convert a second image feature output by the student model into a second probability distribution; the building module 603 is configured to build a loss function based on the prior probability distribution of the student model, and the first probability distribution and the second probability distribution; the adjusting module 604 is configured to adjust model parameters of the student model based on the loss function.

In some embodiments, the loss function is a mutual information loss function, and the constructing module 603 is further configured to: constructing a joint probability distribution based on the first probability distribution and the second probability distribution; determining a prior probability distribution of the student model; taking the second probability distribution as a posterior probability distribution of the student model; and constructing the mutual information loss function based on the joint probability distribution, the prior probability distribution and the posterior probability distribution.

In the embodiment, a Bayesian formula is utilized, joint probability distribution is constructed based on the first probability distribution and the second probability distribution, and mutual information loss functions are constructed based on the joint probability distribution, prior probability distribution and posterior probability distribution, so that the mutual information loss functions of the embodiment are different from common mutual information loss functions and are determined based on the prior probability distribution.

In some embodiments, the loss function is a mutual information loss function, and the constructing module 603 is further configured to: the building module 603 is further configured to: determining an initial probability distribution of the student model; determining a prior probability distribution of the student model based on the learnable distribution parameter and the initial probability distribution.

In some embodiments, the apparatus 600 further comprises: a learning module to adjust the learnable distribution parameter based on the loss function.

The prior probability distribution is determined based on the learnable distribution parameters, and the learnable distribution parameters are adjusted based on the loss function, so that the distribution parameters are adjustable in the training process of the student model, the flexibility can be further improved, and the precision of the student model is improved.

In some embodiments, the building module 603 is further configured to: and randomly initializing the student model by adopting Gaussian distribution to obtain initial probability distribution of the student model.

As the Gaussian distribution is a relatively common distribution function and can basically cover all variable distribution conditions, the initial probability distribution of the student model is determined by adopting the Gaussian distribution, the real prior distribution condition of the student model can be matched, and the accuracy of the student model is improved.

In some embodiments, the apparatus 600 further comprises: the first feature extraction module is used for performing feature extraction processing on the input first image by adopting the teacher model so as to output the first image feature; and/or the second feature extraction module is used for performing feature extraction processing on the input second image by adopting the student model so as to output the second image feature; wherein the first image and the second image are from the same image.

In this embodiment, the first image and the second image are from the same image, and the performance of the student model can be improved by minimizing the subsequent mutual information loss function by using the maximum mutual information characteristic of homologous data in combination with the subsequent mutual information loss function.

In some embodiments, the apparatus 600 further comprises: the first data enhancement module is used for carrying out first data enhancement processing on an original image to obtain a first image; the second data enhancement module is used for performing second data enhancement processing on the original image to obtain a second image; wherein the first data enhancement processing is different from the second data enhancement processing.

Fig. 7 is a structural diagram of an image processing apparatus according to an embodiment of the disclosure, and as shown in fig. 7, the apparatus 700 includes: an acquisition module 701, an extraction module 702, and a determination module 703.

The obtaining module 701 is used for obtaining an image to be processed; the extraction module 702 is configured to extract an image feature of the image to be processed by using an image feature extraction model; the determining module 703 is configured to obtain an image processing result of the image to be processed based on the image feature.

The image feature extraction model is a student model trained by adopting any one of the training methods.

It is to be understood that "first", "second", and the like in the embodiments of the present disclosure are used for distinction only, and do not indicate the degree of importance, the order of timing, and the like.

It is to be understood that in the disclosed embodiments, the same or similar elements in different embodiments may be referenced.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as a model training method or an image processing method. For example, in some embodiments, the model training method or the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the model training method or the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the model training method or the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image processing model training method, comprising:

converting first image features output by the teacher model into a first probability distribution;

converting second image features output by the student model into second probability distribution;

constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions;

based on the loss function, model parameters of the student model are adjusted.

2. The method of claim 1, wherein the loss function is a mutual information loss function, the constructing a loss function based on the first probability distribution and the second probability distribution comprising:

constructing a joint probability distribution based on the first probability distribution and the second probability distribution;

determining a prior probability distribution of the student model;

taking the second probability distribution as a posterior probability distribution of the student model;

and constructing the mutual information loss function based on the joint probability distribution, the prior probability distribution and the posterior probability distribution.

3. The method of claim 2, wherein said determining a prior probability distribution of said student model comprises:

determining an initial probability distribution of the student model;

determining a prior probability distribution of the student model based on the learnable distribution parameter and the initial probability distribution.

4. The method of claim 3, further comprising:

adjusting the learnable distribution parameter based on the loss function.

5. The method of claim 3, wherein the determining an initial probability distribution of the student model comprises:

and randomly initializing the student model by adopting Gaussian distribution to obtain initial probability distribution of the student model.

6. The method of any of claims 1-5, further comprising:

performing feature extraction processing on the input first image by using the teacher model to output the first image feature; and/or the presence of a gas in the gas,

performing feature extraction processing on the input second image by adopting the student model to output the second image feature;

wherein the first image and the second image are from the same image.

7. The method of claim 6, further comprising:

performing first data enhancement processing on an original image to obtain a first image;

performing second data enhancement processing on the original image to obtain a second image;

wherein the first data enhancement processing is different from the second data enhancement processing.

8. An image processing method comprising:

acquiring an image to be processed;

extracting the image characteristics of the image to be processed by adopting an image characteristic extraction model;

acquiring an image processing result of the image to be processed based on the image characteristics;

wherein the image feature extraction model is a student model trained using the method of any one of claims 1-7.

9. An image processing model training apparatus comprising:

the first conversion module is used for converting the first image characteristics output by the teacher model into first probability distribution;

the second conversion module is used for converting the second image characteristics output by the student model into second probability distribution;

a construction module for constructing a loss function based on the prior probability distribution of the student model, and the first and second probability distributions;

and the adjusting module is used for adjusting the model parameters of the student model based on the loss function.

10. The apparatus of claim 9, wherein the loss function is a mutual information loss function, the construction module further configured to:

determining a prior probability distribution of the student model;

11. The apparatus of claim 10, wherein the build module is further to:

determining an initial probability distribution of the student model;

12. The apparatus of claim 11, further comprising:

a learning module to adjust the learnable distribution parameter based on the loss function.

13. The apparatus of claim 11, wherein the build module is further to:

14. The apparatus of any of claims 9-13, further comprising:

the first feature extraction module is used for performing feature extraction processing on the input first image by adopting the teacher model so as to output the first image feature; and/or the presence of a gas in the gas,

the second feature extraction module is used for performing feature extraction processing on the input second image by adopting the student model so as to output the second image features;

wherein the first image and the second image are from the same image.

15. The apparatus of claim 14, further comprising:

the first data enhancement module is used for carrying out first data enhancement processing on an original image to obtain a first image;

the second data enhancement module is used for carrying out second data enhancement processing on the original image so as to obtain a second image;

16. An image processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed;

the extraction module is used for extracting the image characteristics of the image to be processed by adopting an image characteristic extraction model;

the determining module is used for acquiring an image processing result of the image to be processed based on the image characteristics;

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.