CN114492731A - Training method and device of image processing model and electronic equipment - Google Patents

Training method and device of image processing model and electronic equipment Download PDF

Info

Publication number
CN114492731A
CN114492731A CN202111595617.9A CN202111595617A CN114492731A CN 114492731 A CN114492731 A CN 114492731A CN 202111595617 A CN202111595617 A CN 202111595617A CN 114492731 A CN114492731 A CN 114492731A
Authority
CN
China
Prior art keywords
image
model
prediction
target
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111595617.9A
Other languages
Chinese (zh)
Inventor
张林峰
陈昕
涂小兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111595617.9A priority Critical patent/CN114492731A/en
Publication of CN114492731A publication Critical patent/CN114492731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a training method of an image processing model, including: inputting a sample image into a target model for prediction to obtain a first prediction image, wherein the target model is obtained by training a first preset machine learning model; acquiring a first target image with an image frequency higher than a preset frequency from the first prediction image, wherein the image frequency is used for representing the change condition of a pixel gray value of the image in space; and training the second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameter number and the calculated amount of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of the image.

Description

Training method and device of image processing model and electronic equipment
Technical Field
The present disclosure relates to the field of machine learning, and in particular, to a method and an apparatus for compressing a neural network model, and an electronic device.
Background
In recent years, the generation of a countermeasure neural network (GAN) has made a major breakthrough in many fields such as image generation, and has been widely used and has achieved excellent effects in tasks such as image stylization and image editing. However, the existing GAN model often has a large number of parameters and calculations. Therefore, such models are difficult to run on devices with limited computing and storage capabilities, which severely limits the application scope of GAN models.
Image stylization is one of the most common applications of GAN models, and it can perform some stylistic conversion on input images, such as converting ordinary natural pictures into hand-drawn style pictures. However, standard image stylization models (e.g., cyclic antagonism generation network CycleGAN) require about 56.8Billion multiply-add, X Million, parameters for processing of a single picture. Generally, the computing and storage capabilities of the current common mobile phone devices are limited. Therefore, if a standard image stylization model is directly deployed in a mobile phone device, the running speed of the model is extremely slow.
At present, due to the fact that the parameter number and the calculation amount of the generated countermeasure network model are large, the generated countermeasure network model is difficult to operate on equipment with limited calculation and storage capacities, and application of the generated countermeasure network model is limited.
Disclosure of Invention
The invention provides a training method and a training device for an image processing model and electronic equipment, which are used for at least solving the problem that the application of generating an antagonistic network model is limited due to the fact that the antagonistic network model generated in the related technology is difficult to operate on equipment with limited computing and storage capacities because the parameter quantity and the calculated quantity are large. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a training method of an image processing model, including: inputting a sample image into a target model for prediction to obtain a first prediction image, wherein the target model is obtained by training a first preset machine learning model; acquiring a first target image with an image frequency higher than a preset frequency from the first prediction image, wherein the image frequency is used for representing the change condition of a pixel gray value of the image in space; and training the second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameter number and the calculated amount of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of the image.
Optionally, the first preset machine learning model and the second preset machine learning model both include: and generating a confrontation network model.
Optionally, training a second preset machine learning model based on the sample image and the first target image to obtain a second target model, including: inputting the sample image and the first target image into a second preset machine learning model for prediction to obtain a second predicted image; acquiring a second target image with the image frequency higher than the preset frequency from a second prediction image; determining a loss function for the first target image and the second target image; and optimizing the loss function, and obtaining the image processing model under the condition that the value of the loss function is lower than a preset threshold value.
Optionally, the obtaining a first target image with an image frequency higher than a preset frequency from a first predictive image and obtaining a second target image with an image frequency higher than a preset frequency from a second predictive image comprises: respectively carrying out frequency decomposition on the first prediction image and the second prediction image to obtain a first prediction sub-image with different frequencies and a second prediction sub-image with different frequencies; and respectively screening a first target image and a second target image from the first prediction sub-image with different frequencies and the second prediction sub-image with different frequencies.
Optionally, the first prediction image and the second prediction image are respectively subjected to frequency decomposition, which is implemented by one of the following methods: discrete wavelet transform, discrete fourier transform, and discrete cosine transform.
Alternatively, in the case where the first prediction image and the second prediction image are frequency-decomposed by discrete wavelet transform, the method further comprises: respectively performing frequency decomposition on the first prediction image and the second prediction image through a plurality of wavelet basis functions; and respectively carrying out single frequency decomposition or multiple frequency decomposition on the first prediction image and the second prediction image through any one of the wavelet basis functions.
Optionally, after obtaining the image processing model, the method further includes: replacing the target model with the image processing model; and performing prediction processing on the image by using the image processing model.
According to a second aspect of the embodiments of the present disclosure, there is also provided an apparatus for training an image processing model, including: the prediction module is configured to input a sample image into a target model for image prediction to obtain a first predicted image, wherein the target model is obtained by training a first preset machine learning model; the acquisition module is configured to acquire a first target image with an image frequency higher than a preset frequency from the first prediction image, wherein the image frequency is used for representing the change situation of a pixel gray value of the image in space; and the training module is configured to perform training on the second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameter and the calculated amount of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of the image.
Optionally, the first preset machine learning model and the second preset machine learning model both include: and generating a confrontation network model.
Optionally, a training module comprising: the prediction unit is configured to input the sample image and the first target image into a second preset machine learning model for prediction to obtain a second prediction image; an acquisition unit configured to perform acquisition of a second target image having an image frequency higher than a preset frequency from a second prediction image; a determining unit configured to perform a loss function that determines a first target image and a second target image; and the processing unit is configured to perform optimization on the loss function, and obtain the image processing model under the condition that the value of the loss function is lower than a preset threshold value.
Optionally, the apparatus further includes: a first frequency decomposition module configured to perform frequency decomposition on the first prediction image and the second prediction image respectively to obtain a first prediction sub-image with different frequencies and a second prediction sub-image with different frequencies; and the screening module is configured to screen a first target image and a second target image from the first prediction sub-image with different frequencies and the second prediction sub-image with different frequencies respectively.
Optionally, the frequency decomposition module is configured to perform frequency decomposition of the first and second predicted images by one of the following methods: discrete wavelet transform, discrete fourier transform, and discrete cosine transform.
Alternatively, in the case where the first prediction image and the second prediction image are frequency-decomposed by discrete wavelet transform, the apparatus further includes: a second frequency decomposition module configured to perform frequency decomposition of the first and second prediction images by a plurality of wavelet basis functions, respectively; and the third frequency decomposition module is configured to perform single frequency decomposition or multiple frequency decomposition on the first prediction image and the second prediction image respectively through any one of the wavelet basis functions.
Optionally, the apparatus further comprises: a replacement module configured to perform replacement of the target model with the image processing model; an image processing module configured to perform a prediction process on the image using the image processing model.
According to a third aspect of the embodiments of the present disclosure, there is also provided an electronic apparatus, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the above training method of the image processing model.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions of the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above training method of an image processing model.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the above training method of an image processing model.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: compression of the generation of the countermeasure network model is achieved, and the quality of a generated image of the generation of the countermeasure network model is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of training an image processing model in accordance with an exemplary embodiment;
FIG. 2 is a block diagram illustrating an arrangement of an image processing model training apparatus according to an exemplary embodiment;
fig. 3 is a block diagram illustrating a structure of an electronic device according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
gan (generic adaptive network), a generative countermeasure network, is a deep neural network architecture, and generally includes a network of discriminators and a network of generators, wherein the network of generators is used to generate required images or other types of data, and the discriminators are used to determine whether the data is generated by the generators or sampled from a data set.
Dwt (discrete Wavelet transform), which is the result of discretizing the scale and translation of the basic Wavelet transform, is a common tool for frequency analysis.
FID (Freehet inclusion distance), the Freehet inclusion distance, is a common index for measuring the performance of GAN models. Generally, the larger the FID of the real picture and the GAN model generation picture is, the worse the performance of the GAN model is, and vice versa, the better the performance is.
The CycleGAN, cyclic confrontation generation network, is a GAN model based on cyclic consistency loss, and the model can be trained without paired training data to realize the effect of image-to-image translation.
The knowledge distillation algorithm is one of the most effective algorithms to achieve neural network model compression. Firstly, a teacher model with large parameter and calculation amount is trained, and then a student model with small parameter and calculation amount is trained. In the training process of the student model, the knowledge in the teacher network is migrated to the student model, so that the performance of the student model is improved. When the method is applied, the teacher model is replaced by the student model, and the compression of the network can be realized. However, most of the existing knowledge-based distillation algorithms are designed for advanced vision tasks, but GAN models are often used for low-level vision tasks. Although there has been little work currently attempting to apply the knowledge distillation to the compression of GAN models, the effect is not ideal.
In 2020, Muyang Li et al proposed a GAN Compression algorithm to compress the GAN model through neural network architecture search and knowledge distillation. The algorithm mainly comprises three parts: first, the L2-norm distillation loss function between the characteristics of the middle layer of the network of students and teacher is introduced. By optimizing the distillation loss function, knowledge is migrated from the teacher network to the student network. And secondly, carrying out neural network architecture search on the structure of the GAN model, and carrying out search acceleration through a weight sharing technology.
Although the GAN Compression algorithm achieves a certain degree of GAN model Compression, the use of the knowledge distillation algorithm therein is extremely simple and inefficient. For example, the knowledge distillation algorithm only brings about a very small FID index drop. Secondly, the knowledge distillation algorithm based on the intermediate layer characteristics requires a network architecture which is close to that of a teacher network and a student network, which brings great limitation to the application of the technology.
The present disclosure recognizes that the reason that the prior knowledge-based distillation algorithms do not work well on GAN models is that these algorithms do not take into account the differences in the different frequencies of the image. Generally, the quality of the generated image is reflected in its high frequency information. Since the high-frequency information often reflects the detail information of the image, a picture with high-frequency information quality is more ideal for human beings. In contrast, low frequency information in the generated picture has little effect. The existing knowledge distillation algorithm simply enables a student model to learn all the prediction results of a teacher model, and differentiation adjustment is not carried out on different frequencies.
In order to solve the problem, the invention extracts information of different frequencies in a teacher model and a student model through discrete wavelet transformation, then strengthens the study of the student model on image high-frequency information, and inhibits the study on image low-frequency information, thereby realizing the improvement of the effect of a knowledge distillation algorithm.
The above process is illustrated below in a specific example:
FIG. 1 is a flow diagram illustrating a method of training an image processing model, as shown in FIG. 1, according to an exemplary embodiment, the method comprising the steps of:
in step S11, inputting the sample image into a target model for prediction to obtain a first predicted image, wherein the target model is obtained by training a first preset machine learning model;
it should be noted that the GAN teacher model with a large number of parameters and calculations is trained by the target model using a standard training method. In this step, the sample image is input to the teacher model, and a predicted image is output.
In step S12, a first target image with an image frequency higher than a preset frequency is obtained from the first prediction image, wherein the image frequency is used for representing the variation of the pixel gray-scale value of the image in space;
the frequency of the image refers to the spatial frequency, and the image can be seen as a signal defined on a two-dimensional plane, the amplitude of which corresponds to the gray scale of the pixel (color image corresponds to the three components of RGB). The frequency of the image is called spatial frequency, and the spatial frequency refers to the number of times the brightness changes periodically in a unit length, and reflects the change of the pixel gray scale of the image in space.
The first target image is a high-frequency image whose image frequency extracted from the first prediction image is higher than a preset frequency.
In step S13, training a second preset machine learning model based on the sample image and the first target image to obtain an image processing model, where the parameters and the calculation amount of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of the image;
in this step, a parameter, less-calculated GAN student model (i.e., image processing model) is trained in a knowledge distillation manner using the teacher network trained in the previous step.
It should be noted that the trained GAN student model is used to perform the prediction processing task of the image.
By the method, the student model learns more important high-frequency information in the image in a key point manner, and interference of unimportant low-frequency information is avoided, so that the compression effect of the countermeasure network model is improved, and the technical effect of improving the quality of the generated image of the countermeasure network model is improved.
According to an optional embodiment of the present application, the first preset machine learning model and the second preset machine learning model are both generation confrontation network models.
According to another alternative embodiment of the present application, step S13 is to train the first target model based on the sample image and the target image to obtain a second target model, and is implemented by the following method: inputting the sample image and the first target image into a second preset machine learning model for prediction to obtain a second predicted image; acquiring a second target image with the image frequency higher than the preset frequency from a second prediction image; determining a loss function for the first target image and the second target image; and optimizing the loss function, and obtaining the image processing model under the condition that the value of the loss function is lower than a preset threshold value.
And for each image in the sample images, the student network and the teacher network respectively predict the generated result. And discrete wavelet transform is carried out on the prediction results of the two. And calculating an L1-norm loss function of the high-frequency information part in the transformation result as a loss function of the wavelet distillation algorithm. And optimizing the loss function and other loss functions together, thereby realizing the transfer of the knowledge in the teacher network to the student model in the training process of the student model.
In some optional embodiments of the present application, obtaining a first target image with an image frequency higher than a preset frequency from the first predictive image and obtaining a second target image with an image frequency higher than a preset frequency from the second predictive image comprises: respectively carrying out frequency decomposition on the first prediction image and the second prediction image to obtain a first prediction sub-image with different frequencies and a second prediction sub-image with different frequencies; and respectively screening out a first target image and a second target image from the first prediction sub-image with different frequencies and the second prediction sub-image with different frequencies.
In further alternative embodiments of the present application, the first predictive picture and the second predictive picture are frequency decomposed separately by one of the following methods: discrete wavelet transform, discrete fourier transform, and discrete cosine transform.
In the embodiments provided in the present application, in addition to the method of frequency-decomposing the image by discrete wavelet transform described in the above-mentioned scheme, the image may be frequency-decomposed by using discrete fourier transform or discrete cosine transform. The subsequent steps are the same as using discrete wavelet transform.
According to an alternative embodiment of the present application, in the case of frequency decomposition of the first predicted image and the second predicted image by discrete wavelet transform, the method further comprises: respectively performing frequency decomposition on the first prediction image and the second prediction image through a plurality of wavelet basis functions; and respectively carrying out single frequency decomposition or multiple frequency decomposition on the first prediction image and the second prediction image through any one of the wavelet basis functions.
It should be noted that if the discrete wavelet transform is used to perform frequency decomposition on the image, the discrete wavelet transform may use different wavelet basis functions. The discrete wavelet transform may be optionally decomposed one or more times.
The method provided by the application does not modify the network architecture of the model, and is not limited by the network architecture of the teacher model and the student model. The method can be applied to GAN models of teachers and students with any structures and applied to generating tasks of any images and videos.
According to an alternative embodiment of the present application, after the image processing model is obtained in step S13, the target model is replaced by the image processing model; and performing prediction processing on the image by using the image processing model.
And when the application is deployed, the teacher model is replaced by the student model obtained through training, so that the compression of the GAN model is realized. And then performing image prediction processing by using the student model.
The method provided by the application can obviously improve the quality of the generated image of the GAN model with less parameter and less calculation amount. For example, on the Horse2Zebra dataset, CycleGAN model, from Horse to Zebra image translation task, a standard size model has network parameters of 11.38MB and floating point operands of 47.22 with an FID of 61.53. A compact model compressed 4 times has parameters of 2.85MB and 12.14G floating point operands, with an FID of 84.89. The 4-fold compression small model trained by the distillation technology of the invention has the parameter quantity of 2.85MB and the floating point operand of 12.14G, and the FID of the model is 75.45. There was a 9.44 FID reduction compared to the model obtained without the training of the present invention.
Fig. 2 is a block diagram illustrating a structure of an image processing model training apparatus according to an exemplary embodiment, as shown in fig. 2, the apparatus including:
the prediction module 20 is configured to perform image prediction by inputting a sample image into a target model, so as to obtain a first prediction image, wherein the target model is obtained by training a first preset machine learning model;
and the goal model is a GAN teacher model with large parameter and calculation amount trained by a standard training method.
An obtaining module 22 configured to perform obtaining, from the first prediction image, a first target image with an image frequency higher than a preset frequency, where the image frequency is used for representing a variation of a pixel gray value of the image in space;
the first target image is a high-frequency image whose image frequency extracted from the first prediction image is higher than a preset frequency.
And the training module 24 is configured to perform training on the second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameters and the calculated amount of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of the image.
And (3) training a parameter and less-calculated GAN student model (namely an image processing model) in a knowledge distillation mode by using the teacher network trained in the previous step.
Through the device, the student model learns more important high-frequency information in the image in a key manner, and interference of unimportant low-frequency information is avoided, so that the compression effect of the countermeasure network model is improved, and the technical effect of improving the quality of the generated image of the countermeasure network model is improved.
According to an optional embodiment of the present application, the first preset machine learning model and the second preset machine learning model each comprise: and generating a confrontation network model.
According to another alternative embodiment of the present application, training module 24 comprises: the prediction unit is configured to input the sample image and the first target image into a second preset machine learning model for prediction to obtain a second prediction image; an acquisition unit configured to perform acquisition of a second target image having an image frequency higher than a preset frequency from a second prediction image; a determining unit configured to perform a loss function that determines a first target image and a second target image; and the processing unit is configured to perform optimization on the loss function, and obtain the image processing model under the condition that the value of the loss function is lower than a preset threshold value.
And for each image in the sample images, the student network and the teacher network respectively predict the generated result. And discrete wavelet transform is carried out on the prediction results of the two. And calculating an L1-norm loss function of the high-frequency information part in the transformation result as a loss function of the wavelet distillation algorithm. And optimizing the loss function and other loss functions together, thereby realizing the transfer of the knowledge in the teacher network to the student model in the training process of the student model.
In some optional embodiments of the present application, the apparatus further comprises: a first frequency decomposition module configured to perform frequency decomposition on the first prediction image and the second prediction image respectively to obtain a first prediction sub-image with different frequencies and a second prediction sub-image with different frequencies; and the screening module is configured to screen a first target image and a second target image from the first prediction sub-image with different frequencies and the second prediction sub-image with different frequencies respectively.
In an alternative embodiment, the frequency decomposition module is configured to perform the frequency decomposition of the first predictive picture and the second predictive picture separately by one of the following methods: discrete wavelet transform, discrete fourier transform, and discrete cosine transform.
In the embodiments provided in the present application, in addition to the method of performing frequency decomposition on an image generated by discrete wavelet transform, the image may be subjected to frequency decomposition using discrete fourier transform or discrete cosine transform. The subsequent steps are the same as using discrete wavelet transform.
In further alternative embodiments of the present application, where the first predictive image and the second predictive image are frequency decomposed by a discrete wavelet transform, the apparatus further comprises: a second frequency decomposition module configured to perform frequency decomposition of the first and second prediction images by a plurality of wavelet basis functions, respectively; and the third frequency decomposition module is configured to perform single frequency decomposition or multiple frequency decomposition on the first prediction image and the second prediction image respectively through any one of the wavelet basis functions.
It should be noted that if the discrete wavelet transform is used to perform frequency decomposition on the image, the discrete wavelet transform may use different wavelet basis functions. The discrete wavelet transform may be optionally decomposed one or more times.
According to an alternative embodiment of the present application, the apparatus further comprises: a replacement module configured to perform replacement of the target model with the image processing model; an image processing module configured to perform a prediction process on the image using the image processing model.
And when the application is deployed, the teacher model is replaced by the student model obtained through training, so that the compression of the GAN model is realized. And then performing image prediction processing by using the student model.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
Fig. 3 is a block diagram illustrating a structure of an electronic device according to an example embodiment. Referring to fig. 3, the electronic device may be a terminal 300, for example, the terminal 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
Terminal 300 may include one or more of the following components: a processing component 302, a memory 304, a power component 306, a multimedia component 308, an audio component 310, an input/output (I/O) interface 312, a sensor component 313, and a communication component 316.
The processing component 302 generally controls overall operation of the terminal 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.
The memory 304 is configured to store various types of data to support operations at the device 300. Examples of such data include instructions for any application or method operating on the terminal 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 306 provides power to the various components of the terminal 300. The power components 306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal 300.
The multimedia component 308 comprises a screen providing an output interface between the terminal 300 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 310 is configured to output and/or input audio signals. For example, the audio suite 310 includes a Microphone (MIC) configured to receive external audio signals when the terminal 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.
The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the terminal 300. For example, sensor assembly 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of terminal 300, sensor assembly 314 may also detect a change in the position of terminal 300 or a component of terminal 300, the presence or absence of user contact with terminal 300, orientation or acceleration/deceleration of terminal 300, and a change in the temperature of terminal 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 316 is configured to facilitate communications between the terminal 300 and other devices in a wired or wireless manner. The terminal 300 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the terminal 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the electronic device to perform the above-described method is also provided. The computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the above training method of an image processing model.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for training an image processing model, comprising:
inputting a sample image into a target model for image prediction to obtain a first predicted image, wherein the target model is obtained by training a first preset machine learning model;
acquiring a first target image with an image frequency higher than a preset frequency from the first prediction image, wherein the image frequency is used for representing the change situation of pixel gray values of the image in space;
training a second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameter quantity and the calculation quantity of the image processing model are less than those of the target model, and the image processing model is used for image prediction processing.
2. The method of claim 1, wherein the first preset machine learning model and the second preset machine learning model each comprise: and generating a confrontation network model.
3. The method of claim 1, wherein training a second pre-set machine learning model based on the sample image and the first target image, resulting in an image processing model, comprises:
inputting the sample image and the first target image into the second preset machine learning model for image prediction to obtain a second predicted image;
acquiring a second target image with the image frequency higher than the preset frequency from the second prediction image;
determining a loss function for the first target image and the second target image;
and optimizing the loss function, and obtaining the image processing model under the condition that the value of the loss function is lower than a preset threshold value.
4. The method according to claim 3, characterized in that obtaining from said first predictive picture a first target picture with a picture frequency higher than a preset frequency and obtaining from said second predictive picture a second target picture with a picture frequency higher than said preset frequency comprises:
respectively carrying out frequency decomposition on the first prediction image and the second prediction image to obtain a first prediction sub-image with different frequencies and a second prediction sub-image with different frequencies;
and respectively screening the first target image and the second target image from the first prediction sub-image with different frequencies and the second prediction sub-image with different frequencies.
5. The method according to claim 4, characterized in that the first predictive picture and the second predictive picture are respectively frequency decomposed by one of the following methods: discrete wavelet transform, discrete fourier transform, and discrete cosine transform.
6. The method according to claim 5, characterized in that in the case of frequency decomposition of the first and second predictive pictures by means of the discrete wavelet transform, the method further comprises:
respectively performing frequency decomposition on the first prediction image and the second prediction image through a plurality of wavelet basis functions;
and respectively performing single frequency decomposition or multiple frequency decomposition on the first prediction image and the second prediction image through any one of the wavelet basis functions.
7. The method of claim 1, wherein after obtaining the image processing model, the method further comprises:
replacing the target model with the image processing model;
and performing prediction processing on the image by using the image processing model.
8. An apparatus for training an image processing model, comprising:
the prediction module is configured to input a sample image into a target model for image prediction to obtain a first prediction image, wherein the target model is obtained by training a first preset machine learning model;
the acquisition module is configured to acquire a first target image with an image frequency higher than a preset frequency from the first prediction image, wherein the image frequency is used for representing the change situation of a pixel gray value of an image in space;
the training module is configured to perform training on a second preset machine learning model based on the sample image and the first target image to obtain an image processing model, wherein the parameter quantity and the calculation quantity of the image processing model are less than those of the target model, and the image processing model is used for prediction processing of images.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of training an image processing model according to any one of claims 1 to 7.
10. A computer-readable storage medium, whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of training an image processing model of any of claims 1 to 7.
CN202111595617.9A 2021-12-23 2021-12-23 Training method and device of image processing model and electronic equipment Pending CN114492731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111595617.9A CN114492731A (en) 2021-12-23 2021-12-23 Training method and device of image processing model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111595617.9A CN114492731A (en) 2021-12-23 2021-12-23 Training method and device of image processing model and electronic equipment

Publications (1)

Publication Number Publication Date
CN114492731A true CN114492731A (en) 2022-05-13

Family

ID=81493546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111595617.9A Pending CN114492731A (en) 2021-12-23 2021-12-23 Training method and device of image processing model and electronic equipment

Country Status (1)

Country Link
CN (1) CN114492731A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063673A (en) * 2022-07-29 2022-09-16 阿里巴巴(中国)有限公司 Model compression method, image processing method and device and cloud equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886422A (en) * 2019-02-01 2019-06-14 深圳绿米联创科技有限公司 Model configuration method, device, electronic equipment and read/write memory medium
CN110796619A (en) * 2019-10-28 2020-02-14 腾讯科技(深圳)有限公司 Image processing model training method and device, electronic equipment and storage medium
US20200364542A1 (en) * 2019-05-16 2020-11-19 Salesforce.Com, Inc. Private deep learning
CN112272830A (en) * 2018-04-20 2021-01-26 希侬人工智能公司 Image classification by label delivery
CN113205149A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Picture processing method and device
CN113240580A (en) * 2021-04-09 2021-08-10 暨南大学 Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
US20210265018A1 (en) * 2020-02-20 2021-08-26 Illumina, Inc. Knowledge Distillation and Gradient Pruning-Based Compression of Artificial Intelligence-Based Base Caller
CN113537151A (en) * 2021-08-12 2021-10-22 北京达佳互联信息技术有限公司 Training method and device of image processing model, and image processing method and device
CN113570493A (en) * 2021-07-26 2021-10-29 京东数科海益信息科技有限公司 Image generation method and device
CN113807399A (en) * 2021-08-16 2021-12-17 华为技术有限公司 Neural network training method, neural network detection method and neural network detection device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112272830A (en) * 2018-04-20 2021-01-26 希侬人工智能公司 Image classification by label delivery
CN109886422A (en) * 2019-02-01 2019-06-14 深圳绿米联创科技有限公司 Model configuration method, device, electronic equipment and read/write memory medium
US20200364542A1 (en) * 2019-05-16 2020-11-19 Salesforce.Com, Inc. Private deep learning
CN110796619A (en) * 2019-10-28 2020-02-14 腾讯科技(深圳)有限公司 Image processing model training method and device, electronic equipment and storage medium
US20210265018A1 (en) * 2020-02-20 2021-08-26 Illumina, Inc. Knowledge Distillation and Gradient Pruning-Based Compression of Artificial Intelligence-Based Base Caller
CN113240580A (en) * 2021-04-09 2021-08-10 暨南大学 Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN113205149A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Picture processing method and device
CN113570493A (en) * 2021-07-26 2021-10-29 京东数科海益信息科技有限公司 Image generation method and device
CN113537151A (en) * 2021-08-12 2021-10-22 北京达佳互联信息技术有限公司 Training method and device of image processing model, and image processing method and device
CN113807399A (en) * 2021-08-16 2021-12-17 华为技术有限公司 Neural network training method, neural network detection method and neural network detection device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HANTING CHEN等: "Distilling portable Generative Adversarial Networks for Image Translation", 《 AAAI》, 7 March 2020 (2020-03-07), pages 1 - 9 *
何涛等: "基于知识蒸馏的单幅图像去雾方法", 《计算机工程》, vol. 48, no. 04, 3 August 2021 (2021-08-03), pages 165 - 172 *
郑哲等: "量化权值激活的生成对抗网络", 《计算机科学》, vol. 47, no. 05, 31 May 2020 (2020-05-31), pages 144 - 148 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063673A (en) * 2022-07-29 2022-09-16 阿里巴巴(中国)有限公司 Model compression method, image processing method and device and cloud equipment

Similar Documents

Publication Publication Date Title
CN109145970B (en) Image-based question and answer processing method and device, electronic equipment and storage medium
CN109670632B (en) Advertisement click rate estimation method, advertisement click rate estimation device, electronic device and storage medium
CN109165738B (en) Neural network model optimization method and device, electronic device and storage medium
CN109920016B (en) Image generation method and device, electronic equipment and storage medium
CN107563994A (en) The conspicuousness detection method and device of image
CN114418069A (en) Method and device for training encoder and storage medium
CN107341777A (en) image processing method and device
CN109858614A (en) Neural network training method and device, electronic equipment and storage medium
CN112115894A (en) Training method and device for hand key point detection model and electronic equipment
CN109447258B (en) Neural network model optimization method and device, electronic device and storage medium
CN114492731A (en) Training method and device of image processing model and electronic equipment
CN110929616A (en) Human hand recognition method and device, electronic equipment and storage medium
CN111274444B (en) Method and device for generating video cover determination model, and method and device for determining video cover
CN115512116B (en) Image segmentation model optimization method and device, electronic equipment and readable storage medium
CN111583958A (en) Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN111046780A (en) Neural network training and image recognition method, device, equipment and storage medium
CN116310633A (en) Key point detection model training method and key point detection method
CN113486978B (en) Training method and device for text classification model, electronic equipment and storage medium
CN105469411A (en) Method and device used for detecting image definition, and terminal
CN110751223B (en) Image matching method and device, electronic equipment and storage medium
CN112951202B (en) Speech synthesis method, apparatus, electronic device and program product
CN114648116A (en) Model quantification method and device, vehicle and storage medium
CN110659726B (en) Image processing method and device, electronic equipment and storage medium
CN106372661A (en) Method and device for constructing classification model
CN112036487A (en) Image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination