CN115731101A - Super-resolution image processing method, device, equipment and storage medium - Google Patents

Super-resolution image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115731101A
CN115731101A CN202111009776.6A CN202111009776A CN115731101A CN 115731101 A CN115731101 A CN 115731101A CN 202111009776 A CN202111009776 A CN 202111009776A CN 115731101 A CN115731101 A CN 115731101A
Authority
CN
China
Prior art keywords
image
resolution image
sample
network
sample individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111009776.6A
Other languages
Chinese (zh)
Inventor
陈子恒
涂娟辉
周易
余晓铭
易阳
李峰
左小祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111009776.6A priority Critical patent/CN115731101A/en
Publication of CN115731101A publication Critical patent/CN115731101A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application discloses a super-resolution image processing method, device, equipment and storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a first resolution image of a first sample individual; extracting a compact characteristic image of the first sample individual based on the first resolution image of the first sample individual through a teacher model, and constructing a second resolution image of the first sample individual based on the compact characteristic image of the first sample individual; based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual, adjusting parameters of a teacher model to obtain the teacher model completing training; training the student model based on the teacher model completing the training. The technical scheme provided by the embodiment of the application does not make excessive requirements on the equipment capability of the deployment equipment, and can realize rapid deployment of the deep learning model for executing super-resolution image processing.

Description

Super-resolution image processing method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to a super-resolution image processing method, device, equipment and storage medium.
Background
The high resolution image contains greater pixel density, richer texture details and higher reliability than the low resolution image, but in practical applications, the high resolution image is limited by device capability, network quality, environmental complexity and the like, and the ideal high resolution image cannot be obtained directly through shooting or observation and the like.
In order to improve the Resolution of an image, super Resolution (SR) image processing is in progress. The image processing by super-resolution enables construction of a higher resolution image based on a low resolution image obtained by observation or photographing or the like, which can overcome or compensate for problems such as image blur, low quality, and the like due to a shortage of the device capability or the like. At present, super-resolution image processing has been applied in more and more fields such as image compression, medical imaging, remote sensing imaging, public security, video sensing and the like. In the related art, techniques for realizing super-resolution image processing can be roughly classified into the following categories: parametric-based linear filter techniques, image edge structure-based techniques, image reconstruction constraint-based techniques, and learning-based techniques, wherein common learning-based techniques include: manifold learning, sparse coding, and deep learning. By virtue of strong fitting ability, deep learning is widely concerned in super-resolution image processing, and the mapping relation between a low-resolution image and a high-resolution image can be learned through a deep learning convolutional neural network.
The prior research shows that the effect of super-resolution image processing can be obviously improved by using a convolutional neural network with greater depth. However, the complex and too deep model has high requirements on the device capability of the deployed device, and the like, and thus cannot be directly deployed in the embedded device or the mobile terminal device.
Disclosure of Invention
The embodiment of the application provides a super-resolution image processing method, device, equipment and storage medium, and the knowledge distillation technology is applied to the super-resolution image processing, so that rapid deployment can be realized in embedded equipment or mobile terminal equipment. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a super-resolution image processing method, where the method includes:
acquiring a first resolution image of a first sample individual;
extracting, by a teacher model, a compact feature image of the first sample individual based on a first resolution image of the first sample individual, and constructing a second resolution image of the first sample individual based on the compact feature image of the first sample individual; wherein the compact feature image of the first sample individual includes representative edge information of the first sample individual;
based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual, adjusting the parameters of the teacher model to obtain a trained teacher model;
and training the student model based on the trained teacher model, and executing super-resolution image processing by using the trained student model.
In another aspect, an embodiment of the present application provides a super-resolution image processing apparatus, including:
a first sample acquisition module for acquiring a first resolution image of a first sample individual;
the first image processing module is used for extracting a compact characteristic image of the first sample individual based on a first resolution image of the first sample individual through a teacher model and constructing a second resolution image of the first sample individual based on the compact characteristic image of the first sample individual; wherein the compact feature image of the first sample individual includes representative edge information of the first sample individual;
the first parameter adjusting module is used for establishing a restoration degree based on the feature extraction effectiveness degree of the compact feature image of the first sample individual and the image construction restoration degree of the second resolution image of the first sample individual, and adjusting parameters of the teacher model to obtain a teacher model which completes training;
and the student model training module is used for training the student model based on the teacher model which completes the training and executing super-resolution image processing by using the student model obtained by training.
In yet another aspect, embodiments of the present application provide a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the above-mentioned super-resolution image processing method.
In yet another aspect, the present application provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the image processing method with super resolution as described above.
In yet another aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the super-resolution image processing method described above.
The technical scheme provided by the embodiment of the application at least comprises the following beneficial effects:
by applying the knowledge distillation technology to super-resolution image processing, a lightweight student model is finally deployed for executing the super-resolution image processing, so that excessive requirements on the device capacity of the deployment device are not made, rapid deployment can be realized in an embedded device or a mobile terminal device, and the super-resolution image processing is widely applied in multiple fields. In addition, in the embodiment of the application, the lightweight student model takes a teacher model with larger depth and higher precision as guidance in the training process, so that an ideal image processing effect can be achieved even if the model structure of the student model is simpler. In addition, in the embodiment of the application, when the teacher model is trained, the high-resolution image of the sample individual is taken as input, the compact feature image of the sample individual is extracted, so that representative and significant features of the sample individual can be obtained as much as possible, the capability of the teacher model for constructing the super-resolution image of the sample individual is improved, and the capability of the student model for constructing the super-resolution image of the sample individual is also improved as the teacher model is taken as guidance for the training of the student model.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a super-resolution image processing system provided by an embodiment of the present application;
FIG. 2 is a flowchart of a super-resolution image processing method according to an embodiment of the present application;
FIG. 3 is a diagram of a model structure of a teacher model provided by one embodiment of the present application;
FIG. 4 is a schematic diagram of a model structure of a student model provided by one embodiment of the present application;
fig. 5 is a block diagram of an image processing apparatus for super-resolution provided in an embodiment of the present application;
FIG. 6 is a block diagram of a super-resolution image processing apparatus according to another embodiment of the present application;
fig. 7 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. Artificial intelligence infrastructures generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Machine learning specializes in studying how computers simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to improve their performance. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include: artificial neural network, confidence network, reinforcement learning, transfer learning, inductive learning, teaching learning and the like.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is researched and applied in a plurality of fields, such as common image processing, image reconstruction, image restoration, smart home, smart wearable equipment, virtual assistant, smart speaker, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicle, robot, smart medical treatment, smart customer service, and the like.
The technical scheme provided by the embodiment of the application relates to technologies such as artificial intelligence deep learning, and is specifically explained by the following embodiment.
Before the technical solutions provided by the present application are introduced, some technical terms and the like referred to in the embodiments of the present application are briefly introduced.
1. Knowledge Distillation (KD).
Knowledge-based distillation is a generalized model compression method, or "pseudo" compression. In the knowledge distillation technology, the training of the Student model is guided by the Teacher model which completes the training, the learned implicit knowledge aiming at a specific task in the Teacher model (Teacher Network) with large depth and high precision can be migrated into the lightweight Student model (Student Network), so that the performance and the effect of the Student model are improved, finally, the Student model is deployed as an application model and is on-line, and the Teacher model is not deployed and is on-line.
In the related art, the output of the teacher model is taken as a Soft Label (Soft Label); in the training process of the student model, the soft label interacts with the basic fact (Ground Truth), so that the student model can learn real information and can also have semantic information with strong generalization capability provided by the soft label information. However, in the embodiment of the application, knowledge distillation technology is applied to super-resolution image processing, and although the teacher model outputs a result with higher confidence, the soft label is not required to be used for guiding the student model. In the technical scheme provided by the application, the knowledge in the knowledge distillation technology comprises the parameter weight of a teacher model, and the parameter weight of the teacher model is migrated to a student model in the embodiment of the application.
2. Low Resolution Image (LR).
The low-resolution image, i.e., the third-resolution image described in the embodiment of the present application, may be referred to as a lower-resolution image, and generally includes an image acquired by an image acquisition device with limited device capability, an image acquired in a case where network quality is poor, an image acquired in a case where environment information is complex, and the like. By the technical scheme provided by the embodiment of the application, the resolution of the low-resolution image can be improved, and the image with higher quality can be constructed.
3. High Resolution Image (HR).
The high-resolution image, i.e., the first-resolution image described in this embodiment of the present application, may be referred to as a higher-resolution image, and generally includes an image acquired by an image acquisition device with higher device capability, an image acquired under the condition of better network quality, a higher-resolution image obtained by processing a low-resolution image by a hardware or software method, and the like. The source of the high-resolution image is not limited in the embodiments of the present application, and the high-resolution image may be an image with a higher resolution obtained by direct acquisition (shooting or observation, etc.), or an image with a higher resolution obtained by performing a series of processing on a low-resolution image. In the technical scheme provided by the application, the high-resolution image is not only used for loss calculation of teacher model output and student model output, but also used for feature extraction of teacher model input.
4. Super Resolution Image (SR).
The super-resolution image, i.e., the second resolution image described in the embodiment of the present application, refers to an image finally output by the model. In the technical scheme provided by the application, the input of the teacher model comprises a high-resolution image and a low-resolution image, the input of the student model only comprises a low-resolution image, and the output of the teacher model and the output of the student model both comprise super-resolution images.
It should be understood that, in the embodiment of the present application, the image quality (e.g., resolution) of the first resolution image (high resolution image) is generally higher than the image quality (e.g., resolution) of the third resolution image (low resolution image), the image quality (e.g., resolution) of the second resolution image (super resolution image) is also generally higher than the image quality (e.g., resolution) of the third resolution image (low resolution image), but the image quality (e.g., resolution) of the first resolution image (high resolution image) may be higher than, may also be lower than, and may also be equal to the image quality (e.g., resolution) of the second resolution image (super resolution image).
In the following embodiments, for convenience of description, low-resolution images are collectively referred to as third-resolution images, high-resolution images are collectively referred to as first-resolution images, and super-resolution images are collectively referred to as second-resolution images. It should be understood that, in the following embodiments, the expressions "low resolution image, high resolution image, super resolution image" may be replaced with "third resolution image, first resolution image, second resolution image", respectively, and the expressions "third resolution image, first resolution image, second resolution image" may also be replaced with "low resolution image, high resolution image, super resolution image", respectively, which have the same meaning.
Referring to fig. 1, an embodiment of a super-resolution image processing system provided by an embodiment of the present application is shown. The super-resolution image processing system may include: a first device 110 and a second device 120.
The first device 110 refers to a computer device with model training capabilities. Since a large amount of data is generally used as training samples in the model training process, the first device 110 generally needs to have high data processing capability, data storage capability, and the like. The implementation type of the first device 110 is not limited in the embodiment of the present application, and optionally, the first device 110 may be implemented as a server, or the first device 110 may be implemented as a terminal with high processing capability, such as a PC (Personal Computer), a mobile phone, and a tablet Computer. When the first device 110 is implemented as a server, the first device 110 may be a single server, may also be a server cluster formed by a plurality of servers, and may also be a cloud computing center, which is not limited in this embodiment of the present application.
The second device 120 refers to a computer device with model application capabilities. In the embodiment of the present application, a lightweight student model is deployed online instead of a teacher model with a large depth and high precision, so that the second device 120 does not need to have high data processing capability, data storage capability, and the like, that is, the requirement on the device capability of the second device 120 is low. The implementation type of the second device 120 is not limited in the embodiment of the present application, and optionally, the second device 120 may be implemented as a terminal such as a PC, a mobile phone, a tablet computer, a multimedia player, a wearable device, a vehicle-mounted device, a self-service terminal, an intelligent terminal, or may also be implemented as a server.
As shown in fig. 1, the first device 110 is used to train a teacher model and a student model, and to guide training of the student model with the teacher model having completed training; after the first device 110 completes the training of the student model, sending the student model after the training to the second device 120; the second device 120 can perform super-resolution image processing using the trained student model, enable construction of a super-resolution image based on the input low-resolution image, and output the super-resolution image. Optionally, the first device 110 and the second device 120 communicate with each other through a network, where the network may be a wired network or a wireless network, and the embodiment of the present application is not limited thereto.
It should be noted that, after the training of the student model is completed, the first device 110 may also perform super-resolution image processing using the trained student model, and for convenience of description, fig. 1 only illustrates the second device 120 performing super-resolution image processing, which should be understood as not to limit the technical solution provided in the present application.
Referring to fig. 2, a flowchart of a super-resolution image processing method according to an embodiment of the present application is shown. The super-resolution image processing method can be applied to a computer device, such as the first device 110 shown in fig. 1. As shown in fig. 2, the super-resolution image processing method includes at least some of the following steps (step 210 to step 240).
Step 210, a first resolution image of a first sample individual is acquired.
In the embodiment of the application, the knowledge distillation technology is used for super-resolution image processing, and the computer equipment needs to train the teacher model first and transfer the knowledge learned in the training process to the training process of the student model. Wherein, the student model is usually a lightweight deep learning model, and the student model is finally used for deploying online to execute super-resolution image processing; the teacher model is used for providing guidance for training of the student model, and is usually not deployed online due to the fact that the teacher model is usually large in depth, high in precision, complex in structure and the like. According to the technical scheme provided by the embodiment of the application, the knowledge distillation technology is applied to super-resolution image processing, and the light-weight student model is finally deployed for executing the super-resolution image processing, so that the requirement on the equipment capacity of the equipment can not be excessive, and the equipment can be rapidly deployed in embedded equipment or mobile terminal equipment.
When training the teacher model, the computer device needs to obtain a training sample of the teacher model. In an embodiment of the application, the training samples of the teacher model include first resolution images of the first sample individuals, and the input of the teacher model during the training process includes the first resolution images of the first sample individuals. Optionally, the training sample of the teacher model further includes a third resolution image of the first sample individual, such that during the training process, the input to the teacher model includes the third resolution image of the first sample individual and the first resolution image of the first sample individual. The number of training samples of the teacher model is not limited in the embodiment of the present application, that is, the first sample individual may be one sample individual or may be multiple sample individuals, and for the sake of training accuracy, the first sample individual generally includes multiple sample individuals.
In the embodiment of the present application, in a case that a training sample of a teacher model includes a first resolution image of a first sample individual and a third resolution image of the first sample individual, image contents represented by the first resolution image and the third resolution image of the same sample individual should be identical, for example, the same image contents of the same scene, the same person, the same action, the same building, the same plant, and the like are reflected, and a difference between the image contents is reflected in a difference in image quality caused by a difference in resolution, that is, the image quality of the first resolution image is higher than that of the third resolution image. Optionally, the image contents represented by the first resolution image and the third resolution image of different sample individuals may be consistent or inconsistent, which is not limited in this embodiment of the present application. Alternatively, the number of images of the first resolution image of one sample individual may be one or more; the number of the images of the third resolution image of one sample individual may be one or more, which is not limited in the embodiment of the present application.
And step 220, extracting the compact characteristic image of the first sample individual based on the first resolution image of the first sample individual through a teacher model, and constructing a second resolution image of the first sample individual based on the compact characteristic image of the first sample individual.
After the training samples of the teacher model are obtained, the computer device may invoke the teacher model to process the training samples of the teacher model. In the embodiment of the application, the computer equipment extracts a compact characteristic image of the first sample individual based on the first resolution image of the first sample individual through a teacher model; and the computer device constructs a second resolution image of the first sample unit based on the compact feature image of the first sample unit through the teacher model. The Compact Feature image of the first sample individual may also be referred to as a Compact Feature of the first sample individual, which is used to indicate a representative, salient Feature in the first resolution image of the first sample individual. Since the image edge is the most significant portion of the image change, the compact feature image of the first sample individual includes representative edge information, otherwise referred to as significant edge information, of the first sample individual.
Based on the steps executed by the teacher model, it can be seen that, in the embodiment of the present application, the teacher model mainly performs two aspects of processing: on the one hand, the compact feature image is extracted and on the other hand, the second resolution image is constructed. Based on this, in one example, the teacher model includes a two-part learning network, one part of the learning network for extracting the compact feature images and the other part of the learning network for constructing the second resolution images. Of course, the teacher model may also include only a portion of the learning network that is used not only to extract the compact feature images, but also to construct the second resolution images; alternatively, the teacher model may include three or more portions of the learning network, wherein one or more portions of the learning network are used to extract the compact feature images and one or more other portions of the learning network are used to construct the second resolution images. For further description of the model structure of the teacher model, please refer to the following embodiments, which are not repeated herein.
And step 230, based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual, adjusting parameters of the teacher model to obtain the teacher model completing training.
In the course of training the teacher model, the computer device needs to adjust the parameters of the teacher model with a certain target, so that the trained teacher model can achieve the expected model training effect. In the embodiment of the application, the computer device uses the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual as reference targets for adjusting parameters of the teacher model, that is, the computer device enables the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual to achieve relatively optimal effects by adjusting the parameters of the teacher model, so as to obtain the trained teacher model.
The feature extraction effectiveness degree of the compact feature image of the first individual is indicative of an effectiveness degree of a representative, salient feature extracted from the first resolution image of the first individual, which reflects whether or not the feature extracted from the first resolution image of the first individual can be taken as salient information of the first individual. And the image construction reduction degree of the second resolution image of the first sample individual is used for indicating the reduction degree of the constructed second resolution image of the first sample individual relative to the input first resolution image of the first sample individual, and the reduction degree is used for reflecting the reduction effect of the teacher model on the first sample individual. In the embodiment of the present application, neither a determination manner of the validity of feature extraction of the compact feature image of the first sample individual nor a determination manner of the image construction restoration degree of the second resolution image of the first sample individual is limited, and please refer to the following embodiments, which are not repeated herein.
And step 240, training the student model based on the teacher model after training.
After the computer device completes the training of the teacher model, the student model is trained based on the teacher model after the training, or as it is, the student model is trained with the teacher model after the training as guidance, so as to obtain the student model after the training. That is, the computer device migrates the knowledge learned during the training of the teacher model to the student model. Optionally, the migrated knowledge comprises at least one of: parameter weights of the teacher model, semantic information learned by the network of different parts in the teacher model, second resolution images output by the teacher model, output distributions learned by the teacher model, and so on. For further description of training a student model based on a teacher model with training completed, please refer to the following embodiments, which are not repeated herein.
The trained student model is used for performing super-resolution image processing, that is, after the computer device completes training of the student model, the student model can be deployed into a device which has a requirement for super-resolution image processing, such as an embedded device, a mobile terminal device, and the like, so that the device can perform super-resolution image processing using the trained student model. Of course, after the computer device completes the training of the student model, it is also possible to perform super-resolution image processing using the trained student model by itself. Since the student model is a lightweight deep learning model, performing super-resolution image processing by the student model reduces the capability requirements for deploying devices. In addition, although the model structure of the student model is relatively simple and has small depth, the light-weight student model can achieve ideal image processing effect because the student model takes a teacher model with large depth and high precision as guidance in the training process.
It should be noted that, in the embodiment of the present application, an application scenario of the super-resolution image processing method is not limited, and optionally, the super-resolution image processing method described in the embodiment of the present application may be applied to scenarios such as medical imaging, remote sensing imaging, image compression, image processing, video processing, and public security. In the following, the super-resolution image processing method described in the embodiment of the present application is exemplified by being applied to a video processing (such as video conference, video call, etc.).
In one example, after the training of the student model is completed by the method described in the embodiment of the present application, the method further includes (that is, the above-mentioned image processing using the trained student model to perform super-resolution includes: acquiring a first video stream, wherein the first video stream comprises at least one individual third-resolution image; obtaining a second resolution image of at least one individual based on the third resolution image of the at least one individual through the trained student model; wherein the at least one individual second resolution image is used to construct a second video stream. That is to say, the student model obtained through the technical scheme training provided by the embodiment of the application can process video streams in scenes such as video conferences and video calls. After the third resolution image of at least one individual (e.g., at least one video frame) in the original video stream is processed by the trained student model, a second resolution image of at least one individual (e.g., at least one video frame) is obtained, and the second resolution images are used for constructing a new video stream, wherein the image quality (e.g., resolution) of the constructed new video stream is improved compared with that of the original video stream.
In summary, according to the technical scheme provided by the embodiment of the application, the knowledge distillation technology is applied to the super-resolution image processing, and finally the lightweight student model is deployed for executing the super-resolution image processing, so that the device capability of the deployment device is not excessively required, the rapid deployment can be realized in the embedded device or the mobile terminal device, and the super-resolution image processing can be widely applied in multiple fields. In addition, in the embodiment of the application, because the lightweight student model takes the teacher model with larger depth and higher precision as guidance in the training process, even if the model structure of the student model is simpler, an ideal image processing effect can be achieved. In addition, in the embodiment of the application, when the teacher model is trained, the high-resolution image of the sample individual is taken as input, the compact feature image of the sample individual is extracted, so that representative and significant features of the sample individual can be obtained as much as possible, the capability of the teacher model for constructing the super-resolution image of the sample individual is improved, and the capability of the student model for constructing the super-resolution image of the sample individual is also improved as the teacher model is taken as guidance for the training of the student model.
An exemplary model structure of the teacher model and an exemplary model training process of the teacher model are described below.
In one example, the teacher model includes a first encoding network and a first decoding network; the step 220 includes the following steps (step 222 to step 224).
Step 222, extracting a compact feature image of the first sample individual based on the first resolution image of the first sample individual through the first coding network.
The first coding network in the teacher model is used for extracting the compact feature image of the first sample individual, that is, the computer device extracts the compact feature image of the first sample individual based on the first resolution image of the first sample individual through the first coding network in the teacher model. Optionally, the first encoding network is implemented as an Encoder (Encoder) comprising at least one layer of Convolutional Neural Networks (CNN). In the case that the first coding network includes multiple convolutional neural networks, parameter settings (such as dimensions, convolutional kernel sizes, convolutional kernel numbers, and the like) of convolutional neural networks of different levels may be the same or different, and this is not limited in this embodiment of the present application. Optionally, the first coding network further comprises a parameter activation function (Parametric Linear Unit, pselu), or the like.
In one example, the step 222 includes: extracting feature information of a first resolution image of a first sample individual through a first coding network; performing down-sampling processing on the characteristic information to obtain representative information of a first resolution image of a first sample individual; and performing dimension reduction processing on the representative information through a first coding network to obtain a compact characteristic image of the first sample individual. Wherein the representative information is used for indicating representative, salient features in the first resolution image of the first sample individual, and the representative information comprises representative edge information.
Illustratively, as shown in FIG. 3, the first encoding network of the teacher model includes four layers of convolutional neural networks, abbreviated as convolutions in FIG. 3. Extracting feature information of a first resolution image of a first sample individual by computer equipment through the first two layers of convolutional neural networks; then, performing double-time down-sampling processing on the extracted feature information to obtain representative information of a first resolution image of a first sample; then, the representative information is subjected to dimensionality reduction processing through the two layers of convolutional neural networks to obtain a compact characteristic image of the first sample individual.
And 224, constructing a second resolution image of the first sample individual based on the compact characteristic image of the first sample individual through the first decoding network.
The first decoding network in the teacher model is used to construct the second resolution image of the first sample individual, that is, the computer device constructs the second resolution image of the first sample individual based on the compact feature image of the first sample individual through the first decoding network in the teacher model. Optionally, the first decoding network is implemented as a Decoder (Decoder) comprising at least one layer of convolutional neural network. In the case that the first decoding network includes multiple layers of convolutional neural networks, parameter settings of convolutional neural networks at different hierarchical levels may be the same or different, which is not limited in this embodiment of the present application. Optionally, the first coding network further comprises PReLU, meanShift (mean shift), pixelShuffle (pixel reorganization), and the like.
In one example, the step 224 includes: performing dimension-increasing processing on the compact characteristic image of the first sample individual through a first decoding network, and extracting high-dimensional information of the compact characteristic image of the first sample individual; carrying out up-sampling processing on the high-dimensional information to obtain construction information of a second resolution image of the first sample individual; constructing, by the first decoding network, a second resolution image of the first sample individual based on the construction information. Wherein the construction information of the second resolution image of the first individual samples includes high-dimensional information of the compact feature image of the first individual samples and fill-in information of the second resolution image of the first individual samples.
Illustratively, as shown in fig. 3, the first decoding network of the teacher model includes six layers of convolutional neural networks, and the computer device performs the dimension-increasing processing on the compact feature image of the first sample individual through the first four layers of convolutional neural networks after shifting the compact feature image through the first layer of mean value, and extracts the high-dimensional information of the compact feature image of the first sample individual; secondly, performing double-time up-sampling processing on the extracted high-dimensional information to fill the features, and obtaining construction information of a second resolution image of the first sample; and then, sequentially constructing a second resolution image of the first sample individual through the last two layers of convolutional neural networks, pixel recombination and mean shift.
In one example, the teacher model includes a first encoding network and a first decoding network; the step 230 includes: acquiring a third resolution image of the first sample individual; determining a first image fitting loss based on the compact feature image of the first sample individual and the third resolution image of the first sample individual; adjusting parameters of the first coding network based on the first image fitting loss to obtain a trained first coding network; determining a first image reconstruction loss based on the second resolution image of the first sample individual and the first resolution image of the first sample individual; and adjusting parameters of the first decoding network based on the reconstruction loss of the first image to obtain the trained first decoding network.
In this example, the teacher model includes a two part learning network: the first encoding network is used for extracting the compact characteristic image of the first sample individual based on the first resolution image of the first sample individual, and the first decoding network is used for constructing the second resolution image of the first sample individual based on the compact characteristic image of the first sample individual. In the training process of the teacher model, both the first encoding network and the first decoding network in the teacher model need to be trained. Optionally, the computer device may train the first encoding network and the first decoding network separately, or may train the first encoding network and the first decoding network jointly, which is not limited in this embodiment of the present application. An exemplary training procedure for the first encoding network and the first decoding network is described below.
The training sample of the teacher model may further include a third resolution image of the first sample individual, and the computer device calculates a first image fitting loss, that is, a feature extraction effectiveness degree indicating the compact feature image of the first sample individual, for the compact feature image of the first sample individual and the third resolution image of the first sample individual. Then, the computer device adjusts parameters of the first coding network based on the first image fitting loss so that the compact feature image of the first sample individual fits the third resolution image of the first sample individual as much as possible, and when the first image fitting loss converges, the first coding network of the teacher model completes training.
The computer device also calculates a first image reconstruction penalty for the second resolution image of the first individual and the first resolution image of the first individual, the first image reconstruction penalty being indicative of a degree of image construction restitution of the second resolution image of the first individual. Then, the computer device adjusts parameters of the first decoding network based on the first image reconstruction loss so that the second resolution image of the first sample individual restores the first resolution image of the first sample individual as much as possible, and the first decoding network of the teacher model completes training when the first image reconstruction loss converges.
Illustratively, the compact feature image of the first sample individual and the third resolution image of the first sample individual are the same size, and have a height H 'and a width W'. Suppose that the compact feature image of the first sample individual is denoted X ij The third resolution image of the first sample is represented as
Figure BDA0003238460360000141
Then, the first image fitting loses L imitate Can be expressed as:
Figure BDA0003238460360000142
illustratively, the second resolution image of the first sample individual and the first resolution image of the first sample individual are the same size, and have a height H and a width W. Suppose that the second resolution image of the first sample is represented as Y ij The first resolution image of the first sample is represented as
Figure BDA0003238460360000143
Then, the first image reconstruction loses L recon Can be expressed as:
Figure BDA0003238460360000144
in order to expand the characterization capabilities of the first coding network and the first decoding network, optionally, an Attention Transfer (Attention Transfer) mechanism may be used between the first coding network and the first decoding network, and the intermediate layer output characteristics of the first coding network and the intermediate layer output characteristics of the first decoding network use Attention constraints with each other, so as to achieve the purpose of joint training of the first coding network and the first decoding network.
Based on this, in one example, the method further comprises: obtaining first intermediate information based on a first resolution image of a first sample through a first coding network; obtaining second intermediate information based on the compact characteristic image of the first sample individual through a first decoding network; determining a network constraint loss based on the first intermediate information and the second intermediate information, the network constraint loss being used to indicate a degree of similarity of the intermediate information of the first encoding network and the first decoding network; adjusting parameters of the first coding network based on the first image fitting loss and the network constraint loss to obtain a trained first coding network; and adjusting parameters of the first decoding network based on the first image reconstruction loss and the network constraint loss to obtain the trained first decoding network.
In this example, the computer device targets a first image fitting loss and a network constraint loss when adjusting parameters of the first encoding network; the first image reconstruction loss and the network constraint loss are targeted when adjusting parameters of the first decoding network. Optionally, the network constraint loss and the first image fitting loss, the first image reconstruction loss are weighted the same, that is, the computer device adjusts the parameters of the first encoding network based on the sum of the first image fitting loss and the network constraint loss, and adjusts the parameters of the first decoding network based on the sum of the first image reconstruction loss and the network constraint loss; alternatively, the network constraint loss and the first image fitting loss, the first image reconstruction loss are not weighted the same, that is, the computer device adjusts the parameters of the first encoding network based on the weighted sum of the first image fitting loss and the network constraint loss, and adjusts the parameters of the first decoding network based on the weighted sum of the first image reconstruction loss and the network constraint loss.
Illustratively, as shown in fig. 3, the teacher model includes a first encoding network and a first decoding network, the computer device calculates network constraint loss L between the intermediate layer output features of the first encoding network and the intermediate layer output features of the first decoding network using attention constraint when training the teacher model, and the corresponding intermediate layer output features of the first encoding network and the first decoding network respectively calculate network constraint loss L between the intermediate layer output features attention
In summary, according to the technical scheme provided by the embodiment of the application, the teacher model is constructed by adopting the design of the coding network and the decoding network, a single mode of constructing the super-resolution image of the sample individual based on the low-resolution image of the sample individual is abandoned, the compact characteristic image of the sample individual is extracted from the high-resolution image of the sample individual, and the compact characteristic image is fitted to the low-resolution image of the sample individual as much as possible, so that the effective degree of the compact characteristic image is fully improved. In addition, in the embodiment of the application, when the coding network and the decoding network are trained, an attention transfer mechanism is introduced to mutually use attention constraints on intermediate layer output characteristics of the coding network and the decoding network, so that the decoding network can apply a compact characteristic image provided by the coding network to improve the image processing effect of super-resolution, and can transfer the level attention characteristics of the coding network into the level of the decoding network.
An exemplary model structure of the student model and an exemplary model training process of the student model are described below.
In one example, the step 240 includes the following steps (step 241 to step 249).
And 241, acquiring the parameter weight of the trained teacher model.
Since the embodiment of the application applies the knowledge distillation technology to the super-resolution image processing, the teacher model outputs a result with higher confidence, but does not need to use soft labels to guide the student model. Thus, in this example, the knowledge migrated from the teacher model to the student model includes: and completing the parameter weight of the trained teacher model.
Optionally, the computer device may migrate all the parameter weights in the teacher model that has completed training to the student model, so that the computer device needs to acquire all the parameter weights in the teacher model that has completed training; or, the computer device may also migrate part of the parameter weights in the teacher model after training to the student model, so that the computer device needs to obtain part of the parameter weights in the teacher model after training. For example, the computer device may obtain all the parameter weights of the model structure part of the teacher model that is completely trained, which is consistent with the model structure of the student model, and migrate the obtained parameter weights to the student network.
And 243, building a student model based on the parameter weight of the teacher model after training.
After the parameter weights of the teacher models which are trained are obtained, the computer equipment can construct the student models based on the parameter weights of the teacher models which are trained, so that the migration process of the parameter weights is completed. Alternatively, since the model structure of the student model is identical to the model structure of the teacher model, the computer device directly uses the parameter weight of the teacher model having completed training as the parameter weight of the student model.
Illustratively, as shown in FIG. 3, the teacher model includes a first encoding network and a first decoding network, and as shown in FIG. 4, the student model includes a second decoding network. Based on this, the parameter weights of the teacher model include parameter weights of the first decoding network; the above step 243 includes: taking the parameter weight of the first decoding network as the parameter weight of the second decoding network; and taking the network structure of the first decoding network as the network structure of the second decoding network. That is, the structure of the second decoding network included in the student model is the same as the structure of the first decoding network included in the teacher model; the parameter weight of the second decoding network included in the student model is the same as the parameter weight of the first decoding network included in the teacher model.
Step 245, a third resolution image of the second sample individual is acquired.
When the computer device trains the student model, a training sample of the student model needs to be obtained. In the embodiment of the present application, the training samples of the learning model include the third resolution images of the second sample individuals, and in the training process, the input of the student model includes the third resolution images of the second sample individuals. Optionally, the training samples of the student model further comprise a first resolution image of a second sample individual, such that during the training process, the input of the student model comprises a third resolution image of the second sample individual and the first resolution image of the second sample individual. The number of training samples of the student model is not limited in the embodiment of the application, that is, the second sample individual may be one sample individual or multiple sample individuals, and the second sample individual usually includes multiple sample individuals in consideration of training accuracy.
In the embodiment of the present application, in a case that a training sample of a student model includes a third resolution image of a second sample individual and a first resolution image of the second sample individual, image contents represented by the third resolution image and the first resolution image of the same sample individual should be identical, for example, the same image contents of the same scene, the same person, the same action, the same building, the same plant, and the like are reflected, and a difference between the image contents is reflected in a difference in image quality caused by a difference in resolution, that is, the image quality of the third resolution image is lower than the image quality of the first resolution image. Optionally, the image contents represented by the third resolution image and the first resolution image of different sample individuals may be consistent or may not be consistent, which is not limited in this embodiment of the application. Alternatively, the number of images of the third resolution image of one sample individual may be one or more; the number of images of the first-resolution image of one sample individual may be one or multiple, which is not limited in this embodiment of the present application.
Optionally, the second sample individual corresponding to the training sample of the student model and the first sample individual corresponding to the training sample of the teacher model may be the same sample individual or different sample individuals, which is not limited in this application. In consideration of equipment storage cost and the like, the sample individuals corresponding to the training samples of the student model and the sample individuals corresponding to the training samples of the teacher model can be the same, so that the number of training samples required to be stored in the computer equipment can be reduced, and the storage cost of the computer equipment is reduced.
And step 247, constructing a second resolution image of the second sample individual based on the third resolution image of the second sample individual through the student model.
After the training samples of the student model are obtained, the computer device may call the student model to process the training samples of the student model. In the embodiment of the application, the computer device constructs the second resolution image of the second sample individual based on the third resolution image of the second sample individual through the student model. The model structure included in the student model is not limited in the embodiment of the present application, and optionally, the student model includes only one part of the learning network, for example, the student model includes the second decoding network; alternatively, the student model includes a plurality of portions of the learning network that are each used to construct the second resolution image of the second sample individual.
And step 249, constructing a reduction degree based on the image of the second resolution image of the second sample individual, and adjusting parameters of the student model.
In the process of training the student model, the computer device needs to adjust the parameters of the student model with a certain target, so that the trained student model can achieve the expected model training effect. In the embodiment of the application, the computer device uses the image construction reduction degree of the second resolution ratio image of the second sample individual as a reference target for adjusting parameters of the student model, that is, the computer device enables the image construction reduction degree of the second resolution ratio image of the second sample individual to achieve a relatively optimal effect by adjusting the parameters of the student model, so as to obtain the student model completing training.
In an embodiment of the present application, the training samples of the student model further include a first resolution image of the second sample individual. And the image construction restoration degree of the second resolution image of the second sample individual is used for indicating the restoration degree of the constructed second resolution image of the second sample individual relative to the first resolution image of the second sample individual, and the restoration degree is used for reflecting the restoration effect of the student model on the second sample individual.
Optionally, the step 249 includes: acquiring a first resolution image of a second sample individual; determining a second image reconstruction loss based on the second resolution image of the second sample individual and the first resolution image of the second sample individual; and adjusting parameters of the student model based on the second image reconstruction loss. Wherein the second image reconstruction penalty is indicative of a degree of image construction restitution of the second resolution image of the second sample individual. And the computer equipment adjusts the parameters of the student model based on the reconstruction loss of the second image so that the second resolution image of the second sample individual can restore the first resolution image of the second sample individual as much as possible, and when the reconstruction loss of the second image is converged, the student model completes training.
In summary, according to the technical scheme provided by the embodiment of the application, the teacher model completing the training guides the training of the student model, so that the lightweight student model can also obtain an ideal super-resolution image processing effect, and the super-resolution image processing can be widely applied in multiple fields. In addition, in the embodiment of the application, the low-resolution images of the sample individuals are used as input in the training process of the student model, the super-resolution images of the sample individuals are used as output, and the input and the output of the student model in the super-resolution image processing are matched, so that the student model is rapidly deployed in practical application.
Taking the model structure and parameters of the teacher model as the model structure and parameters shown in fig. 3 and the model structure and parameters of the student model as the model structure and parameters shown in fig. 4 as examples, peak Signal to Noise Ratio (PSNR) in the process of processing the low-resolution images of the sample individuals respectively by the teacher model and the student model completing training provided in the embodiment of the present application and the teacher model and the student model completing training in the related art is shown in table one below.
TABLE comparison of Peak SNR
Figure BDA0003238460360000181
Figure BDA0003238460360000191
Therefore, based on the student model completing training provided by the embodiment of the application, the peak signal-to-noise ratio of super-resolution image processing can be improved by 0.21dB, and therefore, compared with the technical scheme provided by the related technology, the technical scheme provided by the embodiment of the application can effectively reduce distortion in the image processing process.
Referring to fig. 5, a block diagram of an image processing apparatus for super resolution provided by an embodiment of the present application is shown. The apparatus has a function of implementing the super-resolution image processing method example described in the embodiment of fig. 2, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The device can be a computer device and can also be arranged in the computer device. The apparatus 500 may include: a first sample acquisition module 510, a first image processing module 520, a first parameter adjustment module 530, and a student model training module 540.
A first sample acquiring module 510 for acquiring a first resolution image of a first sample.
A first image processing module 520, configured to extract, through a teacher model, a compact feature image of the first sample individual based on the first resolution image of the first sample individual, and construct a second resolution image of the first sample individual based on the compact feature image of the first sample individual; wherein the compact feature image of the first sample individual includes representative edge information of the first sample individual.
A first parameter adjusting module 530, configured to adjust parameters of the teacher model based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction restoration degree of the second resolution image of the first sample individual, so as to obtain a trained teacher model.
And a student model training module 540, configured to train a student model based on the trained teacher model, and perform super-resolution image processing using the trained student model.
In one example, as shown in FIG. 6, the teacher model includes a first encoding network and a first decoding network; the first image processing module 520 includes: an image extraction unit 522, configured to extract, through the first encoding network, a compact feature image of the first sample individual based on a first resolution image of the first sample individual; an image construction unit 524, configured to construct a second resolution image of the first sample entity based on the compact feature image of the first sample entity through the first decoding network.
In one example, as shown in fig. 6, the image extracting unit 522 is configured to: extracting feature information of a first resolution image of the first sample individual through the first coding network; performing down-sampling processing on the feature information to obtain representative information of a first resolution image of the first sample, wherein the representative information comprises the representative edge information; and performing dimension reduction processing on the representative information through the first coding network to obtain a compact characteristic image of the first sample individual.
In one example, as shown in fig. 6, the image construction unit 524 is configured to: performing dimension-increasing processing on the compact characteristic image of the first sample individual through the first decoding network, and extracting high-dimensional information of the compact characteristic image of the first sample individual; performing upsampling processing on the high-dimensional information to obtain construction information of a second resolution image of the first sample individual, wherein the construction information comprises the high-dimensional information and filling information of the second resolution image of the first sample individual; constructing, by the first decoding network, a second resolution image of the first sample individual based on the construction information.
In one example, the teacher model includes a first encoding network and a first decoding network; the first parameter adjusting module 530 is configured to: acquiring a third resolution image of the first sample individual; determining a first image fitting loss based on the compact feature image of the first sample individual and the third resolution image of the first sample individual, the first image fitting loss being indicative of a feature extraction effectiveness of the compact feature image of the first sample individual; adjusting parameters of the first coding network based on the first image fitting loss to obtain a trained first coding network; determining a first image reconstruction loss based on a second resolution image of the first sample individual and a first resolution image of the first sample individual, the first image reconstruction loss being indicative of an image construction reduction degree of the second resolution image of the first sample individual; and adjusting parameters of the first decoding network based on the first image reconstruction loss to obtain the trained first decoding network.
In one example, as shown in fig. 6, the apparatus 500 further comprises: an intermediate information constraint module 550, configured to obtain first intermediate information based on the first resolution image of the first sample through the first coding network; obtaining second intermediate information based on the compact feature image of the first sample individual through the first decoding network; determining a network constraint loss based on the first intermediate information and the second intermediate information, the network constraint loss indicating a degree of similarity of the intermediate information of the first encoding network and the first decoding network; the first parameter adjusting module 530 is further configured to: adjusting parameters of the first coding network based on the first image fitting loss and the network constraint loss to obtain the trained first coding network; and adjusting parameters of the first decoding network based on the first image reconstruction loss and the network constraint loss to obtain the trained first decoding network.
In one example, the student model training module 540 is to: acquiring the parameter weight of the trained teacher model; constructing the student model based on the parameter weight of the trained teacher model; acquiring a third resolution image of the second sample individual; constructing, by the student model, a second resolution image of the second sample individual based on a third resolution image of the second sample individual; and constructing a reduction degree based on the image of the second resolution image of the second sample individual, and adjusting the parameters of the student model.
In one example, the teacher model includes a first encoding network and a first decoding network, the parameter weights of the teacher model include parameter weights of the first decoding network; the student model comprises a second decoding network; the student model training module 540 is configured to: taking the parameter weight of the first decoding network as the parameter weight of the second decoding network; and taking the network structure of the first decoding network as the network structure of the second decoding network.
In one example, the student model training module is to: acquiring a first resolution image of the second sample individual; determining a second image reconstruction loss based on a second resolution image of the second sample individual and a first resolution image of the second sample individual, the second image reconstruction loss being indicative of a degree of image construction restitution of the second resolution image of the second sample individual; adjusting parameters of the student model based on the second image reconstruction loss.
In summary, according to the technical scheme provided by the embodiment of the application, the knowledge distillation technology is applied to the super-resolution image processing, and finally the lightweight student model is deployed for executing the super-resolution image processing, so that the device capability of the deployment device is not excessively required, the rapid deployment can be realized in the embedded device or the mobile terminal device, and the super-resolution image processing can be widely applied in multiple fields. In addition, in the embodiment of the application, the lightweight student model takes a teacher model with larger depth and higher precision as guidance in the training process, so that an ideal image processing effect can be achieved even if the model structure of the student model is simpler. In addition, in the embodiment of the application, when the teacher model is trained, the high-resolution image of the sample individual is taken as input, the compact feature image of the sample individual is extracted, so that representative and significant features of the sample individual can be obtained as much as possible, the capability of the teacher model for constructing the super-resolution image of the sample individual is improved, and the capability of the student model for constructing the super-resolution image of the sample individual is also improved as the teacher model is taken as guidance for the training of the student model.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, the division of each functional module is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 7, a block diagram of a computer device provided in an embodiment of the present application is shown, where the computer device can be used to implement the functions of the above-mentioned super-resolution image processing method example. Specifically, the method comprises the following steps:
the computer device 700 includes a Processing Unit (e.g., a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), etc.) 701, a system Memory 704 including a RAM (Random-Access Memory) 702 and a ROM (Read-Only Memory) 703, and a system bus 705 connecting the system Memory 704 and the Central Processing Unit 701. The computer device 700 also includes an I/O System (basic Input/Output System) 706 for facilitating information transfer between devices within the computer device, and a mass storage device 707 for storing an operating System 713, application programs 714, and other program modules 715.
The basic input/output system 706 includes a display 708 for displaying information and an input device 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and input device 709 are connected to the central processing unit 701 through an input output controller 710 coupled to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable media provide non-volatile storage for the computer device 700. That is, the mass storage device 707 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.
Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical, magnetic, tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 704 and mass storage device 707 described above may collectively be referred to as memory.
The computer device 700 may also operate as a remote computer connected to a network via a network, such as the internet, according to embodiments of the present application. That is, the computer device 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 711.
The memory also includes at least one instruction, at least one program, set of codes, or set of instructions stored in the memory and configured to be executed by the one or more processors to implement the above-described super-resolution image processing method.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions that is loaded and executed by a processor to implement the above-mentioned image processing method for super resolution.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM).
In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the super-resolution image processing method described above.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only show an exemplary possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the illustrated sequence, which is not limited in this application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (16)

1. An image processing method for super-resolution, the method comprising:
acquiring a first resolution image of a first sample individual;
extracting, by a teacher model, a compact feature image of the first sample individual based on a first resolution image of the first sample individual, and constructing a second resolution image of the first sample individual based on the compact feature image of the first sample individual; wherein the compact feature image of the first sample individual includes representative edge information of the first sample individual;
based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual, adjusting the parameters of the teacher model to obtain a trained teacher model;
and training the student model based on the trained teacher model, and executing super-resolution image processing by using the trained student model.
2. The method of claim 1, wherein the teacher model comprises a first encoding network and a first decoding network;
the extracting, by a teacher model, a compact feature image of the first sample individual based on a first resolution image of the first sample individual, and constructing a second resolution image of the first sample individual based on the compact feature image of the first sample individual, includes:
extracting, by the first encoding network, a compact feature image of the first sample individual based on a first resolution image of the first sample individual;
constructing, by the first decoding network, a second resolution image of the first individual sample based on the compact feature image of the first individual sample.
3. The method according to claim 2, wherein said extracting, via the first encoding network, the compact feature image of the first individual sample based on the first resolution image of the first individual sample comprises:
extracting feature information of a first resolution image of the first sample individual through the first coding network;
performing down-sampling processing on the feature information to obtain representative information of a first resolution image of the first sample, wherein the representative information comprises the representative edge information;
and performing dimension reduction processing on the representative information through the first coding network to obtain a compact characteristic image of the first sample individual.
4. The method of claim 2, wherein said constructing, by the first decoding network, a second resolution image of the first sample entity based on the compact feature image of the first sample entity comprises:
performing, by the first decoding network, dimension-up processing on the compact feature image of the first sample individual, and extracting high-dimensional information of the compact feature image of the first sample individual;
performing upsampling processing on the high-dimensional information to obtain construction information of a second resolution image of the first sample individual, wherein the construction information comprises the high-dimensional information and filling information of the second resolution image of the first sample individual;
constructing, by the first decoding network, a second resolution image of the first sample individual based on the construction information.
5. The method of claim 1, wherein the teacher model comprises a first encoding network and a first decoding network; the method for obtaining the teacher model after training by adjusting the parameters of the teacher model based on the feature extraction effectiveness of the compact feature image of the first sample individual and the image construction reduction degree of the second resolution image of the first sample individual includes:
acquiring a third resolution image of the first sample individual;
determining a first image fitting loss based on the compact feature image of the first sample individual and the third resolution image of the first sample individual, the first image fitting loss being indicative of a feature extraction effectiveness of the compact feature image of the first sample individual;
adjusting parameters of the first coding network based on the first image fitting loss to obtain a trained first coding network;
determining a first image reconstruction loss based on a second resolution image of the first individual and a first resolution image of the first individual, the first image reconstruction loss being indicative of a degree of image construction reduction of the second resolution image of the first individual;
and adjusting parameters of the first decoding network based on the first image reconstruction loss to obtain the trained first decoding network.
6. The method of claim 5, further comprising:
obtaining first intermediate information based on a first resolution image of the first sample through the first coding network;
obtaining second intermediate information based on the compact feature image of the first sample individual through the first decoding network;
determining a network constraint loss based on the first intermediate information and the second intermediate information, the network constraint loss indicating a degree of similarity of the intermediate information of the first encoding network and the first decoding network;
adjusting parameters of the first coding network based on the first image fitting loss to obtain a trained first coding network, including:
adjusting parameters of the first coding network based on the first image fitting loss and the network constraint loss to obtain the trained first coding network;
adjusting parameters of the first decoding network based on the first image reconstruction loss to obtain a trained first decoding network, including:
and adjusting parameters of the first decoding network based on the first image reconstruction loss and the network constraint loss to obtain the trained first decoding network.
7. The method of claim 1, wherein training a student model based on the trained teacher model comprises:
acquiring the parameter weight of the trained teacher model;
constructing the student model based on the parameter weight of the trained teacher model;
acquiring a third resolution image of a second sample individual;
constructing a second resolution image of a second sample individual based on a third resolution image of the second sample individual through a student model;
and constructing a reduction degree based on the image of the second resolution image of the second sample individual, and adjusting the parameters of the student model.
8. The method of claim 7, wherein the teacher model includes a first encoding network and a first decoding network, and wherein the parameter weights of the teacher model include parameter weights of the first decoding network; the student model comprises a second decoding network;
the building of the student model based on the parameter weight of the teacher model after training comprises the following steps:
taking the parameter weight of the first decoding network as the parameter weight of the second decoding network;
and taking the network structure of the first decoding network as the network structure of the second decoding network.
9. The method of claim 7, wherein the constructing the degree of restitution based on the image of the second resolution image of the second sample individual, adjusting parameters of the student model, comprises:
acquiring a first resolution image of the second sample individual;
determining a second image reconstruction loss based on a second resolution image of the second sample individual and a first resolution image of the second sample individual, the second image reconstruction loss being indicative of a degree of image construction restitution of the second resolution image of the second sample individual;
adjusting parameters of the student model based on the second image reconstruction loss.
10. An image processing apparatus for super-resolution, the apparatus comprising:
the first sample acquisition module is used for acquiring a first resolution image of a first sample individual;
the first image processing module is used for extracting a compact characteristic image of the first sample individual based on a first resolution image of the first sample individual through a teacher model and constructing a second resolution image of the first sample individual based on the compact characteristic image of the first sample individual; wherein the compact feature image of the first sample individual includes representative edge information of the first sample individual;
the first parameter adjusting module is used for establishing a restoration degree based on the feature extraction effectiveness degree of the compact feature image of the first sample individual and the image construction restoration degree of the second resolution image of the first sample individual, and adjusting parameters of the teacher model to obtain a teacher model which completes training;
and the student model training module is used for training the student model based on the teacher model which completes the training and executing super-resolution image processing by using the student model obtained by training.
11. The apparatus of claim 10, wherein the teacher model comprises a first encoding network and a first decoding network; the first parameter adjusting module is configured to:
acquiring a third resolution image of the first sample individual;
determining a first image fitting loss based on the compact feature image of the first individual and the third resolution image of the first individual, the first image fitting loss being indicative of a feature extraction effectiveness of the compact feature image of the first individual;
adjusting parameters of the first coding network based on the first image fitting loss to obtain a trained first coding network;
determining a first image reconstruction loss based on a second resolution image of the first sample individual and a first resolution image of the first sample individual, the first image reconstruction loss being indicative of an image construction reduction degree of the second resolution image of the first sample individual;
and adjusting parameters of the first decoding network based on the first image reconstruction loss to obtain the trained first decoding network.
12. The apparatus of claim 11, further comprising:
an intermediate information constraint module, configured to obtain first intermediate information based on the first resolution image of the first sample through the first coding network; obtaining second intermediate information based on the compact feature image of the first sample individual through the first decoding network; determining a network constraint loss based on the first intermediate information and the second intermediate information, the network constraint loss indicating a degree of similarity of the intermediate information of the first encoding network and the first decoding network;
the first parameter adjustment module is further configured to: adjusting parameters of the first coding network based on the first image fitting loss and the network constraint loss to obtain the trained first coding network; and adjusting parameters of the first decoding network based on the first image reconstruction loss and the network constraint loss to obtain the trained first decoding network.
13. The apparatus of claim 10, wherein the student model training module is configured to:
acquiring the parameter weight of the trained teacher model;
constructing the student model based on the parameter weight of the trained teacher model;
acquiring a third resolution image of a second sample individual;
constructing a second resolution image of a second sample individual based on a third resolution image of the second sample individual through a student model;
and constructing a reduction degree based on the image of the second resolution image of the second sample individual, and adjusting the parameters of the student model.
14. A computer device comprising a processor and a memory, said memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by said processor to implement the super resolution image processing method according to any of claims 1 to 9.
15. A computer-readable storage medium, characterized in that at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, which is loaded and executed by a processor to implement the super resolution image processing method according to any of claims 1 to 9.
16. A computer program product or computer program, characterized in that the computer program product or computer program comprises computer instructions, the computer instructions being stored in a computer readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executing the computer instructions causing the computer device to perform the super-resolution image processing method of any one of claims 1 to 9.
CN202111009776.6A 2021-08-31 2021-08-31 Super-resolution image processing method, device, equipment and storage medium Pending CN115731101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111009776.6A CN115731101A (en) 2021-08-31 2021-08-31 Super-resolution image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111009776.6A CN115731101A (en) 2021-08-31 2021-08-31 Super-resolution image processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115731101A true CN115731101A (en) 2023-03-03

Family

ID=85291325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111009776.6A Pending CN115731101A (en) 2021-08-31 2021-08-31 Super-resolution image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115731101A (en)

Similar Documents

Publication Publication Date Title
CN110796111B (en) Image processing method, device, equipment and storage medium
CN113240580B (en) Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
US11354906B2 (en) Temporally distributed neural networks for video semantic segmentation
Zhao et al. Distortion-aware CNNs for Spherical Images.
US20230072627A1 (en) Gaze correction method and apparatus for face image, device, computer-readable storage medium, and computer program product face image
Wang et al. Towards compact single image super-resolution via contrastive self-distillation
CN111369681A (en) Three-dimensional model reconstruction method, device, equipment and storage medium
CN111369440A (en) Model training method, image super-resolution processing method, device, terminal and storage medium
CN110246084B (en) Super-resolution image reconstruction method, system and device thereof, and storage medium
CN111325664A (en) Style migration method and device, storage medium and electronic equipment
CN112132770A (en) Image restoration method and device, computer readable medium and electronic equipment
CN112989085A (en) Image processing method, image processing device, computer equipment and storage medium
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
CN114328943A (en) Question answering method, device, equipment and storage medium based on knowledge graph
CN116912268A (en) Skin lesion image segmentation method, device, equipment and storage medium
CN114333069B (en) Object posture processing method, device, equipment and storage medium
CN115731101A (en) Super-resolution image processing method, device, equipment and storage medium
CN112862840B (en) Image segmentation method, device, equipment and medium
CN113762261A (en) Method, device, equipment and medium for recognizing characters of image
CN114282741A (en) Task decision method, device, equipment and storage medium
CN113609355A (en) Video question-answering system, method, computer and storage medium based on dynamic attention and graph network reasoning
CN110458919B (en) Dynamic texture video generation method, device, server and storage medium
Liu et al. MODE: Monocular omnidirectional depth estimation via consistent depth fusion
CN117218300B (en) Three-dimensional model construction method, three-dimensional model construction training method and device
CN116958563A (en) Image recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination