WO2023061116A1 - 图像处理网络的训练方法、装置、计算机设备和存储介质 - Google Patents

图像处理网络的训练方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2023061116A1
WO2023061116A1 PCT/CN2022/117789 CN2022117789W WO2023061116A1 WO 2023061116 A1 WO2023061116 A1 WO 2023061116A1 CN 2022117789 W CN2022117789 W CN 2022117789W WO 2023061116 A1 WO2023061116 A1 WO 2023061116A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image data
network
sample
face
Prior art date
Application number
PCT/CN2022/117789
Other languages
English (en)
French (fr)
Inventor
石世昌
黄飞
华超
熊唯
杨梁
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2023570432A priority Critical patent/JP2024517359A/ja
Priority to EP22880058.7A priority patent/EP4300411A1/en
Publication of WO2023061116A1 publication Critical patent/WO2023061116A1/zh
Priority to US18/207,572 priority patent/US20230334833A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • the present application relates to the technical field of image processing, and in particular to an image processing network training method, device, computer equipment and storage medium.
  • Optimizing images can be optimized by training image models.
  • multiple image models with different optimization tasks may be trained separately, and then the multiple image models obtained through training may sequentially superimpose and optimize the image.
  • one image model may have a reverse optimization effect on another image model, resulting in the mutual weakening of the optimization effects between the various image models, resulting in the trained The image model of the image optimization effect on the image is poor.
  • the present application provides a training method of an image processing network on the one hand, the method comprising:
  • the sample image pair includes low-definition image data and high-definition image data; the low-definition image data and the high-definition image data have the same image content;
  • the network parameters of the image processing network are updated according to the super-resolution loss function, image quality loss function, face loss function and sharpening loss function to obtain a trained image processing network.
  • An image processing method is provided on the one hand, the method comprising:
  • the trained image processing network to obtain the super-resolution image data corresponding to the initial image data; the resolution of the super-resolution image data is greater than or equal to the target resolution;
  • the trained image processing network to obtain the second enhanced image data corresponding to the first enhanced image data; if the first enhanced image data contains a human face image, the second enhanced image data is the human face in the first enhanced image data Image data obtained after face enhancement is performed on the image;
  • a training device for an image processing network comprising:
  • a sample acquisition module configured to acquire a sample image pair; the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and the high-definition image data have the same image content;
  • the calling module is used to call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain sample super-resolution image data, and generate a super-resolution loss function according to the sample super-resolution image data and high-definition image data;
  • the calling module is used to call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and the high-definition image data;
  • the calling module is used to call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data to obtain the sample enhanced face image, and combine the sample enhanced face image with the first sample enhanced image data Perform fusion to obtain the second sample enhanced image data, and generate a face loss function according to the sample enhanced face image and the face image in the high-definition image data;
  • the calling module is used to call the image processing network to perform image sharpening processing on the second sample enhanced image data, obtain the sample sharpened image data, and generate a sharpening loss function according to the sample sharpened image data and the high-definition image data;
  • the update module is used to update the network parameters of the image processing network according to the super-resolution loss function, image quality loss function, face loss function and sharpening loss function, so as to obtain a trained image processing network.
  • an image processing device comprising:
  • the super-resolution calling module is used to call the trained image processing network to obtain the super-resolution image data corresponding to the initial image data; the resolution of the super-resolution image data is greater than or equal to the target resolution;
  • the image quality enhancement module is used to call the trained image processing network to perform image quality enhancement processing on the super-resolution image data to obtain the first enhanced image data;
  • the human face enhancement module is used to call the trained image processing network to obtain the second enhanced image data corresponding to the first enhanced image data; if the first enhanced image data contains a human face image, then the second enhanced image data is for the first Image data obtained after face enhancement is performed on the face image in the enhanced image data;
  • the sharpening module is used to call the trained image processing network to perform image sharpening processing on the second enhanced image data, obtain sharpened image data, and output the sharpened image data.
  • One aspect provides a computer device, including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the method in one aspect of the present application.
  • a non-volatile computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor performs the above-mentioned method.
  • a computer program product or computer program comprising computer readable instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device executes the method provided in various optional modes such as the above aspect.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a network training scenario provided by the present application.
  • Fig. 3 is a schematic flow chart of a training method for an image processing network provided by the present application.
  • FIG. 4 is a schematic structural diagram of an encoding and decoding network provided by the present application.
  • Fig. 5 is a schematic structural diagram of a basic unit provided by the present application.
  • FIG. 6 is a schematic diagram of a scenario for acquiring a loss function provided by the present application.
  • FIG. 7 is a schematic flowchart of an image processing method provided by the present application.
  • FIG. 8 is a schematic diagram of a scene for optimizing a human face provided by the present application.
  • FIG. 9 is a schematic diagram of an image optimization scene provided by the present application.
  • FIG. 10 is a schematic diagram of a data push scenario provided by the present application.
  • FIG. 11 is a schematic structural diagram of a training device for an image processing network provided by the present application.
  • FIG. 12 is a schematic structural diagram of an image processing device provided by the present application.
  • Fig. 13 is a schematic structural diagram of a computer device provided by the present application.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
  • the network architecture may include a server 200 and a terminal device cluster, and the terminal device cluster may include one or more terminal devices, and the number of terminal devices is not limited here.
  • a plurality of terminal devices may specifically include a terminal device 100a, a terminal device 101a, a terminal device 102a,..., a terminal device 103a; as shown in Figure 1, a terminal device 100a, a terminal device 101a, a terminal device 102a,...
  • Each terminal device 103a can be connected to the server 200 through a network, so that each terminal device can perform data interaction with the server 200 through the network connection.
  • the server 200 shown in Figure 1 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • the terminal equipment can be: intelligent terminals such as smart phones, tablet computers, notebook computers, desktop computers, smart TVs, and vehicle terminals. The following uses the communication between the terminal device 100a and the server 200 as an example to describe the embodiment of the present application in detail.
  • FIG. 2 is a schematic diagram of a network training scenario provided by the present application.
  • the above-mentioned terminal device 100a may have an application client
  • the server 200 may be the background server of the application client
  • the server 200 may push video data to the application client, but when the server 200 pushes video data to the application client, it may
  • the video data is optimized and then pushed to the application client.
  • Optimizing the video data may refer to optimizing each image frame included in the video data.
  • the server 200 can optimize the image frames in the video data through the trained image processing network.
  • For the training process of the image processing network please refer to the following description.
  • the image processing network that needs to be trained may include a super-resolution network, an image quality enhancement network, a face enhancement network, and a sharpening network.
  • the server 200 can obtain a sample image pair, which can include low-definition image data and high-definition image data, and the server 200 can input the sample image pair into the image processing network.
  • the super-resolution network in the image processing network the The low-resolution image data is subjected to super-resolution processing (that is, the resolution of the low-resolution image data is increased), and the sample super-resolution image data is obtained, and then the super-resolution loss function can be generated through the sample super-resolution image data and high-definition image data.
  • image quality enhancement processing can be performed on the sample super-resolution image data to obtain the first sample enhanced image data, and then the first sample enhanced image data and high-definition image data can be generated Image quality loss function.
  • the face image in the first sample enhanced image data can be enhanced to obtain a sample enhanced face image, and then the sample enhanced face image and high-definition image data
  • the face loss function can be generated from the high-definition face image in
  • the second sample enhanced image data can also be generated by fusing the sample enhanced face image with the first sample enhanced image data.
  • the second sample enhanced image data can be sharpened to obtain the sample sharpened image data, and then the sharpened loss function can be generated through the sample sharpened image data and high-definition image data .
  • the server 200 can pass the above generated super-resolution loss function, image quality loss function, face loss function and sharpening loss function forward in the image processing network, and then pass the super-resolution loss function, image quality loss function, face
  • the loss function and the sharpening loss function together update the network parameters of the network passed to the image processing network to obtain a trained image processing network.
  • the trained image processing network can be used to optimize the image, such as optimizing the following initial image data. For the specific process of this optimization, please refer to the relevant description in the corresponding embodiment in FIG. 7 below.
  • the image resolution can be improved by optimizing the image through the image model used to increase the image resolution
  • the image quality can be enhanced by optimizing the image through the image model used to enhance the image quality.
  • the image model of image quality further superimposes the image after the resolution is increased, and the obtained image is distorted instead, resulting in a poor overall image effect, which shows that there is a conflict between the image processing tasks of the two image models.
  • a multi-task joint training framework from super-resolution network, image quality enhancement network, face enhancement network to sharpening network is provided.
  • image quality enhancement network, face enhancement network and sharpening network are sequentially cascaded, so that during the training process, the super-resolution network obtains a super-resolution loss function according to the input sample image pair and the output of its own network, except for the super-resolution network
  • the image quality loss function, face loss function and sharpening loss function are sequentially obtained.
  • Each loss function contains its own network Parameters, and because each loss function can be transmitted forward in the entire network, the network parameters of these networks can be mutually constrained and interacted with each other, and then the transmitted network (such as super-resolution network, image quality enhancement Network, face enhancement network and sharpening network) network parameters are updated to realize the training of super-resolution network, image quality enhancement network, face enhancement network and sharpening network, which are related, integrated and mutually promoted.
  • the super-resolution network, image quality enhancement network, face enhancement network and sharpening network not only have good training effects, but also do not conflict when superimposing and optimizing images together, so that the trained entire network can The overlay optimization of is better.
  • FIG. 3 is a schematic flowchart of a training method for an image processing network provided in the present application.
  • the execution subject in the embodiment of the present application may be a computer device or a computer device cluster composed of multiple computer devices.
  • the computer device may be a server or a terminal device. Therefore, the execution subject in the embodiment of the present application may be a server, may also be a terminal device, or may be composed of a server and a terminal device.
  • the implementation subject in the embodiment of the present application is a server as an example for description.
  • the method may include:
  • Step S101 acquiring a sample image pair;
  • the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and high-definition image data have the same image content;
  • the server can obtain a sample image pair, which is an image pair used to train the image processing network.
  • a sample image pair can include a low-definition image data and an image corresponding to the low-definition image data. Since the principle of using each sample image pair to train the image processing network is the same, the process of training the image processing network through a sample image pair (hereinafter collectively referred to as the sample image pair) is taken as an example to illustrate, please refer to the following Description of the above content.
  • the low-definition image data and high-definition image data contained in the sample image pair are images with the same image content but different image resolutions (which can be referred to as resolution), and the resolution of the low-definition image data is lower than that of the high-definition image data .
  • the resolution of the low-definition image data can be smaller than the target resolution
  • the resolution of the high-definition image data can be greater than or equal to the target resolution
  • the target resolution can be set according to the actual application scene, for example, the target resolution can be 1920*1080 .
  • the server may obtain high-definition sample video data, and the sample video data may be video data for obtaining sample image pairs. Since one piece of video data may contain multiple image frames, the high-definition sample video data may refer to video data in which the resolution of the included image frames is greater than a resolution threshold, and the resolution threshold may be set according to an actual application scenario. In addition, the sample video data may also be video data in which the resolution of the included image frames is greater than the above-mentioned target resolution.
  • the server can divide the sample video data into frames to obtain multiple image frames included in the sample video data, and the image frames included in the sample video data can be called sample image frames.
  • the server can also use the target bit rate (the target bit rate is low) to encode and decode the sample video data (that is, encode and then decode), and the video data obtained after encoding and decoding can be called low-quality video data .
  • the image frame quality of the low-quality video data is lower than the image frame quality of the sample video data, that is, the definition of the image frames included in the low-quality video data is lower than the definition of the image frames included in the sample video data.
  • the target code rate may be a code rate lower than the code rate threshold
  • the code rate threshold may be set according to the actual application scenario
  • the target code rate may be a relatively low code rate, therefore, the sample video is processed by the target code rate After the data is encoded and decoded, the image quality of the obtained low-quality video data will be deteriorated, so that the definition of the image frames included in the low-quality video data will be reduced.
  • the image frames included in the above-mentioned low-quality video data can be called low-quality image frames, and the low-quality video data can include low-quality image frames corresponding to each sample image frame, and one sample image frame corresponds to one low-quality image frame. Since the codec does not change the resolution of the image frame, the low-quality image frame obtained at this time is actually a high-resolution image frame. Therefore, the resolution of each low-quality image frame in the low-quality video data can be lowered. If the resolution is adjusted to be lower than the target resolution, the low-quality image frame after the resolution is reduced can be called a low-resolution image frame.
  • the server can The resolution image frame constructs a sample image pair, a sample image pair can include a sample image frame and a low-resolution image frame corresponding to the sample image frame (that is, after the resolution of the low-quality image frame corresponding to the sample image frame is reduced obtained image frame), the sample image frame included in a sample image pair is high-definition image data, and the low-resolution image data included in a sample image pair is low-resolution image data. Therefore, multiple sample image pairs can be obtained through the above sample video data.
  • the server can obtain the sample video data, and the server can divide the sample video data into frames to obtain a plurality of sample image frames contained in the sample video data, and then, the server can obtain a plurality of sample image frames from the sample video data. Select the target image frame in the frame as the above-mentioned high-definition image data, and then, the server can perform average fusion processing on the target image frame and the adjacent image frames of the target image frame in a plurality of sample image frames, and the average fusion processing can be performed The obtained image frame is called an average image frame, and then the server can lower the resolution of the average image frame (eg lower than the target resolution) to obtain low-definition image data.
  • the server can lower the resolution of the average image frame (eg lower than the target resolution) to obtain low-definition image data.
  • one target image frame may be any one of multiple image frames included in the sample video data, one target image frame may be one high-definition image data, and there may be multiple target image frames.
  • the adjacent image frames of the target image frame may include one or more image frames on the left side of the target image frame and one or more image frames on the right side of the target image frame among the plurality of sample image frames, and the adjacent image frames of the target image frame The number is determined according to the actual application scenario and is not limited.
  • the server may directly obtain high-definition image data, for example, the high-definition image data may be downloaded from a webpage, or may be local high-definition image data. Therefore, the server can perform Gaussian blur processing on the high-definition image data, and then lower the resolution of the image frame after the Gaussian blur processing (for example, lower than the target resolution), and then obtain the low-resolution image corresponding to the high-definition image data. Clear image data. Among them, Gaussian blur is also called Gaussian smoothing.
  • the server can also directly obtain the high-definition image data, and the server can perform distortion format conversion on the high-definition image data, and then lower the resolution of the image frame converted to the distortion format (for example, lower than the target resolution rate), the low-definition image data corresponding to the high-definition image data can be obtained.
  • performing distortion format conversion on high-definition image data can be understood as compressing high-definition image data, and the quality of image frames obtained after compression will be lower than that of high-definition image data. Refers to converting the data format of high-definition image data from png (a lossless compression image format) to jpg (a lossy compression image format).
  • the server can directly obtain high-definition image data.
  • the server can also obtain sample low-quality video data.
  • the sample low-quality video data can refer to video data containing image frames whose definition is lower than the definition threshold. Therefore, the server can learn the sample low-quality video data through machine learning.
  • the noise data of the data and then the server can obtain the low-definition image data by fusing the noise data into the high-definition image data, and then lowering the resolution of the high-definition image data fused with the noise data.
  • the manner of fusing the noise data in the high-definition image data may be adding noise data in the high-definition image data.
  • the process of the server learning the noise data of the sample low-quality video data by means of machine learning may be: the server may obtain a noise learning network, and the noise learning network may be a model capable of learning the noise data in the video data, therefore, The server can input the sample low-quality video data into the noise learning network, that is, the noise data of the sample low-quality video data can be learned through the noise learning network.
  • a sample image pair can be constructed by using the acquired high-definition image data and low-definition image data.
  • the constructed sample image pairs are used to train the image processing network
  • the image processing network may include a super-resolution network, an image quality enhancement network, a face enhancement network and a sharpening network.
  • each generation network of the image processing network such as super-resolution network, image quality enhancement network, and face enhancement network
  • a U-Net structure a network structure based on the idea of encoder (encoding)-decoder (decoding) can be adopted.
  • Each generation network can be composed of basic units (blocks, that is, blocks), where the encoder and decoder of the super-resolution network use 3 blocks respectively (meaning that one layer of the encoder and decoder can use 3 blocks), and the image quality is enhanced.
  • the network and the face enhancement network use 5 blocks respectively (meaning that one layer in the encoder and decoder can use 5 blocks), and the number of basic channels of each block can be 16.
  • the first 3x3 convolution inside the block will be channel-amplified to increase the feature dimension, and then the output 3x3 convolution will be feature-compressed to keep the dimension of the input channel unchanged, which can learn the image more A lot of characteristic information.
  • PixelShuffle an upsampling method
  • the interpolation upsampling scheme can be replaced by channel-to-space dimension conversion, which can achieve more nice visuals.
  • This application obtains sample image pairs in a variety of ways, which can enrich the types of sample image pairs obtained, and then uses a variety of sample image pairs to train the image processing network, which can improve the training effect of the image processing network. For details, please refer to Described below.
  • FIG. 4 is a schematic structural diagram of a codec network provided in the present application
  • FIG. 5 is a schematic structural diagram of a basic unit provided in the present application.
  • the network structure can include an encoder and a decoder.
  • the encoder can have three layers, and the decoder can have three layers. 3 floors.
  • each layer of the encoder and decoder can be composed of basic units as shown in Figure 5.
  • a basic unit can include a 3*3 convolutional layer, a normalized network layer, an activation layer (ie LeakyRelu), 3*3 convolutional layer and 1*1 convolutional layer.
  • Step S102 call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain sample super-resolution image data, and generate a super-resolution loss function according to the sample super-resolution image data and high-definition image data;
  • the server can call the super-resolution network in the image processing network to increase the resolution of the low-definition image data, such as adjusting to the target resolution, and then generate sample super-resolution image data corresponding to the low-definition image data.
  • the sub-image data is the image data obtained by increasing the resolution of the low-definition image data to the target resolution through the super-resolution network.
  • the server can generate a super-resolution loss function through the sample super-resolution image data and high-definition image data, as follows:
  • the super-resolution loss function can include two parts, one part is the loss function at the pixel level, and the other is the loss function at the feature level.
  • the loss function at the pixel level may be called the first super-resolution loss function
  • the loss function at the feature level may be called the second super-resolution loss function
  • the server can generate the first super-resolution loss function through the pixel value elements contained in the sample super-resolution image data and the pixel value elements contained in the high-definition image data.
  • the first super-resolution loss function l c1 can refer to the following formula (1) Shown:
  • the number of pixels contained in the sample super-resolution image data and the high-definition image data is the same, both are N, and the pixel value at a pixel point can be called a pixel value element, and I represents high-definition image data, Represents the sample super-resolution image data, I (i) represents the i-th pixel value element in the high-definition image data, Indicates the i-th pixel value element in the sample super-resolution image data, i counts from 0, i is less than or equal to N, and N is the total number of pixel value elements in the image data.
  • the server can generate a second super-resolution loss function through the feature value elements contained in the feature map of the sample super-resolution image data in the super-resolution network and the feature value elements contained in the feature map of the high-definition image data in the super-resolution network.
  • the second super-resolution loss function l c2 can be shown in the following formula (2):
  • the value of l can be determined according to the actual application scenario, l represents the number of layers of the feature layer, h l represents the height of the feature map of the l-th feature layer of the super-resolution network, w l represents the feature map of the l-th feature layer
  • the width of , c l represents the number of channels of the lth feature layer
  • s 1 corresponds to the height of the feature map
  • the maximum value of s 1 is equal to h l
  • j 1 corresponds to the width of the feature map
  • the maximum value of j 1 is equal to w l
  • k 1 corresponds to the channel of the feature map
  • the maximum value of k 1 is equal to c l .
  • the value at each feature point in the feature map can be called a feature value element. Therefore, it can be understood that s 1 , j 1 , and k 1 can be understood as indexes for feature value elements in the feature map.
  • represents an operation, that is, the operation of extracting the eigenvalue element at the corresponding position from the feature map.
  • the super-resolution loss function may be the sum of the above-mentioned first super-resolution loss function l c1 and the second super-resolution loss function l c2 .
  • Step S103 call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and the high-definition image data;
  • the above sample super-resolution image data obtained through super-resolution images can be the input of the image quality enhancement network, and the server can call the image quality enhancement network in the image processing network to perform image quality enhancement processing on the sample super-resolution image data , to generate first sample enhanced image data corresponding to the sample super-resolution image data, where the first sample enhanced image data is image data obtained by performing image quality enhancement processing on the sample super-resolution image data.
  • the server can use the peak signal-to-noise ratio (PSNR) obtained by the mean square error between the first sample enhanced image data and the high-definition image data as the image quality loss function, and the image quality loss function PSNR h can be referred to below As shown in formula (3):
  • I represent high-definition image data
  • bits may represent the precision, which may be a precision of 16 binary digits or a precision of 32 binary digits.
  • Step S104 call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data, obtain the sample enhanced face image, and fuse the sample enhanced face image with the first sample enhanced image data , obtaining the second sample enhanced image data, and generating a face loss function according to the sample enhanced face image and the face image in the high-definition image data;
  • the first sample enhanced image data obtained through the above-mentioned image quality enhancement network can be used as the input of the face enhancement network
  • the first sample enhanced image data can include a face image
  • the server can also call the image processing
  • the face enhancement network in the network performs face enhancement processing on the face image in the first sample enhanced image data to generate the second sample enhanced image data
  • the second sample enhanced image data is the first sample enhanced image
  • the face image in the data is the image data obtained after face enhancement processing, for details, please refer to the following description.
  • the face enhancement network can include a face detection network, a face enhancement sub-network, and a face fusion network.
  • the face enhancement sub-network can also include a color discrimination network and a texture discrimination network.
  • the server can call the face detection network to detect the first The detection frame where the face image in the sample enhanced image data is located can be referred to as a face detection frame, and the first sample enhanced image data can also be marked with a The label frame of the actual location of the face image can be called a face label frame.
  • the server can extract the face image contained in the face detection frame from the first sample enhanced image data to obtain the detected face image, and then, the server can call the face enhancement sub-network to detect
  • the face image (that is, the face image extracted from the first sample enhanced image data) is subjected to face enhancement processing, and the enhanced face image can be obtained.
  • the enhanced face image can be called It is a sample enhanced face image, and the sample enhanced face image is a face image obtained by performing face enhancement processing on the face image in the first sample enhanced image data.
  • the server may invoke the face fusion network to fuse the sample enhanced face image with the first sample enhanced image data, and the image data obtained by fusion may be called the second sample enhanced image data.
  • the server can generate a detection loss function through the above-mentioned face detection frame and face annotation frame. Due to the deviation between the actual locations, the detection loss function l r1 can be shown in the following formula (4):
  • J can be the face annotation box, It can be a face detection frame, It can represent the intersection area of the face annotation frame and the face detection frame,
  • represents the area of the face annotation frame, Indicates the area of the face detection frame.
  • the server can also extract the face image in the high-definition image data to obtain the face image in the high-definition image data, and the extracted face image in the high-definition image data can be called a high-definition face image.
  • the server can also generate a color loss function through the high-definition face image, the sample-enhanced face image, and the color discriminator, and the color loss function is used to characterize the color difference between the enhanced sample-enhanced face image and the color of the high-definition face image
  • the color discriminator can be used to judge the probability that the color of the sample enhanced face image is the color of the high-definition face image, and the probability is used to represent the color loss function.
  • the goal is to make the discriminated probability approach 0.5, so It indicates that the color discriminator cannot distinguish the color of the sample enhanced face image from the color of the high-definition face image at this time, and the expected effect is achieved at this time.
  • the server can respectively perform Gaussian blur on the high-definition face image and the sample-enhanced face image, and then determine the probability that the color of the sample-enhanced face image after Gaussian blur is the color of the high-definition face image after Gaussian blur, using the The probability characterizes the color loss function.
  • the server can also generate a texture loss function through the high-definition face image, the sample-enhanced face image, and the texture discriminator, and the texture loss function is used to characterize the relationship between the texture of the enhanced sample-enhanced face image and the texture of the high-definition face image
  • the texture discriminator can be used to judge the probability that the texture of the sample enhanced face image is the texture of the high-definition face image, and use this probability to characterize the texture loss function.
  • the goal is to make the discriminated probability approach 0.5, so It shows that the texture discriminator cannot distinguish the texture of the sample enhanced face image from the texture of the high-definition face image at this time, and the expected effect is achieved at this time.
  • the server may respectively grayscale the high-definition face image and the sample-enhanced face image, and then determine the probability that the texture of the gray-scaled sample-enhanced face image is the texture of the gray-scaled high-definition face image , using this probability to characterize the texture loss function.
  • the server can also enhance the feature value elements contained in the feature map of the face image and the feature value elements contained in the feature map of the high-definition face image through samples to generate a content loss function, which is used to characterize the first
  • a content loss function which is used to characterize the first
  • the content difference between the two-sample enhanced image data and the high-definition image data, the content loss function l r2 can be shown in the following formula (5):
  • R can be a high-definition face image, It can be a sample enhanced face image , and the value of t can be determined according to the actual application scenario.
  • the width of the feature map of the t-th feature layer, c t represents the channel number of the t-th feature layer, s 2 corresponds to the height of the feature map, the maximum value of s 2 is equal to h t , j 2 corresponds to the width of the feature map, The maximum value of j 2 is equal to w t , k 2 corresponds to the channel of the feature map, and the maximum value of k 2 is equal to c t .
  • the value at each feature point in the feature map can be called a feature value element.
  • s 2 , j 2 , and k 2 can be understood as indexes for feature value elements in the feature map.
  • represents an operation, that is, the operation of extracting the eigenvalue element at the corresponding position from the feature map.
  • the face loss function can be the sum of the above detection loss function l r1 , color loss function, texture loss function and content loss function l r2 .
  • the detection loss function can be obtained through the face detection network, and the color loss function, texture loss function and content loss function can be obtained through the face enhancement sub-network.
  • the face loss function can be the above detection loss function, color loss function, the sum of the texture loss function and the content loss function.
  • the face enhancement network can also use face images that are not extracted from the first sample enhanced image data.
  • the face images other than the face image are trained by combining the face images to be trained in the two face images (one is the training set (which can be any training set containing the face image to be optimized) (that is, not from the first One is the face image in the first sample enhanced image data obtained by the image quality enhancement network) to train the face enhancement network, and its training effect can be more good.
  • Step S105 calling the image processing network to perform image sharpening processing on the second sample enhanced image data to obtain sample sharpened image data, and generate a sharpening loss function according to the sample sharpened image data and the high-definition image data;
  • the above-mentioned second sample enhanced image data can be used as the input of the sharpening network, and the server can call the sharpening network in the image processing network to perform image sharpening processing on the second sample enhanced image data to obtain the second sample enhanced image data.
  • the image data obtained by performing image sharpening processing on the image data may be referred to as sample sharpened image data.
  • the server may generate a loss function of the sharpening network by using sample sharpened image data and high-definition image data, and the loss function may be referred to as a sharpening loss function.
  • the sharpening loss function can contain two parts, one is the loss function from the objective angle, and the other is the loss function from the sensory angle.
  • the loss function from the objective angle can be called the quality loss function, and the loss function from the sensory angle can be called is the perceptual loss function.
  • the quality loss function may be a peak signal-to-noise ratio PSNR between the sample sharpened image data and the high-definition image data
  • the perceptual loss function may be obtained by perceptual similarity between the sample sharpened image data and the high-definition image data, wherein, The perceptual similarity can be obtained by the perceived loss value (Learned Perceptual Image Patch Similarity, LPIPS) between the sample sharpened image data and the high-definition image data.
  • the perceptual loss value LPIPS indicates that the sensory level is less similar between the sample sharpened image data and the high-definition image data (that is, the greater the difference), therefore, use the perceptual loss
  • the goal of the function is to minimize the perceptual loss between sample sharpened image data and HD image data.
  • the sharpening loss function can be the sum of the above-mentioned quality loss function and perceptual loss function.
  • FIG. 6 is a schematic diagram of a scenario of obtaining a loss function provided by the present application.
  • the server can input sample image pairs into the super-resolution network, and generate sample super-resolution image data corresponding to low-resolution image data in the super-resolution network, and generate super-resolution loss through sample super-resolution image data and high-definition image data function.
  • the server can also continue to input the sample super-resolution image data into the image quality enhancement network, generate the first sample enhanced image data in the image quality enhancement network, and generate the image quality loss function through the first sample enhanced image data and the high-definition image data.
  • the server can also input the first sample enhanced image data into the face enhancement network, and generate a sample enhanced face image obtained after enhancing the face image in the first sample enhanced image data in the face enhancement network, through This sample enhances the face image and the high-definition face image in the high-definition image data can generate a face loss function.
  • the sample enhanced face image and the first sample enhanced image data can also be fused to obtain the second sample enhanced image data.
  • the server may also input the second sample enhanced image data into the sharpening network, and perform sharpening processing on the second sample enhanced image data in the sharpening network to obtain the sample sharpened image data, and the sample sharpened image data and the high-definition image data are obtained.
  • a sharpening loss function can be generated.
  • Step S106 updating the network parameters of the image processing network according to the super-resolution loss function, image quality loss function, face loss function and sharpening loss function to obtain a trained image processing network;
  • the network parameters of the image processing network may include the network parameters of the super-resolution network, the network parameters of the image quality enhancement network, the network parameters of the face enhancement network, and the network parameters of the sharpening network.
  • the server can pass the above-mentioned super-resolution loss function , image quality loss function, face loss function and sharpening loss function to update the network parameters of the image processing network, wherein the image processing is performed through the super-resolution loss function, image quality loss function, face loss function and sharpening loss function
  • There are two ways to update the network parameters of the network as follows:
  • the order of each network in the image processing network from front to back is super-resolution network, image quality enhancement network, face enhancement network, and sharpening network.
  • the face enhancement network includes face detection network, face Enhanced sub-network and face fusion network are three networks that process face images.
  • the loss function can be passed forward in the image processing network, and then the network parameters of the passed network (such as super-resolution network, image quality enhancement network, face enhancement network and sharpening network) are updated, therefore, the The first way to update the network parameters of the network can be: add multiple loss functions passed to a certain network, and then directly update the network parameters of the network through the added loss function ;
  • the second way to update the network parameters of the network may be: iteratively updating the network parameters of the network through multiple loss functions passed to a certain network in sequence. It can be understood that the effect of updating the network parameters achieved by the two ways of updating the network parameters of the network is the same.
  • the forward transfer of the super-resolution loss function can only be passed to the super-resolution network itself; the forward transfer of the image quality loss function can be passed to the image quality enhancement network and the super-resolution network; the forward transfer of the face loss function can be passed to the human face Enhanced network, image quality enhanced network, and super-resolution network.
  • the face enhanced network includes face detection network, face enhanced sub-network, and face fusion network sequentially from front to back, it can be It is understood that within the face enhancement network, for the transfer of the face loss function, the loss function passed to the face detection network can be the face loss function, and the loss function passed to the face enhancement sub-network can be the face loss The color loss function, texture loss function and content loss function in the function (that is, the loss function generated by the face enhancement sub-network itself), and the face loss function cannot be passed back to the face fusion network; the sharpening loss function forward Passing can be passed to sharpening network, face enhancement network, image quality enhancement network and super-resolution network.
  • the process can be: the above-mentioned super-resolution loss function, image quality loss function, face loss function and sharpening loss function can be Adding, and then updating the network parameters of the super-resolution network through the added loss function, a trained super-resolution network can be obtained; the above-mentioned image quality loss function, face loss function and sharpening loss function can be added together, and then passed The added loss function updates the network parameters of the image quality enhancement network to obtain a trained image quality enhancement network; The added loss function updates the network parameters of the face detection network.
  • the network parameters of the network can also update the network parameters of the face fusion network through the sharpening loss function, and finally a trained face enhancement network can be obtained; the network parameters of the sharpening network can be updated through the sharpening loss function to obtain a trained Sharpen the network.
  • the process can be as follows: the above-mentioned super-resolution loss function, image quality loss function, face loss function and sharpening loss function can be used in turn.
  • the network parameters of the super-resolution network are updated iteratively. After the iterative update, the trained super-resolution network can be obtained. For example, the network parameters of the super-resolution network can be updated through the super-resolution loss function first, and then the image quality loss function can be used to update the network parameters.
  • the network parameters of the super-resolution network updated by the super-resolution loss function are updated, and then the network parameters of the super-resolution network updated by the image quality loss function can be updated by the face loss function, and then the sharpness can be used.
  • the network parameters of the super-resolution network updated by the face loss function are updated using the optimization loss function.
  • the network parameters of the image quality enhancement network can be iteratively updated sequentially through the above image quality loss function, face loss function and sharpening loss function, and a trained image quality enhancement network can be obtained after iterative updating.
  • the network parameters of the face detection network can be iteratively updated through the sharpening loss function and the face loss function in turn.
  • texture loss function, and content loss function iteratively update the network parameters of the face enhancement subnetwork in turn.
  • the trained face enhancement subnetwork can be obtained, and the network parameters of the face fusion network can be adjusted by the sharpening loss function.
  • the network parameters of the sharpening network can be updated through the sharpening loss function to obtain a trained sharpening network.
  • the trained image processing network can be generated (that is, obtained) through the above-mentioned trained super-resolution network, trained image quality enhancement network, trained face enhancement network and trained sharpening network.
  • the trained image processing network can be used to comprehensively optimize the video data or image data. For details, please refer to the specific description in the corresponding embodiment in FIG. 7 below.
  • each network including super-resolution network, image quality enhancement network, face enhancement network and sharpening network
  • each network can be realized under the premise of ensuring the training effect of each network itself.
  • the training effects of mutual promotion and integration between each other make the trained image processing network more accurate, so the image processing network obtained through training can achieve more accurate and excellent optimization effects on image data.
  • this application when training the image processing network, this application provides an end-to-end (for example, from the end of the super-resolution network to the end of the sharpening network as a whole) multi-task (training a network can be a training task) joint training framework
  • the joint training framework is a cascading framework, such as the framework of connecting the super-resolution network to the sharpening network through the image quality enhancement network and the face enhancement network.
  • the codec module randomly generates data (such as the above-mentioned method of encoding and decoding sample video data through the target bit rate) sample image pair), simulated motion blur (as mentioned above by performing average fusion processing on the target image frame and the adjacent image frames of the target image frame to obtain the sample image pair) and compression noise (as mentioned above by performing high-definition image data Distortion format conversion method to obtain sample image pairs), generate widely distributed training data, so that the image processing network trained by the training data can be applied to a wider range of image optimization scenarios, and has stronger robustness.
  • This application can obtain a sample image pair; the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and high-definition image data have the same image content; call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain the sample super-resolution image data, and generate a super-resolution loss function based on the sample super-resolution image data and high-definition image data; call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, and obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and high-definition image data; call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data, and obtain the sample enhanced face image , and fuse the sample enhanced face image with the first sample enhanced image data to obtain the second sample enhanced image data, and generate a face loss function according to the sample enhanced face image and the face image in the high-definition image data; call
  • the method proposed in this application can perform multi-task (such as including super-resolution task, image quality enhancement task, face enhancement task and sharpening task) for the image processing network to be trained in an interrelated and integrated manner, so that the trained When the image processing network performs multi-task optimization on the image at the same time, there will be no conflict between the various tasks, and the optimization effect is better.
  • multi-task such as including super-resolution task, image quality enhancement task, face enhancement task and sharpening task
  • FIG. 7 is a schematic flowchart of an image processing method provided in the present application.
  • the embodiment of the present application describes the application process of the trained image processing network.
  • the content described in the embodiment of the present application can be combined with the content described in the corresponding application embodiment in Figure 3 above.
  • the execution in the embodiment of the present application A principal can also be a server. As shown in Figure 7, the method may include:
  • Step S201 calling the trained image processing network to obtain super-resolution image data corresponding to the initial image data; the resolution of the super-resolution image data is greater than or equal to the target resolution;
  • the super-resolution network to be called here is the super-resolution network in the above-mentioned trained image processing network, that is, the super-resolution network called here is a trained super-resolution network.
  • the server may obtain initial image data, and the initial image data may be any image that needs to be optimized. Since the optimization is performed on high-resolution image data, the optimization effect can be better. Therefore, the server can call the super-resolution network to detect the resolution of the initial image data. If the resolution of the initial image data is detected to be smaller than the target resolution, It indicates that the initial image data is a low-resolution image data, so the super-resolution network can be called to improve the resolution of the initial image data, such as calling the super-resolution network to adjust the resolution of the initial image data to the target resolution (the target resolution It can be a high resolution set by itself according to the actual application scene), and then the initial image data adjusted to the target resolution can be used as the super-resolution image data.
  • the target resolution the target resolution It can be a high resolution set by itself according to the actual application scene
  • the resolution of the initial image data is greater than or equal to the target resolution, it indicates that the initial image data itself is a high-resolution image data. Therefore, there is no need to adjust the resolution of the initial image data, and the initial image data as super-resolution image data.
  • Step S202 calling the trained image processing network to perform image quality enhancement processing on the super-resolution image data to obtain the first enhanced image data;
  • the image quality enhancement network to be called here is the image quality enhancement network in the above-mentioned trained image processing network, that is, the image quality enhancement network called here is a trained image quality enhancement network.
  • the server can also call the image quality enhancement network to optimize the image quality of the super-resolution image data as a whole (that is, perform image quality enhancement processing on the super-resolution image data) , the image data obtained by optimizing the image quality of the super-resolution image data through the image quality enhancement network may be used as the first enhanced image data.
  • Step S203 call the trained image processing network to obtain the second enhanced image data corresponding to the first enhanced image data; if the first enhanced image data contains a face image, the second enhanced image data is the first enhanced image data The image data obtained after the face image of the face is enhanced;
  • the face enhancement network to be called here is the face enhancement network in the above-mentioned trained image processing network, that is, the face enhancement network called here is a trained face enhancement network.
  • the face enhancement network called includes a face detection network, a face enhancement subnetwork and a face fusion network. Since the face in the image data is usually a relatively important element, the server can also pass the face detection network Perform face detection on the first enhanced image data, that is, detect whether the first enhanced image data contains a human face image.
  • the first enhanced image data may be directly used as the second enhanced image data.
  • the human face enhancement sub-network can be called to optimize the human face image in the first enhanced image data, and then the first enhanced image data that is optimized for the human face image can be used as Second enhanced image data.
  • the process of invoking the human face enhancement sub-network to optimize the human face image in the first enhanced image data may be:
  • the server can call the face detection network to extract the human face image detected in the first enhanced image data, so as to obtain the human face image in the first enhanced image data, which can be called extraction. face image.
  • the server can call the face enhancement sub-network to perform face enhancement processing on the extracted face image, that is, perform face optimization on the extracted face image, and the extracted face image that is subjected to face optimization can be called enhanced face image.
  • the server can also call the face fusion network to generate a face fusion mask (ie fusion Mask), which is used for weighted fusion of the enhanced face image and the first enhanced image data to obtain the second 2.
  • Enhance image data The seamless fusion between the enhanced human face image and the first enhanced image data can be realized through the adaptive human face fusion mask.
  • the enhanced face image can be represented as a
  • the face fusion mask can be represented as b
  • the first enhanced image data can be represented as c
  • the second enhanced image data can be b*a+(1-b)* c.
  • the process of enhancing the face image in the first enhanced image data to obtain the second enhanced image data is the same as the above-mentioned process of enhancing the face image in the first sample enhanced image data to obtain the second sample enhanced image data
  • the process is the same.
  • FIG. 8 is a schematic diagram of a scene for optimizing a human face provided by the present application.
  • the server can input the first enhanced image data obtained through the image quality enhancement network into the trained face detection network, and the face image in the first enhanced image data can be extracted through the face detection network.
  • the extracted face image can be obtained, and then the extracted face image is input into the trained face enhancement sub-network, through which the face enhancement sub-network can be used to perform face enhancement on the extracted face image to obtain the enhanced face image.
  • the face enhancement sub-network can be used to perform face enhancement on the extracted face image to obtain the enhanced face image.
  • inputting the enhanced face image and the first enhanced image data into the face fusion network can realize the image fusion of the enhanced face image and the first enhanced image data, and finally obtain the second enhanced image data.
  • Step S204 call the trained image processing network to perform image sharpening processing on the second enhanced image data, obtain sharpened image data, and output the sharpened image data;
  • the sharpening network to be called here is the sharpening network in the above-mentioned trained image processing network, that is, the sharpening network called here is the trained sharpening network.
  • the details in the second enhanced image data can be enhanced through the sharpening network to make the details clearer.
  • the server can call the sharpening network to extract the high-frequency components in the second enhanced image data.
  • the second enhanced image data can be Gaussian blurred and then subtracted from the original second enhanced image data to obtain the second enhanced image data.
  • the high-frequency image information (that is, high-frequency components) in .
  • the server can also call the sharpening network to generate a sharpening mask of the second enhanced image data, and the sharpening mask is used to indicate details in the second enhanced image data that need to be sharpened and enhanced, and the server can use the sharpening mask Dot multiplication with the second enhanced image data to obtain sharpened image information (ie detail components) in the second enhanced image data.
  • the server may use the convolutional layer (eg, 1*1 convolutional layer) and Prelu (activation layer) included in the sharpening network to generate the sharpening mask of the second enhanced image data.
  • the weighted weight for the above-mentioned high-frequency image information (which can be called the first weighted weight), the weighted weight for the above-mentioned sharpened image information (which can be called the second weighted weight) and the weighted weight for the above-mentioned high-frequency image information can also be generated through the sharpening network.
  • the weighted weight of the second enhanced image data (can be referred to as the third weighted weight), and then through the first weighted weight, the second weighted weight and the third weighted weight to the high-frequency image information, sharpened image information and the second enhanced image
  • the corresponding weighted summation of the data can obtain the sharpened image data.
  • the product between the first weighted weight and the high-frequency image information may be used as the first weighted result
  • the product between the second weighted weight and the sharpened image information may be used as the second weighted result
  • the third weighted weight and the first The product of the two enhanced image data is used as the third weighted result, and then the first weighted result, the second weighted result and the third weighted result are summed to obtain the sharpened image data.
  • the sharpened image data is the final image data obtained after optimizing the initial image data, and the server may output the sharpened image data in the image processing network.
  • the process of enhancing the second enhanced image data to obtain the sharpened image data is the same as the above-mentioned process of enhancing the second sample enhanced image data to obtain the sample sharpened image data.
  • FIG. 9 is a schematic diagram of an image optimization scene provided by the present application.
  • the server can input the initial image data into the trained image processing network, and the super-resolution image data corresponding to the initial image data can be obtained through the super-resolution network in the image processing network. Then, the image quality of the super-resolution image data can be enhanced through the image quality enhancement network to obtain the first enhanced image data. Then, the face image obtained in the first enhanced image data can be obtained through the face detection network in the face enhancement network (that is, the face image is extracted), and the face image can be extracted by the face enhancement sub-network.
  • Face enhancement is performed to obtain an enhanced face image
  • the enhanced face image and the first enhanced image data can be fused through the face fusion network to obtain the second enhanced image data.
  • the second enhanced image data can be sharpened through the sharpening network to obtain sharpened image data, which can be output.
  • the above-mentioned initial image data may also be any image frame among a plurality of image frames obtained by dividing the video data into frames
  • the server may be the background server of the application client
  • the video data may be used for sending the application client Pushed data. Therefore, the server can use each image frame included in the video data as initial image data, and use the above process to obtain the sharpened image data corresponding to each image frame included in the video data, and then use each image frame to The corresponding sharpened image data can generate optimized video data of the video data, and the optimized video data is video data obtained by optimizing each image frame in the video data.
  • the server can push the optimized video data to the application client, and the application client can output the optimized video data on the client interface for users to browse and view.
  • FIG. 10 is a schematic diagram of a data push scenario provided by the present application.
  • the server can divide the video data into frames to obtain multiple image frames (such as image frame 1 to image frame n) contained in the video data, and then, the server can use the above-mentioned trained image processing network to divide The obtained image frames are optimized to obtain sharpened image data corresponding to each image frame (for example, sharpened image data 1 to sharpened image data n).
  • the optimized video data of the video data can be obtained through the sharpened image data corresponding to each image frame, and the server can push the optimized video data to the application client.
  • the trained super-resolution network, image quality enhancement network, face enhancement network and sharpening network can optimize the overall image quality, color, texture and clarity of the image data.
  • Comprehensive enhancement there will be no conflicts, and the image data can be improved in many aspects, and the trained face enhancement network has a special enhancement effect on the local face in the image data, thus achieving global enhancement and local enhancement. enhanced.
  • FIG. 11 is a schematic structural diagram of an image processing network training device provided by the present application.
  • the image processing network training device can be a computer-readable instruction (including program code) running in a computer device, for example, the image processing network training device is an application software, and the image processing network training device can be used to execute Corresponding steps in the method provided in the embodiment of the present application.
  • the training device 1 of the image processing network may include: a sample acquisition module 11 , a call module 12 , and an update module 13 .
  • the sample acquisition module 11 is used to acquire a sample image pair; the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and the high-definition image data have the same image content;
  • the calling module 12 is used to call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain sample super-resolution image data, and generate a super-resolution loss function according to the sample super-resolution image data and high-definition image data;
  • the calling module 12 is used to call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and the high-definition image data;
  • the calling module 12 is used to call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data to obtain the sample enhanced face image, and combine the sample enhanced face image with the first sample enhanced image
  • the data is fused to obtain the second sample enhanced image data, and a face loss function is generated according to the sample enhanced face image and the face image in the high-definition image data;
  • the calling module 12 is used to call the image processing network to perform image sharpening processing on the second sample enhanced image data, obtain the sample sharpened image data, and generate a sharpening loss function according to the sample sharpened image data and the high-definition image data;
  • the update module 13 is used to update the network parameters of the image processing network according to the super-resolution loss function, image quality loss function, face loss function and sharpening loss function, so as to obtain a trained image processing network.
  • the image processing network includes a super-resolution network, an image quality enhancement network, a face enhancement network and a sharpening network;
  • the sample super-resolution image data is obtained according to the super-resolution network, and the first sample enhanced image data is obtained according to the image quality enhancement network , the second sample enhanced image data is obtained according to the face enhancement network, and the sample sharpened image data is obtained according to the sharpening network;
  • the update module 13 updates the network parameters of the image processing network according to the super-resolution loss function, the image quality loss function, the face loss function and the sharpening loss function, and obtains a trained image processing network, including:
  • the method of calling module 12 to generate a super-resolution loss function according to sample super-resolution image data and high-definition image data includes:
  • a super-resolution loss function is generated.
  • the image processing network includes a face enhancement network
  • the second sample enhanced image data is obtained according to the face enhancement network
  • the face enhancement network includes a face detection network, a color discrimination network and a texture discrimination network
  • the first sample enhancement image The face image in the data has a face detection frame generated by a face detection network, and a face annotation frame for indicating the actual face position;
  • Call module 12 to enhance the face image according to the sample and the face image in the high-definition image data to generate the face loss function including:
  • the method of calling module 12 to generate a sharpening loss function according to sample sharpening image data and high-definition image data includes:
  • the manner in which the sample acquisition module 11 acquires the sample image pair includes:
  • the target code rate is used to encode and decode the sample video data to obtain the low-quality video data corresponding to the sample video data;
  • the image frame quality of the low-quality video data is lower than the image frame quality of the sample video data, and the low-quality video data includes each The low-quality image frames corresponding to the sample image frames respectively, and the target code rate is lower than the code rate threshold;
  • a sample image pair is constructed according to each sample image frame and the corresponding low-quality image frame.
  • the manner in which the sample acquisition module 11 acquires the sample image pair includes:
  • the manner in which the sample acquisition module 11 acquires the sample image pair includes:
  • the way the sample acquisition module acquires the sample image pair includes:
  • the manner in which the sample acquisition module 11 acquires the sample image pair includes:
  • sample low-definition video data and input the sample low-definition video data into the noise learning network; the definition of the sample low-definition video data is lower than the definition threshold;
  • Noise-based learning network learning sample low-definition video data noise data
  • the noise data is fused in the high-definition image data to obtain low-definition image data.
  • the steps involved in the image processing network training method shown in FIG. 3 can be executed by each module in the image processing network training device 1 shown in FIG. 11 .
  • step S101 shown in Fig. 3 can be carried out by sample acquisition module 11 among Fig. 11, and step S102-step S105 shown in Fig. 3 can be carried out by calling module 12 among Fig. 11; Step S106 can be performed by the updating module 13 in FIG. 11 .
  • This application can obtain a sample image pair; the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and high-definition image data have the same image content; call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain the sample super-resolution image data, and generate a super-resolution loss function based on the sample super-resolution image data and high-definition image data; call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, and obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and high-definition image data; call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data, and obtain the sample enhanced face image , and fuse the sample enhanced face image with the first sample enhanced image data to obtain the second sample enhanced image data, and generate a face loss function according to the sample enhanced face image and the face image in the high-definition image data; call
  • the device proposed in this application can perform multi-task (such as super-resolution task, image quality enhancement task, face enhancement task, and sharpening task) on the image processing network to train in an interrelated and integrated manner, so that the trained When the image processing network performs multi-task optimization on the image at the same time, there will be no conflict between the various tasks, and the optimization effect is better.
  • multi-task such as super-resolution task, image quality enhancement task, face enhancement task, and sharpening task
  • each module in the training device 1 of the image processing network shown in FIG. Splitting into multiple functionally smaller subunits can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned modules are divided based on logical functions.
  • the functions of one module can also be realized by multiple units, or the functions of multiple modules can be realized by one unit.
  • the image processing network training device 1 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented cooperatively by multiple units.
  • a general-purpose computer device such as a computer including processing elements such as a central processing unit (CPU), a random access storage medium (RAM), and a read-only storage medium (ROM) and storage elements.
  • Computer-readable instructions capable of executing the steps involved in the corresponding method as shown in FIG. 3 to construct the training device 1 of the image processing network as shown in FIG. Example image processing network training method.
  • the above-mentioned computer-readable instructions may be recorded in, for example, a computer-readable recording medium, loaded into the above-mentioned computing device via the computer-readable recording medium, and run there.
  • FIG. 12 is a schematic structural diagram of an image processing device provided in the present application.
  • the image processing device can be a computer-readable instruction (including program code) running in a computer device, for example, the image processing device is an application software, and the image processing device can be used to execute the method in the embodiment of the present application. corresponding steps.
  • the image processing device 2 may include: a super-resolution call module 21, an image quality enhancement module 22, a human face enhancement module 23 and a sharpening module 24;
  • the super-resolution calling module 21 is used to call the trained image processing network to obtain the super-resolution image data corresponding to the initial image data; the resolution of the super-resolution image data is greater than or equal to the target resolution;
  • the image quality enhancement module 22 is used to call the trained image processing network to perform image quality enhancement processing on the super-resolution image data to obtain the first enhanced image data;
  • Face enhancement module 23 is used to call the trained image processing network to obtain the second enhanced image data corresponding to the first enhanced image data; if the first enhanced image data contains a human face image, then the second enhanced image data is the corresponding Image data obtained after the face image in the enhanced image data is enhanced;
  • the sharpening module 24 is configured to call the trained image processing network to perform image sharpening processing on the second enhanced image data, obtain sharpened image data, and output the sharpened image data.
  • the trained image processing network includes a super-resolution network
  • the super-resolution calling module 21 calls the trained image processing network to obtain the super-resolution image data corresponding to the initial image data, including:
  • the initial image data is determined as super-resolution image data
  • the super-resolution network is called to adjust the resolution of the initial image data to the target resolution to obtain the super-resolution image data.
  • the trained image processing network includes a face enhancement network
  • the face enhancement module 23 invokes the trained image processing network to obtain the second enhanced image data corresponding to the first enhanced image data, including:
  • the first enhanced image data does not contain a human face image, then determine the first enhanced image data as the second enhanced image data;
  • the face enhancement network is invoked to perform face enhancement processing on the face image in the first enhanced image data to obtain second enhanced image data.
  • the face enhancement network includes a face detection network, a face enhancement sub-network and a face fusion network;
  • the face enhancement module 23 invokes the face enhancement network to perform face enhancement processing on the face image in the first enhanced image data to obtain the second enhanced image data, including:
  • the trained image processing network includes a sharpening network
  • the sharpening module 24 invokes the trained image processing network to perform image sharpening processing on the second enhanced image data, and obtains a manner of sharpening the image data, including:
  • the high-frequency image information, the sharpened image information, and the second enhanced image data are weighted and summed according to the first weighted weight, the second weighted weight, and the third weighted weight to obtain sharpened image data.
  • the initial image data is any image frame in a plurality of image frames obtained by framing the video data; the above-mentioned device 2 is also used for:
  • step S201 shown in FIG. 7 can be performed by the super-resolution calling module 21 in FIG. 12
  • step S202 shown in FIG. 7 can be performed by the image quality enhancement module 22 in FIG. 12
  • step S203 can be performed by the face enhancement module 23 in FIG. 12
  • step S204 shown in FIG. 7 can be performed by the sharpening module 24 in FIG. 12 .
  • This application can obtain a sample image pair; the sample image pair includes low-definition image data and high-definition image data, and the low-definition image data and high-definition image data have the same image content; call the image processing network to adjust the resolution of the low-definition image data to the target resolution, obtain the sample super-resolution image data, and generate a super-resolution loss function based on the sample super-resolution image data and high-definition image data; call the image processing network to perform image quality enhancement processing on the sample super-resolution image data, and obtain the first sample enhanced image data, and generate an image quality loss function according to the first sample enhanced image data and high-definition image data; call the image processing network to perform face enhancement processing on the face image in the first sample enhanced image data, and obtain the sample enhanced face image , and fuse the sample enhanced face image with the first sample enhanced image data to obtain the second sample enhanced image data, and generate a face loss function according to the sample enhanced face image and the face image in the high-definition image data; call
  • the device proposed in this application can perform multi-task (such as super-resolution task, image quality enhancement task, face enhancement task, and sharpening task) on the image processing network to train in an interrelated and integrated manner, so that the trained When the image processing network performs multi-task optimization on the image at the same time, there will be no conflict between the various tasks, and the optimization effect is better.
  • multi-task such as super-resolution task, image quality enhancement task, face enhancement task, and sharpening task
  • each module in the image processing device 2 shown in FIG. A plurality of functionally smaller subunits can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned modules are divided based on logical functions.
  • the functions of one module can also be realized by multiple units, or the functions of multiple modules can be realized by one unit.
  • the image processing device 2 may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented cooperatively by multiple units.
  • a general-purpose computer device such as a computer including processing elements such as a central processing unit (CPU), a random access storage medium (RAM), and a read-only storage medium (ROM) and storage elements.
  • Computer-readable instructions capable of executing the steps involved in the corresponding method as shown in FIG. 7 to construct the image processing device 2 as shown in FIG. Handles the training method of the network.
  • the above-mentioned computer-readable instructions may be recorded in, for example, a computer-readable recording medium, loaded into the above-mentioned computing device via the computer-readable recording medium, and run there.
  • the computer device 1000 may include: a processor 1001 , a network interface 1004 and a memory 1005 .
  • the computer device 1000 may further include: a user interface 1003 and at least one communication bus 1002 .
  • the communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 can be a high-speed RAM memory, or a non-volatile memory, such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 .
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions. Execution of the computer-readable instructions can realize the image processing network provided by the implementation of this application. At least one of training methods and image processing methods.
  • the network interface 1004 can provide a network communication function; the user interface 1003 is mainly used to provide an input interface for the user; Instructions to implement the training method of the image processing network provided by the implementation of this application.
  • the processor 1001 may also be configured to invoke computer-readable instructions stored in the memory 1005 to implement the image processing method provided in the embodiment of the present application.
  • the computer device 1000 described in the embodiment of the present application can execute the description of the above-mentioned image processing network training method in the embodiment corresponding to FIG. 3 above, and can also execute the above-mentioned image processing method in the embodiment corresponding to FIG. 7 above. description and will not be repeated here. In addition, the description of the beneficial effect of adopting the same method will not be repeated here.
  • the present application also provides a computer-readable storage medium, and the computer-readable storage medium is stored in the computer-readable storage medium executed by the aforementioned image processing network training device 1 and image processing device 2.
  • Readable instructions when the processor executes the program instructions, it can execute the description of the training method of the image processing network in the embodiment corresponding to Figure 3 above and the description of the image processing method in the embodiment corresponding to Figure 7 above, therefore, here will No further details will be given.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the above-mentioned program instructions may be deployed to execute on one computer device, or deployed to be executed on multiple computer devices located at one location, or distributed across multiple locations and interconnected by a communication network Executed on the Internet, multiple computer devices distributed in multiple locations and interconnected through a communication network can form a blockchain network.
  • the above-mentioned computer-readable storage medium may be the image processing network training device provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned computer equipment, such as a hard disk or a memory of the computer equipment.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, Flash card (flash card), etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer-readable instructions and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the present application provides a computer program product or computer program, the computer program product or computer program includes computer readable instructions, and the computer readable instructions are stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device executes the description of the above-mentioned image processing network training method in the embodiment corresponding to FIG. 3
  • FIG. 7 above corresponds to the description of the image processing method in the embodiment, so details will not be repeated here.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the technical details not disclosed in the embodiments of the computer-readable storage medium involved in the present application please refer to the description of the method embodiments of the present application.
  • computer-readable instructions can be used to implement each process and process of the method flow charts and/or structural diagrams. /or blocks, and combinations of processes and/or blocks in flowcharts and/or block diagrams.
  • These computer readable instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that instructions executed by the processor of the computer or other programmable data processing equipment produce Means for realizing the functions specified in one or more steps of the flowchart and/or one or more blocks of the structural diagram.
  • These computer-readable instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more steps of the flow chart and/or one or more blocks of the structural schematic diagram.
  • These computer-readable instructions may also be loaded into a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process for execution on the computer or other programmable device
  • the instructions provide steps for realizing the functions specified in one or more steps of the flowchart and/or one or more blocks in the structural schematic diagram.

Abstract

一种图像处理网络的训练方法,包括:调用图像处理网络获取低清图像数据对应的样本超分图像数据,结合高清图像数据生成超分损失函数(S102);获取样本超分图像数据对应的第一样本增强图像数据,结合高清图像数据生成画质损失函数(S103);获取第一样本增强图像数据对应的第二样本增强图像数据,结合高清图像数据生成人脸损失函数(S104);获取第二样本增强图像数据对应的样本锐化图像数据,结合高清图像数据生成锐化损失函数(S105);根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数(S106)。

Description

图像处理网络的训练方法、装置、计算机设备和存储介质
本申请要求于2021年10月12日提交中国专利局,申请号为202111188444.9,申请名称为“图像处理网络的训练方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理的技术领域,尤其涉及一种图像处理网络的训练方法、装置、计算机设备和存储介质。
背景技术
随着计算机网络的不断发展,需要对图像进行优化的场景越来越多,如对用户的某张照片进行优化,或者对视频数据中的图像帧进行优化等。而对图像进行优化则可以通过训练图像模型来进行优化。
相关技术中,在对图像模型进行训练时,可以对具有不同优化任务的多个图像模型分别进行训练,进而通过训练得到的多个图像模型依次对图像进行叠加优化即可。但是,通过多个图像模型对图像进行优化时,一个图像模型可能会对另一个图像模型具有反向优化的作用,从而导致各个图像模型之间优化的效果会被相互削弱,进而导致所训练得到的图像模型对图像进行优化的效果差。
发明内容
本申请一方面提供了一种图像处理网络的训练方法,该方法包括:
获取样本图像对;样本图像对包含低清图像数据和高清图像数据;低清图像数据与高清图像数据具有相同的图像内容;
调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;
调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;
调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;
调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;
根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。
一方面提供了一种图像处理方法,该方法包括:
调用训练好的图像处理网络获取初始图像数据对应的超分图像数据;超分图像数据的分辨率大于或等于目标分辨率;
调用训练好的图像处理网络对超分图像数据进行画质增强处理,得到第一增强图像数据;
调用训练好的图像处理网络获取第一增强图像数据对应的第二增强图像数据;若第一 增强图像数据中包含人脸图像,则第二增强图像数据是对第一增强图像数据中的人脸图像进行人脸增强后所得到的图像数据;
调用训练好的图像处理网络对第二增强图像数据进行图像锐化处理,得到锐化图像数据,并输出锐化图像数据。
一方面提供了一种图像处理网络的训练装置,该装置包括:
样本获取模块,用于获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;
调用模块,用于调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;
调用模块,用于调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;
调用模块,用于调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;
调用模块,用于调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;
更新模块,用于根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。
一方面提供了一种图像处理装置,该装置包括:
超分调用模块,用于调用训练好的图像处理网络获取初始图像数据对应的超分图像数据;超分图像数据的分辨率大于或等于目标分辨率;
画质增强模块,用于调用训练好的图像处理网络对超分图像数据进行画质增强处理,得到第一增强图像数据;
人脸增强模块,用于调用训练好的图像处理网络获取第一增强图像数据对应的第二增强图像数据;若第一增强图像数据中包含人脸图像,则第二增强图像数据是对第一增强图像数据中的人脸图像进行人脸增强后所得到的图像数据;
锐化模块,用于调用训练好的图像处理网络对第二增强图像数据进行图像锐化处理,得到锐化图像数据,并输出锐化图像数据。
一方面提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行本申请中一方面中的方法。
一方面提供了一种非易失性的计算机可读存储介质,该计算机可读存储介质存储有计算机可读指令,该计算机可读指令被处理器执行时使该处理器执行上述一方面中的方法。
一方面提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述一方面等各种可选方式中提供的方法。
附图说明
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述 中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种网络架构的结构示意图;
图2是本申请提供的一种网络训练的场景示意图;
图3是本申请提供的一种图像处理网络的训练方法的流程示意图;
图4是本申请提供的一种编码解码网络的结构示意图;
图5是本申请提供的一种基本单元的结构示意图;
图6是本申请提供的一种获取损失函数的场景示意图;
图7是本申请提供的一种图像处理方法的流程示意图;
图8是本申请提供的一种对人脸进行优化的场景示意图;
图9是本申请提供的一种图像优化的场景示意图;
图10是本申请提供的一种数据推送的场景示意图;
图11是本申请提供的一种图像处理网络的训练装置的结构示意图;
图12是本申请提供的一种图像处理装置的结构示意图;
图13是本申请提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参见图1,图1是本申请实施例提供的一种网络架构的结构示意图。如图1所示,网络架构可以包括服务器200和终端设备集群,终端设备集群可以包括一个或者多个终端设备,这里将不对终端设备的数量进行限制。如图1所示,多个终端设备具体可以包括终端设备100a、终端设备101a、终端设备102a、…、终端设备103a;如图1所示,终端设备100a、终端设备101a、终端设备102a、…、终端设备103a均可以与服务器200进行网络连接,以便于每个终端设备可以通过网络连接与服务器200之间进行数据交互。
如图1所示的服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备可以是:智能手机、平板电脑、笔记本电脑、桌上型电脑、智能电视、车载终端等智能终端。下面以终端设备100a与服务器200之间的通信为例,进行本申请实施例的具体描述。
请一并参见图2,图2是本申请提供的一种网络训练的场景示意图。其中上述终端设备100a中可以具有应用客户端,服务器200可以是该应用客户端的后台服务器,服务器200可以向应用客户端推送视频数据,但是,服务器200在向应用客户端推送视频数据时,可以对该视频数据进行优化再推送给应用客户端,对视频数据进行优化可以指优化视频数据所包含的每一个图像帧。其中,服务器200可以通过训练好的图像处理网络来优化视频数据中的图像帧,该图像处理网络的训练过程请参见如下内容描述。
其中,需要进行训练的图像处理网络可以包含超分网络、画质增强网络、人脸增强网络和锐化网络。服务器200可以获取到样本图像对,该样本图像对可以包含低清图像数据和高清图像数据,服务器200可以将该样本图像对输入图像处理网络,首先,通过图像处理网络中的超分网络可以对低清图像数据进行超分处理(即提高低清图像数据的分辨率),得到样本超分图像数据,进而通过样本超分图像数据以及高清图像数据可以生成超分损失函数。
接着,通过图像处理网络中的画质增强网络可以对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,进而通过第一样本增强图像数据以及高清图像数据就可以生成画质损失函数。
再接着,通过图像处理网络中的人脸增强网络可以对第一样本增强图像数据中的人脸图像进行人脸增强,得到样本增强人脸图像,进而通过样本增强人脸图像和高清图像数据中的高清人脸图像就可以生成人脸损失函数,通过将样本增强人脸图像和第一样本增强图像数据进行融合还可以生成第二样本增强图像数据,具体可以参见下述图3对应实施例中的相应描述。
再接着,通过图像处理网络中的锐化网络可以对第二样本增强图像数据进行锐化处理,得到样本锐化图像数据,进而通过样本锐化图像数据和高清图像数据即可生成锐化损失函数。
服务器200可以在图像处理网络中将上述所生成的超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数往前传递,继而通过超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数一起对图像处理网络中所传递到的网络的网络参数进行更新,得到训练好的图像处理网络。训练好的图像处理网络就可以用于对图像进行优化,如对下述初始图像数据进行优化,该优化的具体过程可以参见下述图7对应实施例中的相关描述。
相关技术中,在对图像模型进行训练时,可以对具有不同优化任务(如用于提升图像分辨率的图像处理任务、用于增强图像画质的图像处理任务、用于提升人脸增强效果的图像处理任务,等等)的多个图像模型分别进行训练,进而通过各自独立的训练得到的多个图像模型,利用这些相互独立的多个图像模型依次对图像进行叠加优化。然而,不同的任务可能存在相互冲突的问题,即表现为图像通过一个图像模型提升相应的效果后再通过另一个图像模型处理反而整体效果变差,即这两个图像模型各自的图像处理任务发生了冲突,这种现象被称为破坏性干扰。例如,通过用于提升图像分辨率的图像模型对图像优化,可以提升图像的分辨率,通过用于增强图像画质的图像模型对图像优化,可以增强图像画质,但若通过用于增强图像画质的图像模型,对分辨率提升之后的图像进一步叠加处理,得到的图像反而出现了失真,导致图像整体效果较差,就说明这两个图像模型各自的图像处理任务存在冲突。
通过采用本申请实施例中所提供的模型训练方法,提供了从超分网络、画质增强网络、人脸增强网络到锐化网络的多任务联合训练框架,该联合训练框架通过将超分网络、画质增强网络、人脸增强网络到锐化网络依次级联,这样,在训练过程中,超分网络根据输入的样本图像对与自身网络的输出,获得超分损失函数,除了超分网络以外的其它网络,则根据上一网络的输出(即自身网络的输入)与自身网络的输出,依次获得画质损失函数、人脸损失函数以及锐化损失函数,各个损失函数包含了各自的网络参数,而又因为各个损 失函数可以在整个网络中往前进行传递,使得这些网络各自的网络参数之间可以相互约束、相互影响,进而对所传递到的网络(如超分网络、画质增强网络、人脸增强网络和锐化网络)的网络参数进行更新,可以实现对超分网络、画质增强网络、人脸增强网络和锐化网络之间相互关联、相互融合、相互促进的训练,使得训练后的超分网络、画质增强网络、人脸增强网络以及锐化网络不仅各自的训练效果好,而且在对图像一起进行叠加优化时不会发生冲突,使得训练好的整个网络对图像的叠加优化效果更好。
请参见图3,图3是本申请提供的一种图像处理网络的训练方法的流程示意图。本申请实施例中的执行主体可以是一个计算机设备或者多个计算机设备所构成的计算机设备集群。该计算机设备可以是服务器,也可以终端设备。因此,本申请实施例中的执行主体可以是服务器,也可以是终端设备,还可以是由服务器和终端设备共同构成。此处以本申请实施例中的执行主体是服务器为例进行说明。如图3所示,该方法可以包括:
步骤S101,获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;
本申请中,服务器可以获取到样本图像对,该样本图像对是用于对图像处理网络进行训练的图像对,一个样本图像对可以包含一个低清图像数据和一个与该低清图像数据相对应的高清图像数据,由于采用各个样本图像对训练图像处理网络的原理相同,因此,此处以通过一个样本图像对(下述统称样本图像对)训练图像处理网络的过程为例进行说明,请参见下述内容描述。
样本图像对所包含的低清图像数据和高清图像数据是具有相同图像内容但具有不同图像清晰度(可以简称清晰度)的图像,低清图像数据的清晰度要低于高清图像数据的清晰度。其中,低清图像数据的分辨率可以小于目标分辨率,高清图像数据的分辨率可以大于或者等于目标分辨率,目标分辨率可以根据实际应用场景进行设定,如目标分辨率可以是1920*1080。
本申请实施例获取样本图像对的方式可以为以下方式中的一种或多种:
在一个实施例中,服务器可以获取到高清的样本视频数据,该样本视频数据可以是用于获取样本图像对的视频数据。由于一个视频数据可以包含多个图像帧,因此该高清的样本视频数据可以指所包含的图像帧的清晰度大于清晰度阈值的视频数据,该清晰度阈值可以根据实际应用场景进行设定。此外,该样本视频数据还可以是所包含的图像帧的分辨率大于上述目标分辨率的视频数据。
因此,服务器可以对样本视频数据进行分帧,可以得到样本视频数据所包含的多个图像帧,可以将样本视频数据所包含的图像帧称之为样本图像帧。服务器还可以采用目标码率(该目标码率为低码率)对样本视频数据进行编解码处理(即编码之后再解码),可以将编解码后所得到的视频数据称之为低质视频数据。该低质视频数据的图像帧画质要低于样本视频数据的图像帧画质,即低质视频数据所包含的图像帧的清晰度低于样本视频数据所包含的图像帧的清晰度。其中,目标码率可以是低于码率阈值的码率,该码率阈值可以根据实际应用场景进行设定,目标码率可以是一个比较低的码率,因此,通过目标码率对样本视频数据进行编解码后,所得到的低质视频数据的画质会变差,使得低质视频数据所包含的图像帧的清晰度会变低。
可以将上述低质视频数据所包含的图像帧称之为低质图像帧,低质视频数据可以包含 每个样本图像帧分别对应的低质图像帧,一个样本图像帧对应一个低质图像帧。由于编解码没有改变图像帧的分辨率,因此此时所得到的低质图像帧其实是高分辨率的图像帧,因此,可以将低质视频数据中各个低质图像帧的分辨率调低,如调到低于目标分辨率,可以将分辨率调低后的低质图像帧称之为低分辨率图像帧,因此,服务器可以根据每个样本图像帧与对应的低质图像帧所属的低分辨率图像帧构建样本图像对,一个样本图像对可以包含一个样本图像帧以及该样本图像帧对应的一个低分辨率图像帧(即将该样本图像帧对应的低质图像帧的分辨率调小后所得到的图像帧),一个样本图像对所包含的样本图像帧就是一个高清图像数据,一个样本图像对所包含的低分辨率图像数据就是一个低清图像数据。因此,通过上述样本视频数据可以获取到多个样本图像对。
在一个实施例中,同样,服务器可以获取到样本视频数据,服务器可以对该样本视频数据进行分帧,得到样本视频数据所包含的多个样本图像帧,进而,服务器可以从该多个样本图像帧中选取目标图像帧作为上述高清图像数据,进而,服务器可以对该目标图像帧以及多个样本图像帧中该目标图像帧的相邻图像帧进行平均融合处理,可以将进行平均融合处理后所得到的图像帧称之为平均图像帧,进而服务器可以将该平均图像帧的分辨率调低(如调到低于目标分辨率),即可得到低清图像数据。其中,一个目标图像帧可以是样本视频数据所包含的多个图像帧中的任一个图像帧,一个目标图像帧可以是一个高清图像数据,目标图像帧可以有多个。目标图像帧的相邻图像帧可以包括多个样本图像帧中目标图像帧左边1个或多个图像帧以及目标图像帧右边1个或多个图像帧,目标图像帧的相邻图像帧的个数根据实际应用场景决定,对此不作限制。
在一个实施例中,服务器可以直接获取到高清图像数据,如该高清图像数据可以是从网页上下载下来的,或者也可以是本地的高清图像数据。因此,服务器可以对该高清图像数据进行高斯模糊处理,再将进行高斯模糊处理后的图像帧的分辨率调低(如调到低于目标分辨率),即可得到该高清图像数据对应的低清图像数据。其中,高斯模糊也称为高斯平滑。
在一个实施例中,服务器也可以直接获取到高清图像数据,服务器可以对该高清图像数据进行失真格式转换,再将进行失真格式转换的图像帧的分辨率调低(如调到低于目标分辨率),即可得到该高清图像数据对应的低清图像数据。例如,对高清图像数据进行失真格式转换可以理解为对高清图像数据进行压缩,压缩后所得到的图像帧的画质会低于高清图像数据的画质,如对高清图像数据进行失真格式转换可以指将高清图像数据的数据格式从png(一种无损压缩图像格式)转换为jpg(一种有损压缩图像格式)。
在一个实施例中,服务器可以直接获取到高清图像数据。服务器还可以获取到样本低质视频数据,该样本低质视频数据可以指所包含的图像帧的清晰度低于清晰度阈值的视频数据,因此,服务器可以通过机器学习的方式学习样本低质视频数据的噪声数据,进而服务器通过在高清图像数据中融合该噪声数据,再将融合有该噪声数据的高清图像数据的分辨率调低,即可得到低清图像数据。其中,在高清图像数据中融合噪声数据的方式可以是在高清图像数据中加入噪声数据。其中,服务器通过机器学习的方式学习样本低质视频数据的噪声数据的过程可以是:服务器可以获取到噪声学习网络,该噪声学习网络可以是一个能够学习视频数据中的噪声数据的模型,因此,服务器可以将样本低质视频数据输入噪声学习网络,即可以通过该噪声学习网络学习到样本低质视频数据的噪声数据。
获取到高清图像数据以及低清图像数据后,即可通过所获取到的高清图像数据以及低清图像数据构建样本图像对。
其中,所构建的样本图像对用于对图像处理网络进行训练,该图像处理网络可以包括超分网络、画质增强网络、人脸增强网络和锐化网络。
对于图像处理网络的各生成网络(如超分网络、画质增强网络、人脸增强网络),可以采取基于encoder(编码)-decoder(解码)思想的U-Net结构(一种网络结构)。每个生成网络可以由基本单元(block,即块)构成,其中超分网络的encoder和decoder分别采用3个block(指编码器和解码器中的一层可以采用3个block),画质增强网络和人脸增强网络分别采取5个block(指编码器和解码器中的一层可以采用5个block),每个block的基本通道数可以为16。此外,本申请中,block内部第一个3x3卷积会做通道放大,以提升特征维度,进而输出的3x3卷积会做特征压缩,以保持与输入通道维度不变,这可以学习到图像更多的特征信息。
此外,本申请中,在超分网络中对图像进行超分处理时,可以用PixelShuffle(一种上采样方法)作为上采样操作,通过通道到空间的维度转换代替插值上采样方案,可以实现更好的视觉效果。
本申请通过采用多种方式来获取样本图像对,可以丰富所获取到的样本图像对的类型,进而采用多种样本图像对训练图像处理网络,可以提升对图像处理网络的训练效果,具体请参见下述内容描述。
请参见图4和图5,图4是本申请提供的一种编码解码网络的结构示意图,图5是本申请提供的一种基本单元的结构示意图。其中,上述超分网络、画质增强网络、人脸增强网络均可以采用如图4所示的网络结构,该网络结构可以包含编码器和解码器,编码器可以有3层,解码器可以有3层。其中,编码器和解码器的每一层又可以是通过如图5所示的基本单元构成,一个基本单元可以依次包括3*3的卷积层、规范化网络层、激活层(即LeakyRelu)、3*3的卷积层以及1*1的卷积层。
步骤S102,调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;
本申请中,服务器可以调用图像处理网络中的超分网络将低清图像数据的分辨率调高,如调整到目标分辨率,进而生成低清图像数据对应的样本超分图像数据,该样本超分图像数据就是通过超分网络将低清图像数据的分辨率调高到目标分辨率后所得到的图像数据。
接着,服务器可以通过样本超分图像数据和高清图像数据生成超分损失函数,具体如下:
其中,超分损失函数可以包含两部分,一部分是像素层面的损失函数,一部分是特征层面的损失函数,通过结合像素层面的损失函数和特征层面的损失函数来对超分效果进行约束可以使得超分训练效果更准确更好。
其中,可以将像素层面的损失函数称之为第一超分损失函数,可以将特征层面的损失函数称之为第二超分损失函数。
服务器可以通过样本超分图像数据所包含的像素值元素和高清图像数据所包含的像素值元素生成该第一超分损失函数,该第一超分损失函数l c1可以参见下述公式(1)所示:
Figure PCTCN2022117789-appb-000001
其中,样本超分图像数据和高清图像数据所包含的像素点的个数相同,均为N,一个像素点处的像素值可以称之为一个像素值元素,I表示高清图像数据,
Figure PCTCN2022117789-appb-000002
表示样本超分图像数据,I (i)表示高清图像数据中第i个像素值元素,
Figure PCTCN2022117789-appb-000003
表示样本超分图像数据中第i个像素值元素,i从0开始计数,i小于或等于N,N为图像数据中像素值元素的总个数。
服务器可以通过样本超分图像数据在超分网络中的特征图所包含的特征值元素和高清图像数据在超分网络中的特征图所包含的特征值元素,生成第二超分损失函数,该第二超分损失函数l c2可以参见下述公式(2)所示:
Figure PCTCN2022117789-appb-000004
其中,l的取值可以根据实际应用场景确定,l表示特征层的层数,h l表示超分网络的第l个特征层的特征图的高度,w l表示第l个特征层的特征图的宽度,c l表示第l个特征层的通道数,s 1对应于特征图的高度,s 1的最大值等于h l,j 1对应于特征图的宽度,j 1的最大值等于w l,k 1对应于特征图的通道,k 1的最大值等于c l。可以将特征图中每个特征点处的值称之为特征值元素,因此,可以理解的是,可以将s 1、j 1、k 1理解为是针对特征图中的特征值元素的索引。φ表示一个操作,即从特征图中提取对应位置处的特征值元素的操作。
更多的,
Figure PCTCN2022117789-appb-000005
表示样本超分图像数据
Figure PCTCN2022117789-appb-000006
在第l个特征层的第k 1个通道的特征图(在超分网络中的特征图)中高度为s 1宽度为j 1处的特征值元素,
Figure PCTCN2022117789-appb-000007
表示高清图像数据I在第l个特征层的第k 1个通道的特征图(在超分网络中的特征图)中高度为s 1宽度为j 1处的特征值元素。
因此,超分损失函数就可以是上述第一超分损失函数l c1和第二超分损失函数l c2之和。
步骤S103,调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;
本申请中,上述通过超分图像所得到的样本超分图像数据就可以是画质增强网络的输 入,服务器可以调用图像处理网络中的画质增强网络对样本超分图像数据进行画质增强处理,以生成样本超分图像数据对应的第一样本增强图像数据,该第一样本增强图像数据就是对样本超分图像数据进行画质增强处理后所得到的图像数据。
进而,服务器可以将通过第一样本增强图像数据和高清图像数据之间的均方误差所得到的峰值信噪比即(PSNR)作为画质损失函数,该画质损失函数PSNR h可以参见下述公式(3)所示:
Figure PCTCN2022117789-appb-000008
其中,其中,I表示高清图像数据,
Figure PCTCN2022117789-appb-000009
表示第一样本增强图像数据,
Figure PCTCN2022117789-appb-000010
表示高清图像数据和第一样本增强图像数据之间的均方误差,bits可以表示精度,该精度可以是二进制的位数为16的精度或者可以是二进制的位数为32的精度。
步骤S104,调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;
本申请中,可以将通过上述画质增强网络所得到的第一样本增强图像数据作为人脸增强网络的输入,第一样本增强图像数据中可以包含人脸图像,服务器还可以调用图像处理网络中的人脸增强网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,以生成第二样本增强图像数据,该第二样本增强图像数据就是对第一样本增强图像数据中的人脸图像进行人脸增强处理后所得到的图像数据,具体可以参见下述描述内容。
其中,人脸增强网络可以包含人脸检测网络、人脸增强子网络以及人脸融合网络,人脸增强子网络又可以包含颜色判别网络和纹理判别网络,服务器可以调用人脸检测网络检测第一样本增强图像数据中人脸图像所在的检测框,可以将该检测框称之为人脸检测框,第一样本增强图像数据中还可以被标注有用于指示第一样本增强图像数据中的人脸图像实际所在位置的标注框,可以将该标注框称之为人脸标注框。服务器可以将人脸检测框中所包含的人脸图像从第一样本增强图像数据中抠取下来,即可得到检测到的人脸图像,进而,服务器可以调用人脸增强子网络对检测到的人脸图像(即从第一样本增强图像数据中抠取下来的人脸图像)进行人脸增强处理,即可得到增强后的人脸图像,可以将该增强后的人脸图像称之为样本增强人脸图像,该样本增强人脸图像也就是对第一样本增强图像数据中的人脸图像进行人脸增强处理后所得到的人脸图像。服务器可以调用人脸融合网络将该样本增强人脸图像与第一样本增强图像数据进行融合,可以将融合所得到的图像数据称之为第二样本增强图像数据。
因此,服务器可以通过上述人脸检测框和人脸标注框生成检测损失函数,顾名思义,该检测损失函数也就是由对第一样本增强图像数据中人脸图像所检测的位置与该人脸图像实际所在位置之间的偏差带来的,该检测损失函数l r1可以参见下述公式(4)所示:
Figure PCTCN2022117789-appb-000011
其中,J可以是人脸标注框,
Figure PCTCN2022117789-appb-000012
可以是人脸检测框,
Figure PCTCN2022117789-appb-000013
可以表示人脸标注框和人脸检测框交集的面积,|J|表示人脸标注框的面积,
Figure PCTCN2022117789-appb-000014
表示人脸检测框的面积。
服务器还可以对高清图像数据中的人脸图像进行抠取,得到高清图像数据中的人脸图像,可以将抠取下来的高清图像数据中的人脸图像称之为高清人脸图像。
服务器还可以通过高清人脸图像、样本增强人脸图像以及颜色判别器生成颜色损失函数,该颜色损失函数用于表征所增强得到的样本增强人脸图像的颜色与高清人脸图像的颜色之间的差异,如可以通过颜色判别器判别样本增强人脸图像的颜色是高清人脸图像的颜色的概率,用该概率表征颜色损失函数,其目标就是使得所判别的该概率趋近于0.5,就表明颜色判别器此时已经区分不出样本增强人脸图像的颜色和高清人脸图像的颜色,此时达到预期效果。
其中,服务器可以分别对高清人脸图像和样本增强人脸图像进行高斯模糊后,再判别高斯模糊后的样本增强人脸图像的颜色是高斯模糊后的高清人脸图像的颜色的概率,用该概率表征颜色损失函数。
服务器还可以通过高清人脸图像、样本增强人脸图像以及纹理判别器生成纹理损失函数,该纹理损失函数用于表征所增强得到的样本增强人脸图像的纹理与高清人脸图像的纹理之间的差异,如可以通过纹理判别器判别样本增强人脸图像的纹理是高清人脸图像的纹理的概率,用该概率表征纹理损失函数,其目标就是使得所判别的该概率趋近于0.5,就表明纹理判别器此时已经区分不出样本增强人脸图像的纹理和高清人脸图像的纹理,此时达到预期效果。
其中,服务器可以分别对高清人脸图像和样本增强人脸图像进行灰度化后,再判别灰度化后的样本增强人脸图像的纹理是灰度化后的高清人脸图像的纹理的概率,用该概率表征纹理损失函数。
更多的,服务器还可以通过样本增强人脸图像的特征图所包含的特征值元素以及高清人脸图像的特征图所包含的特征值元素,生成内容损失函数,该内容损失函数用于表征第二样本增强图像数据和高清图像数据之间的内容差异,该内容损失函数l r2可以参见下述公式(5)所示:
Figure PCTCN2022117789-appb-000015
其中,R可以是高清人脸图像,
Figure PCTCN2022117789-appb-000016
可以是样本增强人脸图像,t的取值可以根据实际应用场景确定,t表示特征层的层数,h t表示人脸增强子网络中第t个特征层的特征图的高度,w t表示第t个特征层的特征图的宽度,c t表示第t个特征层的通道数,s 2对应于特征图 的高度,s 2的最大值等于h t,j 2对应于特征图的宽度,j 2的最大值等于w t,k 2对应于特征图的通道,k 2的最大值等于c t。可以将特征图中每个特征点处的值称之为特征值元素,因此,可以理解的是,可以将s 2、j 2、k 2理解为是针对特征图中的特征值元素的索引。φ表示一个操作,即从特征图中提取对应位置处的特征值元素的操作。
更多的,
Figure PCTCN2022117789-appb-000017
表示样本增强人脸图像
Figure PCTCN2022117789-appb-000018
在第t个特征层的第k 2个通道的特征图中高度为s 2宽度为j 2处的特征值元素,
Figure PCTCN2022117789-appb-000019
表示高清人脸图像R在第t个特征层的第k 2个通道的特征图中高度为s 2宽度为j 2处的特征值元素。
因此,人脸损失函数就可以是上述检测损失函数l r1、颜色损失函数、纹理损失函数以及内容损失函数l r2之和。
通过上述可以知道,通过人脸检测网络可以得到检测损失函数,通过人脸增强子网络可以得到颜色损失函数、纹理损失函数以及内容损失函数,人脸损失函数就可以是上述检测损失函数、颜色损失函数、纹理损失函数以及内容损失函数之和。
可选的,人脸增强网络除了可以使用从第一样本增强图像数据抠取下来的人脸图像进行训练,还可以使用不是由上述从第一样本增强图像数据抠取下来的人脸图像之外的人脸图像进行训练,通过结合两种人脸图像(一种是训练集(可以是任意包含待优化的人脸图像的训练集)中待训练的人脸图像(即不是从第一样本增强图像数据抠取下来的人脸图像),一种是通过画质增强网络所得到的第一样本增强图像数据中的人脸图像)来训练人脸增强网络,其训练效果可以更好。
步骤S105,调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;
本申请中,可以将上述第二样本增强图像数据作为锐化网络的输入,服务器可以调用图像处理网络中的锐化网络对第二样本增强图像数据进行图像锐化处理,得到对第二样本增强图像数据进行图像锐化处理后的图像数据,可以将该图像数据称之为样本锐化图像数据。
其中,服务器可以通过样本锐化图像数据和高清图像数据生成锐化网络的损失函数,可以将该损失函数称之为锐化损失函数。该锐化损失函数可以包含两个部分,一部分是客观角度的损失函数,一部分是感官角度的损失函数,其中,客观角度的损失函数可以称之为质量损失函数,感官角度的损失函数可以称之为感知损失函数。
该质量损失函数可以是样本锐化图像数据和高清图像数据之间的峰值信噪比PSNR,该感知损失函数可以是通过样本锐化图像数据和高清图像数据之间的感知相似度得到,其中,该感知相似度可以通过样本锐化图像数据和高清图像数据之间的感知损失值(Learned Perceptual Image Patch Similarity,LPIPS)得到,感知损失值LPIPS越小,表明样本锐化图像数据和高清图像数据之间从感官层面上来说越相似,反之,感知损失值LPIPS越大,表 明样本锐化图像数据和高清图像数据之间从感官层面上来说越不相似(即差异越大),因此,使用感知损失函数的目标就是使得样本锐化图像数据和高清图像数据之间的感知损失值达到最小。
因此,锐化损失函数就可以是上述质量损失函数和感知损失函数之和。
请参见图6,图6是本申请提供的一种获取损失函数的场景示意图。如图6所示,服务器可以将样本图像对输入超分网络,在超分网络中生成低清图像数据对应的样本超分图像数据,通过样本超分图像数据和高清图像数据可以生成超分损失函数。
服务器还可以继续将样本超分图像数据输入画质增强网络,在画质增强网络中生成第一样本增强图像数据,通过第一样本增强图像数据和高清图像数据可以生成画质损失函数。
服务器还可以将第一样本增强图像数据输入人脸增强网络,在人脸增强网络中生成对第一样本增强图像数据中的人脸图像进行增强后所得到的样本增强人脸图像,通过该样本增强人脸图像和高清图像数据中的高清人脸图像可以生成人脸损失函数。在人脸增强网络中还可以对样本增强人脸图像和第一样本增强图像数据进行融合,得到第二样本增强图像数据。
服务器还可以将第二样本增强图像数据输入锐化网络,在锐化网络中对第二样本增强图像数据进行锐化处理,得到样本锐化图像数据,通过样本锐化图像数据和高清图像数据就可以生成锐化损失函数。
步骤S106,根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络;
本申请中,图像处理网络的网络参数可以包含超分网络的网络参数、画质增强网络的网络参数、人脸增强网络的网络参数以及锐化网络的网络参数,服务器可以通过上述超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数对图像处理网络的网络参数进行更新,其中,通过超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数对图像处理网络的网络参数进行更新的方式有两种,具体如下:
图像处理网络中各个网络从前往后的顺序依次是超分网络、画质增强网络、人脸增强网络、锐化网络,其中,人脸增强网络从前往后又依次包含人脸检测网络、人脸增强子网络、人脸融合网络这3个对人脸图像进行处理的网络。由于损失函数可以在图像处理网络中往前进行传递,进而对所传递到的网络(如超分网络、画质增强网络、人脸增强网络和锐化网络)的网络参数进行更新,因此,对网络的网络参数进行更新的第一种方式可以是:可以将传递到某个网络的多个损失函数进行相加,进而通过相加后的损失函数直接对该个网络的网络参数进行更新即可;对网络的网络参数进行更新的第二种方式可以是:可以依次通过传递到某个网络的多个损失函数对该个网络的网络参数进行迭代更新。可以理解的是,此两种对网络的网络参数进行更新的方式所达到的对网络参数进行更新的效果相同。
具体的,超分损失函数往前传递只能传递给超分网络自己;画质损失函数往前传递可以传递给画质增强网络和超分网络;人脸损失函数往前传递可以传递给人脸增强网络、画质增强网络和超分网络,但是,需要进行说明的是,由于人脸增强网络从前往后又依次包含人脸检测网络、人脸增强子网络、人脸融合网络,因此,可以理解的是,在人脸增强网络内部,对于人脸损失函数的传递,传递到人脸检测网络的损失函数可以是人脸损失函数, 传递到人脸增强子网络的损失函数可以是人脸损失函数中的颜色损失函数、纹理损失函数以及内容损失函数(即人脸增强子网络自己所生成的损失函数),而人脸损失函数不能往后传递给人脸融合网络;锐化损失函数往前传递可以传递给锐化网络、人脸增强网络、画质增强网络和超分网络。
因此,若通过上述第一种方式对各个网络的网络参数进行更新(即修正),则该过程可以是:可以将上述超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数相加,进而通过相加后的损失函数更新超分网络的网络参数,可以得到训练好的超分网络;可以将上述画质损失函数、人脸损失函数以及锐化损失函数相加,进而通过相加后的损失函数更新画质增强网络的网络参数,可以得到训练好的画质增强网络;在人脸增强网络内部,可以将上述锐化损失函数和人脸损失函数相加,进而通过相加后的损失函数更新人脸检测网络的网络参数,还可以将上述锐化损失函数、颜色损失函数、纹理损失函数以及内容损失函数相加,进而通过相加后的损失函数更新人脸增强子网络的网络参数,还可以通过锐化损失函数更新人脸融合网络的网络参数,最后可以得到训练好的人脸增强网络;可以通过锐化损失函数更新锐化网络的网络参数,得到训练好的锐化网络。
若通过上述第二种方式对各个网络的网络参数进行更新(即修正),则该过程可以是:可以通过上述超分损失函数、画质损失函数、人脸损失函数以及锐化损失函数依次对超分网络的网络参数进行迭代更新,迭代更新后可以得到训练好的超分网络,如首先可以通过超分损失函数对超分网络的网络参数进行更新,进而,可以再通过画质损失函数对通过超分损失函数更新后的超分网络的网络参数进行更新,进而,可以再通过人脸损失函数对通过画质损失函数更新后的超分网络的网络参数进行更新,进而,可以再通过锐化损失函数对通过人脸损失函数更新后的超分网络的网络参数进行更新。
同理,可以通过上述画质损失函数、人脸损失函数以及锐化损失函数依次对画质增强网络的网络参数进行迭代更新,迭代更新后可以得到训练好的画质增强网络。
同理,可以通过锐化损失函数和人脸损失函数依次对人脸检测网络的网络参数进行迭代更新,迭代更新后可以得到训练好的人脸检测网络,可以通过锐化损失函数、颜色损失函数、纹理损失函数以及内容损失函数依次对人脸增强子网络的网络参数进行迭代更新,迭代更新后可以得到训练好的人脸增强子网络,可以通过锐化损失函数对人脸融合网络的网络参数进行更新,得到训练好的人脸融合网络,通过训练好的人脸检测网络、训练好的人脸增强子网络以及训练好的人脸融合网络即可得到训练好的人脸增强网络。可以通过锐化损失函数对锐化网络的网络参数进行更新,得到训练好的锐化网络。
因此,通过上述训练好的超分网络、训练好的画质增强网络、训练好的人脸增强网络和训练好的锐化网络即可生成(即得到)训练好的图像处理网络。训练好的图像处理网络就可以用于对视频数据或者图像数据进行全面的优化,具体可以参见下述图7对应实施例中的具体描述。
本申请中对各个网络(包括超分网络、画质增强网络、人脸增强网络以及锐化网络)进行关联训练,可以实现在保证各个网络自身的训练效果的前提下,也实现了各个网络之间相互促进相互融合的训练效果,使得所训练得到的图像处理网络更准确,因此通过训练得到的图像处理网络可以实现对图像数据更准确和更优异的优化效果。即在训练图像处理网络时,本申请提供了端到端(如整体上从超分网络这一端依次到锐化网络这一端)的多 任务(训练一个网络可以是一个训练任务)联合训练框架,该联合训练框架是一种级联的框架,如从超分网络经过画质增强网络和人脸增强网络依次联结到锐化网络的框架,通过采用此种级联的框架来训练各个网络,可以使得各个网络之间的数据协同训练效果更好,并且可以实现不同任务之间的训练效果可以相互促进和融合,避免了任务间的冲突,实现了良好的综合效果。此外,通过切合实际的训练数据生成方案,包括降质核学习(如上述通过噪声学习网络获取样本图像对)、编解码模块随机生成数据(如上述通过目标码率对样本视频数据进行编解码的方式获取样本图像对)、仿真运动模糊(如上述通过对目标图像帧以及目标图像帧的相邻图像帧进行平均融合处理的方式获取样本图像对)和压缩噪声(如上述通过对高清图像数据进行失真格式转换的方式获取样本图像对),生成了分布广泛的训练数据,使得通过该训练数据所训练得到的图像处理网络可以适用于更广泛的图像优化场景,且具有更强的鲁棒性。
本申请可以获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。由此可见,本申请提出的方法可以对图像处理网络进行多任务(如包括超分任务、画质增强任务、人脸增强任务以及锐化任务)的相互关联相互融合地训练,使得训练后的图像处理网络在对图像同时进行多任务的优化时各个任务之间不会发生冲突,优化效果更好。
请参见图7,图7是本申请提供的一种图像处理方法的流程示意图。本申请实施例描述了对训练好的图像处理网络的应用过程,本申请实施例中所描述的内容可以与上述图3对应申请实施例中所描述的内容相结合,本申请实施例中的执行主体也可以是服务器。如图7所示,该方法可以包括:
步骤S201,调用训练好的图像处理网络获取初始图像数据对应的超分图像数据;超分图像数据的分辨率大于或等于目标分辨率;
本申请中,此处所需要调用的超分网络即为上述训练好的图像处理网络中的超分网络,即此处所调用的超分网络是训练好的超分网络。
其中,服务器可以获取到初始图像数据,该初始图像数据可以是任意一个需要进行优化的图像。由于在高分辨率的图像数据上作优化,其优化的效果可以更好,因此,服务器可以调用超分网络检测初始图像数据的分辨率,若检测到初始图像数据的分辨率小于目标分辨率,则表明初始图像数据是一个低分辨率的图像数据,因此可以调用超分网络提高初始图像数据的分辨率,如调用超分网络将初始图像数据的分辨率调整到目标分辨率(该目 标分辨率可以是根据实际应用场景自行设置的一个高分辨率),进而可以将调整到目标分辨率的初始图像数据作为超分图像数据。
或者,若检测到初始图像数据的分辨率大于或等于目标分辨率,则表明初始图像数据自身就是一个高分辨率的图像数据,因此,不用再调整初始图像数据的分辨率,直接将初始图像数据作为超分图像数据。
步骤S202,调用训练好的图像处理网络对超分图像数据进行画质增强处理,得到第一增强图像数据;
本申请中,此处所需要调用的画质增强网络即为上述训练好的图像处理网络中的画质增强网络,即此处所调用的画质增强网络是训练好的画质增强网络。
由于进行超分后的图像的画质可能会不好,因此,服务器还可以调用画质增强网络对超分图像数据的画质进行整体的优化(即对超分图像数据进行画质增强处理),可以将通过画质增强网络对超分图像数据进行画质优化后的图像数据作为第一增强图像数据。
步骤S203,调用训练好的图像处理网络获取第一增强图像数据对应的第二增强图像数据;若第一增强图像数据中包含人脸图像,则第二增强图像数据是对第一增强图像数据中的人脸图像进行人脸增强后所得到的图像数据;
本申请中,此处所需要调用的人脸增强网络即为上述训练好的图像处理网络中的人脸增强网络,即此处所调用的人脸增强网络是训练好的人脸增强网络。
其中,所调用的人脸增强网络包含人脸检测网络、人脸增强子网络和人脸融合网络,由于图像数据中的人脸通常属于比较重要的元素,因此,服务器还可以通过人脸检测网络对第一增强图像数据进行人脸检测,即检测第一增强图像数据中是否包含人脸图像。
若检测到第一增强图像数据不包含人脸图像,则可以直接将第一增强图像数据作为第二增强图像数据。
若检测到第一增强图像数据包含人脸图像,则可以调用人脸增强子网络对第一增强图像数据中的人脸图像进行优化,进而可以将进行人脸图像优化的第一增强图像数据作为第二增强图像数据。
其中,调用人脸增强子网络对第一增强图像数据中的人脸图像进行优化的过程可以是:
服务器可以调用人脸检测网络对第一增强图像数据中所检测到的人脸图像进行抠取,即可得到第一增强图像数据中的人脸图像,可以将该人脸图像称之为抠取人脸图像。
进而,服务器可以调用人脸增强子网络对抠取人脸图像进行人脸增强处理,即对抠取人脸图像进行人脸优化,可以将进行人脸优化的抠取人脸图像称之为增强人脸图像。
更多的,服务器还可以调用人脸融合网络生成人脸融合掩膜(即融合Mask),该人脸融合掩膜用于对增强人脸图像和第一增强图像数据进行加权融合,以得到第二增强图像数据。通过该自适应的人脸融合掩膜可以实现增强人脸图像和第一增强图像数据之间的无缝融合。
例如,可以将增强人脸图像表示为a,将人脸融合掩膜表示为b,将第一增强图像数据表示为c,那么,第二增强图像数据可以是b*a+(1-b)*c。
其中,对第一增强图像数据中的人脸图像进行增强以得到第二增强图像数据的过程与上述对第一样本增强图像数据中的人脸图像进行增强以得到第二样本增强图像数据的过 程相同。
请参见图8,图8是本申请提供的一种对人脸进行优化的场景示意图。如图8所示,服务器可以将通过画质增强网络所得到的第一增强图像数据输入训练好的人脸检测网络,通过人脸检测网络可以对第一增强图像数据中的人脸图像进行抠取,可以得到抠取人脸图像,继而将抠取人脸图像输入训练好的人脸增强子网络,通过该人脸增强子网络可以对抠取人脸图像进行人脸增强,得到增强人脸图像。进而将增强人脸图像和第一增强图像数据输入人脸融合网络即可实现对增强人脸图像和第一增强图像数据的图像融合,最后得到第二增强图像数据。
步骤S204,调用训练好的图像处理网络对第二增强图像数据进行图像锐化处理,得到锐化图像数据,并输出锐化图像数据;
本申请中,此处所需要调用的锐化网络即为上述训练好的图像处理网络中的锐化网络,即此处所调用的锐化网络是训练好的锐化网络。
最后,可以再通过锐化网络对第二增强图像数据中的细节进行增强,使得细节更清晰。服务器可以调用锐化网络提取第二增强图像数据中的高频成分,如可以对第二增强图像数据进行高斯模糊后再与原本的第二增强图像数据作差,就可以得到第二增强图像数据中的高频图像信息(即高频成分)。
服务器还可以调用锐化网络生成第二增强图像数据的锐化掩膜,该锐化掩膜用于指示第二增强图像数据中需要进行锐化增强的细节部分,服务器可以将该锐化掩膜与第二增强图像数据进行点乘,得到第二增强图像数据中的锐化图像信息(即细节成分)。例如,服务器可以使用锐化网络中所包含的卷积层(如1*1的卷积层)和Prelu(激活层)来生成第二增强图像数据的锐化掩膜。
此外,通过锐化网络还可以生成针对上述高频图像信息的加权权重(可以称之为第一加权权重)、针对上述锐化图像信息的加权权重(可以称之为第二加权权重)以及针对第二增强图像数据的加权权重(可以称之为第三加权权重),进而通过第一加权权重、第二加权权重和第三加权权重对高频图像信息、锐化图像信息和第二增强图像数据进行对应加权求和,就可以得到锐化图像数据。
例如,可以将第一加权权重和高频图像信息之间的乘积作为第一加权结果,将第二加权权重和锐化图像信息之间的乘积作为第二加权结果,将第三加权权重和第二增强图像数据之间的乘积作为第三加权结果,进而将第一加权结果、第二加权结果和第三加权结果进行求和,就可以得到锐化图像数据。
该锐化图像数据就是对初始图像数据进行优化后所得到的最终的图像数据,服务器可以在图像处理网络中输出该锐化图像数据。其中,对第二增强图像数据进行增强得到锐化图像数据的过程与上述对第二样本增强图像数据进行增强得到样本锐化图像数据的过程相同。
请参见图9,图9是本申请提供的一种图像优化的场景示意图。如图9所示,服务器可以将初始图像数据输入训练好的图像处理网络,通过图像处理网络中的超分网络可以得到初始图像数据对应的超分图像数据。继而,通过画质增强网络可以对超分图像数据进行画质增强,得到第一增强图像数据。接着,通过人脸增强网络中的人脸检测网络可以抠取得到第一增强图像数据中的人脸图像(即抠取人脸图像),通过人脸增强子网络可以对该 抠取人脸图像进行人脸增强,得到增强人脸图像,通过人脸融合网络可以对增强人脸图像和第一增强图像数据进行融合,即可得到第二增强图像数据。最后,通过锐化网络可以对第二增强图像数据进行锐化处理,得到锐化图像数据,可以输出该锐化图像数据。
其中,上述初始图像数据还可以是对视频数据进行分帧所得到的多个图像帧中的任一个图像帧,服务器可以是应用客户端的后台服务器,该视频数据可以是用于向应用客户端进行推送的数据。因此,服务器可以将视频数据所包含的每个图像帧都作为初始图像数据,并采用上述过程获取到视频数据所包含的每个图像帧分别对应的锐化图像数据,进而通过每个图像帧分别对应的锐化图像数据即可生成该视频数据的优化视频数据,该优化视频数据就是对该视频数据中每帧图像帧进行优化后所得到的视频数据。服务器可以将该优化视频数据推送给应用客户端,应用客户端就可以在客户端界面输出该优化视频数据,供用户浏览查看。
请参见图10,图10是本申请提供的一种数据推送的场景示意图。如图10所示,服务器可以对视频数据进行分帧,得到视频数据所包含的多个图像帧(如图像帧1~图像帧n),进而,服务器可以通过上述训练好的图像处理网络对分帧得到的每个图像帧进行优化,得到每个图像帧分别对应的锐化图像数据(如锐化图像数据1~锐化图像数据n)。
进而,通过每个图像帧分别对应的锐化图像数据即可得到视频数据的优化视频数据,服务器可以将该优化视频数据推送给应用客户端。
采用本申请所提供的方法,通过从超分网络、画质增强网络、人脸增强网络到锐化网络的多任务联合训练框架,可以实现对超分网络、画质增强网络、人脸增强网络和锐化网络之间相互关联、相互融合、相互促进的训练,使得训练后的超分网络、画质增强网络、人脸增强网络以及锐化网络不仅各自的训练效果好,而且在对图像一起进行叠加优化时不会发生冲突,整体优化效果更好,其中,训练好的超分网络、画质增强网络和锐化网络可以对图像数据全局的画质、色彩、纹理以及清晰度这些属性进行全面的增强,不会存在冲突,实现了对图像数据的多方面效果提升,而训练好的人脸增强网络对于图像数据中局部的人脸还有专门的加强效果,从而实现了全局增强和局部增强。
请参见图11,图11是本申请提供的一种图像处理网络的训练装置的结构示意图。该图像处理网络的训练装置可以是运行于计算机设备中的一个计算机可读指令(包括程序代码),例如该图像处理网络的训练装置为一个应用软件,该图像处理网络的训练装置可以用于执行本申请实施例提供的方法中的相应步骤。如图11所示,该图像处理网络的训练装置1可以包括:样本获取模块11、调用模块12、更新模块13。
样本获取模块11,用于获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;
调用模块12,用于调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;
调用模块12,用于调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;
调用模块12,用于调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进 行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;
调用模块12,用于调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;
更新模块13,用于根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。
可选的,图像处理网络包括超分网络、画质增强网络、人脸增强网络和锐化网络;样本超分图像数据根据超分网络得到,第一样本增强图像数据根据画质增强网络得到,第二样本增强图像数据根据人脸增强网络得到,样本锐化图像数据根据锐化网络得到;
更新模块13根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络的方式,包括:
根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新超分网络的网络参数,得到训练好的超分网络;
根据画质损失函数、人脸损失函数及锐化损失函数更新画质增强网络的网络参数,得到训练好的画质增强网络;
根据人脸损失函数及锐化损失函数更新人脸增强网络的网络参数,得到训练好的人脸增强网络;
根据锐化损失函数更新锐化网络的网络参数,得到训练好的锐化网络;
根据训练好的超分网络、训练好的画质增强网络、训练好的人脸增强网络和训练好的锐化网络生成训练好的图像处理网络。
可选的,调用模块12根据样本超分图像数据和高清图像数据生成超分损失函数的方式,包括:
根据样本超分图像数据所包含的像素值元素和高清图像数据所包含的像素值元素,生成第一超分损失函数;
根据样本超分图像数据的特征图所包含的特征值元素和高清图像数据的特征图所包含的特征值元素,生成第二超分损失函数;
根据第一超分损失函数和第二超分损失函数,生成超分损失函数。
可选的,图像处理网络包括人脸增强网络,第二样本增强图像数据根据人脸增强网络得到,人脸增强网络包含人脸检测网络、颜色判别网络和纹理判别网络,第一样本增强图像数据中的人脸图像具有通过人脸检测网络所生成的人脸检测框、以及用于指示实际人脸位置的人脸标注框;
调用模块12根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数的方式,包括:
根据人脸检测框和人脸标注框生成检测损失函数;
对高清图像数据中的人脸图像进行抠取,得到高清人脸图像;
根据高清人脸图像、样本增强人脸图像及颜色判别网络生成颜色损失函数;
根据高清人脸图像、样本增强人脸图像及纹理判别网络生成纹理损失函数;
根据样本增强人脸图像的特征图所包含的特征值元素和高清人脸图像的特征图所包含的特征值元素,生成内容损失函数;
根据检测损失函数、颜色损失函数、纹理损失函数和内容损失函数生成人脸损失函数。
可选的,调用模块12根据样本锐化图像数据和高清图像数据生成锐化损失函数的方式,包括:
根据样本锐化图像数据和高清图像数据之间的峰值信噪比,生成质量损失函数;
根据样本锐化图像数据和高清图像数据之间的感知相似度,生成感知损失函数;
根据质量损失函数和感知损失函数生成锐化损失函数。
可选的,样本获取模块11获取样本图像对的方式,包括:
获取样本视频数据;
对样本视频数据进行分帧,得到样本视频数据所包含的多个样本图像帧;
采用目标码率对样本视频数据进行编解码处理,得到样本视频数据对应的低质视频数据;低质视频数据的图像帧画质低于样本视频数据的图像帧画质,低质视频数据包含每个样本图像帧分别对应的低质图像帧,目标码率低于码率阈值;
根据每个样本图像帧与所对应的低质图像帧构建样本图像对。
可选的,样本获取模块11获取样本图像对的方式,包括:
获取样本视频数据;
对样本视频数据进行分帧,得到样本视频数据所包含的多个样本图像帧;
从多个样本图像帧中选取目标图像帧作为高清图像数据;
对目标图像帧和多个样本图像帧中目标图像帧的相邻图像帧进行平均融合处理,得到低清图像数据。
可选的,样本获取模块11获取样本图像对的方式,包括:
获取高清图像数据;
对高清图像数据进行高斯模糊处理,得到低清图像数据。
可选的,样本获取模块获取样本图像对的方式,包括:
获取高清图像数据;
对高清图像数据进行失真格式转换,得到低清图像数据。
可选的,样本获取模块11获取样本图像对的方式,包括:
获取高清图像数据;
获取样本低清视频数据,并将样本低清视频数据输入噪声学习网络;样本低清视频数据的清晰度低于清晰度阈值;
基于噪声学习网络学习样本低清视频数据的噪声数据;
在高清图像数据中融合噪声数据,得到低清图像数据。
根据本申请的一个实施例,图3所示的图像处理网络的训练方法所涉及的步骤可由图11所示的图像处理网络的训练装置1中的各个模块来执行。例如,图3中所示的步骤S101可由图11中的样本获取模块11来执行,图3中所示的步骤S102-步骤S105可由图11中的调用模块12来执行;图3中所示的步骤S106可由图11中的更新模块13来执行。
本申请可以获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样 本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。由此可见,本申请提出的装置可以对图像处理网络进行多任务(如包括超分任务、画质增强任务、人脸增强任务以及锐化任务)的相互关联相互融合地训练,使得训练后的图像处理网络在对图像同时进行多任务的优化时各个任务之间不会发生冲突,优化效果更好。
根据本申请的一个实施例,图11所示的图像处理网络的训练装置1中的各个模块可以分别或全部合并为一个或若干个单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个子单元,可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述模块是基于逻辑功能划分的,在实际应用中,一个模块的功能也可以由多个单元来实现,或者多个模块的功能由一个单元实现。在本申请的其它实施例中,图像处理网络的训练装置1也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。
根据本申请的一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算机设备上运行能够执行如图3中所示的相应方法所涉及的各步骤的计算机可读指令(包括程序代码),来构造如图11中所示的图像处理网络的训练装置1,以及来实现本申请实施例的图像处理网络的训练方法。上述计算机可读指令可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
请参见图12,图12是本申请提供的一种图像处理装置的结构示意图。该图像处理装置可以是运行于计算机设备中的一个计算机可读指令(包括程序代码),例如该图像处理装置为一个应用软件,该图像处理装置可以用于执行本申请实施例提供的方法中的相应步骤。如图12所示,该图像处理装置2可以包括:超分调用模块21、画质增强模块22、人脸增强模块23和锐化模块24;
超分调用模块21,用于调用训练好的图像处理网络获取初始图像数据对应的超分图像数据;超分图像数据的分辨率大于或等于目标分辨率;
画质增强模块22,用于调用训练好的图像处理网络对超分图像数据进行画质增强处理,得到第一增强图像数据;
人脸增强模块23,用于调用训练好的图像处理网络获取第一增强图像数据对应的第二增强图像数据;若第一增强图像数据中包含人脸图像,则第二增强图像数据是对第一增强图像数据中的人脸图像进行人脸增强后所得到的图像数据;
锐化模块24,用于调用训练好的图像处理网络对第二增强图像数据进行图像锐化处理,得到锐化图像数据,并输出锐化图像数据。
可选的,训练好的图像处理网络包括超分网络,超分调用模块21调用训练好的图像 处理网络获取初始图像数据对应的超分图像数据的方式,包括:
获取初始图像数据;
调用超分网络检测初始图像数据的分辨率;
若初始图像数据的分辨率大于或等于目标分辨率,则将初始图像数据确定为超分图像数据;
若初始图像数据的分辨率小于目标分辨率,则调用超分网络将初始图像数据的分辨率调整到目标分辨率,得到超分图像数据。
可选的,训练好的图像处理网络包括人脸增强网络,人脸增强模块23调用训练好的图像处理网络获取第一增强图像数据对应的第二增强图像数据的方式,包括:
调用人脸增强网络对第一增强图像数据进行人脸检测;
若第一增强图像数据不包含人脸图像,则将第一增强图像数据确定为第二增强图像数据;
若第一增强图像数据包含人脸图像,则调用人脸增强网络对第一增强图像数据中的人脸图像进行人脸增强处理,得到第二增强图像数据。
可选的,人脸增强网络包含人脸检测网络、人脸增强子网络和人脸融合网络;
人脸增强模块23调用人脸增强网络对第一增强图像数据中的人脸图像进行人脸增强处理,得到第二增强图像数据的方式,包括:
调用人脸检测网络对第一增强图像数据中的人脸图像进行抠取,得到抠取人脸图像;
调用人脸增强子网络对抠取人脸图像进行人脸增强处理,得到增强人脸图像;
调用人脸融合网络生成人脸融合掩膜;
根据人脸融合掩膜对第一增强图像数据和增强人脸图像进行图像融合处理,得到第二增强图像数据。
可选的,训练好的图像处理网络包括锐化网络,锐化模块24调用训练好的图像处理网络对第二增强图像数据进行图像锐化处理,得到锐化图像数据的方式,包括:
调用锐化网络提取第二增强图像数据中的高频图像信息;
根据锐化网络生成针对第二增强图像数据的锐化掩膜,并根据锐化掩膜提取第二增强图像数据中的锐化图像信息;
根据锐化网络预测针对高频图像信息的第一加权权重、针对锐化图像信息的第二加权权重以及针对第二增强图像数据的第三加权权重;
根据第一加权权重、第二加权权重和第三加权权重对应对高频图像信息、锐化图像信息和第二增强图像数据进行加权求和,得到锐化图像数据。
可选的,初始图像数据是对视频数据进行分帧所得到的多个图像帧中任一个图像帧;上述装置2还用于:
根据多个图像帧中每个图像帧分别对应的锐化图像数据,生成视频数据的优化视频数据;
将优化视频数据推送给应用客户端,以使应用客户端输出优化视频数据。
根据本申请的一个实施例,图7所示的图像处理方法所涉及的步骤可由图12所示的图像处理装置2中的各个模块来执行。例如,图7中所示的步骤S201可由图12中的超分调用模块21来执行,图7中所示的步骤S202可由图12中的画质增强模块22来执行;图 7中所示的步骤S203可由图12中的人脸增强模块23来执行,图7中所示的步骤S204可由图12中的锐化模块24来执行。
本申请可以获取样本图像对;样本图像对包含低清图像数据和高清图像数据,低清图像数据与高清图像数据具有相同的图像内容;调用图像处理网络将低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据样本超分图像数据和高清图像数据生成超分损失函数;调用图像处理网络对样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据第一样本增强图像数据和高清图像数据生成画质损失函数;调用图像处理网络对第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将样本增强人脸图像与第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据样本增强人脸图像和高清图像数据中的人脸图像生成人脸损失函数;调用图像处理网络对第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据样本锐化图像数据和高清图像数据生成锐化损失函数;根据超分损失函数、画质损失函数、人脸损失函数及锐化损失函数更新图像处理网络的网络参数,得到训练好的图像处理网络。由此可见,本申请提出的装置可以对图像处理网络进行多任务(如包括超分任务、画质增强任务、人脸增强任务以及锐化任务)的相互关联相互融合地训练,使得训练后的图像处理网络在对图像同时进行多任务的优化时各个任务之间不会发生冲突,优化效果更好。
根据本申请的一个实施例,图12所示的图像处理装置2中的各个模块可以分别或全部合并为一个或若干个单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个子单元,可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述模块是基于逻辑功能划分的,在实际应用中,一个模块的功能也可以由多个单元来实现,或者多个模块的功能由一个单元实现。在本申请的其它实施例中,图像处理装置2也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。
根据本申请的一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算机设备上运行能够执行如图7中所示的相应方法所涉及的各步骤的计算机可读指令(包括程序代码),来构造如图12中所示的图像处理装置2,以及来实现本申请实施例的图像处理网络的训练方法。上述计算机可读指令可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
请参见图13,图13是本申请提供的一种计算机设备的结构示意图。如图13所示,计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,计算机设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图13所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、 用户接口模块以及计算机可读指令,执行该计算机可读指令可实现本申请实施提供的图像处理网络的训练方法和图像处理方法中的至少一种。
在图13所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的计算机可读指令,以实现本申请实施提供的图像处理网络的训练方法。
处理器1001还可以用于调用存储器1005中存储的计算机可读指令,以实现本申请实施例提供的图像处理方法。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图3对应实施例中对上述图像处理网络的训练方法的描述,也可执行前文图7所对应实施例中对上述图像处理方法的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请还提供了一种计算机可读存储介质,且计算机可读存储介质中存储有前文提及的图像处理网络的训练装置1和图像处理装置2所执行的计算机可读指令,当处理器执行程序指令时,能够执行前文图3所对应实施例中对图像处理网络的训练方法的描述和前文图7对应实施例中对图像处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
作为示例,上述程序指令可被部署在一个计算机设备上执行,或者被部署位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行,分布在多个地点且通过通信网络互连的多个计算机设备可以组成区块链网络。
上述计算机可读存储介质可以是前述任一实施例提供的图像处理网络的训练装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机可读指令以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行前文图3对应实施例中对上述图像处理网络的训练方法的描述以及前文图7对应实施例中对图像处理方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这 些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和/或结构示意图来描述的,具体可由计算机可读指令实现方法流程图和/或结构示意图的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。这些计算机可读指令可提供到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能的装置。这些计算机可读指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或结构示意图一个方框或多个方框中指定的功能。这些计算机可读指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或结构示意一个方框或多个方框中指定的功能的步骤。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (20)

  1. 一种图像处理网络的训练方法,由计算机设备执行,所述方法包括:
    获取样本图像对;所述样本图像对包含低清图像数据和高清图像数据,所述低清图像数据与所述高清图像数据具有相同的图像内容;
    调用所述图像处理网络将所述低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据所述样本超分图像数据和所述高清图像数据生成超分损失函数;
    调用所述图像处理网络对所述样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据所述第一样本增强图像数据和所述高清图像数据生成画质损失函数;
    调用所述图像处理网络对所述第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将所述样本增强人脸图像与所述第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据所述样本增强人脸图像和所述高清图像数据中的人脸图像生成人脸损失函数;
    调用所述图像处理网络对所述第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据所述样本锐化图像数据和所述高清图像数据生成锐化损失函数;
    根据所述超分损失函数、所述画质损失函数、所述人脸损失函数及所述锐化损失函数更新所述图像处理网络的网络参数,得到训练好的图像处理网络。
  2. 根据权利要求1所述的方法,其特征在于,所述图像处理网络包括超分网络、画质增强网络、人脸增强网络和锐化网络;所述样本超分图像数据根据所述超分网络得到,所述第一样本增强图像数据根据所述画质增强网络得到,所述第二样本增强图像数据根据所述人脸增强网络得到,所述样本锐化图像数据根据所述锐化网络得到;
    所述根据所述超分损失函数、所述画质损失函数、所述人脸损失函数及所述锐化损失函数更新所述图像处理网络的网络参数,得到训练好的图像处理网络,包括:
    根据所述超分损失函数、所述画质损失函数、所述人脸损失函数及所述锐化损失函数更新所述超分网络的网络参数,得到训练好的超分网络;
    根据所述画质损失函数、所述人脸损失函数及所述锐化损失函数更新所述画质增强网络的网络参数,得到训练好的画质增强网络;
    根据所述人脸损失函数及所述锐化损失函数更新所述人脸增强网络的网络参数,得到训练好的人脸增强网络;
    根据所述锐化损失函数更新所述锐化网络的网络参数,得到训练好的锐化网络;
    根据训练好的超分网络、训练好的画质增强网络、训练好的人脸增强网络和训练好的锐化网络生成训练好的图像处理网络。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述样本超分图像数据和所述高清图像数据生成超分损失函数,包括:
    根据所述样本超分图像数据所包含的像素值元素和所述高清图像数据所包含的像素值元素,生成第一超分损失函数;
    根据所述样本超分图像数据的特征图所包含的特征值元素和所述高清图像数据的特征图所包含的特征值元素,生成第二超分损失函数;
    根据所述第一超分损失函数和所述第二超分损失函数,生成所述超分损失函数。
  4. 根据权利要求1所述的方法,其特征在于,所述图像处理网络包括人脸增强网络, 所述第二样本增强图像数据根据所述人脸增强网络得到,所述人脸增强网络包含人脸检测网络、颜色判别网络和纹理判别网络,所述第一样本增强图像数据中的人脸图像具有通过所述人脸检测网络所生成的人脸检测框、以及用于指示实际人脸位置的人脸标注框;
    所述根据所述样本增强人脸图像和所述高清图像数据中的人脸图像生成人脸损失函数,包括:
    根据所述人脸检测框和所述人脸标注框生成检测损失函数;
    对所述高清图像数据中的人脸图像进行抠取,得到高清人脸图像;
    根据所述高清人脸图像、所述样本增强人脸图像及所述颜色判别网络生成颜色损失函数;
    根据所述高清人脸图像、所述样本增强人脸图像及所述纹理判别网络生成纹理损失函数;
    根据所述样本增强人脸图像的特征图所包含的特征值元素和所述高清人脸图像的特征图所包含的特征值元素,生成内容损失函数;
    根据所述检测损失函数、所述颜色损失函数、所述纹理损失函数和所述内容损失函数生成所述人脸损失函数。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述样本锐化图像数据和所述高清图像数据生成锐化损失函数,包括:
    根据所述样本锐化图像数据和所述高清图像数据之间的峰值信噪比,生成质量损失函数;
    根据所述样本锐化图像数据和所述高清图像数据之间的感知相似度,生成感知损失函数;
    根据所述质量损失函数和所述感知损失函数生成所述锐化损失函数。
  6. 根据权利要求1所述的方法,其特征在于,所述获取样本图像对,包括:
    获取样本视频数据;
    对所述样本视频数据进行分帧,得到所述样本视频数据所包含的多个样本图像帧;
    采用目标码率对所述样本视频数据进行编解码处理,得到所述样本视频数据对应的低质视频数据;所述低质视频数据的图像帧画质低于所述样本视频数据的图像帧画质,所述低质视频数据包含每个样本图像帧分别对应的低质图像帧,所述目标码率低于码率阈值;
    根据所述每个样本图像帧与所对应的低质图像帧构建所述样本图像对。
  7. 根据权利要求1所述的方法,其特征在于,所述获取样本图像对,包括:
    获取样本视频数据;
    对所述样本视频数据进行分帧,得到所述样本视频数据所包含的多个样本图像帧;
    从所述多个样本图像帧中选取目标图像帧作为所述高清图像数据;
    对所述目标图像帧和所述多个样本图像帧中所述目标图像帧的相邻图像帧进行平均融合处理,得到所述低清图像数据。
  8. 根据权利要求1所述的方法,其特征在于,所述获取样本图像对,包括:
    获取所述高清图像数据;
    对所述高清图像数据进行高斯模糊处理,得到所述低清图像数据。
  9. 根据权利要求1所述的方法,其特征在于,所述获取样本图像对,包括:
    获取所述高清图像数据;
    对所述高清图像数据进行失真格式转换,得到所述低清图像数据。
  10. 根据权利要求1所述的方法,其特征在于,所述获取样本图像对,包括:
    获取所述高清图像数据;
    获取样本低清视频数据,并将所述样本低清视频数据输入噪声学习网络;所述样本低清视频数据的清晰度低于清晰度阈值;
    基于所述噪声学习网络学习所述样本低清视频数据的噪声数据;
    在所述高清图像数据中融合所述噪声数据,得到所述低清图像数据。
  11. 一种图像处理方法,所述方法由计算机设备执行,所述方法包括:
    调用训练好的图像处理网络获取初始图像数据对应的超分图像数据;所述超分图像数据的分辨率大于或等于目标分辨率;
    调用训练好的图像处理网络对所述超分图像数据进行画质增强处理,得到第一增强图像数据;
    调用训练好的图像处理网络获取所述第一增强图像数据对应的第二增强图像数据;若所述第一增强图像数据中包含人脸图像,则所述第二增强图像数据是对所述第一增强图像数据中的人脸图像进行人脸增强后所得到的图像数据;
    调用训练好的图像处理网络对所述第二增强图像数据进行图像锐化处理,得到锐化图像数据,并输出所述锐化图像数据;
    其中,所述训练好的图像处理网络采用上述权利要求1-10任一项所述的方法进行训练得到。
  12. 根据权利要求11所述的方法,其特征在于,训练好的图像处理网络包括超分网络;所述调用训练好的图像处理网络获取初始图像数据对应的超分图像数据,包括:
    获取所述初始图像数据;
    调用所述超分网络检测所述初始图像数据的分辨率;
    若所述初始图像数据的分辨率大于或等于所述目标分辨率,则将所述初始图像数据确定为所述超分图像数据;
    若所述初始图像数据的分辨率小于所述目标分辨率,则调用所述超分网络将所述初始图像数据的分辨率调整到所述目标分辨率,得到所述超分图像数据。
  13. 根据权利要求11所述的方法,其特征在于,训练好的图像处理网络包括人脸增强网络;所述调用训练好的图像处理网络获取所述第一增强图像数据对应的第二增强图像数据,包括:
    调用所述人脸增强网络对所述第一增强图像数据进行人脸检测;
    若所述第一增强图像数据不包含人脸图像,则将所述第一增强图像数据确定为所述第二增强图像数据;
    若所述第一增强图像数据包含人脸图像,则调用所述人脸增强网络对所述第一增强图像数据中的人脸图像进行人脸增强处理,得到所述第二增强图像数据。
  14. 根据权利要求13所述的方法,其特征在于,所述人脸增强网络包含人脸检测网络、人脸增强子网络和人脸融合网络;
    所述调用所述人脸增强网络对所述第一增强图像数据中的人脸图像进行人脸增强处 理,得到所述第二增强图像数据,包括:
    调用所述人脸检测网络对所述第一增强图像数据中的人脸图像进行抠取,得到抠取人脸图像;
    调用所述人脸增强子网络对所述抠取人脸图像进行人脸增强处理,得到增强人脸图像;
    调用所述人脸融合网络生成人脸融合掩膜;
    根据所述人脸融合掩膜对所述第一增强图像数据和所述增强人脸图像进行图像融合处理,得到所述第二增强图像数据。
  15. 根据权利要求11所述的方法,其特征在于,训练好的图像处理网络包括锐化网络;所述调用训练好的图像处理网络对所述第二增强图像数据进行图像锐化处理,得到锐化图像数据,包括:
    调用所述锐化网络提取所述第二增强图像数据中的高频图像信息;
    根据所述锐化网络生成针对所述第二增强图像数据的锐化掩膜,并根据所述锐化掩膜提取所述第二增强图像数据中的锐化图像信息;
    根据所述锐化网络预测针对所述高频图像信息的第一加权权重、针对所述锐化图像信息的第二加权权重以及针对所述第二增强图像数据的第三加权权重;
    根据所述第一加权权重、所述第二加权权重和所述第三加权权重对应对所述高频图像信息、所述锐化图像信息和所述第二增强图像数据进行加权求和,得到所述锐化图像数据。
  16. 根据权利要求11所述的方法,其特征在于,所述初始图像数据是对视频数据进行分帧所得到的多个图像帧中任一个图像帧;所述方法还包括:
    根据所述多个图像帧中每个图像帧分别对应的所述锐化图像数据,生成所述视频数据的优化视频数据;
    将所述优化视频数据推送给应用客户端,以使所述应用客户端输出所述优化视频数据。
  17. 一种图像处理网络的训练装置,所述装置包括:
    获取样本图像对;所述样本图像对包含低清图像数据和高清图像数据,所述低清图像数据与所述高清图像数据具有相同的图像内容;
    调用所述图像处理网络将所述低清图像数据的分辨率调整到目标分辨率,得到样本超分图像数据,并根据所述样本超分图像数据和所述高清图像数据生成超分损失函数;
    调用所述图像处理网络对所述样本超分图像数据进行画质增强处理,得到第一样本增强图像数据,并根据所述第一样本增强图像数据和所述高清图像数据生成画质损失函数;
    调用所述图像处理网络对所述第一样本增强图像数据中的人脸图像进行人脸增强处理,得到样本增强人脸图像,并将所述样本增强人脸图像与所述第一样本增强图像数据进行融合,得到第二样本增强图像数据,以及根据所述样本增强人脸图像和所述高清图像数据中的人脸图像生成人脸损失函数;
    调用所述图像处理网络对所述第二样本增强图像数据进行图像锐化处理,得到样本锐化图像数据,并根据所述样本锐化图像数据和所述高清图像数据生成锐化损失函数;
    根据所述超分损失函数、所述画质损失函数、所述人脸损失函数及所述锐化损失函数更新所述图像处理网络的网络参数,得到训练好的图像处理网络。
  18. 一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现权利要求1-16中任一项所述方法的步骤。
  19. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行权利要求1-16中任一项所述方法的步骤。
  20. 一种非易失性的计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令适用于由处理器加载并执行权利要求1-16中任一项所述的方法。
PCT/CN2022/117789 2021-10-12 2022-09-08 图像处理网络的训练方法、装置、计算机设备和存储介质 WO2023061116A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023570432A JP2024517359A (ja) 2021-10-12 2022-09-08 画像処理ネットワークの訓練方法、装置、コンピュータ機器及びコンピュータプログラム
EP22880058.7A EP4300411A1 (en) 2021-10-12 2022-09-08 Training method and apparatus for image processing network, computer device, and storage medium
US18/207,572 US20230334833A1 (en) 2021-10-12 2023-06-08 Training method and apparatus for image processing network, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111188444.9 2021-10-12
CN202111188444.9A CN113628116B (zh) 2021-10-12 2021-10-12 图像处理网络的训练方法、装置、计算机设备和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/207,572 Continuation US20230334833A1 (en) 2021-10-12 2023-06-08 Training method and apparatus for image processing network, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2023061116A1 true WO2023061116A1 (zh) 2023-04-20

Family

ID=78391165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/117789 WO2023061116A1 (zh) 2021-10-12 2022-09-08 图像处理网络的训练方法、装置、计算机设备和存储介质

Country Status (5)

Country Link
US (1) US20230334833A1 (zh)
EP (1) EP4300411A1 (zh)
JP (1) JP2024517359A (zh)
CN (1) CN113628116B (zh)
WO (1) WO2023061116A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628116B (zh) * 2021-10-12 2022-02-11 腾讯科技(深圳)有限公司 图像处理网络的训练方法、装置、计算机设备和存储介质
CN115147280B (zh) * 2022-07-15 2023-06-02 北京百度网讯科技有限公司 深度学习模型的训练方法、图像处理方法、装置和设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462010A (zh) * 2020-03-31 2020-07-28 腾讯科技(深圳)有限公司 图像处理模型的训练方法、图像处理方法、装置及设备
CN111598182A (zh) * 2020-05-22 2020-08-28 北京市商汤科技开发有限公司 训练神经网络及图像识别的方法、装置、设备及介质
US20200364830A1 (en) * 2019-05-16 2020-11-19 Here Global B.V. Method, apparatus, and system for task driven approaches to super resolution
CN112085681A (zh) * 2020-09-09 2020-12-15 苏州科达科技股份有限公司 基于深度学习的图像增强方法、系统、设备及存储介质
CN112598587A (zh) * 2020-12-16 2021-04-02 南京邮电大学 一种联合人脸去口罩和超分辨率的图像处理系统和方法
CN112927172A (zh) * 2021-05-10 2021-06-08 北京市商汤科技开发有限公司 图像处理网络的训练方法和装置、电子设备和存储介质
CN113628116A (zh) * 2021-10-12 2021-11-09 腾讯科技(深圳)有限公司 图像处理网络的训练方法、装置、计算机设备和存储介质
CN114511449A (zh) * 2020-11-16 2022-05-17 株式会社理光 图像增强方法、装置及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481188A (zh) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 一种图像超分辨率重构方法
CN108320267A (zh) * 2018-02-05 2018-07-24 电子科技大学 用于人脸图像的超分辨率处理方法
CN109615582B (zh) * 2018-11-30 2023-09-01 北京工业大学 一种基于属性描述生成对抗网络的人脸图像超分辨率重建方法
CN113034358A (zh) * 2019-12-09 2021-06-25 华为技术有限公司 一种超分辨率图像处理方法以及相关装置
CN111179177B (zh) * 2019-12-31 2024-03-26 深圳市联合视觉创新科技有限公司 图像重建模型训练方法、图像重建方法、设备及介质
CN111242846B (zh) * 2020-01-07 2022-03-22 福州大学 基于非局部增强网络的细粒度尺度图像超分辨率方法
CN112508782B (zh) * 2020-09-10 2024-04-26 浙江大华技术股份有限公司 网络模型的训练方法、人脸图像超分辨率重建方法及设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364830A1 (en) * 2019-05-16 2020-11-19 Here Global B.V. Method, apparatus, and system for task driven approaches to super resolution
CN111462010A (zh) * 2020-03-31 2020-07-28 腾讯科技(深圳)有限公司 图像处理模型的训练方法、图像处理方法、装置及设备
CN111598182A (zh) * 2020-05-22 2020-08-28 北京市商汤科技开发有限公司 训练神经网络及图像识别的方法、装置、设备及介质
CN112085681A (zh) * 2020-09-09 2020-12-15 苏州科达科技股份有限公司 基于深度学习的图像增强方法、系统、设备及存储介质
CN114511449A (zh) * 2020-11-16 2022-05-17 株式会社理光 图像增强方法、装置及计算机可读存储介质
CN112598587A (zh) * 2020-12-16 2021-04-02 南京邮电大学 一种联合人脸去口罩和超分辨率的图像处理系统和方法
CN112927172A (zh) * 2021-05-10 2021-06-08 北京市商汤科技开发有限公司 图像处理网络的训练方法和装置、电子设备和存储介质
CN113628116A (zh) * 2021-10-12 2021-11-09 腾讯科技(深圳)有限公司 图像处理网络的训练方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN113628116B (zh) 2022-02-11
JP2024517359A (ja) 2024-04-19
EP4300411A1 (en) 2024-01-03
CN113628116A (zh) 2021-11-09
US20230334833A1 (en) 2023-10-19

Similar Documents

Publication Publication Date Title
WO2023061116A1 (zh) 图像处理网络的训练方法、装置、计算机设备和存储介质
CN108022212B (zh) 高分辨率图片生成方法、生成装置及存储介质
CN109218727B (zh) 视频处理的方法和装置
US7400764B2 (en) Compression and decompression of media data
WO2020098422A1 (zh) 编码图案的处理方法和装置、存储介质、电子装置
CN110072119B (zh) 一种基于深度学习网络的内容感知视频自适应传输方法
US20080288857A1 (en) Sharing editable ink annotated images with annotation-unaware applications
WO2022111631A1 (zh) 视频传输方法、服务器、终端和视频传输系统
US20210150769A1 (en) High efficiency image and video compression and decompression
JP2017537403A (ja) 超解像画像を生成するための方法、装置およびコンピュータ・プログラム・プロダクト
WO2023005140A1 (zh) 视频数据处理方法、装置、设备以及存储介质
CN114339260A (zh) 图像处理方法及装置
CN113724136A (zh) 一种视频修复方法、设备及介质
US11854164B2 (en) Method for denoising omnidirectional videos and rectified videos
US20230053317A1 (en) Deep palette prediction
JP2024505766A (ja) エンドツーエンドウォーターマーキングシステム
US20240005563A1 (en) Encoders for Improved Image Dithering
US11928855B2 (en) Method, device, and computer program product for video processing
CN112911341B (zh) 图像处理方法、解码器网络训练方法、装置、设备和介质
CN111553961B (zh) 线稿对应色图的获取方法和装置、存储介质和电子装置
CN115375539A (zh) 图像分辨率增强、多帧图像超分辨率系统和方法
Pazzi et al. A novel image mosaicking technique for enlarging the field of view of images transmitted over wireless image sensor networks
CN114586056A (zh) 图像处理方法及装置、设备、视频处理方法及存储介质
CN114125442B (zh) 屏幕视频编码模式确定方法、编码方法、装置和计算设备
CN115376188B (zh) 一种视频通话处理方法、系统、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22880058

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022880058

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022880058

Country of ref document: EP

Effective date: 20230926

WWE Wipo information: entry into national phase

Ref document number: 2023570432

Country of ref document: JP