CN112906721A

CN112906721A - Image processing method, device, equipment and computer readable storage medium

Info

Publication number: CN112906721A
Application number: CN202110496630.2A
Authority: CN
Inventors: 陈艺云; 陈铭良
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-06-04
Anticipated expiration: 2041-05-07
Also published as: CN112906721B

Abstract

The application provides an image processing method, an image processing device, an image processing apparatus and a computer readable storage medium; the method comprises the following steps: acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain image features of the image to be processed; determining a control vector and a compression degree coefficient based on the image feature; determining a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and the model structure of the trained image enhancement sub-model; based on the image characteristics and the control vector, acquiring an image processing result of the trained image enhancement sub-model when the target exit stage is reached; and outputting the image processing result. By the method and the device, the exit stage of the image processing can be determined in a self-adaptive mode, and the image processing efficiency is improved.

Description

Image processing method, device, equipment and computer readable storage medium

Technical Field

The present application relates to image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a computer-readable storage medium.

Background

With the development of terminal technology, pixels of image acquisition equipment are higher and higher, so that the size of an image or video file acquired by the image acquisition equipment is increased. In order to save transmission bandwidth and storage space, the JPG compression technology is widely used. In the process of carrying out JPG compression processing on the image by a user, the image can be compressed to different degrees by adopting different compression ratios according to actual requirements such as network conditions and the like. Different compression rates may cause different degrees of blockiness, ringing, blurring, etc., resulting in different degrees of image quality degradation. At present, the image decompression effect enhancement algorithm based on deep learning obtains good effect, but the stable network still inevitably brings loss on a small amount of details while removing the compression effect. Because the model is trained by simply utilizing the large-range JPG compression coefficient picture to obtain a general network capable of processing different compression conditions, the network can only realize a deterministic mapping, so that the optimization cannot be achieved under all conditions, and the landing application benefit is too small; and the quality of the image is analyzed independently, the training efficiency of a plurality of networks is low for different compression conditions, the resource consumption is large, and the compression coefficient condition of the currently processed image is generally unknown.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium, which can determine the exit stage of image processing in a self-adaptive mode and improve the image processing efficiency.

The technical scheme of the embodiment of the application is realized as follows:

an embodiment of the present application provides an image processing method, including:

acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain image features of the image to be processed;

determining a control vector and a compression degree coefficient based on the image feature, wherein the control vector is used for adjusting the intensity of the image feature in the image processing process;

determining a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and the model structure of the trained image enhancement sub-model;

based on the image characteristics and the control vector, acquiring an image processing result of the trained image enhancement sub-model when the target exit stage is reached;

and outputting the image processing result.

An embodiment of the present application provides an image processing apparatus, including:

the first acquisition module is used for acquiring an image to be processed and extracting the characteristics of the image to be processed to obtain the image characteristics of the image to be processed;

a first determining module, configured to determine a control vector and a compression degree coefficient based on the image feature, where the control vector is used to adjust the intensity of the image feature in the image processing process;

the second determining module is used for determining a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and the model structure of the trained image enhancement sub-model;

the image processing module is used for acquiring an image processing result of the trained image enhancement sub-model when the target exit stage is reached based on the image characteristics and the control vector;

and the output module is used for outputting the image processing result.

In some embodiments, the apparatus further comprises:

a second obtaining module, configured to obtain a trained image processing model, where the trained image processing model includes: the method comprises the steps of training a feature extraction sub-model, a compression estimation sub-model and a trained image enhancement sub-model;

correspondingly, the first obtaining module is further configured to: extracting the characteristics of the image to be processed by utilizing the trained characteristic extraction submodel to obtain the image characteristics;

a first determination module further configured to: and inputting the image characteristics into the trained compression estimation sub-model to obtain the control vector and the compression degree coefficient.

In some embodiments, the first obtaining module is further configured to:

acquiring a first convolution kernel, a second convolution kernel and a third convolution kernel corresponding to a first convolution layer of the trained feature extraction sub-model, wherein the dimensions of the first convolution kernel, the second convolution kernel and the third convolution kernel are different from each other;

performing convolution processing on the image to be processed by respectively utilizing the first convolution kernel, the second convolution kernel and the third convolution kernel to obtain a first convolution result, a second convolution result and a third convolution result;

splicing the first convolution result, the second convolution result and the third convolution result to obtain a splicing result;

and performing convolution processing on the splicing result at least once again through a second convolution layer in the trained feature extraction submodel to obtain the image features of the image to be processed.

In some embodiments, the trained compressed estimation submodel includes at least a third convolutional layer, a pooling layer, and a fully-connected layer, the first determining module further configured to:

performing convolution processing on the image features by using the third convolution layer to obtain a fourth convolution result;

performing pooling treatment on the fourth convolution result by using the pooling layer to obtain a pooling result;

determining the pooling result as the control vector;

and carrying out full connection processing on the pooling result by using the full connection layer to obtain the compression degree coefficient.

In some embodiments, the second determining module is further configured to:

determining the total number of processing stages of the trained image enhancement submodel based on the model structure of the trained image enhancement submodel;

determining each compression threshold range corresponding to each processing stage based on the total number of the processing stages;

determining a target compression threshold range in which the compression degree coefficient is positioned from each compression threshold range;

and determining the processing stage corresponding to the target compression threshold range as a target exit stage of the trained image enhancement sub-model.

In some embodiments, the image processing module is further configured to:

determining a target image enhancement network structure corresponding to the target exit stage based on the model structure of the trained image enhancement sub-model and the target exit stage;

and inputting the image characteristics and the control vector to the target image enhancement network structure to obtain the image processing result.

In some embodiments, the image processing module is further configured to:

generating a weighting coefficient of the image feature in each channel by using the control vector;

adjusting the image characteristics based on the weighting coefficients to obtain adjusted image characteristics;

performing convolution processing on the adjusted image adjustment to obtain a fifth convolution result;

when the image enhancement processing is determined to be required to be continued, performing up-sampling and/or down-sampling on the fifth convolution result to obtain a corresponding up-sampling result and/or down-sampling result;

and determining the up-sampling result and/or the down-sampling result as an intermediate image feature, and generating the weighting coefficient of the intermediate image feature in each channel by using the control vector again until determining that the image enhancement processing is not required to be continued to obtain the image processing result.

In some embodiments, the apparatus further comprises:

a third obtaining module, configured to obtain a training data set, where the training data set includes a plurality of training images;

the fourth acquisition module is used for acquiring the compression coefficient labels of all the training images and the preset image processing model;

a fifth obtaining module, configured to obtain a first loss function of the feature extraction sub-model and the compression estimation sub-model in the preset image processing model, and obtain a second loss function corresponding to the image enhancement sub-model in the preset image processing model;

and the model training module is used for training the preset image processing model by utilizing the first loss function and the second loss function to obtain the trained image processing model.

In some embodiments, the third obtaining module is further configured to:

obtaining an original image set, wherein the original image set comprises a plurality of uncompressed original images;

compressing each original image to different degrees to obtain each compressed image;

performing at least one of cutting and rotating on each compressed image to obtain a plurality of processed images;

determining the respective compressed images and the processed plurality of images as a training image set.

In some embodiments, the apparatus further comprises:

the third determining module is used for determining a loss weight coefficient corresponding to each processing stage based on a compression coefficient label of the training image, the total number of the processing stages of the image enhancement sub-model and a preset hyper parameter;

and the fourth determining module is used for determining a second loss function by using the loss weight coefficient corresponding to each processing stage.

In some embodiments, the model training module is further configured to:

pre-training the feature extraction submodel and the compression estimation submodel by using the first loss function to obtain a pre-trained feature extraction submodel and a pre-trained compression estimation submodel;

constructing a joint loss function by using a preset first weight, a preset second weight, the first loss function and the second loss function;

and training the pre-trained feature extraction submodel, the compression estimation submodel and the image enhancement submodel by using the joint loss function to obtain the trained image processing model.

a memory for storing executable instructions;

and the processor is used for realizing the method provided by the embodiment of the application when executing the executable instructions stored in the memory.

Embodiments of the present application provide a computer-readable storage medium, which stores executable instructions for causing a processor to implement the method provided by the embodiments of the present application when the processor executes the executable instructions.

Embodiments of the present application provide a computer program product, or computer program, comprising computer instructions, the computer instructions being stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method described in the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

after an image to be processed is obtained, firstly, feature extraction is carried out on the image to be processed to obtain image features of the image to be processed, then a control vector and a compression degree coefficient are determined based on the image features, further, a target exit stage of the trained image enhancement sub-model is dynamically determined based on the compression degree coefficient and a model structure of the trained image enhancement sub-model, in the image enhancement process, an image processing result of the trained image enhancement sub-model when the target exit stage is reached is obtained based on the image features and the control vector, and the image processing result is output. And for the image to be processed with serious compression, deeper convolution processing is adopted to improve the restoration effect, so that the loss caused by the decompression effect and detail blurring is balanced in a self-adaptive mode, the consumption of computing resources can be saved, and the image processing efficiency is improved.

Drawings

FIG. 1A is a schematic diagram of a network model for image processing based on RBQE in the related art;

FIG. 1B is a schematic diagram of a network model for image processing by a CResMD in the related art;

fig. 2A is a schematic network structure diagram of an image processing system according to an embodiment of the present application;

FIG. 2B is a schematic diagram of an embodiment of a block chain system of an image processing system;

FIG. 2C is an alternative Block Structure (Block Structure) provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic flowchart of an implementation of an image processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a training process of an image processing model according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another implementation of the image processing method according to the embodiment of the present application;

fig. 7 is a schematic flowchart of another implementation of the image processing method according to the embodiment of the present application;

fig. 8 is a schematic overall network framework diagram of an image processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a common feature extraction module according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a compression condition estimation module according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a U-type network of stage 2 of an image quality enhancement module provided in an embodiment of the present application;

fig. 12 is a schematic diagram of dynamically adjusting the intensity of a feature map through a feature map adjusting structure according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Upsampling (UpSample), refers to a process of restoring the resolution of the feature map to the resolution size of the original picture. After the input image is subjected to feature extraction through a convolutional neural network, the size of an output feature graph is usually reduced, the image needs to be restored to the original size for subsequent calculation sometimes, and the up-sampling operation of the mapping from the small resolution to the large resolution is realized by adopting the expansion of the image size.

2) Pooling (Pooling), which is to perform down-sampling on an image at the early stage of not losing image features as much as possible, reduces feature vectors output by a convolutional layer, and improves the result (overfitting is not easy to occur). The most commonly used are max pooling and average pooling; the maximum pooling is the maximum value in the same pooling structure, and the average pooling is the average value in the same pooling structure; the pooling operation can reduce the size of spatial information and improve the operation efficiency, which also means reducing parameters and reducing the risk of overfitting (overfitt).

3) The Relu function is a piecewise linear function, all negative values are changed into 0, and positive values are unchanged, so that a feature graph becomes sparse, overfitting can be prevented to a certain extent, and feature extraction is facilitated.

4) The Loss Function (Loss Function), also called cost Function (cost Function), is a Function that maps the value of a random event or its related random variables to non-negative real numbers to represent the "risk" or "Loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. Parameter estimation, which is used for models in statistics and machine learning, for example, is an optimization goal of machine learning models.

5) Feature maps (Feature maps), Feature maps obtained by convolving an image with a filter. The Feature map may be convolved with a filter to generate a new Feature map. May refer to the output of convolutional layers inside a convolutional network.

6) And a convolution kernel, wherein when an image is processed, given an input image, each corresponding pixel in an output image is formed after weighted averaging of pixels in a small area in the input image, wherein a weight value is defined by a function, and the function is called as the convolution kernel.

In order to better understand the image processing method provided by the embodiment of the present application, an image processing method for image enhancement and the existing disadvantages in the related art are described herein. Image processing can be performed in at least two ways in the related art to achieve image enhancement: Resource-Efficient Blind Quality Enhancement (RBQE) and controlled Residual Learning algorithm for Image Restoration (crossmd), both are deep Learning based techniques.

As shown in fig. 1A, RBQE performs Image Quality evaluation using an Image Quality evaluation Module (IQAM), and determines whether the current sample can be exited from this stage by comparing with a set Quality score threshold, thereby implementing the exit stage determination of test Image dynamics for different compression conditions.

As shown in fig. 1B, the cressmd generates condition coefficients based on artificial estimation of the degree of image compression, and generates residual connection weight coefficients via all connection layers, thereby implementing different degrees of processing on pictures with different degrees of compression.

RBQE utilizes an image quality evaluation module to dynamically allocate computing resources to different test images, which has two drawbacks: the traditional image quality evaluation module is adopted, and the corresponding algorithm is not coupled into the image quality enhancement network for training, so that the accurate evaluation result of the image processed by the network cannot be ensured, and the effect improvement of the image quality enhancement network is not directly helped; in addition, only adjustment strategies of different exit stages are adopted, so that the coarse granularity adjustment of K stages can be realized for the processing of the compressed image, and if the fine granularity adjustment is performed by increasing the number of stages, the increase of the calculation amount is inevitably caused by the fact that more quality evaluation modules are needed for complex samples. These two drawbacks limit its application in real scenes.

The CResMD is adjusted by the compression condition coefficient estimated by manual experience, and due to different user evaluation standards, a more appropriate condition coefficient is difficult to obtain at one time, and the calculated amount of simple samples is greatly increased by adopting a network with the same depth for all samples, and when the number of convolution layers reaches a certain number, the image quality of the simple samples is not obviously improved.

Based on this, embodiments of the present application provide an image processing method, an image processing apparatus, an image processing device, and a computer-readable storage medium, which can perform dynamic adjustment and select a UNet model in an exit stage according to a compression condition of a picture itself, and adaptively restore compression effects of different degrees. An exemplary application of the image processing apparatus provided in the embodiments of the present application is described below, and the apparatus provided in the embodiments of the present application may be implemented as various types of user terminals such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, and a portable game device), and may also be implemented as a server. In the following, an exemplary application will be explained when the device is implemented as a terminal.

Referring to fig. 2A, fig. 2A is a schematic diagram of a network architecture of an image processing system according to an embodiment of the present application, and as shown in fig. 2A, the image processing system 100 includes: server 200, network 300 and terminal 400. To support an exemplary Application, the terminal 400 is connected to the server 200 through the network 300, the terminal 400 may be a smart terminal, various applications (apps) may be installed on the smart terminal, for example, a video watching App, an instant messaging App, a shopping App, an image capturing App, and the like, the network 300 may be a wide area network or a local area network, or a combination thereof, and data transmission is achieved by using a wireless link.

When a user views a video through the terminal 400 or views a picture on a web page, the terminal 400 may request to acquire the video or the picture from the server 200. The image processing method provided by the embodiment of the application can be integrated in a gallery App or a video viewing App of a terminal as a functional plug-in, and if the terminal 400 starts the image processing function, the terminal 400 can process the image or video acquired from the server 200 by using the image processing method provided by the embodiment of the application, obtain the processed image or video, and present the processed image or video in a display interface of the terminal 400.

Based on the network architecture shown in fig. 2A, there may be another implementation manner, when a user watches a video through a video App in a terminal 400, the user sends a video acquisition request to the server 200, the server acquires corresponding video stream data after receiving the video acquisition request, and acquires a network connection state of the terminal, and when it is determined that the terminal is in a WiFi connection state and the video stream data is data after compression processing, determines to perform image processing on each video frame image in the video stream data, so as to perform restoration enhancement on each video frame image, and sends the processed video frame to the terminal 400. Therefore, the restoration enhancement of the video frame image is finished at the server side without image processing by the terminal, the processing complexity of the terminal can be reduced, and the watching experience of the video is improved.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present invention.

The image processing system 100 according To the embodiment of the present application may also be a distributed system 201 of a blockchain system, referring To fig. 2B, where fig. 2B is a schematic structural diagram of an application of the image processing system provided by the embodiment of the present application To the blockchain system, where the distributed system 201 may be a distributed node formed by a plurality of nodes 202 (any form of computing devices in an access network, such as servers and user terminals) and a client 203, a Peer-To-Peer (P2P, Peer-To-Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

It should be noted that, in the distributed system 201, the node 202 may be a terminal or a server.

Referring to the functions of each node in the blockchain system shown in fig. 2B, the functions related to each node in the blockchain system will be described in detail as follows:

1) routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully. For example, the services implemented by the application include: 2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the electronic money remaining in the electronic money address. And 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations. 2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

4) Consensus (Consensus), a process in a blockchain network, is used to agree on transactions in a block among a plurality of nodes involved, the agreed block is to be appended to the end of the blockchain, and the mechanisms for achieving Consensus include Proof of workload (PoW, Proof of Work), Proof of rights and interests (PoS, Proof of equity (DPoS), Proof of granted of shares (DPoS), Proof of Elapsed Time (PoET, Proof of Elapsed Time), and so on.

Referring to fig. 2C, fig. 2C is an optional schematic diagram of a Block Structure (Block Structure) provided in this embodiment, each Block includes a hash value of a transaction record (hash value of the Block) stored in the Block and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a terminal 400 according to an embodiment of the present application, where the terminal 400 shown in fig. 3 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in FIG. 3.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;

an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates an image processing apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: a first acquiring module 4551, a first determining module 4552, a second determining module 4553, an image processing module 4554, and an output module 4555, which are logical and thus may be arbitrarily combined or further divided according to the functions implemented.

The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the image processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In order to better understand the method provided by the embodiment of the present application, artificial intelligence, each branch of artificial intelligence, and an application field, a cloud technology, and an artificial intelligence cloud service related to the method provided by the embodiment of the present application are explained first.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The directions will be described below.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.

Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

The so-called artificial intelligence cloud Service is also generally called AI as a Service (AI as a Service), and is a Service method of an artificial intelligence platform that is mainstream at present, specifically, the AI as a platform splits several types of common AI services and provides independent or packaged services at a cloud end. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface) interface, and part of the qualified developers can also use the AI framework and AI infrastructure provided by the platform to deploy and operate and maintain own dedicated cloud artificial intelligence services.

The scheme provided by the embodiment of the application relates to the computer vision technology, machine learning, artificial intelligence cloud service and other technologies of artificial intelligence, and is specifically explained by the following embodiments.

The image processing method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the terminal provided by the embodiment of the present application.

The embodiment of the application provides an image processing method, which is applied to an image processing device, wherein the image processing device can be a terminal in fig. 2A or a server in fig. 2A. Fig. 4 is a schematic flow chart of an implementation of the image processing method according to the embodiment of the present application, and steps of the image processing method according to the embodiment of the present application will be described with reference to fig. 4.

Step S101, obtaining an image to be processed, and performing feature extraction on the image to be processed to obtain image features of the image to be processed.

The image to be processed can be a compressed image, a compressed color image or a compressed gray-scale image. And the image to be processed can be an independent image or a video frame image decoded from video stream data. Acquiring an image to be processed, wherein the image to be processed can be acquired from a local storage space of the image processing equipment when the image processing equipment is implemented; when the image processing device is a terminal and the terminal is provided with an image acquisition device (such as a camera), the image to be processed can be acquired by the image acquisition device; in some embodiments, the image to be processed may also be acquired from a server by the terminal sending an image acquisition request or a video acquisition request to the server.

Since the image is also a series of pixel points after being read by the computer, when the image to be recommended is a color image of 960 × 720, the data dimension of the input prediction model is 960 × 720 × 3, 3 represents three color gamut channels of red, green and blue, and the value range of each value in the input data is (0-255).

The image feature of the image to be recommended may be extracted through a backbone network (backbone) in a trained image processing model, and the image feature is a vector with a dimension of 960 × 720 × 1, which is composed of floating point (float) type numbers.

In step S102, a control vector and a compression degree coefficient are determined based on the image feature.

In the embodiment of the application, the control vector is used for adjusting the intensity of the image features in the image processing process, and the compression degree coefficient is used for determining the target exit stage.

And S103, determining a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and the model structure of the trained image enhancement sub-model.

In the implementation of step S103, the total number of processing stages of the image enhancement submodel may be determined based on the model structure of the image enhancement submodel, so as to determine the compression threshold range corresponding to each processing stage, and when the compression degree coefficient falls within the compression threshold range corresponding to which processing stage, the processing stage is determined as the target exit stage. The model structure used for each processing stage of the image enhancement submodel is different, and the higher the number of processing stages, the more complex the model structure used.

Step S104, based on the image characteristics and the control vector, obtaining an image processing result of the trained image enhancement sub-model when the target exit stage is reached;

in the practical application process, the channel adjusting weights of all characteristic graphs before convolution operation in the image quality enhancement module can be generated based on the control vector, so that the image enhancement module can perform self-adaptive adjustment according to the current image to be processed, and more precise network regulation and control and dynamic network output adjustment are realized.

When the method is realized, the image characteristics and the control vectors are input into an image enhancement sub-model to carry out image enhancement processing, before the image characteristics are subjected to convolution operation, the intensity of the image characteristics is adjusted through the adjusting weight generated by the control vectors, and an image processing result when the target exit stage is reached is obtained. The image processing result is an image subjected to image restoration enhancement.

Step S105, the image processing result is output.

When step S105 is implemented by a terminal, the output image processing result may be that the image processing result is presented on a display interface of the terminal, and when step S105 is implemented by a server, the output image processing result may be that the server transmits the image processing result to the terminal. And after receiving the image processing result, the terminal presents the image processing result in a display interface.

In the image processing method provided in the embodiment of the present application, after an image to be processed is obtained, feature extraction is performed on the image to be processed first to obtain image features of the image to be processed, then a control vector and a compression degree coefficient are determined based on the image features, and further a target exit stage of a trained image enhancement sub-model is dynamically determined based on the compression degree coefficient and a model structure of the trained image enhancement sub-model, and in an image enhancement process, an image processing result of the trained image enhancement sub-model when the image enhancement sub-model reaches the target exit stage is obtained based on the image features and the control vector, and the image processing result is output The detail brought by the method is fuzzy, and deeper convolution processing is adopted for the image to be processed with serious compression to improve the restoration effect, so that the loss caused by the decompression effect and the detail blurring is balanced in a self-adaptive mode, the consumption of computing resources can be saved, and the image processing efficiency is improved.

In some embodiments, before step S101, a trained image processing model may also be obtained by performing the following steps:

and S001, acquiring the trained image processing model.

The trained image processing model comprises: a feature extraction sub-model, a compression estimation sub-model and an image enhancement sub-model.

Correspondingly, in the step S101, "extracting the feature of the image to be processed to obtain the image feature of the image to be processed" may be implemented as follows: and performing feature extraction on the image to be processed by using the trained feature extraction submodel to obtain image features, wherein the step can be further realized by the following steps:

step S1011, a first convolution kernel, a second convolution kernel and a third convolution kernel corresponding to the first convolution layer of the trained feature extraction submodel are obtained.

Wherein dimensions of the first convolution kernel, the second convolution kernel, and the third convolution kernel are different from each other. The first convolution kernel may be a row vector, the second convolution kernel may be a column vector, and the third convolution kernel may be a matrix, e.g., the first convolution kernel has a dimension of 1 x 5, the second convolution kernel has a dimension of 5 x 1, and the third convolution kernel has a dimension of 3 x 3.

Step S1012, performing convolution processing on the image to be processed by using the first convolution kernel, the second convolution kernel, and the third convolution kernel, respectively, to obtain a first convolution result, a second convolution result, and a third convolution result.

Because compressed pictures generally have obvious blocky effects, which are mainly reflected in obvious abrupt changes in the horizontal and vertical directions, in the embodiment of the application, convolution kernels with three different dimensions (row vectors, column vectors and matrixes) are used for performing convolution processing on images to be processed respectively, so that corresponding convolution results are obtained, and rich convolution results can be obtained.

Step S1013, the first convolution result, the second convolution result, and the third convolution result are subjected to a stitching process to obtain a stitching result.

And step S1014, performing at least one convolution processing on the splicing result through the second convolution layer in the trained feature extraction submodel to obtain the image features of the image to be processed.

The trained feature extraction submodel may include one second convolution layer, or may include a plurality of second convolution layers, and may perform convolution processing on each second convolution layer, for example, there may be 3 second convolution layers in the feature extraction submodel, and convolution processing is performed using convolution kernels of 3 × 3, 5 × 5, and 3 × 3, respectively, to obtain the image features of the final image to be processed.

In the above steps S1011 to S1014, in the feature extraction submodel, convolution is performed in the first convolution layer by using multiple convolution sums with different dimensions, that is, convolution is performed in the first convolution layer by using horizontal and vertical line convolution sums 3 × 3, and convolution results obtained by different convolution kernels are spliced to obtain rich shallow features, so as to improve the subsequent image processing effect.

In some embodiments, the step S102 "determining a control vector and a compression degree coefficient based on the image feature" may be implemented by inputting the image feature into the trained compression estimation sub-model, and obtaining the control vector and the compression degree coefficient. The trained compression estimation submodel at least comprises a third convolution layer, a pooling layer and a full-link layer, and further, the step of inputting the image characteristics into the trained compression estimation submodel to obtain the control vector and the compression degree coefficient can be realized by the following steps:

step S1021, performing convolution processing on the image feature by using the third convolution layer to obtain a fourth convolution result.

In implementation, the image features may be convolved with a convolution kernel of 3 × 3 at the third convolution layer, so as to obtain a fourth convolution result.

In step S1022, pooling is performed on the fourth convolution result by using the pooling layer, so as to obtain a pooled result.

The pooling treatment may be an average pooling treatment or a maximum pooling treatment.

In step S1023, the pooling result is determined as the control vector.

In the embodiment of the present application, considering that the information included in the compression degree coefficient is limited, the feature vector before the compression degree coefficient is regressed is used as a control vector for regulating and controlling the image quality enhancement module, that is, the pooling result after the pooling process is determined as the control vector.

Step S1024, using the full link layer to perform full link processing on the pooled result to obtain the compression degree coefficient.

In the above steps S1021 to S1024, the compression degree coefficient and the control vector for generating the image feature strength adjustment weight are determined by the compression estimation submodel and the image feature preceding the image enhancement submodel, so as to provide the necessary data base for the subsequent determination target exit stage and the image enhancement processing procedure.

In some embodiments, the step S103 "determining the target exit stage of the trained image enhancement submodel based on the compression degree coefficient and the model structure of the trained image enhancement submodel" may be implemented by:

and step S1031, determining the total number of processing stages of the trained image enhancement submodel based on the model structure of the trained image enhancement submodel.

Here, the image enhancer model may be a densely connected U-type network structure, the network model structure used for each processing stage is different, and the higher the number of processing stages, the more complicated the model structure. The total number of processing stages of the image enhancement submodel can be determined by determining the model structure of the image enhancement submodel, for example, in the embodiment of the present application, the total number of processing stages is 5.

In step S1032, each compression threshold range corresponding to each processing stage is determined based on the total number of processing stages.

In implementation, the compression threshold range corresponding to the ith processing node may be ((i-1)/K, i/K ], where K is the total number of processing stages, and assuming that K is 5, then the compression threshold range of the 2 nd processing stage is (0.2, 0.4).

In step S1033, a target compression threshold range in which the compression degree coefficient is located is determined from the respective compression threshold ranges.

In the embodiment of the present application, the compression degree coefficient is a real number between 0 and 1, and step S1033 may be implemented by determining a compression threshold range in which the compression degree coefficient falls, and determining the compression threshold range as the target compression threshold range. For example, the image enhancer model has 5 processing stages, the compression threshold range corresponding to the first stage is (0, 0.2), the compression threshold range corresponding to the second processing stage is (0.2, 0.4), the compression threshold range corresponding to the third processing stage is (0.4, 0.6), the compression threshold range corresponding to the fourth processing stage is (0.6, 0.8), the compression threshold range corresponding to the fifth processing stage is (0.8, 1), and the compression coefficient determined by the compression estimation sub-module is 0.48, so that the compression coefficient is within the compression threshold range of (0.4, 0.6), that is, the target compression threshold range.

Step S1034, determining the processing stage corresponding to the target compression threshold range as the target exit stage of the trained image enhancement sub-model.

In connection with the above example, the processing stage corresponding to the target compression threshold range (0.4, 0.6) is the third processing stage, so the third processing stage is the target exit stage.

Through the steps S1031 to S1034, the target exit stage can be determined based on the compression degree coefficient determined by the compression estimation submodel and the direct coarse granularity, so that extra calculation amount caused by multiple quality evaluations is avoided.

In some embodiments, the step S104 "obtaining the image processing result of the trained image enhancement sub-model when the target exit stage is reached based on the image feature and the control vector" shown in fig. 4 can be implemented by:

step S1041, determining a target image enhancement network structure corresponding to the target exit stage based on the model structure of the trained image enhancement sub-model and the target exit stage.

In the embodiment of the present application, the network structure used in each processing stage is different, and the higher the number of processing stages is, the more complex the network structure is, in an actual implementation process, the image enhancement network structure in the first processing stage includes one convolution layer, and the image enhancement network structure in the next stage is based on the image enhancement network structure in the previous stage and the new convolution layer, down-sampling layer, and up-sampling layer. The image enhancement network structure corresponding to each processing stage is determined, so that after the target exit stage is determined, the target image enhancement network structure corresponding to the target exit stage can be determined based on the model structure of the image enhancement sub-model.

Step S1042, inputting the image feature and the control vector to the target image enhancement network structure to obtain the image processing result.

In practical application, step S1042 may be implemented by the following steps:

in step S10421, a weighting coefficient of the image feature in each channel is generated using the control vector.

When the method is implemented, the control vector can be input into an adjusting module which is composed of a full connection layer and an activation function, and the weighting coefficients of the image features in each channel are generated.

Step S10422, adjusting the image feature based on the weighting coefficient to obtain an adjusted image feature.

In the feature extraction process, image features of multiple channels (which may also be referred to as feature maps in some embodiments) are obtained based on the number of convolution kernels, so the image features in this embodiment may refer to image features of multiple channels, and when step S10422 is implemented, the weighting system of each channel is multiplied by the image features of the corresponding channel to adjust the intensity of the image features, so as to obtain adjusted image features.

Step S10423, performing convolution processing on the adjusted image feature to obtain a fifth convolution result;

in step S10424, it is determined whether or not the image enhancement processing needs to be continued.

Judging whether enhancement processing needs to be continued, wherein if the next convolutional layer exists in the target image enhancement network structure, the target exit stage is not yet reached, that is, the image enhancement processing needs to be continued, and then the step S10425 is entered; when the target image enhancement network structure does not have the next convolution layer, it indicates that the target exit stage is reached, i.e., it is not certain that the image enhancement processing needs to be continued, and then the process proceeds to step S10427.

Step S10425, performing up-sampling and/or down-sampling on the fifth convolution result to obtain a corresponding up-sampling result and/or down-sampling result;

in step S10426, the up-sampling result and/or the down-sampling result is determined as the intermediate image feature.

After the step S10426, the steps S10421 to S10424 are repeatedly executed, the control vector is used again to generate the weighting coefficients of the intermediate image features in each channel, the weighting coefficients of the intermediate image features in each channel are used to adjust each intermediate image feature to obtain the adjusted intermediate image feature, the adjusted intermediate image feature is convolved to obtain a sixth convolution result, and whether the image enhancement processing needs to be continued or not is judged until it is determined that the image enhancement processing does not need to be continued to obtain the image processing result.

In step S10427, the fifth convolution result is determined as an image processing result.

In the above steps S10421 to S10427, the target image enhancement network structure is dynamically determined by the estimated compression degree coefficient of the image to be processed, so that the loss caused by the decompression effect and the detail blurring is balanced in an adaptive manner in the image enhancement process, and the calculation amount can be reduced and the image processing effect can be ensured.

In some embodiments, before step S001, a trained image processing model is further obtained through steps S301 to S304 shown in fig. 5:

step S301, a training data set is acquired.

The training data set includes a plurality of training images, which are compressed images. In some embodiments, step S301 may be implemented by:

in step S3011, an original image set is acquired.

Wherein, the original image set comprises a plurality of original images which are not compressed. In an implementation, the original image set may be obtained from the public DIV2K dataset.

Step S3012, performs compression processing on each original image to different degrees, and obtains each compressed image.

When implemented, Python Imaging Library is utilized to generate compressed images with compression percentages ranging from one percent to eighty percent.

In step S3013, the compressed images are subjected to at least one of cropping and rotation to obtain a plurality of processed images.

In step S3013, the amount of training data can be increased significantly, thereby enhancing the data.

In step S3014, the compressed images and the processed images are determined as a training image set.

Step S302, obtaining a compression coefficient label and a preset image processing model of each training image.

The method comprises the steps of obtaining a compression coefficient label of a training image, obtaining the compression percentage of the training image during implementation, and mapping the compression percentage to a value range of [1, 0] through linear transformation to obtain the compression coefficient label, wherein 1 represents the most serious compression degree. The restored image is a clear image without compression. The preset image processing model comprises a feature extraction sub-model, a compression estimation sub-model and an image enhancement sub-model.

Step S303, obtaining a first loss function of the feature extraction submodel and the compression estimation submodel in the preset image processing model, and obtaining a second loss function corresponding to the image enhancement submodel in the preset image processing model.

The first loss function may be a function for minimizing the compression coefficient estimation loss as an optimization target, and may be an L1 loss function when implemented, and the second loss function may be a multi-stage restoration loss function.

In some embodiments, prior to step S303, the second loss function may be determined by:

step S211, determining a loss weight coefficient corresponding to each processing stage based on the compression coefficient label of the training image, the total number of processing stages of the image enhancement submodel, and a preset hyper parameter.

In order to enable the image processing model to exit simple samples at an early stage, in the embodiment of the present application, the restoration loss of each processing stage is supervised by using the actual compression coefficient. The early stage loss function weight of the simple sample is high, the later stage loss function weight is low, and the reverse is true for the complex sample. In practical implementation, the loss weight coefficients corresponding to the processing stages can be determined by the following formula:

（1-2）；

wherein, W_iIs the restoration loss weight coefficient of the i-th stage, and a and b are W_iIntercept and slope of (a) for ensuring W_iFor positive numbers, b is used to determine the difference between the weights of the different phases, and in the embodiment of the present application, a may be set to 10 and b may be set to 9. K is the total number of stages.

After the restoration loss weight coefficients of the processing stages are determined through the formula (1-2), the restoration loss weight coefficients of the processing stages can be normalized through the formula (1-3):

（1-3）；

in step S212, a second loss function is determined by using the loss weight coefficients corresponding to the respective processing stages.

In practical implementation, the second loss function may be determined by equations (1-4):

（1-4）；

wherein, W_iOutput is the restoration loss weight coefficient obtained by the formula (1-3)_iIs output for the ith stageTo restore the image, I_GTIs a clear image without compression.

Step S304, training the preset image processing model by using the first loss function and the second loss function to obtain a trained image processing model.

In some embodiments, step S304 may be implemented by:

step S3041, pre-training the feature extraction submodel and the compression estimation submodel by using the first loss function, so as to obtain a pre-trained feature extraction submodel and a pre-trained compression estimation submodel.

In step S3041, the training image may be input into the feature extraction submodel to obtain training image features, the training image features are input into the compression estimation submodel to obtain a predicted compression degree coefficient, and the feature extraction submodel and the compression estimation submodel are input into the feature extraction submodel and the compression estimation submodel based on a difference between the compression degree coefficient and the compression coefficient label to perform back propagation training on the feature extraction submodel and the compression estimation submodel until a pre-trained feature extraction submodel and a compression estimation submodel are obtained.

Step S3042, a joint loss function is constructed by using the preset first weight, the preset second weight, the preset first loss function, and the preset second loss function.

Since the feature extraction submodel and the compression estimation submodel have completed the pre-training process through step S3042, in this step, the first weight value is greater than the second weight value, for example, the first weight value may be 0.97, and the second weight value may be 0.03. In some embodiments, it is also possible that the first weight is 1 and the second weight is 0.01.

Step S3043, training the pre-trained feature extraction submodel, the compression estimation submodel, and the image enhancement submodel by using the joint loss function, so as to obtain the trained image processing model.

Here, the trained image processing model includes a trained feature extraction sub-model, a trained compression estimation sub-model, and a trained image enhancement sub-model.

The trained image processing model is obtained through steps S301 to S304. It should be noted that the training process of the image processing model is generally implemented by a server, and if the image processing process is implemented by a terminal, the terminal may obtain the trained image processing model from the server in advance. Of course, it can be implemented by the terminal if the computing power of the terminal is sufficiently powerful or the processor usage is low, and the image processing process is performed by the terminal.

Based on the foregoing embodiments, an image processing method is further provided in an embodiment of the present application, and is applied to the network architecture shown in fig. 2A, fig. 6 is a schematic diagram of a further implementation flow of the image processing method provided in the embodiment of the present application, and as shown in fig. 6, the flow includes:

in step S401, the terminal sends a video acquisition request to the server in response to an operation instruction for viewing a video.

The video acquisition request carries a video identifier, and the video acquisition request may also carry a network connection status of the terminal.

Step S402, the server receives the video acquisition request and acquires video data corresponding to the video identification.

The video identifier may be a video name, and the server acquires video data corresponding to the video identifier from its own storage space based on the video identifier.

In step S403, the server acquires the data compression rate of the video data and the network connection state of the terminal.

Step S404, when the server determines that the terminal is in the wireless local area network connection state and the data compression rate is greater than the processing threshold, the server decodes the video stream data to obtain each video frame image.

In step S405, the server determines each video frame image as an image to be processed.

In the embodiment of the present application, when the data compression rate is greater than the processing threshold, it indicates that video compression is relatively serious, which may result in poor viewing experience for a user, and if the terminal is in a wireless local area network connection state, it indicates that transmitting a relatively clear video file for the user does not increase the mobile traffic of the user, so that it is determined that each video frame image in the video data is subjected to restoration enhancement processing, that is, each video frame image is determined as an image to be processed.

In step S406, the server obtains the trained image processing model.

The trained image processing model comprises: the method comprises the following steps of training a feature extraction sub-model, a compression estimation sub-model and a training image enhancement sub-model. The feature extraction submodel is used for extracting features, the compression estimation submodel is used for determining a compression degree coefficient and a control vector, and the image enhancement submodel is used for carrying out image restoration enhancement.

And step S407, the server extracts the features of the image to be processed by using the trained feature extraction submodel to obtain the image features.

Step S408, the server inputs the image characteristics into the trained compression estimation submodel to obtain a control vector and a compression degree coefficient.

Step S409, the server determines a target image enhancement network structure corresponding to the target exit stage based on the model structure of the trained image enhancement sub-model and the target exit stage.

Step S410, the server inputs the image feature and the control vector to the target image enhancement network structure to obtain the image processing result.

It should be noted that, for the steps or terms used in the embodiments of the present application, those in the embodiments are referred to in the description of the other embodiments.

And step S411, the server carries out video coding on each image processing result to obtain target video stream data.

In step S412, the server transmits the target video stream data to the terminal.

In step S413, the terminal receives the target video stream data and plays the target video stream data on its own display interface.

When the step S412 is implemented, the video player of the terminal may decode the target video stream data to obtain each processed video frame image, and play the processed video frame image, where the processed video frame image is an image subjected to restoration and enhancement, and is clearer, so that the viewing experience is better.

In the image processing method provided by the application, when a terminal carries out live video viewing or video on demand viewing through a server and sends a video acquisition request to the server, the server analyzes the video stream data to obtain each video frame image and carries out restoration enhancement processing on each video frame image in the video stream data when determining that the restoration enhancement needs to be carried out on each video frame image in the video stream data based on the data compression rate of the video stream data and the network environment where the terminal is located after acquiring the video stream data, in the processing process, after the image characteristics of the video frame images are extracted, a control vector and a compression degree coefficient are determined based on the image characteristics, a target exit stage of the image processing is determined by using the compression degree coefficient, and a target image enhancement network structure corresponding to the target exit stage is further determined, in practical implementation, the target image enhancement network structure corresponding to the image with low compression rate is simpler than the target image enhancement network structure corresponding to the image with high compression rate, the calculated amount is lower, the image processing effect can be ensured, the calculation complexity can also be reduced, in addition, in the embodiment of the application, the weighting coefficients of all channels of the network intermediate characteristic diagram are generated through the control vector, more precise network regulation and control are realized, the network output is dynamically regulated, and the image processing quality can be improved.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

In the image processing method provided in the embodiment of the present application, the timing for exiting the UNet image enhancement submodel is dynamically adjusted based on the compression condition estimation, and fig. 7 is a schematic diagram of a further implementation flow of the image processing method provided in the embodiment of the present application, as shown in fig. 7, the flow includes:

in step S701, a picture is input.

Step S702, extracting picture characteristics.

Step S703, performs compression condition estimation to obtain a control vector and a compression degree coefficient.

When the method is implemented, the compression condition estimation network can be used for obtaining the control vector and the compression degree coefficient.

Step S704, the control vector is used to generate the channel adjusting weight of each feature map before the convolution operation in the image quality enhancement module.

Step S705, determining the timing of exiting the stage by using the compression degree coefficient, and outputting the picture.

In the embodiment where steps S701 to S705 are included, a sub-network is added to the picture quality enhancement main network for estimating the compression condition of the image, and the compression estimation coefficient is used for controlling the whole main network at the same time; the estimated compression coefficient is used for directly determining the network exit stage in a coarse granularity mode, so that extra calculation amount caused by multiple quality evaluations can be avoided; and the weighting coefficients of all channels of the network intermediate characteristic diagram are generated by utilizing the control vectors extracted by the estimation module through a simple regulation and control module, so that more precise network regulation and control are realized, and network output is dynamically regulated.

In the embodiment of the application, image processing is performed by using a dynamic adjustment early exit UNet image enhancement sub-model based on compression condition estimation, and the training and inference process of the image enhancement sub-model can be realized by the following steps:

firstly, establishing a training data set.

In an embodiment of the present application, the published DIV2K dataset may be used as a training dataset, followed by a Python Imaging Library (PIL) to generate JPG images with compression percentages ranging from one percent to eighty percent. In the training process, random cutting, turning and rotating operations are adopted to increase data. In the training process, the compression degree coefficient label is obtained by mapping compression percentage to [1, 0] through linear transformation, wherein 1 represents the most severe compression degree. The restored image is a clear image without JPG compression.

And secondly, designing a deep neural network.

This step can be achieved by:

step 801, designing a basic network.

As shown in fig. 8, the overall network framework of the image processing method provided by the embodiment of the present application is that a feature extraction module 811 shared by network headers is composed of convolution layers, a compression condition estimation module 812 is composed of convolution operations, global pooling and full connection operations, and an image quality enhancement module 813 is of a densely-connected U-shaped network structure.

The feature extraction module 811 corresponds to a feature extraction sub-model in another embodiment, the compression condition estimation module 812 corresponds to a compression estimation sub-model in another embodiment, and the image quality enhancement module 813 corresponds to an image enhancement sub-model in another embodiment.

Step S802, designing a shared feature extraction module.

Considering that the JPG compressed picture has obvious blocky effect, which is mainly manifested by obvious abrupt change in horizontal and vertical directions, in the embodiment of the present application, as shown in fig. 9, horizontal 901, vertical line 902 convolution and 3 × 3 convolution 903 are used in the first layer convolution layer of the feature extraction module to extract rich shallow features.

And thirdly, designing a compression condition estimation module.

In the embodiment of the present application, as shown in fig. 10, the compression condition estimation module is composed of a base convolution 1001, a global pooling 1002, and a full link layer 1003. In consideration of the fact that the compression degree coefficient contains limited information, in the embodiment of the present application, the feature vector before the compression degree coefficient is regressed is used as a control vector for regulating and controlling the image quality enhancement module. As shown in fig. 10, the output of the global pooling is a control vector, and the output of the full-connected layer is a compression degree coefficient obtained by the regression.

And fourthly, designing an image quality enhancement module.

In the embodiment of the application, the image quality enhancement module adopts a densely connected U-shaped network structure and comprises five stages. Different stages correspond to different sizes of sub-U-shaped network structures. The U-shaped network of the next stage is composed of the U-shaped network of the previous stage, a new convolution layer, a down-sampling layer and an up-sampling layer.

Taking a U-type network of stage 2 as an example for explanation, fig. 11 is a schematic diagram of the U-type network of stage 2 of the image quality enhancement module provided in this embodiment of the present application, and as shown in fig. 11, before each convolution operation of the image quality enhancement module, the intensity of the feature map is dynamically adjusted through a feature map adjusting structure 1101, so as to implement finer adjustment and control according to the own compression condition of the input sample. As shown in fig. 12, the inputs of the feature map adjusting structure include a control vector 1201 and a feature map 1202, and the control vector is input to a regulation module 1203 to obtain an adjusted control vector, and then an adjusted feature map 1204 is output based on the adjusted control vector and the feature map, and the regulation module 1203 is composed of a simple full connection layer and an activation function.

The design of the deep neural network is completed through the second step, and then the network training and the inference process are carried out.

And thirdly, network training.

The process of training the deep neural network designed in the second step comprises the following steps:

step S1301, pre-training.

In the network training process, in order to ensure the stability of training, the common feature extraction module and the compression condition estimation module are pre-trained by taking the compression coefficient estimation loss as an objective function, wherein the objective function used in the pre-training is shown as a formula (1-1):

（1-1）；

wherein cond_predictFor estimated compression factor, cond_GTIs a compression degree coefficient label, | |₁As a function of L1 losses.

Second, the loss function.

After loading the pre-trained common feature extraction module and the compression condition estimation module, the compression coefficient estimation and the multi-stage restoration loss are adopted to supervise the training of the whole network. In order to enable the network to exit simple samples at an early stage, the restoration loss of each stage is supervised by the actual compression factor. The early stage loss function weight of the simple sample is high, the later stage loss function weight is low, and the reverse is true for the complex sample. In the embodiment of the present application, the multi-stage restoration loss function is in the form of equations (1-2), (1-3), and (1-4):

（1-2）；

（1-3）；

（1-4）；

wherein, w_iIs the weight coefficient of recovery loss in the i stage, and a and b are w_iIntercept and slope of (a) for ensuring w_iFor positive numbers, b is used to determine the difference between the weights of the different phases, and in the embodiment of the present application, a may be set to 10 and b may be set to 9. K is the total number of stages, Output_iRestored images output for the I-th stage, I_GTIs a clear image without compression.

Since the main purpose of the image processing method provided by the embodiment of the present application is restoration of a compressed image, and the compression condition estimation module has been pre-trained, the ratio of the restoration loss function and the compression coefficient estimation loss function can be set to 1: 0.01, the total training loss function is shown in equations (1-5):

（1-5）；

and fourthly, deducing.

In the inference process, the compression situation estimation module has two main aspects to the control of the whole network:

in the first aspect, the control vector is used to generate weighting coefficients of each channel of the feature map, and the feature map intensity in the image quality enhancement module is adjusted before each convolution operation.

In the second aspect, the exit stage is determined by using the predicted compression coefficient, so that the early exit is realized for simple samples, and the calculation amount is dynamically reduced. For example, when the estimated compression degree coefficient is less than 1/K, the first stage is selected to directly exit, and when the compression coefficient is more than or equal to 1/K and less than 2/K, the second stage is selected to exit.

The network framework provided by the embodiment of the application realizes the self-adaptive adjustment of the strength of the network intermediate characteristic diagram according to the compression condition of the input sample in the inference process, so that the network is more suitable for the restoration of the current sample. Meanwhile, the mode of directly selecting the exit stage by the compression coefficient can better distribute the calculation to the samples in need.

The image processing method provided by the embodiment of the application can be used for removing the compression effect of the JPG image: according to the situations of blocky effect, ringing effect and the like of the image, the algorithm estimates the compression situation coefficient of the image, performs self-adaptive image quality enhancement on the JPG image, removes noise caused by JPG compression, and improves the visual effect of the image. Furthermore, the method can also be expanded to the estimation and self-adaptive enhancement of the compression effect caused by various video compression, and is applied to various live broadcast and on-demand application scenes in video products. The video frame compression degree is estimated according to different Quantization Parameter (QP) values by adopting a corresponding video compression technical means, and the video frame is restored and enhanced, so that the impression of the whole video is improved.

In the embodiment of the application, a UNet model which is dynamically adjusted and selects an exit stage according to the compression condition of the picture is utilized to adaptively restore the compression effect of different degrees. Wherein the network used comprises a compression situation estimation module and a picture quality enhancement module. The network adjusts the weights of different channels of the middle feature map (feature map) of the image quality enhancement module by using the control vector extracted by the compression condition estimation module, so that the network can carry out self-adaptive adjustment according to the currently processed sample. Furthermore, according to the dynamic selection exit stage of the compression coefficient, the scheme can obtain good processing effect on simple samples with low compression degree only by a small amount of convolutional layer processing so as to avoid detail blurring caused by more deep layer calculation, and adopts deeper convolution processing to improve recovery effect on complex samples with serious compression. In summary, the image processing method provided by the embodiment of the application can estimate the compression condition of the user image to generate a dynamic deep network structure, so that the loss caused by the decompression effect and the detail blurring is well balanced in a self-adaptive mode, the consumption of computing resources is saved, the use cost is greatly reduced, and the user experience is improved.

Continuing with the exemplary structure of the image processing apparatus 455 provided by the embodiments of the present application implemented as software modules, in some embodiments, as shown in fig. 3, the software modules stored in the image processing apparatus 455 of the memory 450 may include:

a first obtaining module 4551, configured to obtain an image to be processed, and perform feature extraction on the image to be processed to obtain an image feature of the image to be processed;

a first determining module 4552, configured to determine a control vector and a compression degree coefficient based on the image feature, where the control vector is used to adjust the intensity of the image feature during image processing;

a second determining module 4553, configured to determine a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and a model structure of the trained image enhancement sub-model;

an image processing module 4554, configured to obtain an image processing result of the trained image enhancement sub-model when the target exit stage is reached, based on the image feature and the control vector;

an output module 4555, configured to output the image processing result.

In some embodiments, the apparatus further comprises:

a second obtaining module, configured to obtain a trained image processing model, where the trained image processing model includes: a feature extraction sub-model, a compression estimation sub-model and an image enhancement sub-model;

In some embodiments, the first obtaining module is further configured to:

determining the pooling result as the control vector;

In some embodiments, the second determining module is further configured to:

In some embodiments, the image processing module is further configured to:

performing convolution processing on the adjusted image characteristics to obtain a fifth convolution result;

In some embodiments, the apparatus further comprises:

a fifth obtaining module, configured to obtain a first loss function of the feature extraction sub-model and the compression estimation sub-model in the preset image processing model, and obtain a second loss function corresponding to the image enhancement sub-model in the preset image model;

In some embodiments, the third obtaining module is further configured to:

In some embodiments, the apparatus further comprises:

In some embodiments, the model training module is further configured to:

It should be noted that the above description of the embodiment of the image processing apparatus is similar to the above description of the method, and has the same advantageous effects as the embodiment of the method. For technical details not disclosed in the embodiments of the image processing apparatus of the present application, a person skilled in the art should understand with reference to the description of the embodiments of the method of the present application.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, a method as illustrated in fig. 4, 5, and 6.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An image processing method, comprising:

determining a control vector and a compression degree coefficient based on the image features;

determining a target exit stage of the trained image enhancement sub-model based on the compression degree coefficient and a model structure of the trained image enhancement sub-model;

acquiring an image processing result of the trained image enhancement sub-model when the target exit stage is reached based on the image features and the control vector, wherein the control vector is used for adjusting the intensity of the image features in the image processing process;

and outputting the image processing result.

2. The method according to claim 1, wherein before the obtaining of the image to be processed and the feature extraction of the image to be processed to obtain the image features of the image to be processed, the method further comprises:

obtaining a trained image processing model, wherein the trained image processing model comprises: the trained feature extraction submodel, the trained compression estimation submodel and the trained image enhancement submodel;

the feature extraction of the image to be processed to obtain the image features of the image to be processed includes: performing feature extraction on the image to be processed by using the trained feature extraction submodel to obtain image features;

the determining a control vector and a compression degree coefficient based on the image feature comprises: and inputting the image characteristics into the trained compression estimation sub-model to obtain the control vector and the compression degree coefficient.

3. The method of claim 2, wherein the performing feature extraction on the image to be processed by using the trained feature extraction submodel to obtain image features comprises:

4. The method of claim 2, wherein the trained compressed estimation submodel comprises at least a third convolutional layer, a pooling layer, and a full-link layer, and the inputting the image features into the trained compressed estimation submodel to obtain the control vector and the compression degree coefficient comprises:

determining the pooling result as the control vector;

and carrying out full-connection processing on the pooling result by using the full-connection layer to obtain the compression degree coefficient.

5. The method of claim 1, wherein determining the target exit stage of the trained image enhancement submodel based on the compression degree coefficient and a model structure of the trained image enhancement submodel comprises:

6. The method of claim 1, wherein the obtaining an image processing result of the trained image enhancement sub-model when the target exit stage is reached based on the image feature and the control vector comprises:

7. The method of claim 6, wherein inputting the image features and the control vectors to the target image enhancement network structure to obtain the image processing result comprises:

8. The method according to any one of claims 2 to 7, further comprising:

acquiring a training data set, wherein the training data set comprises a plurality of training images;

acquiring a compression coefficient label and a preset image processing model of each training image;

acquiring a first loss function of a feature extraction sub-model and a compression estimation sub-model in the preset image processing model, and acquiring a second loss function corresponding to an image enhancement sub-model in the preset image processing model;

and training the preset image processing model by using the first loss function and the second loss function to obtain a trained image processing model.

9. The method of claim 8, wherein the obtaining a training data set comprises:

at least one of cutting and rotating the compressed images to obtain a plurality of processed images;

10. The method of claim 8, further comprising:

determining loss weight coefficients corresponding to all processing stages based on compression coefficient labels of training images, the total number of the processing stages of the image enhancement sub-model and preset hyper-parameters;

and determining a second loss function by using the loss weight coefficients corresponding to the processing stages.

11. The method of claim 8, wherein the training the preset image processing model by using the first and second loss functions to obtain a trained image processing model comprises:

12. An image processing apparatus characterized by comprising:

a first determining module, configured to determine a control vector and a compression degree coefficient based on the image feature, where the control vector is used to adjust an intensity of the image feature in an image processing process;

the second determining module is used for determining a target exit stage of the trained image enhancement submodel based on the compression degree coefficient and the model structure of the trained image enhancement submodel;

and the output module is used for outputting the image processing result.

13. An image processing apparatus characterized by comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 11 when executing executable instructions stored in the memory.

14. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 11.