CN113709504B

CN113709504B - Image processing method, intelligent terminal and readable storage medium

Info

Publication number: CN113709504B
Application number: CN202111253333.1A
Authority: CN
Inventors: 陈泳; 黄河
Original assignee: Shenzhen Transsion Holdings Co Ltd
Current assignee: Shenzhen Transsion Holdings Co Ltd
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2022-02-15
Anticipated expiration: 2041-10-27
Also published as: CN113709504A

Abstract

The application provides an image processing method, an intelligent terminal and a readable storage medium, wherein the image processing method comprises the following steps: acquiring at least one image block; determining or generating at least one intermediate result according to the at least one image block; and performing filtering processing on the at least one intermediate result to obtain a target filtering result, wherein the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the at least one image block. By adopting the image processing scheme provided by the application, the relevance among different image blocks can be fully utilized, and the image distortion can be effectively reduced.

Description

Image processing method, intelligent terminal and readable storage medium

Technical Field

The application relates to the technical field of intelligent terminals, in particular to an image processing method, an intelligent terminal and a readable storage medium.

Background

With the development of various electronic devices and internet technologies, people seek higher and higher definition of video pictures, and in the process of conception and implementation of the present application, the inventor finds that at least the following problems exist: in conventional video coding techniques, the loop filtering process can make block boundary pixels discontinuous and create situations where high frequency information is lost, thereby creating artifacts and distortion. Therefore, it is necessary to provide an innovative image processing method to improve the performance of loop filtering processing and reduce the distortion of the reconstructed image or the decoded image.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides an image processing method, an intelligent terminal and a readable storage medium, which can make full use of the correlation between image blocks to effectively reduce image distortion.

In order to solve the above technical problem, the present application provides an image processing method, optionally applicable to an intelligent terminal, including:

acquiring at least one image block;

determining or generating at least one intermediate result according to the at least one image block;

and performing filtering processing on the at least one intermediate result to obtain a target filtering result, wherein the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the at least one image block. Optionally, the at least one image block comprises a first image block and a second image block; the first image block comprises an image block which is not based on the neural network filtering processing, and the second image block comprises an image block which is based on the neural network filtering processing; or the second image block and the target filtering result are subjected to the same type of filtering processing;

optionally, the determining or generating at least one intermediate result according to the at least one image block includes: determining or generating at least one intermediate result according to the first image block and the second image block.

Optionally, the at least one image block comprises a first image block comprising an image block that is not processed by the neural network based filtering;

determining or generating at least one intermediate result according to the at least one image block includes: acquiring a second image block corresponding to the first image block, wherein the second image block comprises an image block which is filtered based on a neural network; or the second image block and the target filtering result are subjected to the same type of filtering processing; determining or generating at least one intermediate result according to the first image block and the second image block.

Optionally, the first image block and the second image block belong to different types of image blocks; the image of the second image block and the image of the first image block belong to the same image group; and the image of the second image block and the image of the first image block are sequentially encoded or decoded.

Optionally, the obtaining at least one image block includes: acquiring a first image block; and acquiring a second image block according to the attribute information of the first image block.

Optionally, the obtaining a second image block according to the attribute information of the first image block includes: and acquiring a second image block from a filtering result cache unit or an image buffer according to the attribute information of the first image block, wherein the second image block comprises the image block after filtering.

Optionally, the obtaining, according to the attribute information of the first image block, a second image block from a filtering result caching unit or an image buffer includes at least one of: if the attribute information of the first image block indicates that the first image block is a chrominance image block or a luminance image block, acquiring a second image block from a filtering result cache unit, wherein the second image block is a luminance image block after filtering processing or a chrominance image block after filtering processing correspondingly; and if the attribute information of the first image block indicates that the first image block is a difference frame image block, acquiring a second image block from a filtering result cache unit or an image buffer, wherein the second image block is a key frame image block after filtering.

Optionally, the determining or generating at least one intermediate result according to the first image block and the second image block includes: acquiring the image characteristics of the first image block and the image characteristics of the second image block; and determining or generating at least one intermediate result according to the image characteristics of the first image block and the image characteristics of the second image block.

Optionally, the determining or generating at least one intermediate result according to the image features of the first image block and the image features of the second image block includes: if the first image block is a luminance image block or a chrominance image block, acquiring edge features of the first image block and edge features of the second image block; fusing the edge characteristics of the first image block and the edge characteristics of the second image block to obtain an edge fused image block; and determining or generating at least one intermediate result according to the edge fusion image block.

Optionally, the determining or generating at least one intermediate result according to the image features of the first image block and the image features of the second image block includes: if the first image block is a difference frame image block or a key frame image block, acquiring the brightness characteristic and/or the chrominance characteristic of the first image block, and acquiring the brightness characteristic and/or the chrominance characteristic of the second image block; fusing the brightness characteristic of the first image block and the brightness characteristic of the second image block to obtain a brightness fused image block, and/or fusing the chrominance characteristic of the second image block and the chrominance characteristic of the first image block to obtain a chrominance fused image block; and determining or generating at least one intermediate result according to the brightness fused image block and/or the chroma fused image block.

Optionally, the determining or generating at least one intermediate result according to the first image block and the second image block includes: fusing the first image block and the second image block to obtain a target fused image block; and determining or generating at least one intermediate result according to the target fusion image block.

Optionally, the fusing the first image block and the second image block to obtain a target fused image block includes: scaling the first image block and/or the second image block to obtain a first image block and/or a second image block with adjusted size; and performing fusion processing on any one of the first image block and the first image block with the adjusted size and any one of the second image block and the second image block with the adjusted size to obtain a target fusion image block.

Optionally, the at least one intermediate result comprises at least one fused image block; the filtering the at least one intermediate result to obtain a target filtering result includes: and performing filtering processing on the at least one fused image block by using a filtering processing mode adopted by the second image block to obtain a target filtering result.

Optionally, the target filtering result includes a first image block after filtering processing, and filtering processing model structures and/or parameters corresponding to filtering processing modes adopted by the first image block and the second image block after filtering processing are different.

Optionally, the resolution of the first image block is smaller than the resolution of the second image block, and/or the resolution of the first image block after the filtering processing is larger than the resolution of the first image block.

The present application further provides an image processing method, optionally applicable to an intelligent terminal, including:

acquiring an image block to be processed;

determining or generating a target filtering mode according to the attribute information of the image block to be processed;

and filtering the image blocks to be processed according to the target filtering mode to obtain a target filtering result.

Optionally, the target filtering manner includes a first target filtering manner and/or a second target filtering manner; the determining or generating a target filtering mode according to the attribute information of the image block to be processed includes: if the attribute information of the image block to be processed indicates that the image block to be processed is a first image block, determining or generating the first target filtering mode; and/or determining or generating the second target filtering mode if the attribute information of the image block to be processed indicates that the image block to be processed is a second image block.

Optionally, the first target filtering manner and the second target filtering manner are the same type of filtering manner, and structures and/or parameters of filtering processing models corresponding to the first target filtering manner and the second target filtering manner are different.

Optionally, the method further comprises at least one of: the first target filtering mode and the second target filtering mode correspond to different filtering units, and a mapping relation exists between a filtering processing model corresponding to the filtering unit and a quantization parameter; and the filtering processing model corresponding to the filtering unit is determined or generated according to the quantization parameter corresponding to the target coding cost.

Optionally, the filtering, according to the target filtering manner, the image block to be processed to obtain a target filtering result includes at least one of the following: if the image block to be processed is a first image block, acquiring a reference image block, and performing filtering processing on the reference image block and the first image block according to the target filtering mode to obtain a target filtering result; and/or if the image block to be processed is a second image block, filtering the second image block according to the target filtering mode to obtain a target filtering result.

Optionally, at least one of the following is included: the reference image block is an image block after filtering processing; optionally, the reference image block and the target filtering result are subjected to the same type of filtering processing; the first image block comprises any one of a difference frame image block, a chrominance image block and a luminance image block; the second image block comprises a key frame image block.

Optionally, the filtering the reference image block and the first image block according to the target filtering manner to obtain a target filtering result includes: preprocessing the reference image block and the first image block to obtain a filtering object; and carrying out filtering processing on the filtering processing object according to the target filtering mode to obtain a target filtering result.

Optionally, the preprocessing the reference image block and the first image block to obtain a filtering object includes at least one of:

performing fusion processing on the reference image block and the first image block to obtain a first fusion image block, and determining the first fusion image block as a filtering processing object;

scaling the reference image block and/or the first image block to obtain a size-adjusted reference image block and/or a size-adjusted first image block, performing fusion processing on any one of the reference image block and the size-adjusted reference image block and any one of the first image block and the size-adjusted first image block to obtain a second fused image block, and determining the second fused image block as the filtering processing object;

and carrying out zero setting processing on partial transform coefficients included in the first image block according to a transform coefficient threshold to obtain a processed first image block, carrying out fusion processing on the processed first image block and the reference image block to obtain a third fused image block, and determining the third fused image block as the filtering processing object.

Optionally, the filtering the object to be filtered according to the target filtering manner to obtain a target filtering result includes at least one of:

if the filtering processing object is the first fusion image block or the third fusion image block, filtering the filtering processing object according to a first type of target filtering mode to obtain a target filtering result;

and if the filtering processing object is the second fusion image block or the second image block, filtering the filtering processing object according to the first type of target filtering mode or the second type of target filtering mode to obtain a target filtering result.

The present application also provides an image processing apparatus including:

the acquisition module is used for acquiring at least one image block;

the determining module is used for determining or generating at least one intermediate result according to the at least one image block;

and the filtering module is used for carrying out filtering processing on the at least one intermediate result to obtain a target filtering result, and the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the at least one image block.

The present application also provides another image processing apparatus including:

the acquisition module is used for acquiring an image block to be processed;

the determining module is used for determining or generating a target filtering mode according to the attribute information of the image block to be processed;

and the filtering module is used for carrying out filtering processing on the image blocks to be processed according to the target filtering mode so as to obtain a target filtering result.

The application also provides an intelligent terminal, including: a memory, a processor, wherein the memory has stored thereon an image processing program, which when executed by the processor implements the steps of any of the methods described above.

The present application also provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, performs the steps of the method as set forth in any one of the above.

As described above, the image processing method of the present application includes the steps of: acquiring at least one image block; determining or generating at least one intermediate result from the at least one image block; and performing filtering processing on the at least one intermediate result to obtain a target filtering result, wherein the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the at least one image block. By the technical scheme, the target filtering result can be obtained by utilizing the intermediate result generated by one or more associated image blocks, and then the function of generating the reconstructed image or the decoded image is realized, so that the problem of larger distortion of the reconstructed image or the decoded image output in video coding or decoding is solved, and further the user experience is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

fig. 3 is a functional diagram illustrating processing of general video coding at an encoding end according to an embodiment of the present application;

fig. 4 is a functional diagram illustrating processing of general video coding at a decoding end according to an embodiment of the present application;

fig. 5 is a flowchart illustrating an image processing method according to the first embodiment;

FIG. 6 is a flowchart illustrating an image processing method according to a second embodiment;

FIG. 7a is a first schematic diagram illustrating a structure of a neural network-based loop filter according to a second embodiment;

FIG. 7b is a schematic diagram of a second embodiment of a neural network based loop filter;

FIG. 7c is a first schematic structural diagram of a filtering process model according to the second embodiment;

FIG. 8a is a schematic diagram of a third example of a structure of a neural network-based loop filter according to the second embodiment;

FIG. 8b is a second schematic diagram illustrating a filtering process model according to a second embodiment;

FIG. 9 is a fourth exemplary schematic diagram illustrating a structure of a neural network-based loop filter according to the second embodiment;

fig. 10 is a schematic diagram showing the processing function of a decoder according to the second embodiment;

fig. 11a is a first schematic structural diagram of a post-processing filter based on a super-resolution neural network according to a second embodiment;

fig. 11b is a schematic structural diagram of a post-processing filter based on a super-resolution neural network according to the second embodiment;

FIG. 11c is a schematic diagram of a filtering process model according to the second embodiment;

fig. 12a is a schematic structural diagram of a post-processing filter based on a super-resolution neural network according to the second embodiment;

fig. 12b is a schematic structural diagram of a post-processing filter based on a super-resolution neural network according to the third embodiment;

FIG. 13a is a schematic diagram of the processing function of an encoder according to a second embodiment;

FIG. 13b is a schematic diagram of the processing function of another decoder according to the second embodiment;

fig. 14 is a schematic structural diagram of a neural network-based loop filter according to a fifth embodiment;

FIG. 15a is a schematic diagram of the processing function of another encoder shown in accordance with the second embodiment;

FIG. 15b is a schematic diagram illustrating the processing function of yet another decoder according to the second embodiment;

fig. 16 is a schematic structural diagram of a loop filter based on a super-resolution neural network according to a second embodiment;

fig. 17 is a flowchart illustrating an image processing method according to a third embodiment;

fig. 18 is a flowchart illustrating an image processing method according to a fourth embodiment;

fig. 19 is a schematic configuration diagram of an image processing apparatus according to the fifth embodiment.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and further, where similarly-named elements, features, or elements in different embodiments of the disclosure may have the same meaning, or may have different meanings, that particular meaning should be determined by their interpretation in the embodiment or further by context with the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S501 and S502 are used herein for the purpose of more clearly and briefly describing the corresponding contents, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S502 and then S501 in the specific implementation, but these steps should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The smart terminal may be implemented in various forms. For example, the smart terminal described in the present application may include smart terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 ), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), TDD-LTE (Time Division duplex-Long Term Evolution, Time Division Long Term Evolution), 5G, and so on.

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the mobile terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems (e.g. 5G), and the like.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

In order to facilitate understanding of the contents of the embodiments described below, the following explanation is made on terms of art that may be involved in the present scheme.

YUV: one type of image format commonly used in video, picture, camera, etc. applications, unlike RGB, the YUV format is represented by a "luminance" component, called Y (equivalent to gray scale), and two "chrominance" components, called U (blue projection) and V (red projection), respectively.

I frame: the intra-frame coded frame is an independent frame with all information, can be independently decoded without referring to other images, and can be simply understood as a static picture. The first frame in a video sequence is always an I-frame because it is a key frame.

P frame: inter-frame predictive coding frames require reference to the frame preceding the playing order for coding. The difference between the current frame picture and the previous frame (which may be an I frame or a P frame) is shown. When decoding, the difference defined by the frame is superimposed on the picture buffered before, and the final picture is generated.

B frame: bidirectionally predictive-encoded frames, i.e., B-frames, record the differences between the current frame and the preceding and following frames. That is, to decode a B frame, not only a buffer picture before the playback order but also a picture after the playback order are decoded, and a final picture is obtained by superimposing the previous and subsequent pictures on the data of the current frame. The B frame compression rate is high, but the decoding performance is required to be high.

And (3) rate distortion optimization: the Rate-Distortion Optimization, RDO for short, is used for improving the video quality in video compression, and aims to minimize Distortion under the condition of not exceeding the maximum code Rate. The main idea is to calculate the minimum value (the value corresponding to the cost function, namely the rate distortion cost) through the cost function under the restriction of two factors of the code rate and the distortion degree, and ensure the low code rate at the low distortion degree. It can be used for various mode selections such as an intra prediction mode or an inter prediction mode, a partition mode decision of a coding tree unit or a coding unit, and the like.

Alternatively, the terms "reconstruction" and "decoding" may be used interchangeably, and the terms "image," "picture," and "frame" may be used interchangeably. Typically, but not necessarily, the term "reconstruction" is used at the encoder side, while "decoding" is used at the decoder side.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a processing function of a general video coding at an encoding end according to an embodiment of the present disclosure. A video frame input at the encoding end 300 is usually divided into a plurality of image blocks for processing, each image block is subtracted from a prediction block obtained by a predictor (including inter-frame prediction and intra-frame prediction) to obtain a residual block, and then the residual block is transformed and quantized and encoded by an entropy encoder to form an encoded bitstream. In addition, after inverse transformation and inverse quantization, the transformed and quantized residual block is added to a prediction block corresponding to an image block input by a predictor to obtain a reconstructed block, but due to transformation and quantization, distortion exists between the reconstructed block and the image block in the input video frame, so that the finally reconstructed video frame may not have a high degree of restoration, and therefore, the reconstructed block needs to be subjected to loop filtering processing to reduce image distortion, and the difference between a reconstructed image (i.e., a reconstructed frame) obtained after the loop filtering processing and the input original image is reduced as much as possible. Among the Loop Filter processing shown in fig. 3, there are various Filter processing, including a deblocking Filter (DBF), a Sample-Adaptive Offset Filter (SAO), and an Adaptive Loop Filter (ALF). Alternatively, the filter control data indicates whether or not a neural network-based loop filtering operation is applied, and the indication data is used to unify the loop filtering process used between the decoding side and the encoding side.

In one embodiment, the loop filtering process may also include only one or more of the above-mentioned filters. Alternatively, the distortion may be further reduced by adding a Loop Filter based on a Neural Network, for example, a Loop Filter based on a Dense Residual Convolutional Neural Network (DRNLF) or a Loop Filter based on a Super-Resolution CNN (SRCNN) to perform a Loop filtering operation on the reconstructed block.

Accordingly, the loop filter processing can be applied not only at the video encoding side but also at the video decoding side. Referring to fig. 4, fig. 4 is a schematic diagram illustrating processing functions at a decoding end of a general video coding according to an embodiment of the present disclosure. The video decoding end 400 performs entropy decoding on the received bit stream to obtain prediction data, filter control data indicated by the encoding end, and a residual block, optionally, after the residual block is subjected to inverse quantization and inverse transformation (i.e., inverse quantization and inverse transformation), the residual block is summed with a prediction block output by the prediction data through a predictor, and then determines whether a neural network-based loop filter is applied to the currently-being-decoded image block or not according to loop filtering processing indicated by the filter control data, and finally, outputs decoded video data after filtering, and buffers the data into a decoded image buffer so as to apply the data into the image block requiring inter-frame prediction. In another embodiment, the filter control data may include other data that controls the filtering process for the other filters.

Corresponding adjustments may be made for some improvement points and refinements for the codec processes under different video compression standards and the encoders or decoders involved in the schemes described below, but the general processing framework is that shown in fig. 3 and 4, the image blocks involved in the subsequent schemes may correspond to the reconstructed blocks in fig. 3 or 4, and the filtering process may correspond to the processing of one or more filters in the loop filtering.

First embodiment

Referring to fig. 5, fig. 5 is a flowchart illustrating an image processing method according to a first embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be a terminal device (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is taken as the terminal device for example and is described.

S501, at least one image block is obtained.

In an embodiment, the at least one image block includes one or more image blocks, and for any image block, the image block may be a reconstructed image block to be input to a deblocking filter, or a reconstructed image block subjected to deblocking filtering, or an image block after being subjected to loop filtering based on a neural network, or an image block in a cached image in an encoded image buffer or a decoded image buffer. As shown in fig. 3, at the encoding end, the reconstructed image block is obtained by inverse transform and inverse quantization of the residual block corresponding to the input image block, and then adding the residual block to the prediction block output by the predictor.

In an embodiment, an implementation manner of obtaining at least one image block may be: and acquiring a first image block, and acquiring a second image block according to the attribute information of the first image block. Alternatively, the attribute information of the image block may be an identifier indicating what type of image block the first image block belongs to. The image blocks are divided from the belonging image frames, the types of the image blocks can comprise I frame image blocks, P frame image blocks and B frame image blocks, and the image blocks belong to different frame images; the types of the image blocks may also include chrominance image blocks (abbreviated as chrominance blocks) and luminance image blocks (abbreviated as luminance blocks) in the same frame image, divided from the associated feature types. Alternatively, the luminance block and the chrominance block may be considered as two components with different characteristics in the image, namely, a luminance component and a chrominance component, the luminance component has a large amount of detail information, the chrominance component carries less information, and the texture is smooth. Taking the above classification standard as an example, if the first image block is a P-frame image block or a B-frame image block, the obtained second image block is an I-frame image block, and if the first image block is a chrominance image block (or a luminance image block), the obtained second image block is a luminance image block (or a chrominance image block). When any image block (first image block) of the two types of image blocks is processed subsequently, the image information of the other image block (second image block) can be used for reference, and the image information is expressed comprehensively, so that the image block filtering effect can be improved when the image block is filtered. It should be noted that, when the attribute information of the first image block indicates that the first image block is an I-frame image block, the acquired second image block is empty, that is, the I-frame image block does not need to refer to information of other image blocks, which is determined by all information of the I-frame image itself.

Optionally, the manner of obtaining the second image block according to the attribute information of the first image block may be: and determining to acquire a second image block from the filtering result cache unit or the image buffer according to the attribute information of the first image block, wherein the second image block comprises the image block after filtering processing. Optionally, the filtering result buffering unit may be a buffering unit of the filter itself or a buffering unit independent of the filter, the filter may be a loop filter based on a neural network or a post-processing filter based on a super-resolution neural network, and optionally, the neural network may be a deep separable convolutional neural network, or a super-resolution neural network, and the like, which is not limited herein. The image buffer may refer to an encoded image buffer at an encoding end for storing reconstructed images or a decoded image buffer at a decoding end (see contents shown in fig. 3 and fig. 4, respectively), for storing decoded images, where the images stored in the image buffer are used for inter-frame prediction in a video encoding and decoding process. It should be noted that, no matter whether the second image block is obtained from the filtering result buffer unit or the image buffer, the second image block is the image block after filtering processing, and the difference is that the types of filtering processing included in the image blocks in different storage addresses are different, for example, the image block in the image stored in the encoded image buffer is the image block after loop filtering processing (including multiple filters), such as the image block in the image stored in the encoded image buffer after adaptive loop filter processing included in the encoder shown in fig. 3; the image block buffered in the filtering result buffer unit may be an image block processed by a loop filter based on a neural network.

Optionally, determining to acquire the second image block from the filtering result caching unit or the image buffer according to the attribute information of the first image block may include at least one of:

if the attribute information of the first image block indicates that the first image block is a chrominance image block or a luminance image block, acquiring a second image block from a filtering result cache unit, wherein the second image block is a luminance image block after filtering processing or a chrominance image block after filtering processing correspondingly;

and if the attribute information of the first image block indicates that the first image block is a difference frame image block, acquiring a second image block from the filtering result cache unit or the image buffer, wherein the second image block is a key frame image block after filtering. That is, the attribute information of the first image block indicates that the type of the first image block is different, and the source of the reference second image block is different. The key frame tiles may be I frame tiles and the difference frame tiles may be P frame, B frame tiles.

Assuming that the first image block is a chrominance image block, the acquired second image block is a luminance image block after filtering stored in the filtering result cache unit, or the first image block is a luminance image block, the acquired second image block is a chrominance image block after filtering stored in the filtering result cache unit, the subsequent first image block and the second image block stored in the filtering result cache unit will be subjected to the same type of filtering, for example, the second image block stored in the filtering result cache unit is an image block after filtering based on a neural network loop filter, and the first image block will be subjected to the neural network loop filtering, optionally, the filtering may be performed by different filtering units in the same filter, or by respective filtering units in two filters, where the filtering performed by the two filters on the image blocks is the same type of filtering, for example, all loop filtering processes based on a super-resolution neural network.

Assuming that the first image block is a difference frame image block including an image block in a forward predictive coded image frame (P frame image block for short) and an image block in a bidirectional predictive coded image frame (B frame image block for short), the second image block to be referred to is an image block in an intra coded image frame (I frame image block for short) acquired from an image buffer. The corresponding second image block is obtained from the specific storage space according to the attribute information of the first image block, so that the second image block can be quickly and accurately obtained, and the filtering processing efficiency is improved.

It should be noted that, no matter whether the second image block is obtained from the filtering result buffer unit or the image buffer, the second image block is the image block after filtering processing, and the difference is that the types of filtering processing included in the image blocks in different storage addresses are different, for example, the image block in the image stored in the encoded image buffer is the image block after loop filtering processing (including multiple filters), such as the image block in the image stored in the encoded image buffer after adaptive loop filter processing included in the encoder shown in fig. 3; the image block buffered in the filtering result buffer unit may be an image block processed by a loop filter based on a neural network.

S502, determining or generating at least one intermediate result according to at least one image block.

In an embodiment, the at least one intermediate result includes one or more intermediate results, and any intermediate result may be an image block processed on the at least one image block. Optionally, when at least one image block includes a first image block and a second image block, at least one intermediate result may be determined or generated according to the first image block and the second image block, for example, the first image block and the second image block may be directly fused to obtain an intermediate result, or one of the image blocks is scaled and then fused with the other image block to obtain an intermediate result. Optionally, a more detailed implementation of determining or generating the intermediate result may be found in the contents of the second embodiment, which is not described in detail herein.

S503, filtering the at least one intermediate result to obtain a target filtering result.

In an embodiment of the present application, the target filtering result is used to determine or generate a reconstructed image or a decoded image corresponding to the at least one image block. Here, the reconstructed image and the decoded image are images processed and restored at the video encoding side and the decoding side, respectively, and the restored images are restores of the original images input to the video encoding side. The final restored image may be somewhat distorted compared to the original image, usually due to some processing during the restoration process. However, in the solution provided in the present application, some measures may be taken to reduce the distortion degree of the restored image, for example, adding a post-processing filter based on a super-resolution neural network before outputting the decoded image, so as to increase the resolution of the decoded image, thereby generating a high-quality decoded image.

In an embodiment, the filtering process for the at least one intermediate result may refer to a filtering process of a loop filter based on a neural network as in fig. 3 (or fig. 4), or may refer to a filtering process of a loop filter based on one or more filters (including an adaptive loop filter, a sample adaptive offset filter, and a loop filter based on a neural network) as in fig. 3 (or fig. 4). Correspondingly, the target filtering result may be a filtering result obtained after at least one intermediate result is input into the loop filter based on the neural network and is subjected to filtering processing, at this time, a reconstructed image or a decoded image is determined or generated by using the target filtering result, and the target filtering result further needs to be subjected to filtering processing by the sampling adaptive offset filter and the adaptive loop filter in sequence. The target filtering result may also be a filtering result output by at least one intermediate result after loop filtering, that is, a filtering result obtained by filtering the loop filter, the sampling adaptive offset filter, and the adaptive loop filter based on the neural network in sequence, at this time, the target filtering result may be directly used as an image block in the reconstructed image or the decoded image, and the reconstructed image or the decoded image may be obtained in sequence according to the target filtering result corresponding to the image block of the input image.

It should be noted that, when reconstructing an image and decoding an image, the same encoder or decoder is used to perform encoding and decoding operations on different types of acquired image blocks, which can effectively save system software and hardware resources. The image processing scheme provided by the embodiment can be applied to a video encoding end and can also be applied to a video decoding end. If the scheme is applied to the encoding end, each image block corresponds to data in the encoding end, for example, an I-frame image block stored in an encoded image buffer, and the filtering process may be a loop filtering process in the encoding end; if the scheme is applied to the decoding side, the above-mentioned each image block corresponds to data in the decoding side, for example, an I-frame image block stored in a decoded image buffer, and the corresponding filtering process corresponds to a loop filtering process of the decoding side.

In summary, the embodiments of the present application have at least the following advantages:

determining or generating an intermediate result through the acquired at least one image block, wherein the determined or generated intermediate result may include image information of a plurality of image blocks due to the correlation between the image blocks, for example, the chrominance image block and the luminance image block, and further, the accuracy of the target filtering result is improved by using the intermediate result; when at least one image block comprises a first image block and a second image block, the first image block corresponds to the second image block, information in the second image block can be used as a reference for restoring image information by the first image block, namely the first image block can also reduce the distortion degree of a reconstructed image by utilizing the first image block after filtering according to the information of the second image block during filtering.

Second embodiment

Referring to fig. 6, fig. 6 is a flowchart illustrating an image processing method according to a second embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be a terminal device (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is taken as the terminal device for example and is described.

S601, at least one image block is obtained.

In an embodiment, the at least one image block comprises a first image block and a second image block; optionally, the first image block includes an image block that is not processed by the neural network based filtering, and the second image block includes an image block that is processed by the neural network based filtering. The obtaining manner of the first image block and the second image block may adopt the implementation manner of obtaining at least one image block in the embodiment corresponding to fig. 5, which is not described herein again. The second image block and the target filtering result described below may both be image blocks subjected to a neural network-based filtering process, and the above-mentioned neural network may be a super-resolution neural network or a deep separable convolutional neural network.

In another embodiment, the at least one image block comprises a first image block comprising an image block that has not been processed by the neural network-based filtering. Before step S602 is executed, the following steps are also executed: and acquiring a second image block corresponding to the first image block. Optionally, the second image block includes an image block after being filtered based on a neural network. The second image block may also be obtained according to the attribute information corresponding to the first image block, and the corresponding content may refer to the content in the corresponding embodiment of fig. 5, which is not described herein again.

Optionally, for the association or distinction between the first image block and the second image block, at least one of the following may be included:

1) the first image block and the second image block belong to different types of image blocks;

2) the image of the second image block and the image of the first image block are different images but belong to the same image group;

3) and the image of the second image block and the image of the first image block are sequentially encoded or decoded.

Alternatively, the different types of image blocks may refer to that when the first image block is a B frame image block or a P frame image block, the second image block is an I frame image block, and when the first image block is a chrominance image block (or a luminance image block), the second image block is a luminance image block (or a chrominance image block); for the same image Group where the image blocks are located, the method is suitable for the case that when the first image block is a B-frame image block or a P-frame image block, the obtained second image block is an I-frame image block, because in a video coding sequence, a Group of pictures (GOP) is a Group of pictures formed between two I-frames; when the image in which the second image block is located is a reference frame for inter-frame prediction in the image buffer, the reference frame is an encoded or decoded image, i.e., is encoded or decoded before the image in which the first image block is located.

S602, at least one intermediate result is determined or generated according to the first image block and the second image block.

In one embodiment, the implementation of this step may be: acquiring image characteristics of a first image block and image characteristics of a second image block; at least one intermediate result is determined or generated according to the image features of the first image block and the image features of the second image block. Alternatively, the image features may include edge features, luminance features, chrominance features, etc., where the intermediate results are derived using characteristics of the image itself.

Optionally, when the image feature includes an edge feature, the manner of determining or generating the at least one intermediate result may be: if the first image block is a luminance image block or a chrominance image block, acquiring edge features of the first image block and edge features of the second image block; and fusing the edge characteristics of the first image block and the edge characteristics of the second image block to obtain an edge fused image block, and determining or generating at least one intermediate result according to the edge fused image block. That is to say, when the first image block is a luminance image block or a chrominance image block, because the first image block has more representations of luminance information or chrominance information of an image and has a defect in representation of an edge structure of the image, edge features of the luminance image block and the chrominance image block can be obtained, and edge features of the two image blocks are fused to obtain an edge-fused image block, so that parts such as a contour edge, details and the like of the image block can be enhanced, and loss of the detail part of the image block is suppressed. Optionally, the edge-fused image block may be used as an intermediate result, or the edge-fused image block may be combined with other image blocks, such as further fusing a luminance image block or a chrominance image block, to determine or generate an intermediate result.

Optionally, when the image features include luma features and/or chroma features, the manner of determining or generating the at least one intermediate result may be:

if the first image block is a difference frame image block or a key frame image block, acquiring the brightness characteristic and/or the chrominance characteristic of the first image block, and acquiring the brightness characteristic and/or the chrominance characteristic of the second image block;

fusing the brightness characteristic of the first image block and the brightness characteristic of the second image block to obtain a brightness fused image block, and/or fusing the chroma characteristic of the second image block and the chroma characteristic of the first image block to obtain a chroma fused image block;

at least one intermediate result is determined or generated from the luminance fused image block and/or the chrominance fused image block.

That is, when the first image block is a B-frame image block or a P-frame image block, the chrominance features or luminance features of the first image block and the second image block may be extracted, and the respective features of the first image block and the second image block may be fused to obtain one or both of a luminance-fused image block and a chrominance-fused image block. By acquiring the chrominance characteristics and the luminance characteristics of the B frame image blocks or the P frame image blocks, intermediate results, such as a large amount of detail information carried in the chrominance characteristics, can be acquired by using image information of different dimensions. Alternatively, the luminance fused image block and/or the chrominance fused image block may be directly used as an intermediate result. It is also possible to use only the luminance fused image block as the intermediate image block, from the viewpoint that the human eye is most sensitive to the luminance component (brightness and darkness) in the image and is relatively less sensitive to the chrominance (color) component in the image.

In another embodiment, the correlation between the images can be used to obtain at least one intermediate result, and this step is implemented by: fusing the first image block and the second image block to obtain a target fused image block; and determining or generating at least one intermediate result according to the target fusion image block. The image blocks in different images (for example, I-frame image blocks and P-frame image blocks) may be fused together or the image blocks in the same image (for example, luminance image blocks and chrominance image blocks) may be fused together by using temporal correlation and spatial correlation, where the temporal correlation refers to that most elements in one image are also present in adjacent images (front and back), and the spatial correlation refers to that there is correlation between adjacent pixels in one image. The first image block and the second image block are fused by utilizing the correlation between the images or the image blocks, and the obtained target fused image block is used as an intermediate result, so that the information included in the images can be more accurately and comprehensively depicted, and thus, during subsequent filtering processing, the intermediate result can be regarded as the second image block to compensate the information lacked by the first image block, namely, the image can be reconstructed according to other image information, and further, the reduction degree of the reconstructed image or the decoded image related to the first image block can be effectively improved.

Optionally, the manner of fusing the first image block and the second image block may be: scaling the first image block and/or the second image block to obtain a first image block and/or a second image block with adjusted size; and performing fusion processing on any one of the first image block and the first image block with the adjusted size and any one of the second image block and the second image block with the adjusted size to obtain a target fusion image block. Alternatively, the scaling process may be a downsampling process or an upsampling process, whichever is appropriate, for the downsampling process the resized image block being referred to as a reduced-size image block, and for the upsampling process the resized image block being referred to as an enlarged-size image block. The size here can be understood as the resolution, and can be represented by the pixels of the image, and the smaller the pixels of the image, the greater the pixel density, the higher the image definition, and the more the information content. For example, for an image I with size of M × N, s-fold down-sampling is performed to obtain a resolution image with size of (M/s) × (N/s), and if the image is in a matrix form, the down-sampling is performed to change an image in an original image s × s window into one pixel, and the value of the pixel is an average value of all pixels in the window.

The first image block and the second image block are processed in the above mode, and then fusion can be performed according to different combination modes to obtain a target fusion image block. Optionally, the combination includes the following four types, corresponding to the fusion processing under different encoder or decoder structures:

1) and performing fusion processing on the first image block and the second image block to obtain a target fusion image block. Alternatively, the second image block may be obtained from a filtering result buffering unit or an image buffer. What type of the first image block and the second image block may be the image block may adopt the aforementioned contents, and is not limited herein. For example, the luminance image block and the chrominance image block may be directly fused to obtain a target fused image block, or the reconstructed B/P frame image block and the I frame image block subjected to the loop filtering may be directly fused to obtain a target fused image block.

2) And carrying out fusion processing on the first image block and the second image block with the adjusted size to obtain a target fusion image block. Optionally, when the second image block is an image block subjected to filtering processing by the super-resolution neural network, the size (or resolution) of the second image block is larger than that of the first image block, and for convenience of fusion processing, the size of the second image block is kept consistent with that of the first image block, and here, the size adjustment may adopt downsampling processing to reduce the size of the second image block to be consistent with that of the first image block, and then, the second image block is fused with the first image block to obtain a fused image block. The filtering process performed by the super-resolution neural network may be a loop filter based on the super-resolution neural network or a post-processing filter based on the super-resolution neural network.

3) And carrying out fusion processing on the first image block and the second image block with the adjusted sizes to obtain a target fusion image block. In order to further save the amount of transmitted data and reduce the distortion degree of the image, the image in which the first image block is located may be subjected to downsampling processing at an input end (for example, before video encoding processing or before transformation processing), so as to obtain that the first image block is an image block after size reduction, the image in which the first image block is located may be a B frame or a P frame of input video data, the first image block is a B frame image block or a P frame image block, the second image block may be an I frame image block subjected to loop filtering processing, that is, an image block in a reconstructed frame (I frame), and the downsampling processing is not performed on the I frame image block. Because the reconstructed P frame/B frame image is subjected to downsampling processing at an input end, but the I frame image is not subjected to any scaling processing, the sizes of the first image block and the second image block are different, in order to facilitate fusion processing and subsequent filtering processing, the first image block can be subjected to upsampling processing, the size of the first image block is enlarged to be consistent with that of the second image block, and then the first image block and the second image block are fused to obtain a target fusion image block.

4) And carrying out fusion processing on the first image block with the adjusted size and the second image block with the adjusted size to obtain a target fusion image block. In the embodiment of the present application, in order to reduce further reduction of the transmission code rate, the first image block and the second image block are downsampled to obtain the first image block and the second image block with reduced sizes, and then the first image block and the second image block are fused to obtain the target fusion image block.

It should be noted that, in the embodiment of the present application, the processing of the target fusion image block obtained by the first three fusion methods is mainly described, and details of the target fusion image block obtained by the method shown in fig. 4) and the subsequent processing are not described herein. The scaling process related to the resizing may be implemented by adding a scaling module to the video encoder and the video decoder shown in fig. 3 and fig. 4, or may be implemented by adding a scaling module to the filtering processor, or by adding a scaling module independent from the filtering processor, which may specifically refer to various structural schematic diagrams used in the video codec shown in step S603, and is not described in detail herein.

S603, filtering the at least one intermediate result to obtain a target filtering result.

In an embodiment, the at least one intermediate result obtained through the processing in the above step includes at least one fused image block, and the implementation manner of this step may be: and performing filtering processing on at least one fused image block by using a filtering processing mode adopted by the second image block to obtain a target filtering result. That is, the target filtering result and the second image block are subjected to the same type of filtering processing or the same filtering manner. By the same type of filtering process is meant that the basic principles of the filtering process are the same, but the specific filtering process model, e.g. the structure and parameters of the neural network, to which the filtering process relates are different. For example, the second image block and the target filtering result are both subjected to filtering processing by a loop filter based on a super-resolution neural network, but the filtering units for processing the second image block and the target filtering result are different, and the filtering processing models included in the filtering units are also different.

Optionally, the target filtering result includes the first image block after filtering processing, and the filtering processing model structures and/or parameters corresponding to the filtering processing modes adopted by the first image block and the second image block after filtering processing are different. The filtering processing model is a software implementation adopted in the filtering unit, and specific description can be given in the following, which will not be described in detail herein. Optionally, when the filtering process is a super-resolution neural network-based filtering process, including a super-resolution neural network-based loop filtering process or a super-resolution neural network-based post-processing filtering process, the resolution may be increased after the filtering process is performed on the image block, that is, at least one of the following situations exists: the resolution of the first image block is smaller than that of the second image block, and/or the resolution of the first image block after filtering processing is larger than that of the first image block. When the first image block and the second image block are image blocks of different images of the same image group, an encoder is used for generating I frames, B frames and P frames with different sizes, so that the waste of system resources can be effectively reduced, and high-quality reconstructed frames can be generated by utilizing the correlation among different types of frames.

In an embodiment, the following structures and basic principles of the encoder, the decoder, and the filter are respectively corresponding to the intermediate result determined or generated according to the target fused image block:

1) and directly fusing the first image block and the second image block to obtain a fused image block, and further determining or generating an intermediate result. The corresponding encoder and decoder structures may adopt the structure diagrams shown in fig. 3 and fig. 4, and the structures of the neural network based loop filter used in the encoder or decoder may be the first structure diagram and the second structure diagram of the neural network based loop filter, respectively, as shown in fig. 7a or fig. 7b, and include the neural network based loop filtering unit Af1 and the neural network based loop filtering unit Bf1 (for convenience of description, hereinafter, referred to as filtering units for short). Alternatively, the filtering unit Af1 and the filtering unit Bf1 may be respectively disposed in two different neural network-based loop filters, or may be disposed in the same neural network-based loop filter, where the input reconstruction data a and the reconstruction data B are different reconstruction blocks, and may be a chrominance image block or a luminance image block, respectively, corresponding to the first image block and the second image block. As shown in fig. 7a, after receiving the reconstructed data a and the filtered reconstructed data B output from the neural network-based filtering unit Bf1, the neural network-based loop filtering unit Af1 performs the neural network-based loop filtering process according to an intermediate result generated by the reconstructed data a and the filtered reconstructed data B, and outputs the filtered reconstructed data a. After receiving the reconstructed data B and the filtered reconstructed data a output from the neural network-based filtering unit Af1, the neural network-based loop filtering unit Bf1 performs the neural network-based loop filtering process based on an intermediate result generated between the reconstructed data B and the filtered reconstructed data a, and outputs the filtered reconstructed data B.

In an embodiment, at time t0, the neural network based loop filtering unit Af1 receives the reconstruction data a, performs the neural network based filtering process on the reconstruction data a, and outputs the filtered reconstruction data a_t0. At time t1, the neural network-based loop filtering unit Bf1 receives the reconstructed data B and the filtered re-weightsBuilding data A_t0And reconstructing the data B and the filtered data A_t0Performing fusion and filtering processing based on neural network, outputting filtered reconstruction data B_t1(ii) a At time t2, the neural network-based loop filtering unit Af1 receives the filtered reconstructed data B_t1And reconstructing the data A and the filtered data B_t1Performing fusion and filtering processing based on neural network, and outputting filtered reconstructed data A_t2. Finally, the filtered reconstruction data A output by the loop filter unit Af1 based on the neural network_t2And filtered reconstructed data B output by the loop filtering unit Bf1 based on the neural network_t1As the final result of this processing by the neural network-based loop filter processor. In another embodiment, at time t0, the neural network based loop filter unit Af1 receives the reconstructed data a, performs the neural network based filtering process on the reconstructed data a, and outputs the filtered reconstructed data a_t0(ii) a The loop filtering unit Bf1 based on the neural network receives the reconstruction data B, carries out filtering processing based on the neural network on the reconstruction data B and outputs the filtered reconstruction data B_t0. At time t1, the neural network-based loop filtering unit Af1 receives the filtered reconstructed data B_t0And reconstructing the data A and the filtered data B_t0Performing fusion and filtering processing based on neural network, outputting filtered reconstruction data A_t1(ii) a The loop filter unit Bf1 based on the neural network receives the filtered reconstruction data A_t0And reconstructing the data B and the filtered data A_t0Performing fusion and filtering processing based on neural network, outputting filtered reconstruction data B_t1. Finally, the filtered reconstructed data A is processed_t1And filtered reconstructed data B_t1And is output as the final result of this processing by the neural network-based loop filter processor.

The structure of the neural network based loop filter shown in fig. 7b is substantially the same as that of the neural network based loop filter shown in fig. 7 a. The difference is that a storage unit may be connected after each filtering unit, the storage unit is used to store the reconstructed data after filtering processing, and is a functional module independent of the filtering unit, and optionally, the storage unit may also be a sub-module in the filtering unit. The function of the storage unit is the same as that of the filtering result buffer unit described above. If the first tile is reconstructed data B (or reconstructed data a), the second tile may be a filtered result buffered from storage unit As1 (or storage unit Bs 1), and the filtered result is the tile of reconstructed data a (or reconstructed data B) processed by filtering unit Af1 (or filtering unit Bf 1).

Alternatively, the filtering processing model corresponding to each filtering unit may be a model structure based on a neural network. Neural networks typically include neurons organized in groups called layers, with an input layer, an output layer, and a hidden layer, and deep neural networks typically have two or more hidden layers. Referring to fig. 7c, fig. 7c is a schematic structural diagram of a filtering processing model according to an embodiment of the present disclosure. In the model structure diagram, the fusion module Ac1 may be configured to fuse the reconstructed data a and the filtered reconstructed data B, and obtain an intermediate result of the fused data. The fusion module may be implemented using a join (connect) operation. As shown in fig. 7c, two convolutional layers and at least one residual unit (or residual network) are also included. In one embodiment, the basic principle of the filtering processing model adopted by the loop filter based on the neural network is as follows: the reconstructed data a or intermediate data after fusion of the reconstructed data a and the filtered reconstructed data B is fed into the convolutional layer C1, passes through N residual units, and outputs the filtered reconstructed data a after one convolutional layer C2. For example, the residual data generated by the N residual units and the reconstructed data a are mapped and synthesized in the corresponding relationship at the convolutional layer C2, so as to obtain the filtered reconstructed data a. Note that the convolutional layer C1 and the residual unit correspond to feature extraction and feature enhancement. Convolutional layer C2 corresponds to mapping and synthesis. Similarly, the reconstructed data B or the intermediate data obtained by fusing the reconstructed data B and the filtered reconstructed data a through the fusion module Bc1 is fed into the convolutional layer D1, passes through N residual error units, and is output after one convolutional layer D2Filtered reconstructed data B, here the filtered reconstructed data, i.e. the target filtering result. For example, the residual data generated by the N residual units and the reconstructed data B are processed at the convolutional layer D2, and the filtered reconstructed data B is obtained. In one embodiment, the fusion module, the convolution layer, and the residual unit may each include at least one branch, and the optimal branch combination is selected for different types of reconstruction data. Alternatively, the reconstructed data a may be (luminance tile. e.g., CTB) and the reconstructed data B may be (chrominance tile. e.g., CTB). At the time t0 to t1, the reconstructed data A passes through a loop filtering unit Af1 based on the neural network, and then filtered reconstructed data A is generated_t01(stored in memory cell As 1). At times t1 to t2, reconstructed data B and filtered reconstructed data A_t01After passing through a neural network-based loop filtering unit Bf1, filtered reconstructed data B are generated_t12(stored in the storage unit Bs 1). At time t2 to t3, the filtered reconstructed data B_t12And reconstructed data A, which is generated after passing through a loop filtering unit Af1 based on a neural network_t23(stored in memory cell As 1). Finally, the filtered reconstructed data A is processed_t23And filtered reconstructed data B_t12As the final filtered data output.

The filter process model can be expressed as:

Z_concat=g_z(W_z1*X+B_z1,W_z2*Y+B_z2)

F₁(Z_concat)=g₁(W₁*Z_concat+B₁)

F_i(Z_concat)=g_i(W_i*F_i-1(Z_concat)+B_i), i={2,...,M₁-1}

F_M(Z_concat)=g_M(W_M*F_M-1(Z_concat)+B_M)+X

Z’_concat=g_z(W’_z1*X’+B’_z1,W’_z2*Y’+B’_z2)

F’₁(Z’_concat)=g’₁(W’₁*Z’_concat+B’₁)

F’_i(Z’_concat)=g’_i(W’_i*F_i-1(Z’_concat)+B’_i), i={2,...,M₂-1}

F’_M(Z’_concat)=g’_M(W’_M*F_M-1(Z’_concat)+B’_M)+X’

wherein Z is_concat，Z’_concatRespectively, the outputs of fusion module Ac1 and fusion module Bc 1. And X and Y are input of a fusion module Ac 1. X ', Y' are inputs to the fusion module Bc 1. W_z1，W_z2，W’_z1，W’_z2，W₁~W_M，W’₁~W’_MAs the weight of each layer, B_z1，B_z2，B’_z1，B’_z2，B₁~B_M，B’₁~B’_MIs the bias parameter for each layer. g_z(), g₁(), g_i(), g_M(), g’_z(), g’₁(), g’_i() And g'_M() For activating functions, i.e. convolution operations, M_1,M₂Are integers. F₁(), F_i(),F_M(),F’₁(), F’_i(),F’_M() May be a network representation of convolutional layers, residual units, or shuffle units.

At time t0 to t1, X is the reconstructed data A, Y is 0, X 'is the reconstructed data 0, Y' is 0, and the output is F_M1(Z_concat) And 0; at the time t1 to t2, X is the reconstruction data A, and Y is 0; x 'is reconstructed data B, Y' is F_Mt1(Z_concat) The output is F_M1(Z_concat) And F'_M1(Z’_concat). At the time t 2-t 3, X is reconstruction data A, and Y is F'_M1(Z’_concat) (ii) a X 'is reconstructed data B, Y' is F_Mt1(Z_concat) The output is F_M2(Z_concat) And F'_M1(Z’_concat). Finally, F is mixed_M2(Z_concat) And F'_M1(Z’_concat) Respectively as filtered reconstructed data A_t23And filtered reconstructed data B_t12And output. At this time, F_M2(Z_concat) And F'_M1(Z’_concat) And is the final filtered data output.

In another embodiment, at time t0 to t1, after the reconstructed data A passes through the neural network-based loop filtering unit Af1, filtered reconstructed data A is generated_t01(stored in memory cell As 1); after the reconstructed data B passes through a loop filtering unit Bf1 based on a neural network, filtered reconstructed data B is generated_t01(stored in the storage unit Bs 1). At time t1 to t2, the filtered reconstructed data B_t01And reconstructed data A, which is generated after passing through a loop filtering unit Af1 based on a neural network_t12(stored in memory cell As 1); filtered reconstructed data A_t01And reconstruction data B, which is generated after passing through a loop filtering unit Bf1 based on a neural network_t12(stored in the storage unit Bs 1). Finally, the filtered reconstructed data A is processed_t12And filtered reconstructed data B_t12As the final filtered data output.

The filter process model can be expressed as:

Z_concat=g(W_z1*X+B_z1,W_z2*Y+B_z2)

F₁(Z_concat)=g(W₁*Z_concat+B₁)

F_i(Z_concat)=g(W_i*F_i-1(Z_concat)+B_i), i={2,...,M₁-1}

F_M(Z_concat)=g(W_M*F_M-1(Z_concat)+B_M)+X

Z’_concat=g(W’_z1*X’+B’_z1,W’_z2*Y’+B’_z2)

F’₁(Z’_concat)=g’(W’₁*Z’_concat+B’₁)

F’_i(Z’_concat)=g’(W’_i*F_i-1(Z’_concat)+B’_i), i={2,...,M₂-1}

F’_M(Z’_concat)=g’(W’_M*F_M-1(Z’_concat)+B’_M)+X’

wherein Z is_concat，Z’_concatRespectively, the outputs of fusion module Ac1 and fusion module Bc 1. And X and Y are input of a fusion module Ac 1. X ', Y' are inputs to the fusion module Bc 1. W_z1，W_z2，W’_z1，W’_z2，W₁~W_M，W’₁~W’_MIs the weight of each layer, B_z1，B_z2，B’_z1，B’_z2，B₁~B_M，B’₁~B’_MIs the bias parameter for each layer. g' () and g () are activation functions, convolution operations, M₁，M₂Are integers. F₁(), F_i(),F_M(),F’₁(), F’_i(),F’_M() May be a network representation of convolutional layers, residual units, or shuffle units.

At time t0 to t1, X is the reconstructed data A, Y is 0, X 'is the reconstructed data B, Y' is 0, and the output is F_Mt1(Z_concat) And F'_Mt1(Z’_concat) (ii) a At the time t 1-t 2, X is reconstruction data A, and Y is F'_Mt1(Z’_concat) (ii) a X 'is reconstructed data B, Y' is F_Mt1(Z_concat) The output is F_Mt2(Z_concat) And F'_Mt2(Z’_concat). After time t2, F_Mt2(Z_concat) And F'_Mt2(Z’_concat) Respectively as filtered reconstructed data A_t12And filtered reconstructed data B_t12And output. At this time, F_Mt2(Z_concat) And F'_Mt2(Z’_concat) And is the final filtered data output.The output of the filter processing model has a shorter operation time than the filter processing model in the previous embodiment.

By minimizing the loss function, a parameter set θ, comprising W, may be trained from K sets of training samples { X, Y }, K ═ 1, …, K }_z1,W_z2 ,W_i, B_i, W’_z1,W’_z2,W’_i, B’_iAnd i is {1, …, M }. The loss function is based on the error between the image subjected to the neural network loop based filtering and the original image. In a possible embodiment, when the first tile is a P frame tile or a B frame tile, and the second tile is an I frame tile after loop filtering, see fig. 8a, it is a structural schematic diagram of a loop filter based on a neural network, where a loop filtering unit based on a neural network is different from a filtering processing model corresponding to a filtering unit shown in fig. 7a or fig. 7B, see fig. 8B described below. Aiming at the I frame image block, as the key frame where the image block is located is the complete reservation of one frame of image, the image can be restored by adopting an intra-frame coding mode, and other image blocks do not need to be referred to during loop filtering processing; for reconstructed P frame tiles or B frame tiles (P/B frame tiles for short), since there is no complete picture data, I frame tiles need to be referenced to better restore the input image, and I frame tiles can be obtained from the coded image buffer. That is, FIG. 8a illustrates a filtering approach that utilizes I frame tiles that have been loop filtered to generate filtered P/B frame tiles, i.e., target filtering results.

Illustratively, after the I-frame tiles are subjected to loop filtering processing (e.g., at least one of deblocking filtering, neural network-based loop filtering, sample adaptive offset, and adaptive loop filtering), the loop-filtered I-frame tiles are stored in a decoded picture buffer. When a loop filtering process based on a neural network is performed on a P/B frame tile in one GOP (group of pictures), an I frame tile of a corresponding position can be read from a decoded picture buffer and used as a reference tile for the P/B frame tile. Then, a neural network based loop filter unit receives the I frame tiles of the corresponding locations read from the decoded picture buffer and the reconstructed P/B frame tiles, producing filtered P/B frame tiles. For the structural schematic diagram of the filtering processing model corresponding to the filtering unit shown in fig. 8a, see fig. 8B, and the structure is similar to that shown in fig. 7c, and includes a fusion module, at least one residual unit, and two convolution layers, where after the reconstructed P/B frame image block and the I frame image block subjected to the loop filtering processing are fused by the fusion module, the reconstruction module sequentially passes through the convolution layer, the at least one residual unit, and the convolution layer to generate a filtered P/B frame image block, and for example, mapping and synthesizing the residual data generated by the N residual units and the reconstructed P/B frame image block in the convolution layer in a corresponding relationship to obtain the filtered P/B frame image block. However, the number of residual error units included in the filter processing model corresponding to the filter unit, and the parameters of convolutional layer settings, etc. may be completely different from those in fig. 7 c.

In a possible embodiment, the filter processing model corresponding to the filter unit may be determined according to a quantization parameter, which is obtained based on a quantization parameter map, which is a matrix filled with at least one quantization parameter value. Each loop filtering unit based on the neural network may have at least one candidate model, and the candidate model may be an optimal model obtained by training the neural network under different quantization parameters, that is, each candidate model corresponds to a different quantization parameter, and optionally, a suitable quantization parameter may be selected based on a rate-distortion cost on the encoding side and a suitable candidate model may be selected from the at least one candidate model. As shown in fig. 9, a schematic diagram of a structure of a loop filter based on a neural network is shown, in which quantization parameters and reconstructed data at an input end are input to a filtering unit, and an optimal filtering processing model of the filtering unit Af1 can be determined according to the quantization parameters Aq1, and an optimal filtering processing model of the filtering unit Bf1 can be determined according to the quantization parameters Bq 1. The reconstructed data is further processed, and the structure is suitable for the case that the reconstructed data is a chrominance image block and a luminance image block. Similarly, the quantization parameter may be input to the reconstructed P/B frame image block and the filtered I frame image block to determine the specific structure or parameter of the filter processing model corresponding to the filter unit.

In a possible embodiment, still using the structure of the encoder shown in fig. 3 and the basic structure of the decoder shown in fig. 4, the following can be done for the reconstructed data using a neural network based loop filter: for P/B frame tiles, a portion of the high frequency information may be discarded after the transform process, while for I frame tiles, one may choose to retain slightly more high frequency information than for P/B frame tiles. For example, a CU is partitioned into Transform Units (TUs) according to a quadtree structure, such as a coding tree for the CU. In the transform processing, transform processing (DCT transform) is performed on the transform unit, and then DCT coefficients are obtained. For P/B frames, a threshold may be set that indicates the number of DCT coefficients that are retained. I.e. discarding part of the high frequency coefficients (also called AC coefficients) in dependence of the threshold, e.g. setting AC coefficients larger than the threshold to zero. For I frames, more high frequency coefficients are retained than P/B frames. For the P frame/B frame, the I frame image blocks at corresponding positions are read from the coding image buffer (namely the decoding image buffer at the decoding end) in the loop filtering process based on the neural network, and the I frame image blocks and the P frame/B frame image blocks are fused because the I frame contains more high-frequency information, so that the P frame/B frame can restore part of the high-frequency information in the loop filtering process of the neural network. Alternatively, the structural diagram of the loop filter based on the neural network may also adopt the content shown in fig. 8a, and the structural diagram of the filtering processing model shown in fig. 8b may be used as the filtering processing model in more detail.

2) And fusing the first image block and the second image block with the adjusted size to obtain a fused image block, and further determining or generating an intermediate result. The structure of the encoder used may be the structure shown in fig. 3, and the decoder is added with a post-filter based on a super-resolution neural network, please refer to fig. 10, and fig. 10 shows a processing function schematic diagram of a decoder, in the decoder 500, an image block output after loop filtering processing passes through a post-processing filter of the super-resolution neural network, which can effectively improve the resolution of the image block and greatly reduce the distortion of the decoded image. Optionally, the loop filter based on the neural network may be included in the loop filtering process, and the loop filter based on the neural network may be used together with a post-processing filter of the super-resolution neural network to form a dual guarantee for reducing the image distortion degree, or may not be used, that is, the reconstructed image block is subjected to deblocking filtering, sampling adaptive offset, and adaptive loop filtering in sequence, and then directly output to the post-processing filter based on the super-resolution neural network for processing.

When a first image block output by loop filtering is processed, a first structural schematic diagram and a second structural schematic diagram of a post-processing filter based on a super-resolution neural network may be as shown in fig. 11a or 11B, where reconstructed data a and B at input ends are data after the loop filtering, that is, image blocks output by adaptive loop filtering, and the structures of the first and second structural schematic diagrams are similar to those of the aforementioned fig. 7a and 7B, and are not described herein again, optionally, refer to fig. 11c, which is a third structural schematic diagram of a filtering processing model shown in this embodiment, and the third structural schematic diagram is applicable to a filtering unit of the post-processing filter based on a super-resolution neural network, and the filtering processing models corresponding to the filtering unit in the loop filter based on the neural network are different: the post-processing filter based on the super-resolution neural network comprises more shuffling layers in a filtering unit, and can be used for improving the resolution of an image block. The basic principle of post-processing filter processing for a super-resolution neural network is as follows: the reconstructed data a or the intermediate data obtained by fusing the reconstructed data a and the filtered reconstructed data B are fed into the convolutional layer, and after passing through N residual error units and one convolutional layer, the filtered reconstructed data a is generated from the output of the convolutional layer by the shuffle layer M1, for example, the residual error data generated by the N residual error units and the reconstructed data a are mapped and synthesized in the convolutional layer in a corresponding relationship, and then pass through the shuffle layer M1 to obtain the filtered reconstructed data a; mapping and synthesizing corresponding relations between residual data generated by the N residual units and the reconstructed data B in the convolutional layer, and obtaining filtered reconstructed data B through the shuffling layer M2. Optionally, the spatial resolution of the filtered reconstruction data a is higher than that of the reconstruction data a, and the reconstruction data B is the same as that of the reconstruction data a, which is not described herein again. Alternatively, the first image block may be reconstructed data a, for example a luminance image block (chrominance image block), and the second image block, i.e. filtered reconstructed data B, i.e. a filtered chrominance image block (or a filtered chrominance image block).

The reconstructed data may also be an I frame tile, or a P/B frame tile. For the I frame image blocks, the filtering of other image blocks is not uniformly referred. Therefore, when the first image block is an I-frame image block, the information of other image blocks does not need to be acquired to reconstruct the image. Referring to fig. 12a, fig. 12a is a schematic processing diagram of a post-processing filter based on a super-resolution neural network according to an embodiment of the present application, and fig. 12b is a schematic filtering structure diagram in more detail. The reconstructed data a in fig. 12B is an I frame image block, the reconstructed data B is a P/B frame image block, and the storage unit As2 and the filtering result buffer unit have the same function.

3) For the way of obtaining the fused image block and further obtaining the intermediate result by fusing the resized first image block and the resized second image block, the corresponding encoder and decoder are an encoder and decoder added with a scaling module on the basis of the structure of the video codec shown in fig. 3 or fig. 4, as shown in fig. 13a and fig. 13B, in the encoder 600, the input video data is transmitted to the scaling module, the scaling process may be performed on the input first image block (for example, a P/B frame image block), and a scaling module is added between the encoded image buffer and the inter-frame prediction, so as to improve the encoding efficiency during the inter-frame prediction. Alternatively, the scaling process on the input first image block (e.g., P/B frame image block) may be performed before the transform and quantization process. In the decoder 700, only a scaling module is added between the decoded image buffer and the inter prediction, and the scaling processes performed by the scaling modules in the above encoder or decoder are all down-sampling processes for reducing the size of the image block and reducing the amount of transmitted data. The scaling module as in fig. 13a or fig. 13b functions as follows: if the current frame is an I frame in the input bitstream, the scaling module does not scale the current frame, and the scaling operation is a down-sampling operation. If the current frame is a B frame, or a P frame, the scaling module scales (downsamples) it. In addition, for the reference frame read from the coded image buffer for inter-frame prediction, the reference frame needs to be scaled (down-sampled) before being fed into the inter-frame prediction module. Thus, when a loop filter processes a P/B frame tile, the P/B frame tile is the tile after the downsampling process, and the I frame is not downsampled. Optionally, a scaling module may be included in the filtering unit (e.g., a neural network-based loop filter). The scaling processing of the scaling module is up-sampling processing and is used for reducing the size of an image block, fusing the processed image blocks and executing loop filtering processing, so that the data volume of B frames and P frames can be saved, the distortion degree of the B frames and the P frames can be reduced by referring to the I frame image block, and further more efficient video coding and decoding are realized.

The structure of the neural network-based loop filter may also be as shown in fig. 14, where the scaled reconstructed P/B frame image block is fused with the I frame image block subjected to the loop filtering processing by the fusion module, and then sequentially passes through the convolutional layer, the at least one residual unit, and the convolutional layer to output the filtered P/B frame image block. For example, mapping and synthesizing corresponding relations between residual data generated by the N residual units and the scaled reconstructed P/B frame image block in the convolutional layer to obtain a filtered P/B frame image block. The input data and the output data are similar to those in the aforementioned fig. 8B, the input data are reconstructed P/B frame image blocks and I frame image blocks processed by loop filtering, and the filtered P/B frame image blocks are output, except that: since the I frame is a frame that is not subjected to scaling (down-sampling) as compared with the P/B frame, the loss of information due to scaling (down-sampling) of the P frame of the B frame can be restored to some extent by using the I frame that is not subjected to scaling. In a filtering unit of a loop filter based on a neural network, upsampling processing needs to be carried out on reconstructed P/B frame image blocks. The specific upsampling processing mode may include: the neighbors are in a copy manner, other locations are filled with 0 s, transposed convolutions, etc., without limitation. It should be noted that scaling and fusion may also be regarded as a functional block independent of the filtering unit, i.e. a pre-processing operation before actually performing the filtering process.

In a possible embodiment, the filtering process for the intermediate result may also be a filtering process based on a loop filter of a super-resolution neural network, and since the image block obtained by the processing based on the loop filter of the super-resolution neural network is a high-resolution image block, after the processing of sampling adaptive offset and adaptive loop filtering, the image stored in the encoded image buffer needs to be scaled when the image is predicted between frames, for example, the size (or resolution) of the image block is unified with other image blocks by the down-sampling process. A scaling module may be provided between the encoded image buffer and the inter prediction, as shown in fig. 15a and 15b, which are the encoder 800 and the decoder 900, respectively. Alternatively, the structure of the loop filter based on the super-resolution neural network may be as shown in fig. 16, or may adopt the structure schematic as shown in fig. 11c, which respectively correspond to the structure schematic of the filtering units under different reconstruction data.

With the configuration shown in fig. 11c, when the reconstruction data a (or the reconstruction data B) is processed, the reconstruction data B (or the reconstruction data a) processed by the other filtering unit is subjected to scaling processing and then fused, and is further processed by the super-resolution neural network (including the convolutional layer, the residual unit, and the shuffle layer), so that the filtered reconstruction data a (or the reconstruction data B) is obtained. The configuration shown in fig. 11c is suitable for the case where the reconstructed data is a luminance image block or a chrominance image block.

As shown in fig. 16, the reconstructed P/B frame image block is fused with the scaled I frame image block subjected to loop filtering by the fusion module, and then sequentially passes through the convolutional layer, the residual unit, and the convolutional layer to output the filtered P/B frame image block. For example, mapping and synthesizing corresponding relations between residual data generated by the N residual units and reconstructed P/B frame image blocks in the convolutional layer to obtain filtered P/B frame image blocks. At least one intermediate result may be a target fusion image block obtained by fusing the I frame image block subjected to the loop filtering processing and the reconstructed P/B frame image block, and a filtering unit for performing the filtering processing on the target fusion image block may be the same as a filtering unit in the post-processing filter based on the super-resolution neural network, as shown in fig. 12a and 12B, except that: different from the input reconstruction data, the reconstructed P/B frame image block processed by the loop filter based on the super-resolution neural network in this embodiment may be obtained from the filtering result buffer unit of the filter itself without being subjected to the adaptive loop filtering processing. Specific processing principles are not described in detail herein. Compared with a post-processor (namely a post-processing filter) based on the super-resolution neural network, the loop filtering based on the super-resolution neural network can directly output the high-resolution bit stream, and a high-resolution image can be obtained in a loop filtering link without the post-processor based on the super-resolution neural network, so that the high-resolution bit stream can be efficiently generated and the video stream can be efficiently decoded.

by acquiring different types of image blocks, fusion is performed by utilizing image information of each dimension of the image blocks, such as edge features, brightness features and chrominance features, to obtain a fusion image block, and then an intermediate result is determined or generated by utilizing the fusion image block, so that the image information included in the intermediate result is richer, and a more accurate filtering result can be obtained after the intermediate result is subjected to filtering processing, thereby improving the quality of a decoded image or a reconstructed image related to the image block. In addition, scaling processing is carried out on different types of image blocks under different conditions to adapt to different filtering processing, and the distortion degree of a reconstructed image or a decoded image related to at least one image block can be reduced by using a filtering result obtained by the filtering processing, so that the quality of video coding and decoding is improved.

Third embodiment

Referring to fig. 17, fig. 17 is a flowchart illustrating an image processing method according to a third embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be a terminal device (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is taken as the terminal device for example and is described.

S1701, an image block to be processed is acquired.

In an embodiment, the to-be-processed image block may refer to reconstructed data that has not been subjected to filtering processing based on a neural network, for example, the to-be-processed image block may be a reconstructed image block output by a deblocking filter in the encoder shown in fig. 3. Alternatively, according to various codec structures shown in the second embodiment, the neural network-based filtering process includes any one of a neural network-based loop filtering process, a super-resolution neural network-based loop filtering process, and a super-resolution neural network-based post-processing filtering process.

And S1702, determining or generating a target filtering mode according to the attribute information of the image block to be processed.

In an embodiment, the target filtering manner includes a first target filtering manner and/or a second target filtering manner; optionally, the first target filtering manner and the second target filtering manner are the same type of filtering manner, and structures and/or parameters of filtering processing models corresponding to the first target filtering manner and the second target filtering manner are different. For example, the first target filtering method and the second target filtering method are both filtering methods based on a loop filter of a neural network, and the filtering processing model refers to a neural network model used in the loop filter based on the neural network, and for different processing objects (such as the first image block and the second image block), the filtering processing models of the target filtering methods are different, for example, when the filtering processing model is the neural network model, the structure and/or parameters thereof may be adaptively adjusted.

The implementation of this step may be: if the attribute information of the image block to be processed indicates that the image block to be processed is a first image block, determining or generating a first target filtering mode; and/or determining or generating a second target filtering mode if the attribute information of the image block to be processed indicates that the image block to be processed is a second image block. The attribute information may be an attribute identifier, configured to indicate a type to which an image block to be processed belongs, that is, configured to indicate a mapping relationship with the type of the image block to be processed, where the first image block and the second image block are different types of image blocks in the same dimension, for example, the first image block may be a luminance image block, the second image block may be a chrominance image block, and both of the first image block and the second image block may belong to different component information of the same frame image, and for example, the first image block is an I-frame image block, the second image block is a P-frame image block or a B-frame image block, and images in which both of the first image block and the second image block belong may belong to the same image group. The determined or generated target filtering manner is also different for different types of image blocks. Note that the second image block in this embodiment is an image block that has not been subjected to filtering processing, and has a different meaning from the second image block (image block after being subjected to filtering processing) in the first and second embodiments.

Optionally, the corresponding condition for the target filtering mode further includes at least one of the following:

the first target filtering mode and the second target filtering mode correspond to different filtering units, and a mapping relation exists between a filtering processing model corresponding to the filtering unit and a quantization parameter;

and the filtering processing model corresponding to the filtering unit is determined or generated according to the quantization parameter corresponding to the target coding cost.

The filtering units may serve as basic units for performing filtering processing on the image blocks, and each filtering unit has a corresponding filtering processing model, which may be various types of neural networks, such as a deep separable convolutional neural network, a super-resolution neural network, and the like. For each filtering unit, there may be at least one candidate model, each candidate model corresponding to a different quantization parameter, and each candidate model is an optimal model obtained by training a neural network under the participation of a specific quantization parameter, so that a mapping relationship exists between the candidate model and the quantization parameter. The filtering processing model corresponding to the filtering unit may be determined or generated from the at least one candidate model according to a quantization parameter corresponding to the target coding cost, for example, a suitable quantization parameter is selected based on a rate-distortion cost on the coding side, and a suitable candidate model is determined from the at least one candidate model and is used as the filtering processing model.

Alternatively, the step of obtaining the filtering processing model by using the target coding cost may be: obtaining a quantization parameter matrix; determining or generating a target coding cost according to the quantization parameters included in the quantization parameter matrix; and determining or generating a filtering processing model according to the quantization parameter corresponding to the target coding cost. The quantization parameter matrix, that is, the quantization parameter map (QP map), is a matrix filled with at least one quantization parameter value, each quantization parameter included therein may reflect a spatial detail compression condition, and each quantization parameter has a candidate model corresponding thereto, and different candidate models bring differences of coding costs under different quantization parameters, optionally, the coding cost may be represented by a rate distortion cost, so that the coding cost corresponding to each quantization parameter may be different, and the coding cost may be determined as a target coding cost, and due to a mapping relationship between the coding cost and the quantization parameter, and between the quantization parameter and the candidate model, the quantization parameter may be determined according to the target coding cost, and further, the best matching candidate model may be determined, and the candidate model may be used as a filtering processing model.

And S1703, filtering the image block to be processed according to the target filtering mode to obtain a target filtering result.

In an embodiment, according to the target filtering manners determined for the different types of the image blocks to be processed in step S1702, the first image block may be processed by using the first target filtering manner, and the second image block may be processed by using the second target filtering manner. The target filtering result obtained by the filtering processing may be a first filtered image block or a second filtered image block, and the target filtering result may be used to determine or generate a reconstructed image where the image block to be processed is located at the encoding end, or determine or generate a decoded image where the image block to be processed is located at the decoding end.

for different types of image blocks, different target filtering manners are adopted for processing, specifically, different filtering units corresponding to the same type of filtering manners, for example, neural network models included in the different filtering units, are adopted for adaptively processing each image block, so that the filtering processing of the image blocks can be more reasonable and efficient. In addition, the selection of the filtering processing model corresponding to the filtering unit can be determined or generated based on the quantization parameter corresponding to the minimum coding cost, and by adopting the method, the accuracy of matching the filtering processing models of different filtering units can be effectively improved, and the video coding and decoding performance can be improved by efficiently utilizing computer resources.

Fourth embodiment

Referring to fig. 18, fig. 18 is a flowchart illustrating an image processing method according to a fourth embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be a terminal device (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is taken as the terminal device for example and is described.

And S1801, acquiring an image block to be processed.

And S1802, determining or generating a target filtering mode according to the attribute information of the image block to be processed.

Steps S1801 to S1802 may refer to steps S1701 to S1702 in the third embodiment, which are not described herein again.

And S1803, if the image block to be processed is the first image block, acquiring a reference image block, and performing filtering processing on the reference image block and the first image block according to a target filtering mode to obtain a target filtering result.

In an embodiment, the first image block includes any one of a difference frame image block, a chrominance image block, and a luminance image block, and the reference image block is a filtered image block. The reference image blocks may include any one of a key frame image block after filtering, a luminance image block after filtering, and a chrominance image block after filtering according to the difference of the first image block, where the difference frame image block includes a P frame image block and a B frame image block, and the key frame image block is an I frame image block. Optionally, the reference image block and the target filtering result are subjected to the same type of filtering process. For example, if the first image block is a difference frame image block, the acquired reference image block is a key frame image block after being subjected to loop filtering based on the neural network, and the corresponding target filtering result is also obtained after being subjected to loop filtering based on the neural network. In the processing of the first image block and the reference image block, the target filtering method is the first target filtering method.

In an embodiment, the filtering the reference image block and the first image block according to the target filtering manner to obtain the target filtering result may include: preprocessing a reference image block and a first image block to obtain a filtering object; and carrying out filtering processing on the filtering processing object according to the target filtering mode to obtain a target filtering result. The preprocessing here may include scaling, fusion, or no processing. The filtering object may be obtained by fusing the preprocessed reference image block and the preprocessed first image block, and the filtering object is also an image block, and the specific determination method of the filtering object may be as follows.

Optionally, the reference image block and the first image block are preprocessed to obtain a filtering object, where the filtering object includes at least one of:

firstly, fusing a reference image block and a first image block to obtain a first fused image block, and determining the first fused image block as a filtering processing object;

scaling the reference image block and/or the first image block to obtain a size-adjusted reference image block and/or a size-adjusted first image block, performing fusion processing on any one of the reference image block and the size-adjusted reference image block and any one of the first image block and the size-adjusted first image block to obtain a second fusion image block, and determining the second fusion image block as a filtering processing object;

and zeroing partial transform coefficients included in the first image block according to a transform coefficient threshold to obtain a processed first image block, performing fusion processing on the processed first image block and the reference image block to obtain a third fused image block, and determining the third fused image block as a filtering processing object.

The following description will be made in detail with respect to the filter processing target obtained by the above three preprocessing methods.

1) The acquired first image block and the reference image block are directly fused, the obtained first fused image block can be used as a filtering processing object, the filtering processing of the first image block originally can be changed into the filtering processing with the reference image block when the filtering processing object is filtered, so that the filtering of the first image block can be assisted by utilizing the image information of the reference image block, the distortion degree of the first image block after the filtering processing is reduced by compensating the information loss of the first image block, and the quality of a reconstructed image and the finally decoded and output image is further ensured.

2) The method corresponds to the possibility of multiple combinations, and different second fusion image blocks can be generated or determined according to different combination modes, so that a filtering processing object is obtained. The scaling processing of the reference image block and the first image block has different processing under different conditions, only one of the reference image block and the first image block can be scaled, and then the other image block is fused without scaling processing to obtain a second fused image block, for example, the reference image block and the first image block with the adjusted size are fused to be used as the second fused image block; or scaling both the two image blocks, and fusing the two scaled image blocks to obtain a second fused image block. The scaling process includes a down-sampling process or an up-sampling process, and the resizing may be a size reduction or a size enlargement, as the case may be. Further description is given to the manner of obtaining the filtering processing object, which may refer to the implementation manner of the intermediate result determined or generated according to the target fusion image block in the second embodiment, and the second image block in this embodiment is replaced with the reference image block, that is, the same processing may be implemented, which is not described herein again. It should be noted that the second image block in the second embodiment is different from the second image block in the present embodiment, the former is an image block after being subjected to filtering processing, the latter is an image block without being subjected to filtering processing, and the reference image block in the present embodiment may correspond to the second image block in the first embodiment or the second embodiment.

3) In the method, the high-frequency information of the first image block is discarded by setting a transform coefficient threshold, and then the first image block is restored by using the high-frequency information of the reference image block, and the third fused image block is the image block including the compensated high-frequency information. Alternatively, the transform coefficient threshold may be set according to an artificial experience value, or may be set in another manner. The transform coefficients included in the first image block refer to high frequency coefficients (also referred to as AC coefficients) and are a part of the DCT coefficients (the DCT coefficients include low frequency coefficients and high frequency coefficients) of the first image block after the transform processing. The transform coefficient larger than the transform coefficient threshold in the first image block may be set to zero, so as to implement the processing of discarding the high frequency information, so that more retained low frequency information in the first image block, and the reference image block retains more high frequency information, so as to merge the two, thereby implementing the restoration of the high frequency information. Alternatively, the first tile refers to a P frame tile or a B frame tile and the reference tile is an I frame tile that reads a corresponding location from an image buffer (e.g., an encoded image buffer or a decoded image buffer). For a P frame tile or a B frame tile, the neural network based loop filter process may retrieve an I frame containing more high frequency information from the image buffer, thereby restoring a portion of the high frequency information of the P frame tile or the B frame tile through the neural network based loop filter process. In this way, the first image block can be rapidly processed in the processes after the transformation process, including quantization, inverse transformation and the like, and the quality of video coding and decoding can be ensured in the subsequent processing processes.

In a possible embodiment, performing filtering processing on the filtering processing object according to the target filtering manner to obtain a target filtering result may include: if the filtering processing object is the first fusion image block or the third fusion image block, filtering the filtering processing object according to a first type of target filtering mode to obtain a target filtering result; and/or if the filtering processing object is a second fusion image block or a second image block, performing filtering processing on the filtering processing object according to a first type of target filtering mode or a second type of target filtering mode to obtain a target filtering result.

In other words, when the filter processing objects are different, the types of the target filter methods used are also different. In this case, the filter units corresponding to the two types of target filtering methods and the included filter processing models are completely different. Optionally, the first type of target filtering manner adopted by the first fused image block and the third fused image block may be a loop filtering processing based on a neural network, the third fused image block is an image block that restores the high-frequency information of the first image block by using the high-frequency information of the reference image block, scaling processing is not involved, the first fused image block is obtained by directly fusing the two image blocks, scaling processing is also not involved, scaling processing is a preprocessing manner required by the second type of target filtering manner, and the size of the image blocks can be unified or the calculated amount or the data transmission amount can be reduced.

The filter processing of the second fusion image block and the second image block corresponds to two types of filter processing manners, that is, the aforementioned first type of target filter manner, for example, loop filter processing based on a neural network, may also be adopted, and the second type of target filter manner, for example, loop filter processing based on a super-resolution neural network, or post-processing filter based on a super-resolution neural network may also be adopted. In some cases, the target filtering manners adopted by the second fused image block and the second image block may both be the first type of target filtering manner, so that the scaling processing involved in the second fused image block is mainly to reduce the data amount transmitted by the image block, thereby saving resources. The second fused image block and the target filtering mode adopted by the second image block may both be a second type of target filtering mode, and in this mode, the size adjustment in the process of obtaining the second fused image block is to unify the size, so that the fused image block is obtained quickly. Optionally, the target filtering manners adopted by the second fusion image block and the second image block may also be a first type of target filtering manner and a second type of target filtering manner, respectively. The target filtering results obtained for different types of filtering modes can be used for determining or generating a reconstructed image or a decoded image corresponding to the image block to be processed.

And S1804, if the image block to be processed is a second image block, filtering the second image block according to a target filtering mode to obtain a target filtering result.

In an embodiment, the second tile comprises a key frame tile, i.e. an I frame tile; the target filtering method is a second target filtering method, that is, a filtering method of a different filtering unit (or a different filtering processing model) of the same type of filtering method as the filtering processing performed on the first image block. The image information included in the second image block is complete, so that other image blocks are not required to be fused for filtering, the obtained target filtering result is the filtered second image block, and the target filtering result can be used as a reference image block when the first image block is a difference frame image block.

aiming at different types of image blocks to be processed, different filtering modes are used for carrying out targeted filtering processing on the image blocks; when the image block to be processed is the first image block or the second image block, direct filtering processing can be selected, or a corresponding reference image block is obtained, filtering processing is performed after preprocessing, preprocessing of the reference image block and the image block to be processed comprises scaling processing and fusion processing, the scaling processing can reduce the processing amount of image data, data resources required by transmission can be saved, fusion processing can ensure information fusion between image blocks with correlation, the filtering effect of the image block is improved, and then distortion of the image is reduced.

Fifth embodiment

Referring to fig. 19, fig. 19 is a schematic diagram illustrating a structure of an image processing apparatus according to a fifth embodiment, where the image processing apparatus may be a computer program (including program code) running in a server, and the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The image processing apparatus includes: an acquisition module 1901, a determination module 1902, and a processing module 1903.

An obtaining module 1901, configured to obtain at least one image block;

a determining module 1902, configured to determine or generate at least one intermediate result according to the at least one image block;

a processing module 1903, configured to perform filtering processing on the at least one intermediate result to obtain a target filtering result, where the target filtering result is used to determine or generate a reconstructed image or a decoded image corresponding to the at least one image block.

In an embodiment, the at least one image block comprises a first image block and a second image block; the first image block comprises an image block which is not based on the neural network filtering processing, and the second image block comprises an image block which is based on the neural network filtering processing; or the second image block and the target filtering result are subjected to the same type of filtering processing; the determining module 1902 is specifically configured to: determining or generating at least one intermediate result according to the first image block and the second image block.

In an embodiment, the at least one tile comprises a first tile comprising a tile that has not been processed based on neural network filtering; the determining module 1902 is specifically configured to: acquiring a second image block corresponding to the first image block, wherein the second image block comprises an image block which is filtered based on a neural network; or the second image block and the target filtering result are subjected to the same type of filtering processing; determining or generating at least one intermediate result according to the first image block and the second image block.

Optionally, at least one of the following is included: the first image block and the second image block belong to different types of image blocks; the image of the second image block and the image of the first image block belong to the same image group; and the image of the second image block and the image of the first image block are sequentially encoded or decoded.

In an embodiment, the obtaining module 1901 is specifically configured to: acquiring a first image block; and acquiring a second image block according to the attribute information of the first image block.

In an embodiment, the obtaining module 1901 is specifically configured to: and acquiring a second image block from a filtering result cache unit or an image buffer according to the attribute information of the first image block, wherein the second image block comprises the image block after filtering.

In an embodiment, the determining module 1902 is further specifically configured to: if the attribute information of the first image block indicates that the first image block is a chrominance image block or a luminance image block, acquiring a second image block from a filtering result cache unit, wherein the second image block is a luminance image block after filtering processing or a chrominance image block after filtering processing correspondingly; and if the attribute information of the first image block indicates that the first image block is a difference frame image block, acquiring a second image block from a filtering result cache unit or an image buffer, wherein the second image block is a key frame image block after filtering.

In an embodiment, the determining module 1902 is specifically configured to: acquiring the image characteristics of the first image block and the image characteristics of the second image block; and determining or generating at least one intermediate result according to the image characteristics of the first image block and the image characteristics of the second image block.

In an embodiment, the image feature includes an edge feature, and the determining module 1902 is specifically configured to: if the first image block is a luminance image block or a chrominance image block, acquiring edge features of the first image block and edge features of the second image block; fusing the edge characteristics of the first image block and the edge characteristics of the second image block to obtain an edge fused image block; and determining or generating at least one intermediate result according to the edge fusion image block.

In an embodiment, the image feature includes a luminance feature and/or a chrominance feature, and the determining module 1902 is specifically configured to: if the first image block is a difference frame image block or a key frame image block, acquiring the brightness characteristic and/or the chrominance characteristic of the first image block, and acquiring the brightness characteristic and/or the chrominance characteristic of the second image block; fusing the brightness characteristic of the first image block and the brightness characteristic of the second image block to obtain a brightness fused image block, and/or fusing the chrominance characteristic of the second image block and the chrominance characteristic of the first image block to obtain a chrominance fused image block; and determining or generating at least one intermediate result according to the brightness fused image block and/or the chroma fused image block.

In an embodiment, the determining module 1902 is specifically configured to: fusing the first image block and the second image block to obtain a target fused image block; and determining or generating at least one intermediate result according to the target fusion image block.

In an embodiment, the determining module 1902 is specifically configured to: scaling the first image block and/or the second image block to obtain a first image block and/or a second image block with adjusted size; and performing fusion processing on any one of the first image block and the first image block with the adjusted size and any one of the second image block and the second image block with the adjusted size to obtain a target fusion image block.

In one embodiment, the at least one intermediate result includes at least one fused image block; the processing module 1903 is specifically configured to: and performing filtering processing on the at least one fused image block by using a filtering processing mode adopted by the second image block to obtain a target filtering result.

In another embodiment, the image processing apparatus as shown in fig. 19 described above may also be applied to an image processing method in which:

an obtaining module 1901, configured to obtain an image block to be processed;

a determining module 1902, configured to determine or generate a target filtering manner according to the attribute information of the to-be-processed image block;

the processing module 1903 is configured to perform filtering processing on the to-be-processed image block according to the target filtering manner to obtain a target filtering result.

In an embodiment, the target filtering manner includes a first target filtering manner and/or a second target filtering manner; the determining module 1902 is specifically configured to: if the attribute information of the image block to be processed indicates that the image block to be processed is a first image block, determining or generating the first target filtering mode; and/or determining or generating the second target filtering mode if the attribute information of the image block to be processed indicates that the image block to be processed is a second image block.

Optionally, at least one of the following is included: the first target filtering mode and the second target filtering mode correspond to different filtering units, and a mapping relation exists between a filtering processing model corresponding to the filtering unit and a quantization parameter; and the filtering processing model corresponding to the filtering unit is determined or generated according to the quantization parameter corresponding to the target coding cost.

In an embodiment, the processing module 1903 is further specifically configured to at least one of: if the image block to be processed is a first image block, acquiring a reference image block, and performing filtering processing on the reference image block and the first image block according to the target filtering mode to obtain a target filtering result; and/or if the image block to be processed is a second image block, filtering the second image block according to the target filtering mode to obtain a target filtering result.

In an embodiment, the processing module 1903 is specifically configured to: preprocessing the reference image block and the first image block to obtain a filtering object; and carrying out filtering processing on the filtering processing object according to the target filtering mode to obtain a target filtering result.

In an embodiment, the processing module 1903 is specifically configured to at least one of: performing fusion processing on the reference image block and the first image block to obtain a first fusion image block, and determining the first fusion image block as a filtering processing object; scaling the reference image block and/or the first image block to obtain a size-adjusted reference image block and/or a size-adjusted first image block, performing fusion processing on any one of the reference image block and the size-adjusted reference image block and any one of the first image block and the size-adjusted first image block to obtain a second fused image block, and determining the second fused image block as the filtering processing object; and carrying out zero setting processing on partial transform coefficients included in the first image block according to a transform coefficient threshold to obtain a processed first image block, carrying out fusion processing on the processed first image block and the reference image block to obtain a third fused image block, and determining the third fused image block as the filtering processing object.

In an embodiment, the processing module 1903 is specifically configured to at least one of: if the filtering processing object is the first fusion image block or the third fusion image block, filtering the filtering processing object according to a first type of target filtering mode to obtain a target filtering result; and if the filtering processing object is the second fusion image block or the second image block, filtering the filtering processing object according to the first type of target filtering mode or the second type of target filtering mode to obtain a target filtering result.

It can be understood that the functions of the functional modules of the image processing apparatus described in the embodiment of the present application can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process of the method can refer to the description related to the foregoing method embodiment, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

The embodiment of the application further provides an intelligent terminal, which comprises a memory and a processor, wherein the memory stores an image processing program, and the image processing program is executed by the processor to realize the steps of the image processing method in any embodiment. The smart terminal may be a mobile terminal 100 as shown in fig. 1.

Alternatively, the processor 110 of the mobile terminal 100 as shown in fig. 1 may be used to call up an image processing program stored in the memory 109 to perform the following operations:

acquiring at least one image block;

and performing filtering processing on the at least one intermediate result to obtain a target filtering result, wherein the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the at least one image block.

In an embodiment, the at least one image block comprises a first image block and a second image block; the first image block comprises an image block which is not based on the neural network filtering processing, and the second image block comprises an image block which is based on the neural network filtering processing; or the second image block and the target filtering result are subjected to the same type of filtering processing; the processor 110 is specifically configured to: determining or generating at least one intermediate result according to the first image block and the second image block.

In an embodiment, the at least one tile comprises a first tile comprising a tile that has not been processed based on neural network filtering; the processor 110 is specifically configured to: acquiring a second image block corresponding to the first image block, wherein the second image block comprises an image block which is filtered based on a neural network; or the second image block and the target filtering result are subjected to the same type of filtering processing; determining or generating at least one intermediate result according to the first image block and the second image block.

In one embodiment, the processor 110 is specifically configured to: acquiring a first image block; and acquiring a second image block according to the attribute information of the first image block.

In one embodiment, the processor 110 is specifically configured to: and acquiring a second image block from a filtering result cache unit or an image buffer according to the attribute information of the first image block, wherein the second image block comprises the image block after filtering.

In one embodiment, the processor 110 is specifically configured to at least one of: if the attribute information of the first image block indicates that the first image block is a chrominance image block or a luminance image block, acquiring a second image block from a filtering result cache unit, wherein the second image block is a luminance image block after filtering processing or a chrominance image block after filtering processing correspondingly; and if the attribute information of the first image block indicates that the first image block is a difference frame image block, acquiring a second image block from a filtering result cache unit or an image buffer, wherein the second image block is a key frame image block after filtering.

In one embodiment, the processor 110 is specifically configured to: acquiring the image characteristics of the first image block and the image characteristics of the second image block; and determining or generating at least one intermediate result according to the image characteristics of the first image block and the image characteristics of the second image block.

In an embodiment, the image features include edge features, and the processor 110 is specifically configured to: if the first image block is a luminance image block or a chrominance image block, acquiring edge features of the first image block and edge features of the second image block; fusing the edge characteristics of the first image block and the edge characteristics of the second image block to obtain an edge fused image block; and determining or generating at least one intermediate result according to the edge fusion image block.

In an embodiment, the image features include luminance features and/or chrominance features, and the processor 110 is specifically configured to: if the first image block is a difference frame image block or a key frame image block, acquiring the brightness characteristic and/or the chrominance characteristic of the first image block, and acquiring the brightness characteristic and/or the chrominance characteristic of the second image block; fusing the brightness characteristic of the first image block and the brightness characteristic of the second image block to obtain a brightness fused image block, and/or fusing the chrominance characteristic of the second image block and the chrominance characteristic of the first image block to obtain a chrominance fused image block; and determining or generating at least one intermediate result according to the brightness fused image block and/or the chroma fused image block.

In one embodiment, the processor 110 is specifically configured to: fusing the first image block and the second image block to obtain a target fused image block; and determining or generating at least one intermediate result according to the target fusion image block.

In one embodiment, the processor 110 is specifically configured to: scaling the first image block and/or the second image block to obtain a first image block and/or a second image block with adjusted size; and performing fusion processing on any one of the first image block and the first image block with the adjusted size and any one of the second image block and the second image block with the adjusted size to obtain a target fusion image block.

In one embodiment, the at least one intermediate result includes at least one fused image block; the processor 110 is specifically configured to: and performing filtering processing on the at least one fused image block by using a filtering processing mode adopted by the second image block to obtain a target filtering result.

In one possible embodiment, the processor 110 of the mobile terminal 100 shown in FIG. 1 may be configured to invoke an image processing program stored in the memory 109 to perform the following operations:

acquiring an image block to be processed;

In an embodiment, the target filtering manner includes a first target filtering manner and/or a second target filtering manner; the processor 110 is specifically configured to: if the attribute information of the image block to be processed indicates that the image block to be processed is a first image block, determining or generating the first target filtering mode; and/or determining or generating the second target filtering mode if the attribute information of the image block to be processed indicates that the image block to be processed is a second image block.

In one embodiment, the processor 110 is specifically configured to at least one of: if the image block to be processed is a first image block, acquiring a reference image block, and performing filtering processing on the reference image block and the first image block according to the target filtering mode to obtain a target filtering result; and/or if the image block to be processed is a second image block, filtering the second image block according to the target filtering mode to obtain a target filtering result.

Optionally, at least one of the following is included: the reference image block is an image block after filtering processing; optionally, both the reference image block and the target filtering result have been subjected to the same type of filtering processing; the first image block comprises any one of a difference frame image block, a chrominance image block and a luminance image block; the second image block comprises a key frame image block.

In one embodiment, the processor 110 is specifically configured to: preprocessing the reference image block and the first image block to obtain a filtering object; and carrying out filtering processing on the filtering processing object according to the target filtering mode to obtain a target filtering result.

In one embodiment, the processor 110 is specifically configured to at least one of: performing fusion processing on the reference image block and the first image block to obtain a first fusion image block, and determining the first fusion image block as a filtering processing object; scaling the reference image block and/or the first image block to obtain a size-adjusted reference image block and/or a size-adjusted first image block, performing fusion processing on any one of the reference image block and the size-adjusted reference image block and any one of the first image block and the size-adjusted first image block to obtain a second fused image block, and determining the second fused image block as the filtering processing object; and carrying out zero setting processing on partial transform coefficients included in the first image block according to a transform coefficient threshold to obtain a processed first image block, carrying out fusion processing on the processed first image block and the reference image block to obtain a third fused image block, and determining the third fused image block as the filtering processing object.

In one embodiment, the processor 110 is specifically configured to at least one of: if the filtering processing object is the first fusion image block or the third fusion image block, filtering the filtering processing object according to a first type of target filtering mode to obtain a target filtering result; and if the filtering processing object is the second fusion image block or the second image block, filtering the filtering processing object according to the first type of target filtering mode or the second type of target filtering mode to obtain a target filtering result.

It should be understood that the mobile terminal described in the embodiment of the present application may perform the method description of any one of the foregoing embodiments, and may also perform the description of the image processing apparatus in the foregoing corresponding embodiment, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

The present application further provides a computer-readable storage medium, on which an image processing program is stored, and the image processing program, when executed by a processor, implements the steps of the image processing method in any of the above embodiments.

In the embodiments of the intelligent terminal and the computer-readable storage medium provided in the present application, all technical features of any one of the embodiments of the image processing method may be included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the method, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with emphasis, and reference may be made to the description of other embodiments for parts that are not described or illustrated in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

the method comprises the steps of obtaining at least one image block, wherein the at least one image block comprises a first image block and a second image block, and an image where the first image block is located and an image where the second image block is located are different images of the same image group;

and performing filtering processing on the at least one intermediate result to obtain a target filtering result, wherein the target filtering result is used for determining or generating a reconstructed image or a decoded image corresponding to the first image block.

2. The method of claim 1, wherein the first tile comprises a tile that has not been processed based on neural network filtering; the second image block comprises an image block which is filtered based on a neural network, or the second image block and the target filtering result are subjected to the same type of filtering processing;

wherein the determining or generating at least one intermediate result according to the at least one image block comprises:

determining or generating at least one intermediate result according to the first image block and the second image block.

3. The method of claim 2, comprising at least one of:

the first image block and the second image block belong to different types of image blocks;

and the image of the second image block and the image of the first image block are sequentially encoded or decoded.

4. The method of claim 1 or 2, wherein said obtaining at least one image block comprises:

acquiring a first image block;

and acquiring a second image block according to the attribute information of the first image block.

5. The method of claim 4, wherein said obtaining a second image block according to attribute information of the first image block comprises:

and acquiring a second image block from a filtering result cache unit or an image buffer according to the attribute information of the first image block, wherein the second image block comprises the image block after filtering.

6. The method as claimed in claim 5, wherein said obtaining the second tile from the filtering result buffer unit or the image buffer according to the attribute information of the first tile comprises:

and if the attribute information of the first image block indicates that the first image block is a difference frame image block, acquiring a second image block from a filtering result cache unit or an image buffer, wherein the second image block is a key frame image block after filtering.

7. The method of claim 2, wherein the determining or generating at least one intermediate result from the first tile and the second tile comprises:

acquiring the image characteristics of the first image block and the image characteristics of the second image block;

and determining or generating at least one intermediate result according to the image characteristics of the first image block and the image characteristics of the second image block.

8. The method of claim 7, wherein the image features comprise luminance features and/or chrominance features, the determining or generating at least one intermediate result from the image features of the first image block and the image features of the second image block comprises:

if the first image block is a difference frame image block, acquiring the brightness characteristic and/or the chrominance characteristic of the first image block, and acquiring the brightness characteristic and/or the chrominance characteristic of the second image block;

fusing the brightness characteristic of the first image block and the brightness characteristic of the second image block to obtain a brightness fused image block, and/or fusing the chrominance characteristic of the second image block and the chrominance characteristic of the first image block to obtain a chrominance fused image block;

and determining or generating at least one intermediate result according to the brightness fused image block and/or the chroma fused image block.

9. The method of claim 2, wherein the determining or generating at least one intermediate result from the first tile and the second tile comprises:

fusing the first image block and the second image block to obtain a target fused image block;

and determining or generating at least one intermediate result according to the target fusion image block.

10. The method of claim 9, wherein the fusing the first tile and the second tile to obtain a target fused tile comprises:

scaling the first image block and/or the second image block to obtain a first image block and/or a second image block with adjusted size;

and performing fusion processing on any one of the first image block and the first image block with the adjusted size and any one of the second image block and the second image block with the adjusted size to obtain a target fusion image block.

11. The method of claim 10, wherein the at least one intermediate result comprises at least one fused image block; the filtering the at least one intermediate result to obtain a target filtering result includes:

and performing filtering processing on the at least one fused image block by using a filtering processing mode adopted by the second image block to obtain a target filtering result.

12. The method according to claim 11, wherein the target filtering result includes a first image block after filtering, and the filtering processing model structures and/or parameters corresponding to the filtering processing modes adopted by the first image block and the second image block after filtering are different.

13. The method as claimed in claim 12 wherein the resolution of the first image block is smaller than the resolution of the second image block and/or the resolution of the filtered first image block is larger than the resolution of the first image block.

14. The method of claim 1, wherein the step of determining or generating at least one intermediate result from the at least one image block is preceded by the step of:

determining or generating a target filtering mode according to the attribute information of the at least one image block; and

the filtering the at least one intermediate result to obtain a target filtering result includes:

and carrying out filtering processing on the at least one intermediate result according to the target filtering mode to obtain a target filtering result.

15. An intelligent terminal, characterized in that, intelligent terminal includes: memory, a processor, wherein the memory has stored thereon an image processing program which, when executed by the processor, implements the steps of the image processing method of any of claims 1 to 14.

16. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 14.