CN114079779A

CN114079779A - Image processing method, intelligent terminal and storage medium

Info

Publication number: CN114079779A
Application number: CN202210029380.6A
Authority: CN
Inventors: 刘雨田
Original assignee: Shenzhen Transsion Holdings Co Ltd
Current assignee: Shenzhen Transsion Holdings Co Ltd
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-02-22
Anticipated expiration: 2042-01-12
Also published as: CN114079779B; WO2023134482A1

Abstract

The application provides an image processing method, an intelligent terminal and a storage medium, wherein the image processing method comprises the following steps: acquiring first auxiliary information; and processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information. By the scheme, the information of the image blocks of different viewpoints can be fully utilized, the distortion of a reconstructed image or a decoded image is reduced, and the coding quality of the multi-viewpoint video is effectively improved.

Description

Image processing method, intelligent terminal and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an intelligent terminal, and a storage medium.

Background

Different from a single-viewpoint video, the multi-viewpoint video shoots the same scene from different viewpoints through a plurality of cameras, and can provide abundant dynamic scenes and real sensory experience for audiences. With the development of video compression technology, research on multi-view video-oriented video coding technology is also deepened gradually. At present, a 3D-HEVC Coding technology proposed on the basis of the Video Coding standard HEVC (High Efficiency Video Coding) can efficiently compress a multi-view Video and its corresponding depth data.

However, in the course of conceiving and implementing the present application, the inventors found that at least the following problems existed: in the multi-view video coding technology, in order to reduce the distortion of a reconstructed frame at a loop filtering processing stage (for example, a neural network-based loop filtering processing stage), a reconstructed frame is usually enhanced by using reference frames of different views at the same time, and the generated enhanced frame is used in a subsequent coding process.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides an image processing method, an intelligent terminal, and a storage medium, which can make full use of information of image blocks of different viewpoints, reduce distortion of a reconstructed image or a decoded image, and further effectively improve coding quality of a multi-viewpoint video.

In order to solve the above technical problem, the present application provides an image processing method, including:

acquiring first auxiliary information;

and processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information.

The present application provides another image processing method, comprising:

acquiring a second reconstruction image block;

and filtering the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block and the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.

Optionally, the filtering the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block, and the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block includes at least one of:

and filtering the first reconstructed image block according to the second reconstructed image block to obtain a filtered first reconstructed image block.

And filtering the first reconstructed image block according to the attribute information of the first reconstructed image block to obtain the filtered first reconstructed image block.

And filtering the first reconstructed image block according to the attribute information of the second reconstructed image block to obtain the filtered first reconstructed image block.

And filtering the first reconstructed image block according to the attribute information of the second reconstructed image block and the attribute information of the first reconstructed image block to obtain the filtered first reconstructed image block.

And filtering the first reconstructed image block according to the attribute information of the second reconstructed image block and the attribute information of the second reconstructed image block to obtain the filtered first reconstructed image block.

And filtering the first reconstructed image block according to the attribute information of the first reconstructed image block and the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.

And filtering the first reconstruction image block according to the second reconstruction image block, the attribute information of the first reconstruction image block and the attribute information of the second reconstruction image block to obtain the filtered first reconstruction image block.

The application provides an image processing apparatus, including:

the acquisition module is used for acquiring first auxiliary information;

and the processing module is used for processing the first image block corresponding to the first view point according to the reference image corresponding to the second view point and the first auxiliary information.

The present application provides another image processing apparatus including:

the acquisition module is used for acquiring a second reconstruction image block;

and the processing module is used for filtering the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block and the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.

The application also provides an intelligent terminal, including: the image processing system comprises a memory and a processor, wherein the memory stores an image processing program, and the image processing program realizes the steps of the method when being executed by the processor.

The present application also provides a computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the method as described above.

As described above, the image processing method of the present application includes the steps of: acquiring first auxiliary information; and processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information. By the technical scheme, the image block of the current coding view point can be processed by using the auxiliary information and the image block of the view point different from the current coding view point. The obtained processing result is beneficial to determining a reconstructed image or a decoded image of the image block of the viewpoint currently being coded, reducing video coding distortion, improving video coding quality and further improving user experience.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a multi-view video encoder according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a multi-view video decoder according to an embodiment of the present application;

fig. 5 is a flowchart illustrating an image processing method according to the first embodiment;

fig. 6 is a schematic structural diagram showing a neural network-based loop filter according to the first embodiment;

FIG. 7 is a flowchart illustrating an image processing method according to a second embodiment;

FIG. 8a is a schematic diagram of a feature extraction network according to a second embodiment;

FIG. 8b is a schematic structural diagram of another feature extraction network according to the second embodiment

FIG. 9a is a schematic structural diagram of a first pre-processing module according to a second embodiment;

FIG. 9b is a schematic structural diagram of another first pre-processing module according to the second embodiment;

fig. 10 is a schematic structural diagram showing a combination of a feature extraction network and a first preset processing module according to a second embodiment;

fig. 11 is a schematic structural diagram including a feature fusion network according to a second embodiment;

FIG. 12a is a schematic structural diagram of a third pre-processing module according to the second embodiment;

FIG. 12b is a schematic structural diagram of another third pre-processing module according to the second embodiment;

fig. 13 is a schematic structural diagram of a neural network-based filtering processing module according to a second embodiment;

fig. 14 is a flowchart illustrating an image processing method according to a third embodiment;

FIG. 15 is a schematic diagram of a structure of a neural network-based loop filter shown in accordance with a third embodiment;

fig. 16 is a schematic structural diagram showing another neural network-based loop filter according to the third embodiment;

fig. 17 is a schematic structural diagram showing still another neural network-based loop filter according to the third embodiment;

fig. 18 is a schematic configuration diagram of an image processing apparatus according to the fourth embodiment.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and further, where similarly-named elements, features, or elements in different embodiments of the disclosure may have the same meaning, or may have different meanings, that particular meaning should be determined by their interpretation in the embodiment or further by context with the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S501 and S502 are used herein for the purpose of more clearly and briefly describing the corresponding contents, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S502 and then S501 in the specific implementation, but these steps should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The smart terminal may be implemented in various forms. For example, the smart terminal described in the present application may include smart terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 ), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), TDD-LTE (Time Division duplex-Long Term Evolution, Time Division Long Term Evolution), 5G, and so on.

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide some registers to manage, for example, home location

Registers (not shown) and the like, and holds some user-specific information about service characteristics, data rates, and the like. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems (e.g. 5G), and the like.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

For the sake of understanding, the following first explains the terms of art to which the embodiments of the present application may be related.

1) Multi-view video

The array composed of a plurality of cameras (a plurality of cameras of the same equipment or one or more cameras of different equipment) shoots the same scene from different visual angles at the same time, and the method is an effective three-dimensional (3D) video representation method, can reproduce the scene more vividly and provides stereoscopic impression and interactive function. The process of compression encoding and decompression decoding of multi-view video is called multi-view video encoding and multi-view video decoding. In 3D-HEVC, views can be divided into two categories: an independent viewpoint (e.g., introduction of 2) described below) and an independent viewpoint (e.g., introduction of 3) described below).

2) Independent viewpoint

An independent view may also be referred to as a base view, and the encoding of this view is independent of other views. That is, the video image of the independent view may be encoded by using a conventional video encoder (e.g., HEVC video encoder) without depending on other views, and the corresponding bit streams may be separately extracted to form a two-dimensional bit stream, thereby recovering the two-dimensional video.

3) Non-independent viewpoint

The view coding can also be called dependent view, and the information of the current coding view is usually predicted by using the coded information of the independent view, so that the inter-view redundancy is reduced, and the coding efficiency is improved.

4) View Synthesis Prediction (VSP)

A predictive coding technique for three-dimensional video sequences is used to predict the pictures of a current view from other views. The main differences from inter prediction are: the prediction image generated by view synthesis prediction is a view synthesis image generated from a reconstructed image of an encoded (or decoded) view different from the current encoding (or decoding) view and a reconstructed depth, and the prediction image generated by inter prediction is a reconstructed image at another time of the current encoding (or decoding) view.

5) Depth map (depth image)

That is, a depth image, also called range image, is an image in which the distance (depth) from an image capture to each point in a scene is taken as a pixel value, and the geometry of the visible surface of an object in the scene is directly reflected in the depth image. The depth map can record the distance between an object in a scene and a camera, and can be used for measurement, three-dimensional reconstruction, virtual viewpoint synthesis and the like. The depth map may be obtained by capturing left and right viewpoint images of the same scene with a binocular camera, and obtaining a disparity map by using a (binocular) stereo matching algorithm to obtain the depth map.

6) Coding Tree Unit (Coding Tree Unit, CTU)

Coding logic units, which are sequentially coded into an HEVC bitstream, typically include three blocks, namely two chroma blocks and one luma Block, such a Block is called a Coding Tree Block (CTB), and besides, the CTU includes related syntax elements.

Alternatively, the terms "reconstruction" and "decoding" may be used interchangeably, and the terms "image," "picture," and "frame" may be used interchangeably. Typically, but not necessarily, the term "reconstruction" is used at the encoder side, while "decoding" is used at the decoder side.

Based on the above, a multi-view encoder for encoding multi-view video is described below. Fig. 3 is a schematic structural diagram of a multi-view encoder according to an embodiment of the present application. Exemplified by multi-view video with viewpoints V0~ V1, optionally V0 is an independent viewpoint and V1 is a dependent viewpoint, the texture images of each viewpoint being associated with a respective depth image. As will be appreciated by those skilled in the art, the reconstructed texture block of the texture image of the independent view point and the reconstructed depth block of the corresponding depth image may be utilized to generate a predicted texture block of the texture image of the dependent view point. In addition, the view-dependent predicted depth block may be generated using the reconstructed depth block of the independent view. The encoding and decoding processes for the independent view and the dependent view using the multi-view encoder are as follows.

(1.1) the encoding process of the independent view V0 by the multi-view encoder 300a is explained as follows:

after receiving the input video data of the independent view V0, a prediction block (including a texture prediction block of the texture image and a depth prediction block of the depth image) obtained by intra prediction and/or inter prediction is subtracted from an original image block (including a texture image block of the texture image and a depth image block of the depth image) to obtain a residual block (including a texture residual block of the texture image and a depth residual block of the depth image). And then, carrying out transformation and quantization processing on the residual block, and then carrying out coding by an entropy coder to form a coded bit stream. In addition, the residual block is subjected to inverse quantization and inverse transform processing, and is added to a prediction block obtained by intra prediction and/or inter prediction to obtain a reconstructed block. Due to the transformation and quantization, there is distortion between the reconstructed block and the image block in the input frame (the image of the input video data). Therefore, loop filtering processing is required for the reconstructed block. Such as a neural network based loop filtering process. In addition, the Loop Filter process may also include at least one of DBF (Deblocking Filter), SAO (Sample-Adaptive Offset), ALF (Adaptive Loop Filter) (not shown in fig. 3); the loop filtering process based on the neural network may further add a filter based on the neural network, and the neural network may be a super-resolution neural network, a convolution neural network based on dense residuals, a general convolution neural network, or the like, which is not limited herein. The Neural Network based Loop Filter process is configured by DBF, DRNLF (Dense Residual Convolutional Neural Network based In-Loop Filter), SAO, and ALF, for example (not shown In fig. 3). The reconstructed block after the loop filtering process is further synthesized into a reconstructed image and stored in an image buffer for the prediction process of the subsequent image block.

(1.2) the encoding process of the dependent view V1 by the multi-view encoder 300a is explained as follows:

after receiving the input video data depending on the view point V1, a prediction block (including a texture prediction block of the texture image and a depth prediction block of the depth image) obtained by intra prediction and/or inter prediction is subtracted from an original image block (including a texture image block of the texture image and a depth image block of the depth image) to obtain a residual block (including a texture residual block of the texture image and a depth residual block of the depth image). And then, carrying out transformation and quantization processing on the residual block, and then carrying out coding by an entropy coder to form a coded bit stream. In addition, the residual block is subjected to inverse quantization and inverse transform processing, and is added to a prediction block obtained by intra prediction and/or inter prediction to obtain a reconstructed block. Due to the transformation and quantization, there is distortion between the reconstructed block and the image block in the input frame. Therefore, it is necessary to perform a loop filtering process on the reconstructed block, for example, a neural network-based loop filtering process, where the neural network-based loop filtering process may further include at least one of DBF, SAO, and ALF (not shown in fig. 3), and a neural network-based filter may be added to further improve the filtered image quality, for example, DRNLF. The reconstructed block after the loop filtering process is further synthesized into a reconstructed image and stored in an image buffer for the prediction process of the subsequent image block.

Furthermore, the image depending on the view V1 can also be subjected to view synthesis prediction. Specifically, image blocks of the independent view point V0 corresponding to the dependent view point V1, including texture image blocks of a texture image and depth image blocks of a depth image, may be read from the image buffer. Further, the predicted depth image block of the depth image of the corresponding dependent view point V1 may be generated from the depth image block of the corresponding independent view point V0, and the predicted texture block of the texture image of the dependent view point V1 may be generated from the texture image block of the texture image of the corresponding independent view point V0 and the depth image block of the depth image. Next, control data related to view synthesis prediction (i.e., control data included in the prediction data in fig. 3 for instructing the decoding side and the encoding side to maintain the same prediction method) and other related data (e.g., filter control data) are entropy-encoded and transmitted in the encoded bitstream.

(2.1) regarding the decoding process of the multi-view video, which can be regarded as the reverse process of the encoding process of the multi-view video, the decoding process of the independent view V0 and the dependent view V1 by the multi-view video decoder 300b shown in fig. 4 is explained as follows:

the decoding process for the independent view V0 and the dependent view V1 may go through the following processes: the video decoder performs entropy decoding on the received coded bit stream (such as the bit stream of independent view V0 or the bit stream of dependent view V1) to obtain prediction data, filter control data indicated by the encoding end, and quantized transform coefficients; then, the quantized transform coefficient is inversely quantized and inversely transformed to obtain a residual block, the residual block and a prediction block output after the prediction data is processed by one of a plurality of prediction modes (for example, intra-frame prediction, inter-frame prediction and view synthesis prediction) are summed, then according to the indication of filter control data to loop filter processing, the decoded image block is filtered by adopting the same filtering mode as the multi-video encoder, the filtered decoded image block can further synthesize a decoded image, the decoded image is cached in a decoded image buffer for the prediction processing of the subsequent image block, and the decoded video data is output at the same time.

It should be noted here that, when the bitstream of the dependent view V1 is decoded in a multi-view video decoder, the prediction parameters decoded by the dependent view V1 may include control data instructing the decoder to use view synthesis prediction, and the multi-view video decoder obtains a prediction block by using a video synthesis prediction manner according to the instruction of the control data, for example, a prediction depth image block of a depth image of the dependent view V1 may be generated according to a depth image block of the corresponding independent view V0, and a prediction texture block of a texture image of the dependent view V1 may be generated according to a texture image block of a texture image of the corresponding independent view V0 and a depth image block of the depth image, and then a series of processes such as summing the prediction depth image block, the prediction texture block, and a residual block corresponding to the prediction depth image block to obtain respective decoded images.

Based on the above descriptions of the multi-video encoder and the multi-video decoder, the following explains the image processing method provided by the embodiment of the present application with reference to the drawings.

First embodiment

Referring to fig. 5, fig. 5 is a flowchart illustrating an image processing method according to a first embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be an intelligent terminal (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is an intelligent terminal for example.

S501, first auxiliary information is obtained.

In one embodiment, the first side information comprises depth information or disparity information, the depth information comprising at least one of: depth feature information, statistical information based on depth values, depth slices, pre-processed depth slices, combined information of depth feature information and statistical information based on depth values.

The following relationship exists for disparity information and depth information: since the disparity is inversely proportional to the distance from a point on the three-dimensional space to the projection center plane, the depth information of a point in the scene can be known as long as the disparity information of the point is known. Depth information or parallax information can be determined from the corresponding depth image, and the depth characteristic information can be any one or more of point characteristic, line characteristic, surface characteristic and depth contour information of a region of interest about depth; the statistical information based on the depth values may be statistical information of the depth values of the corresponding depth slices. The statistical information based on the depth values may be used to calculate a similarity between the depth slice of the first view and the depth slice of the second view; depth slices refer to slice regions in the depth map corresponding to texture slices; the preprocessed depth slice is, for example, a depth slice subjected to quantization processing. Alternatively, the depth information may be represented by a matrix, the size of which is associated with the corresponding texture slice, illustratively labeled 1 for the depth region of interest or a particular facet feature with respect to depth, and 0 for other regions. Therefore, the method is helpful to extract the characteristics of the depth region of interest or the texture region corresponding to the specific surface characteristics related to the depth, and further perform loop filtering processing on the characteristics so as to improve the quality of the reconstructed image or the decoded image.

The first auxiliary information may be auxiliary information from a first view and/or a second view, the first view may be a dependent view, and the second view may be an independent view.

S502, processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information.

In one embodiment, further comprising: acquiring the first image block corresponding to the first viewpoint; and/or acquiring the first image block corresponding to the first view point. Optionally, the second viewpoint is different from the first viewpoint. Optionally, the reference image and the image in which the first image block is located belong to images of different viewpoints at the same time.

In one embodiment, corresponding to the multi-view video encoder and multi-view video decoder introduced above, the first view here may be a dependent view and the second view may be an independent view; the first image block is a reconstructed block before being input into the neural network-based loop filtering process, optionally, the reconstructed block is a reconstructed texture image block (also referred to as a reconstructed texture block for short), and the reconstructed texture block may be any one of a CTU, a slice (slice), a block (tile), and a sub-image; the image in which the first image block is located may be referred to as a current frame of the dependent view, where the current frame may be a current texture frame F1, and the current texture frame F1 is a reconstructed image. Optionally, the reference image is a reference frame obtained from an image buffer (or a decoded image buffer), the reference frame is a reconstructed image corresponding to the second view (or a decoded image corresponding to the second view), and the reference image is encoded before the image where the first image block is located. For example, the reference picture is a reference frame of an independent viewpoint.

For example, the first image block is a currently reconstructed texture slice S1 (or currently reconstructed texture slice S1) of the view-dependent current texture frame F1, and the currently reconstructed texture slice S1 may be an intra-predicted slice (I slice) or an inter-predicted slice (P slice), optionallyThe current texture frame F1 is not completely reconstructed, and the reference frame F of the independent viewpoint_R1 (corresponding to the reference image) has been reconstructed, and the currently reconstructed texture slice S1 may be subsequently processed to match the texture slice in the reference frame. It should be noted that, when the obtained currently reconstructed texture slice S1 is a reconstructed texture slice obtained through intra prediction processing (i.e., an intra prediction slice I slice), or is a reconstructed texture slice obtained through inter prediction processing (i.e., an inter prediction slice P slice), instead of a texture slice obtained through inter-view prediction (e.g., view synthesis prediction), the currently reconstructed texture slice does not refer to texture information of an independent view in the reconstruction process, and therefore, in the loop filtering processing stage, the quality of the reconstructed texture slice after the loop filtering processing may be enhanced by fusing texture information of a reference frame from the independent view. However, the present invention is not limited thereto, and the current reconstructed texture slice S1 may also be a reconstructed texture slice obtained by the inter-view prediction process. When the obtained current reconstructed texture slice is a reconstructed texture slice obtained through inter-view prediction processing, the quality of the reconstructed texture slice after filtering processing can be further improved by subsequently referring to the texture information of the independent view point.

The size of the image area of the acquired first image block of the first view point is smaller than that of the image area of the reference image of the second view point, so that the second image block matched with the first image block of the first view point can be determined from the larger image area of the reference image of the second view point, and the matching degree is improved. The first view and the second view may correspond to a dependent view and an independent view, and both the first image block and the second image block may be reconstructed texture blocks, and a specific matching manner may refer to the content described in the embodiment corresponding to fig. 7, which is not described in detail herein first.

Optionally, a processing result corresponding to the first image block may be determined or generated according to the second image block of the reference image and the first auxiliary information.

In one embodiment, the second image block may be acquired from a reference image. The following preset rules may be followed for the second image block acquisition, namely: a second image block matching the first image block of the first view is determined in a larger image area of the reference image partition of the second view. Assuming that the first view point is a dependent view point, the second view point is an independent view point, and the first image block is a reconstructed texture block, the second image block may be determined according to the obtained reconstructed texture block and the reference frame, or the second image block may be determined according to the obtained reconstructed texture block and an image area in the reference frame. More, since the reconstructed texture block is any one of CTU, slice, block and sub-picture, according to the above predetermined rule, the situation of the second image block is determined as follows: when a first image block is a reconstructed texture block of a current frame of a dependent view point, a second image block is determined from a reconstructed slice in an independent view point reference frame (optionally, the size of the reconstructed texture block is smaller than that of the reconstructed slice); when the first image block is a reconstructed texture block of the current frame depending on the viewpoint and the reconstructed image block is a CTU, any one of a slice (slice), a square (tile) and a sub-image of a reference image of an independent viewpoint can be acquired to determine a second image block; and when the first image block is a reconstructed texture block of the current frame depending on the view point and the reconstructed image block is a slice (slice) or a square (tile), acquiring a sub-image in the reference image of the independent view point, and determining a second image block from the sub-image.

In an embodiment, the relationship between the first image block and the second image block comprises at least one of: the second image block and the first image block have the same size, the first image block and the second image block have the same type, and when the second image block is a slice, the second image block is composed of a plurality of coding tree units. The same size here means that the image areas of the image blocks are the same size, for example, the sizes of the first image block and the second image block are both 8 × 8, and the same type is, for example, when the first image block is a slice, the second image block corresponds to a slice, and when the first image block is a coding tree block (CTU), the second image block corresponds to a coding tree block (CTU).

The slice is in particular a texture slice (or reconstructed texture slice), optionallyTexture slice of two viewpoints (e.g. texture slice S of independent viewpoint)_R) Instead of a texture slice, which is usually included in a NAL (Network Abstraction Layer, which is responsible for packetizing and transmitting encoded data in a format required by a Network in video coding standard h.264), an image region may be composed of a plurality of CTUs, which have the same size and shape as a texture slice of a first view (e.g., a view-dependent texture slice).

Alternatively, the second image block may be determined from the reference image according to the first auxiliary information of the first image block. Illustratively, the first image block is a current reconstructed texture block, specifically, a current reconstructed texture slice S1 (or texture slice S1), the first view point is a dependent view point, and the first auxiliary information is depth information or disparity information, for example, depth information corresponding to the texture slice S1 in the view-dependent current texture frame F1 may be a depth slice itself or a preprocessed depth slice, and may also be statistical information of depth values of the depth slice corresponding to the current reconstructed texture slice S1. The depth information Ds1 or disparity information Ds2 corresponding to the currently reconstructed texture slice S1 is determined from the depth image corresponding to the texture slice S1. The reference texture frame F at the independent viewpoint can be determined according to the depth information Ds1 or the parallax information Ds2 corresponding to the currently reconstructed texture slice S1 _R1 corresponding reference texture slice S _R1, see the description below.

Optionally, step S502 includes:

acquiring first auxiliary information of the first image block, wherein optionally, the first auxiliary information includes depth information, and optionally, the depth information is determined according to a depth image corresponding to the first image block;

calculating or acquiring similarity between first auxiliary information of each image block in the reference image and the first auxiliary information of the first image block;

and determining the image block with the maximum similarity in the reference image as a second image block matched with the first image block.

Optionally, the first auxiliary information of the first image block includes depth information or disparity information, and the depth information or the disparity information is determined from a depth image corresponding to the first image block. For example, when the first image block is the current reconstructed texture slice S1 of the current texture frame F1 depending on the view point, the first side information may be determined from the depth slice corresponding to the current reconstructed texture slice S1.

Optionally, each image block in the reference image of the second view is of the same type as the first image block, for example, all are texture slices; similar to the first auxiliary information of the first image block, the first auxiliary information of each image block in the reference image also includes depth information or disparity information, and the depth information or disparity information is determined from a depth image corresponding to the reference image. Regarding the similarity between the second image block and the first image block, the similarity is measured by the similarity between the respective first auxiliary information. For example, the similarity between the texture slice of the independent viewpoint and the texture slice of the dependent viewpoint is calculated from depth information including at least one of: depth feature information, statistical information based on depth values, depth slices, pre-processed depth slices, combined information of depth feature information and statistical information based on depth values.

Optionally, by determining an image block in the reference image of the second view, which has the largest similarity with the first auxiliary information corresponding to the first image block of the first view, a second image block matching the first image block of the first view can be found in each image block of the reference image of the corresponding second view, where the first auxiliary information of the second image block is the most similar to the first auxiliary information of the first image block.

And the processing result is used for obtaining a reconstructed image or a decoded image corresponding to the first image block. It should be noted that, when the scheme is applied to a multi-view video encoding end, the processing result is used to obtain a reconstructed image corresponding to the first image block; when the scheme is applied to a multi-view video decoding end, the processing result is used for acquiring a decoding image corresponding to the first image block.

In one embodiment, the processing result includes a filtered first image block, and the neural network-based loop filter processor shown in fig. 6 may be utilized to process the first image block according to the first auxiliary information and the second image block of the reference image, and determine or generate the processing result.

As shown in fig. 6, the loop filter based on the neural network includes a fusion module and a filtering processing module based on the neural network, optionally, the fusion module may receive first auxiliary information (e.g., depth information or disparity information), a first image block (e.g., a current reconstructed texture block of a current frame depending on a viewpoint), and a reference image (e.g., an independent viewpoint reference frame) for processing, the fusion module may determine a second image block (e.g., a matching texture block) from the reference image, and after the first image block and the second image block are subjected to a series of processing in the fusion module, a result obtained by the processing in the fusion module is input to the filtering processing module based on the neural network for processing, so as to obtain a filtered first image block. It should be noted that the loop filter based on the neural network may be disposed in the multi-view encoder shown in fig. 3 or the multi-view decoder shown in fig. 4, and the filtering processing module based on the neural network adopts the following structural schematic diagram shown in fig. 13. In one embodiment, the fusion module may exist independently of the neural network-based filtering processing module, i.e., as a separate functional module. In another embodiment, the fusion module is included in the neural network based loop filter processor of fig. 3 or fig. 4 together with the neural network based filter processing module, and is used to determine or generate a processing result of the loop filter processing, and then a reconstructed image or a decoded image can be further synthesized according to the processing result. In an embodiment, the processing result may also be referred to as a view-dependent loop-filtered texture block or a view-dependent loop-filtered reconstructed texture block.

In one embodiment, a fusion module receives a view-dependent texture block, a reference frame for an independent view, and depth information or disparity information. In another embodiment, the fusion module may also receive a current reconstructed texture block of the current frame of the dependent view, a reconstructed texture slice in the independent view reference frame, and depth information or disparity information. Alternatively, the reconstructed texture block of the current frame depending on the view may be one of a CTU, a slice (slice), a block (tile), and a sub-picture, and when the fusion module receives that the reconstructed texture block of the current frame depending on the view is a CTU, the fusion module may receive one of a slice, a block, and a sub-picture of a reference frame from an independent view from the picture buffer, and when the fusion module receives that the reconstructed texture block of the current frame depending on the view is a slice (slice) or a block (tile), the fusion module may receive a sub-picture of the reference frame from an independent view from the picture buffer.

The more detailed processing manner of the fusion module and the filtering processing module based on the neural network can be seen from the content described in the corresponding embodiment of fig. 7, and will not be described in detail herein.

To sum up, the image processing scheme provided by the embodiment of the present application may be applied to a multi-view video coding and decoding scene, and by referring to first auxiliary information (including depth information or disparity information), and utilizing information of image blocks of different views, the disparity influence between frames of different views at the same time may be reduced, so as to achieve a better matching effect of the image blocks of different views, and determine or generate a corresponding processing result according to a second image block with a higher matching degree, so as to reduce a distortion degree of the processing result, thereby obtaining a high-quality reconstructed image or decoded image.

Second embodiment

Referring to fig. 7, fig. 7 is a flowchart illustrating an image processing method according to a second embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be an intelligent terminal (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is an intelligent terminal for example.

S701, acquiring first auxiliary information, a first image block corresponding to a first view point, and a reference image corresponding to a second view point. This step can be referred to the related description in the first embodiment, and is not described herein.

S702, determining a second image block from the reference image according to the first auxiliary information.

The second tile may be determined from one of a slice, a block, or a sub-picture from which the reference picture from the second view is obtained from the picture buffer when the first tile of the first view is a CTU, and the second tile may be determined from the sub-picture from which the reference picture from the second view is obtained from the picture buffer when the first tile of the first view is a slice or a block. Namely, here, the processing rule for determining the second image block from the reference image of the second viewpoint is: the size of the image area of the second view is larger than the size of the image area of the first image block of the first view.

In an embodiment, the second image block and the first image block are of the same type, e.g. the first image block is a reconstructed texture slice, the second image block determined from a sub-image of the reference image is also a reconstructed texture slice; the second image block is matched with the first image block, and the second image block may also be referred to as a matched image block, for example, when the first image block is a reconstructed texture block, the second image block is a matched texture block, specifically, when the reconstructed texture block is a reconstructed texture slice, a reconstructed texture slice of the reference image of the second view point may be roughly matched with the reconstructed texture slice of the first view point (slice to slice registration), and the obtained second image block is a reconstructed texture slice (i.e., a matched texture slice) meeting a matching condition in the reference image of the second view point.

In one embodiment, the step S702 is to perform rough matching on a first image block of a first view and a second image block of a second view, and optionally implement the following steps (i) - (iii):

firstly, acquiring first auxiliary information of the first image block; acquiring similarity of first auxiliary information of each image block in the reference image and the first auxiliary information of the first image block; and thirdly, determining the image block with the maximum similarity in the reference image as a second image block matched with the first image block. For specific content, reference may be made to the description of the same step in the first embodiment, which is not described herein again.

Illustratively, the first image block is a view-dependent current textureCurrent reconstructed texture slice S1 of frame F1, the reference picture being a reference frame of an independent view, reference texture frame F of an independent view_RTexture slice S in 1_RAnd the current reconstructed texture tile S1 of the current frame depending on the viewpoint: first, the depth information Ds1 or the disparity information Ds2 corresponding to the current reconstructed texture slice S1 is obtained, and then, the reference frame F at the corresponding independent viewpoint is obtained_R1, the reference texture slice S which is most similar to the depth information Ds1 or the disparity information Ds2 corresponding to the current reconstructed texture slice S1 is searched_R1. For example, the texture slice of the independent view having the greatest similarity of depth information corresponding to the view-dependent texture slice S1 is determined as the reference texture slice S _R1, the reference texture slice S _R1 and the view-dependent texture slice S1. Note that the texture slice S of the independent viewpoint_RInstead of a texture slice in the usual sense contained in a NAL, a picture region composed of a plurality of CTUs of the same size and shape as the texture slice in the dependent view may be used.

And S703, determining a first feature map corresponding to the first image block and a second feature map corresponding to the second image block.

The feature maps corresponding to the first image block and the second image block may be obtained by directly extracting features of the image blocks, or may be obtained by extracting features of image sub-blocks output by fine matching after the first image block and the second image block are subjected to fine matching. For these two different ways, reference may be made to the following description.

Mode 1: and performing feature extraction processing on the first image block and the second image block based on a feature extraction network and the first auxiliary information to obtain a first feature map corresponding to the first image block and a second feature map corresponding to the second image block.

The first auxiliary information used in the rough matching is the same as the first auxiliary information used in the rough matching, and may be from the first image block or the second image block; the first auxiliary information includes depth information or disparity information, and the depth information or disparity information may be represented by a matrix, or may also be represented by a depth map or a disparity map, or a depth map or a disparity map subjected to preprocessing (e.g., quantization processing, normalization processing). The first auxiliary information may be used as reference information of a feature extraction network, and when feature extraction processing is performed on the first image block and the second image block, a mapping relationship is established between features of the extracted first image block and the extracted second image block and depth information and/or parallax information, so that a first preset processing model and/or a first preset processing parameter can be determined more accurately when first preset processing is performed subsequently. In an embodiment, the first predetermined process is a warping process, the first predetermined process model is a warping model, and the first predetermined process parameter is a warping parameter.

In another embodiment, the depth information corresponding to the texture image region of a specific depth that needs to be supervised may be set to 1, and the depth information corresponding to the texture image regions of other depths may be set to 0. Further, the depth information or the disparity information obtained in this way is used as the first auxiliary information. Alternatively, the first auxiliary information may be supervision information of a feature extraction network, which enables only features of texture image areas corresponding to specific depths to be supervised in the first image block and the second image block to be extracted to generate respective corresponding feature maps. In this way, it is possible to preferentially process or only process texture image regions of a specific depth that need to be supervised, in case of limited computational resources or transmission bandwidth.

The feature extraction network includes a neural network, for example, any one or a combination of a convolutional neural network, a residual convolutional neural network, a deep learning neural network, and the like, and a feature map corresponding to the image block can be extracted through processing of the trained neural network, where the feature map is a multi-dimensional (e.g., two-dimensional) matrix. The corresponding feature extraction unit may be a convolutional layer, and the downsampling unit may be a pooling layer.

Mode 2: acquiring a first image sub-block of the first image block and a second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block; performing feature extraction processing on the first image sub-block and the second image sub-block based on a feature extraction network and the second auxiliary information to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block; determining or generating a first feature map corresponding to the first image block through the first sub-feature map, and determining or generating a second feature map corresponding to the second image block through the second sub-feature map; optionally, the second auxiliary information is different from the first auxiliary information. In an embodiment, the first sub-feature maps corresponding to all the first image sub-blocks of the first image block are combined/spliced into a first feature map, and the second sub-feature maps corresponding to all the second image sub-blocks of the second image block are spliced into a second feature map.

In this way, the second image sub-block corresponds to a result obtained by performing fine matching on the first image sub-block and the image sub-block in the second image block after the coarse matching, the second image sub-block may also be referred to as a matched image sub-block, and the first image sub-block and the second image sub-block have the same type. Illustratively, the first image block of the first view point is a reconstructed texture block, the second image block of the second view point is a reconstructed texture block (or called a matching texture block), the first image sub-block is a reconstructed texture sub-block in the reconstructed texture block of the first view point, and the second image sub-block is a reconstructed texture sub-block (or called a matching texture sub-block) in the reconstructed texture block of the second view point.

The matching of the second auxiliary information of the second image sub-block and the first image sub-block means that the similarity between the second auxiliary information of different image sub-blocks is the largest. In a similar manner to the coarse matching, the fine matching is performed by performing similarity calculation using the second auxiliary information. The second auxiliary information may include depth information or disparity information, and when the second auxiliary information is depth information, the depth information may include, but is not limited to, at least one of:

depth feature information, for example, point features, line features, face features, boundary features, depth profile information of a portion of interest with respect to depth;

② statistical information based on depth values, e.g. reconstructing texture coded treeblocks CTB_dStatistical information of depth values of the corresponding reconstructed depth blocks, where the depth information may be used to calculate a similarity between the reconstructed depth block of the second view and the reconstructed depth block of the first view;

combining depth characteristic information and statistical information based on depth values;

and fourthly, reconstructing the depth block or the preprocessed reconstructed depth block.

It should be noted that the second auxiliary information used for the fine matching is different from the first auxiliary information used for the coarse matching. The content of the second auxiliary information used in the fine matching is different or the accuracy is higher. In an embodiment, the first auxiliary information and the second auxiliary information are depth information, and may include different contents, for example, depth information corresponding to a texture slice in coarse matching is depth feature information, and depth information corresponding to a reconstructed texture subblock in fine matching is statistical information based on depth values; or the depth information corresponding to the texture slice during the rough matching is depth feature information, and the depth information corresponding to the reconstructed texture sub-block during the fine matching is a combination of the depth feature information and statistical information based on depth values. In another embodiment, the first side information and the second side information have different precisions, for example, the depth information corresponding to the texture slice in the coarse matching is n depth feature information, and the depth information corresponding to the reconstructed texture sub-block in the fine matching is m depth feature information, where m is optionally greater than n and is an integer greater than or equal to 1. Thus, at the time of rough matching, the texture slice may be matched using one type of depth information (e.g., low-precision depth information), and at the time of fine matching, the reconstructed texture coding tree may be matched using another type of depth information (high-precision depth information), and by matching in two different dimensions, a good balance between the computation complexity and the matching result may be maintained.

In a possible embodiment, the first image sub-block and the second image sub-block are image sub-blocks of the same type, for example, the first image sub-block and the second image sub-block are coding tree blocks or extended coding tree blocks. The first image block is a reconstructed texture block, and reconstruction is performedThe texture block may be a reconstructed texture slice, i.e. the first image block is a reconstructed texture slice; the first image sub-block is a reconstructed texture sub-block, which may be a reconstructed texture coding tree block CTB, i.e. the first image sub-block is a reconstructed texture coding tree block CTB, and the types of the second image block and the second image sub-block correspond to the first image block and the first image sub-block. The method for obtaining the reconstructed texture sub-block may be as follows: the reconstructed texture coding tree block CTB in the reconstructed texture slice S1 for the first view is coded according to a predetermined processing order_dReconstructed texture slice S from a second viewpoint_RReconstructed texture coding tree block CTB in 1_iA fine matching (block to block registration) is performed. For example, the first view point is a dependent view point, the second view point is an independent view point, and the reference texture frame F at the independent view point can be determined according to the raster scanning order by the depth information or the disparity information corresponding to the reconstructed texture coding tree block in the reconstructed texture slice of the dependent view point_RReconstructed texture coding tree block CTB in corresponding reference texture slice in 1_i。

Alternatively, the similarity between the reconstructed texture coded treeblock of the first view and the reconstructed texture coded treeblock of the second view may be calculated from the depth information, i.e. the reconstructed texture coded treeblock CTB of the first view is determined_dCorresponding reconstructed texture coding tree block CTB of second view point with maximum depth information similarity_iAnd as a reconstructed texture coded tree block CTB with the first view_dThe matching reconstructed texture coding treeblocks, where the degree of similarity of the depth information can be represented by a function on the probability of similarity.

The reconstructed texture sub-block may also be an extended reconstructed texture coding tree block CTB_exI.e. the first image sub-block and the second image sub-block are both extended reconstructed texture coding tree blocks, extended reconstructed texture coding tree blocks CTB_exIs a coding tree block for expanding the block edge of a reconstructed texture coding tree block CTB, an expanded reconstructed coding tree block CTB_exIncluding reconstructing the texture coded tree block CTB. Alternatively, the extension region may be filled with pixels of other reconstructed texture coding treeblocks adjacent to the reconstructed texture coding treeblockThus, the extended reconstructed texture coding tree block CTB_exIs larger than the reconstructed texture coding tree block. The segmentation of the coded image is based on the coded treeblock, and the reconstructed coded image or the reconstructed decoded image can generate the block effect, but when the expanded reconstructed texture coded treeblock is used for loop filtering, the expanded area is filled by the pixels of other adjacent reconstructed texture coded treeblocks, so that the block effect caused by the segmentation can be effectively reduced. Therefore, the first image sub-block and the second image sub-block are the expanded reconstructed texture coding tree blocks, the block effect can be essentially reduced from the image division, and the filtering effect and the coding and decoding quality are further improved.

In the mode 2, similar to the first auxiliary information, the second auxiliary information may be used as supervision information in addition to being used as reference information for fine matching, so that a mapping relationship is established between the extracted features of the first image sub-block and the second image sub-block and the depth information and/or the disparity information, so that the feature extraction network can extract only the features of the texture image region corresponding to the specific depth needing supervision in the first image sub-block and the second image sub-block to generate the corresponding feature maps.

Both of the above two modes can extract corresponding feature maps (including a first feature map and a second feature map) through the feature extraction network, and the difference is only the difference between the processing objects received by the feature extraction network: under fine matching, the feature extraction network receives a first image sub-block (e.g., reconstructed texture sub-block), a second image sub-block (e.g., matching texture sub-block), and second side information. In case of only a coarse match, the feature extraction network receives a first image block (e.g. the current reconstructed texture block), a second image block (e.g. the matching texture block), first side information. The details and processing principles involved in the feature extraction network are described next.

In one embodiment, the feature extraction network comprises N cascaded feature extraction modules, wherein N is an integer greater than or equal to 1, each feature extraction module in the first N-1 feature extraction modules comprises a feature extraction unit and a downsampling unit which are connected in series, and the Nth feature extraction module comprises a feature extraction unit; a first feature extraction module of the N cascaded feature extraction modules, configured to process the first image block and the second image block, or the first image sub-block and the second image sub-block; each feature extraction module of the N cascaded feature extraction modules except the first feature extraction module is used for processing the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected with the output of the feature extraction unit, and the output of the down-sampling unit is connected with the input of the feature extraction unit in the next feature extraction module; optionally, the first auxiliary information or the second auxiliary information is used as reference information and/or supervision information of at least one feature extraction module of the N cascaded feature extraction modules.

Based on the above description of the feature extraction network, please refer to fig. 8a, which is a schematic structural diagram of a feature extraction network according to an embodiment of the present application. The following description will be made with reference to fig. 8a for extracting the features of the first image block by using the pyramid hierarchical processing for the feature extraction network, where the feature extraction network receives the first auxiliary information to process the first image block and the second image block as an example.

First, a first image block (e.g., reconstructed texture block), a second image block (e.g., matching texture block), and first auxiliary information are input into the feature extraction module 1, and a feature map Fd1 and a feature map Fi1 are output through a convolution operation in the feature extraction unit 1. Then, the down-sampling unit 1 obtains the down-sampled feature map Fdd1 and the down-sampled feature map Fid1 for the output feature map Fd1 and the feature map Fi 1. The processing of the feature extraction unit 1 and the down-sampling unit 1 included in the feature extraction module 1 is processing of a first level.

Next, the down-sampled feature map Fdd1 and the down-sampled feature map Fid1 are input to the feature extraction module 2, and the feature map Fd2 and the feature map Fi2 are output by the convolution operation in the feature extraction unit 2. Then, the down-sampling unit 2 performs down-sampling operation on the output feature maps Fd2 and Fi2 to obtain down-sampled feature maps Fdd2 and Fid 2. The processing of the feature extraction unit 2 and the down-sampling unit 2 included in the feature extraction module 2 is processing of a second level.

Similarly, the operation of the feature extraction module for each subsequent level is similar up to the processing of the (n-1) th level. For the processing of the nth level, the down-sampled feature map Fdd (N-1) and the down-sampled feature map Fid (N-1) are input to the feature extraction module N, and the feature map Fdn and the feature map Fin are output by a convolution operation.

The feature maps Fd 1-Fdn are collectively referred to as a first feature map, and the feature maps Fi 1-Fin are collectively referred to as a second feature map. In the above embodiment, after each feature extraction unit processes the feature map, the down-sampling unit performs down-sampling, so that the size of the feature map can be reduced, and the size of the feature map can be gradually reduced by the serially connected feature extraction modules, so that the semantic meaning expressed by the feature map is more abstract, that is, the pyramid feature extraction is performed in the process. In the pyramid feature extraction process, the feature maps generated by the feature extraction modules of each level are feature maps of different scales, and the feature maps of different scales are subsequently used for distortion and fusion, so that the expression of the feature maps can be enriched, and the finally fused feature maps can more comprehensively and accurately describe the information of the first image block of the first viewpoint, thereby better restoring the original image corresponding to the first image block.

Part or all of the feature extraction modules in the feature extraction network can receive first auxiliary information/second auxiliary information, the first auxiliary information/the second auxiliary information can be used as reference information or supervision information, the reference information can enable mapping relations to be established between extracted features and the auxiliary information, and the supervision information can be used for carrying out supervision training on the feature extraction unit, so that the trained feature extraction network can accurately extract image blocks of information of different depths, and accurate feature maps can be obtained. In the above example, the feature extraction module (specifically, the feature extraction unit 1) which may be only the first hierarchy receives the first auxiliary information/the second auxiliary information (e.g., the depth information or the disparity information), so that the calculation is relatively simple and is suitable for a scene with simpler depth information. In other examples, all modules or some modules of the feature extraction module at each level may also receive the first auxiliary information/the second auxiliary information (e.g., depth information or disparity information), so as to be suitable for scenes with a high-precision depth variation range, which is beneficial to obtaining a high-quality reconstructed texture block after loop filtering processing. Furthermore, it is also possible to enable or disable the reception of the first/second auxiliary information (e.g. depth information or disparity information) at each level for different scenes at different times, so that the complexity of the computation can be adaptively controlled to meet the requirements of different applications. Referring to fig. 8b, another schematic diagram of a feature extraction network is shown, as shown in fig. 8b, when the feature extraction network extracts feature maps corresponding to a first image block and a second image block, each feature extraction unit receives first auxiliary information to better extract the feature map of the image block of high-precision depth information.

It should be noted that the feature extraction network shown in fig. 8a or fig. 8b may further receive second auxiliary information to process the first image sub-block and the second image sub-block, and the specific processing flow is the same as the content of the foregoing example of receiving the first auxiliary information to process the first image block and the second image block, which is not described in detail herein. The feature maps output by the feature extraction units can be sub-feature maps, for example, the feature maps Fd 1-Fdn are collectively called as first sub-feature maps, and the feature maps Fi 1-Fin are collectively called as second sub-feature maps. In an embodiment, the first sub-feature maps corresponding to all the first image sub-blocks of the first image block may be combined/spliced into a first feature map, the second sub-feature maps corresponding to all the second image sub-blocks of the second image block may be spliced into a second feature map, and then the first preset processing and the second preset processing are performed based on the first feature map and the second feature map; in another embodiment, the first preset processing and the second preset processing may also be directly performed on the first sub-feature map and the second sub-feature map.

In an embodiment, taking the first preset processing as warping processing and the second preset processing as feature fusion processing as an example, the processing of the sub-feature map is described as follows:

and combining/splicing the first sub-feature maps corresponding to all the first image sub-blocks of the first image block into a first feature map, and splicing the second sub-feature maps corresponding to all the second image sub-blocks of the second image block into a second feature map. Next, the second feature map is subjected to warping processing. And finally, performing feature fusion processing on the distorted second feature map and the distorted first feature map to obtain a fused feature map. It should be noted that the second sub-feature map may be directly warped without determining the first feature map and the second feature map. After the second sub-feature map is warped to obtain a warped second sub-feature map, all the second sub-feature maps may be combined/stitched to determine or generate the warped second feature map. And combining/stitching the first sub-feature maps to determine or generate the first feature map. And finally, performing feature fusion processing on the distorted second sub-feature map and the distorted first feature map to obtain a fused feature map. Or, the second sub-feature map may be directly warped without determining the first feature map and the second feature map. And after the second sub-feature map is subjected to warping processing to obtain a warped second sub-feature map, performing feature fusion processing on the warped second sub-feature map and the first sub-feature map to obtain a fused sub-feature map. And finally, combining/splicing the fused sub-feature maps to obtain the fused feature map.

S704, determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map. And the processing result is used for generating a reconstructed image or a decoded image corresponding to the first image block.

In one possible embodiment, the specific implementation steps of S704 include (1) - (3):

(1) and carrying out first preset treatment on the second characteristic diagram according to the first characteristic diagram to obtain a target second characteristic diagram.

In one embodiment, a first preset processing parameter is determined based on the first feature map and the second feature map; or, determining a first preset processing parameter based on the first feature map, the second feature map and the first auxiliary information; performing first preset processing on the second characteristic diagram based on a first preset processing model to obtain a target second characteristic diagram; the first preset processing model comprises a first processing model determined according to the first preset processing parameter.

In another embodiment, the first preset processing model includes the first processing model and a second processing model, and the performing the first preset processing on the second feature map based on the first preset processing model to obtain the target second feature map includes:

determining coordinates of sampling points in the second characteristic diagram according to the first processing model and the second processing model;

determining a target pixel value corresponding to the coordinate of the sampling point according to the second characteristic diagram and the sampling kernel function;

and generating a target second characteristic diagram according to the target pixel value corresponding to the sampling point coordinate.

Optionally, the first preset processing is warping processing, the target second feature map is a warped second feature map, the first preset processing parameter is a warping parameter, and the first preset processing model is a warping model.

By performing warping processing on the second feature map corresponding to the second image block of the second viewpoint, the feature maps of different viewpoints can be mapped to each other, and here, the second feature map of the second viewpoint is mapped to the first viewpoint, so that the attributes such as the shape and size of the object in the warped second feature map and the first feature map of the first viewpoint are similar. By fusing the distorted second feature map and the distorted first feature map, the quality of the reconstructed image corresponding to the first image block can be improved in the filtering stage, and the distortion of the reconstructed image is reduced, so that the original image corresponding to the first image block can be better restored.

In one embodiment, the first preset process in step (1) is a warping process, which may include the following steps: determining a warping parameter based on the first feature map and the second feature map; or determining a warping parameter based on the first feature map, the second feature map and the first auxiliary information; or determining a warping parameter based on the first feature map, the second feature map and the second auxiliary information; carrying out warping processing on the second feature map based on a warping model to obtain a warped second feature map; the warping model includes a first process model determined from the warping parameters.

Optionally, the feature maps output by the feature extraction units in the feature extraction modules of each hierarchy in the feature extraction network are a first feature map and a second feature map, such as the feature maps Fd 1-Fdn (corresponding to the first feature map) and Fi 1-Fin (corresponding to the second feature map) shown in the aforementioned fig. 8 a. Optionally, for the second feature maps obtained by pyramid hierarchical extraction, the second feature map of each layer may be subjected to a first preset process (e.g., a warping process) to obtain a target second feature map (e.g., a warped second feature map).

The warping principle of the second feature map Fix is explained below by taking the first feature map Fdx and the second feature map Fix output by the feature extraction unit x in the feature extraction module of the x-th (x ∈ [1 ∈ n ]) layer as an example:

the warping parameter is output by the warping parameter determination module receiving the first feature map Fdx from the first viewpoint and the second feature map Fix from the second viewpoint. Optionally, the first feature map Fdx of the first viewpoint has a width Wdx, a height Hdx, and a channel number Cdx; the second feature map Fix at the second viewpoint has a width of Wix, a height of Hix, and a number of channels of Cix; the warping parameter determination module is constructed based on a neural network. For example, the warping parameter determining module may be implemented by using a fully connected layer or a convolutional layer, and it should be noted that the warping parameter determining module further includes a regression layer for generating the warping parameter.

Optionally, a neural network-based warping parameter determination module may be constructed through a neural network learning algorithm, and the warping processing module is capable of establishing a mapping relation of the input variables (including the first feature map and the second feature map) to the warping parameters. For example, in training the neural network-based warping parameter determination module, first, a training sample is created, the training sample comprising an input and an output, optionally the input comprising a first feature map, a second feature map, and the output comprising warping parameters, where the warping parameters comprise warping parameters labeled for the second feature map of different warping types (e.g., cropping, translation, rotation, scaling, and tilting). Then, forward propagation calculation is carried out by utilizing the training samples, and input and output of each layer of neurons are obtained. Then, the error between the estimated warping parameter output by the neural network and the marked warping parameter is calculated, and the sum of squares of the errors of the networks is minimized by adjusting the weights and offset values of the networks of the respective layers. And finally, when the error reaches the preset precision, the obtained weight and deviation value of each layer of network are used as final values to finish the training of the neural network. In the prediction stage, when the neural network-based warping parameter determination module receives the first feature map and the second feature map, the trained neural network included in the warping parameter determination module can accurately determine the warping parameter. It should be noted that, the warping parameter determining module may further receive a first sub-feature map of the first viewpoint and a second sub-feature map of the second viewpoint, and determine the warping parameter.

Alternatively, through the similarity between the features in the first feature map and the second feature map, the correspondence between the target pixel coordinates in the grid (i.e., the pixel coordinates of the warped feature map) and the corresponding pixel coordinates in the second feature map may be determined. Optionally, the correspondence is used to determine a warping parameter.

In one embodiment, the distortion parameters include at least one of: parameters relating to affine transformation, parameters relating to projective transformation. It should be noted that, by adopting the pyramid hierarchical extraction process, the warping parameters may be different for the first feature map and the second feature map output by different layers.

After the warping parameter is determined, a warping model can be obtained according to the determined warping parameter, and the warping model can reflect the mapping relationship between the corresponding pixel coordinates of the second feature map Fix and the warped second feature map Fiwx.

In a possible implementation manner, the warping model includes a first processing model and a second processing model, the first processing model is a warping model determined according to the warping parameter, the second processing model includes target pixel coordinates, and the warping processing on the second feature map using the warping model may include: determining coordinates of sampling points in the second characteristic diagram according to the first processing model and the second processing model; determining a target pixel value corresponding to the coordinate of the sampling point according to the second characteristic diagram and the sampling kernel function; and generating a distorted second characteristic diagram according to the target pixel value corresponding to the sampling point coordinate.

The first processing model for warping parameter determination comprises any one of an affine transformation matrix, a projective transformation matrix, and a combination of the affine transformation matrix and the projective transformation matrix; the second processing model is a pixel grid (grid) model G, assuming G = { Gi }, Gi being the target pixel coordinates (x) of the grid in the output profile_i ^t，y_i ^t) Optionally, the pixels are warped output pixels. In one embodiment, the pixel mesh model G is a predefined mesh model. In other embodiments, the pixel mesh model G may be determined from the features and/or side information in the first feature map. Determining the pixel mesh model G from the features and/or side information in the first feature map may make the setup of the pixel mesh model more flexible.

Coordinates of the sampling points in the second characteristic diagram can be obtained according to the first processing model and the second processing model, the first processing model is assumed to be an affine transformation matrix, and coordinates of the sampling points defined in the second characteristic diagram Fix are (x)_i ^s，y_i ^s) Then, the affine transformation of pixel formula is:

alternatively, theta_ijThe warping parameter is composed of affine transformation matrix, and the second processing model is based on the target pixel coordinate included in the pixel gridA constant 1 is added to form homogeneous coordinates. Some common warping transformations can be represented by homogeneous coordinates.

In a preferred embodiment, a standardized coordinate system may also be used. For example, the target pixel coordinates (x) of the grid in the output feature map (i.e., the warped second feature map) can be normalized by the normalized coordinate system_i ^t，y_i ^t) Is limited to a range of-1 to 1, and coordinates (x) of the sampling points defined in the second profile are determined_i ^s，y_i ^s) Is limited to a value in the range of-1 to 1. In this way, subsequent sampling and transformation can be applied to a standardized coordinate system.

The input second feature map may be cropped, translated, rotated, scaled, and tilted by the warping model to form an output feature map (i.e., a warped second feature map).

In another possible embodiment, the warping model comprises a plurality of warping submodels, each assigned a corresponding weight. By distributing different weights to each distortion submodel, a better distortion model combination can be obtained, and the distortion model combination can emphasize different aspects to perform distortion processing on the second characteristic diagram, so that the distortion effect is better.

The above principle of warping the second feature map based on the warping model to obtain the coordinates of the sampling points can be regarded as a part of the sampling process performed on the second feature map, and then the warped second feature map Fiwx can be generated according to the sampling result (i.e. the coordinates of the sampling points). Alternatively, after the coordinates of the sample point of the second feature map are determined, the coordinates (x) of the sample point may be defined in the second feature map_i ^s，y_i ^s) And applying a sampling kernel function to obtain the pixel value of the pixel point corresponding to the sampling point coordinate in the output characteristic diagram. See the following expressions in particular:

alternatively,

and

is a parameter of the sampling kernel k () which is a function on image interpolation. In one embodiment, it may be a bilinear interpolation function.

Is a value at position (n, m) in channel c of the input second feature map,

is the sample point coordinate (x) in channel C_i ^s，y_i ^s) The pixel value of pixel i.

And

is the width and height of the grid. H and W are the width and height of the second feature map. C is the number of channels. The final warped second feature map includes the target pixel values corresponding to the coordinates of the sample points.

For the logic of the warping processing in the above-described step (1), it can be implemented by the following warping processing module, please refer to fig. 9a and 9b, which are schematic structural diagrams of the first preset processing module provided in the embodiment of the present application, and the first preset processing module is specifically a warping processing module, and includes a warping parameter determining module, a warping model determining module, and a sampling module. The functions for the individual modules are as follows: the warping parameter determining module is described above, and is not described herein again, the warping model determining module may receive the warping parameters output by the warping parameter determining module, and then output a warping model, for example, the warping parameters are parameters related to affine transformation, the warping model correspondingly includes an affine transformation matrix, and the sampling module is configured to perform warping and sampling processing on the second feature map by combining the warping model, including obtaining coordinates of sampling points, calculating target pixel values at coordinates of the sampling points of the second feature map by using a sampling kernel function, and then output a warped second feature map Fiwx.

The warp processing modules as shown in fig. 9a and 9b differ in that: the inputs received are different. Fig. 9a shows that the warp processing module receives the first feature map and the second feature map to determine the warp model. And the warping processing module shown in fig. 9b may receive side information (including first/second side information, such as depth information or disparity information) in addition to the first feature map and the second feature map to determine a warping model. In one embodiment, the warping parameters are output by the warping parameter determination module receiving the first feature map Fdx from the first viewpoint, the second feature map Fix from the second viewpoint, and the side information. Optionally, the first feature map Fdx of the first viewpoint has a width Wdx, a height Hdx, and a channel number Cdx; the second feature map Fix at the second viewpoint has a width of Wix, a height of Hix, and a number of channels of Cix; the warping parameter determination module is constructed based on a neural network. For example, the warping parameter determining module may be implemented by using a fully connected layer or a convolutional layer, and it should be noted that the warping parameter determining module further includes a regression layer for generating the warping parameter.

Optionally, a neural network-based warping parameter determination module may be constructed through a neural network learning algorithm, and the warping processing module is capable of establishing a mapping relation of the input variables (including the first feature map, the second feature map and the first auxiliary information) to the warping parameters. For example, in training the neural network-based warping parameter determination module, first, a training sample is created, the training sample comprising an input and an output, optionally the input comprising a first feature map, a second feature map, and first auxiliary information, and the output comprising warping parameters, where the warping parameters comprise warping parameters labeled for the second feature map of different warping types (e.g., cropping, translation, rotation, scaling, and tilting). Then, forward propagation calculation is carried out by utilizing the training samples, and input and output of each layer of neurons are obtained. Then, the error between the estimated warping parameter output by the neural network and the marked warping parameter is calculated, and the sum of squares of the errors of the networks is minimized by adjusting the weights and offset values of the networks of the respective layers. And finally, when the error reaches the preset precision, the obtained weight and deviation value of each layer of network are used as final values to finish the training of the neural network. In the prediction stage, when the neural network-based warping parameter determination module receives the first feature map and the second feature map, the trained neural network included in the warping parameter determination module can accurately determine the warping parameter.

Alternatively, through the auxiliary information (e.g., the first auxiliary information and the second auxiliary information), the correspondence between the target pixel coordinates in the mesh (i.e., the pixel coordinates of the warped feature map) and the corresponding pixel coordinates in the second feature map may be determined. Optionally, the correspondence is used to determine a warping parameter. As can be seen from the foregoing, in the feature extraction stage, a mapping relationship between the features of the first image block and the second image block and the depth information and/or the disparity information may be established. Further, through the mapping relationship between the features of the first image block and the second image block and the depth information and/or the disparity information, the relationship between the target pixel coordinates in the grid (i.e., the pixel coordinates of the distorted feature map) and the corresponding pixel coordinates in the second feature map can be determined.

Since the difference (e.g., the difference in size and shape) between the images of the first viewpoint and the second viewpoint is mainly caused by disparity or depth, the warping process with reference to the depth information or disparity information may make the warped second feature map more match the corresponding first feature map of the first viewpoint, thereby making the quality of the subsequent fused feature map higher.

In an embodiment, the processing of the first feature map and the second feature map output by each feature extraction unit in the feature extraction network by the warping processing module is the same, that is, the second feature map is warped by using the first feature map, so as to obtain a warped second feature map. Please refer to a structure diagram of a combination of a feature extraction network and a first pre-processing module shown in fig. 10, which is proposed based on the structure diagram of the feature extraction network shown in fig. 8a, where the first pre-processing module is specifically a warping processing module. As shown in fig. 10, the image warping system includes N warping processing modules and N feature extraction modules, the feature extraction network processes the first image block and the second image block to obtain a corresponding first feature map and a corresponding second feature map, and the feature maps Fdx (x e [1, N ]) and Fix output by each feature extraction unit are correspondingly processed by the warping processing module to obtain a warped second feature map Fiwx, that is, the warped second feature map finally includes N warped second feature maps. It is understood that the feature extraction module and the warping processing module with the same reference number belong to the same level of processing, for example, the feature extraction module 1 and the warping processing module 1 belong to level 1, the feature extraction module 2 and the warping processing module 2 belong to level 2, and so on, including N levels, forming a pyramid hierarchy, each level having the feature extraction module and the warping processing module.

(2) And performing second preset treatment according to the first characteristic diagram and the target second characteristic diagram to obtain a target characteristic diagram.

In an embodiment, a second preset process may be performed on the first feature map and the target second feature map by using a feature fusion network to obtain a target feature map.

Optionally, the first preset processing is warping processing, the second preset processing is feature fusion processing, the target second feature map is a warped second feature map, the first preset processing parameter is a warping parameter, the first preset processing model is a warping model, and the target feature map is a fusion feature map.

The first feature map may be a feature map obtained by pyramid hierarchical extraction, and corresponds to an output of a feature extraction unit in any one of the feature extraction modules in the feature extraction network, and the distorted second feature map is a result obtained by distorting the second feature map output by each layer of feature extraction unit corresponding to the distortion processing module.

In an embodiment, the implementation manner of obtaining the fused feature map may be: performing feature fusion processing on the first feature map and the distorted second feature map by using a feature fusion network to obtain a fusion feature map; optionally, the feature fusion network includes N feature fusion modules and M upsampling modules, where M is an integer greater than or equal to 1, and M +1= N; the input of the ith feature fusion module in the feature fusion network is connected with the output of the ith warping processing module in the N warping processing modules, the output of the ith feature fusion module is connected with the input of the jth up-sampling module, j is an integer greater than or equal to 1 and less than or equal to M, and i = j + 1; the output of the jth upsampling module is connected with the input of the jth feature fusion module; the ith warping processing module is used for warping a second feature map output by the ith feature extraction module in the feature extraction network, wherein i is an integer greater than or equal to 1 and is less than or equal to N; the Nth feature fusion module is used for fusing the distorted second feature map output by the Nth distortion processing module and the first feature map output by the Nth feature extraction unit; and when i is not equal to N, the ith feature fusion module is used for fusing the distorted second feature map output by the ith distortion processing module, the first feature map output by the ith feature extraction unit and the feature map output by the ith up-sampling module.

The feature fusion network is used for fusing the distorted second feature map output by the distortion processing module and the first feature map output by the feature extraction network. On the basis of the first feature map and the second feature map obtained by pyramid layered extraction, the feature fusion network corresponds to the feature extraction network, and can also be divided into N levels, each level comprises a feature fusion module and/or an up-sampling module, and each level has N feature fusion modules and M (namely N-1) up-sampling modules. And correspondingly, N distortion processing modules are used for carrying out distortion processing on the second characteristic diagram. Based on the above, please refer to fig. 11 showing a schematic structural diagram including a feature fusion network, where the result output by each feature fusion module is input into the last upsampling module, that is, the output of the nth feature fusion module is the input of the N-1 th upsampling module, so that the feature map output by the 1 st level feature fusion module in the pyramid hierarchical structure is the fusion feature map used for final filtering.

The following describes a specific processing flow of the fusion processing with reference to the schematic configuration diagram shown in fig. 11:

at the Nth level: the warping processing module N receives and processes the first feature map Fdn and the second feature map Fin output by the feature extraction module N, outputs a warped second feature map Fiwn, then, the feature fusion module N performs feature fusion on the first feature map Fdn and the warped Fiwn to obtain a fused feature map Fdfn, and the fused feature map Fdfn is input to the upsampling module M at the N-1 th level, that is, the upsampling module (N-1) to perform upsampling processing.

Level N-1: the up-sampling module (N-1) performs up-sampling processing on the fused feature map Fdfn to obtain an up-sampled feature map Fun, the up-sampled feature map Fun is output to the feature fusion module N-1 of the N-1 th level, meanwhile, the distortion processing module (N-1) receives and processes the first feature map Fd (N-1) and the second feature map Fi (N-1) output by the feature extraction module (N-1), and outputs a distorted second feature map Fiw (N-1). Next, the feature fusion module (N-1) performs feature fusion on the first feature map Fd (N-1), the warped feature map Fiw (N-1), and the upsampled feature map Fun to obtain a fused feature map Fdf (N-1). The fused feature map Fdf (N-1) is input to the upsampling module (N-2) of the N-2 th level for upsampling.

Level N-2: the up-sampling module (N-2), that is, the up-sampling module (M-1), up-samples the fused feature map Fdf (N-1) to obtain an up-sampled feature map Fu (N-1), and outputs the up-sampled feature map Fu (N-1) to the feature fusion module (N-2) of the N-2 th level, and the feature fusion module (N-2) performs feature fusion on the warped second feature map Fiw (N-2) output by the warping processing module (N-2), the first feature map Fd (N-2) output by the feature extraction module (N-2), and the up-sampled feature map Fu (N-1) to obtain a fused feature map Fdf (N-2).

By analogy, the subsequent levels all adopt the same processing mode until the level 1.

Level 1: the warping processing module 1 receives and processes the first feature map Fd1 and the second feature map Fi1 output by the feature extraction module 1, outputs a warped second feature map Fiw1, and then the fused feature module 1 performs feature fusion on the first feature map Fd1, the warped second feature map Fiw1, and the upsampled feature map Fu2 from the level 2 to obtain a fused feature map Fdf. The fused feature map Fdf may ultimately be used to determine the result of the processing of the first image block. In the process, the size of the feature map can be enlarged by up-sampling the fused feature map, so that the fused feature map with the same size as the input feature map is finally obtained, and the fused feature map is filtered to obtain a reconstructed image with higher quality.

It is to be understood that, in the structure diagram shown in fig. 11, as viewed in the lateral direction, there are N levels, and optionally, each of the level 1 to the level N-1 includes a feature extraction module, a warping processing module, a feature fusion module, and an upsampling module, and the level N includes a feature extraction module, a warping processing module, and a feature fusion module, and each level outputs processed data, i.e., a fused feature map, by the feature fusion module. In a longitudinal view, the processing logics corresponding to the feature extraction module and the distortion processing module in the N levels are from top to bottom, namely from the 1 st level to the nth level, and the processing of the feature fusion module and the up-sampling module is from bottom to top, namely from the nth level to the 1 st level, so that the pyramid processing model formed can realize accurate extraction of the feature map. It should be noted that the number N of the layers or the number of each module may be set as needed or may be set according to an empirical value, and is not limited herein.

In an embodiment, the feature fusion module performs feature fusion on the first feature map and the warped second feature map to obtain a fused feature map Fdf (i.e., a fused feature map) in an alternative manner: adding the first feature map on the corresponding channel and the distorted second feature map, wherein the number of the channels is unchanged (namely add operation); it can also be: the warped second feature map Fiw and the first feature map Fd are input into a connection layer (concatemate), and the fused feature (i.e., concatemate) is output by a connection operation, for example, each output channel of the connection layer is:

optionally, denotes convolution, C denotes the number of channels, X_iFirst characteristic diagram, Y, of the i-th channel_iSecond characteristic diagram, K, representing the ith channel_iRepresenting the convolution kernel, K, corresponding to the first profile_i+cRepresenting the convolution kernel corresponding to the second signature.

Based on the above, specific examples of the rough matching, the fine matching, the feature extraction, the warping processing, and the feature fusion involved in the above steps S702 and S703 may be summarized as shown in fig. 12a, which corresponds to a detailed structural schematic diagram of the fusion module shown in fig. 6, for example, the first image block is processed according to the second image block and the auxiliary information, that is, the fine matching, the feature extraction, and the warping processing may be included. In another embodiment, it is also possible to include a structure diagram corresponding to that shown in fig. 12b without fine matching. The processing logic of the fusion module is referred to as a third preset processing, and the fusion module is correspondingly referred to as a third preset processing module.

(3) And determining or generating a processing result corresponding to the first image block according to the target feature map.

In an embodiment, the target feature map may be subjected to filtering processing to obtain a filtered target feature map; and determining the processing result corresponding to the first image block according to the filtered target feature map. Further, the target feature map may be filtered by using a target filtering processing model, so as to obtain a filtered target feature map. Optionally, the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit. The step of utilizing the target filtering processing model to carry out filtering processing on the target characteristic diagram to obtain a filtered target characteristic diagram comprises the following steps: performing down-sampling processing on the target feature map processed by at least one first processing unit to obtain a down-sampled target feature map; performing up-sampling processing on the down-sampled target feature map to obtain a target fusion feature map; and processing the target fusion characteristic diagram by using the second processing unit to obtain a filtered target characteristic diagram.

Optionally, the target feature map is a fused feature map. And the processing result is used for generating a reconstructed image or a decoded image corresponding to the first image block. When the scheme provided by this embodiment is applied to the multi-view encoder side, the processing result is used to generate a reconstructed image corresponding to the first image block, and when the scheme is applied to the multi-view decoder side, the processing result is used to generate a decoded image corresponding to the first image block. In one embodiment, an optional implementation of step (3) includes: filtering the fusion characteristic diagram to obtain a filtered fusion characteristic diagram; and determining a processing result corresponding to the first image block according to the filtered fusion feature map. The processing result includes the first image block after the first view filtering, for example, when the first image block is a current reconstructed texture image of a current frame of a view-dependent view, the processing result obtained here may be a reconstructed texture block after the current frame of the view-dependent view is filtered. The subsequent processing result may also be subjected to other filtering processing (e.g., ALF), and a reconstructed image or a decoded image is further synthesized according to the result obtained by the filtering processing.

In a possible embodiment, the filtering process for the fused feature map may be: filtering the fusion characteristic diagram by using a target filtering processing model to obtain a filtered fusion characteristic diagram; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter.

The target candidate model included in the target filtering processing model may be a neural network model provided in the neural network-based filtering processing module. Alternatively, the structure of the neural network-based filtering processing module may include at least one convolution layer and at least one residual unit as shown in fig. 13. The fused feature map Fdf (i.e. the fused feature map) is fed into convolutional layer 1, passes through D residual units, and outputs a first view-filtered first image block after one convolutional layer 2. Optionally, D is an integer greater than or equal to 1, and the neural network-based filtering processing module corresponds to a processing module in a neural network-based loop filter (e.g., DRNLF).

In one embodiment, each neural network-based filter processing module has a plurality of candidate models, each candidate model corresponding to a different quantization parameter, the quantization parameter being derived from a quantization parameter map (QP map), which is a matrix populated with a plurality of quantization parameters. In the training stage, a plurality of candidate models can be trained according to different quantization parameters, the optimal candidate model is corresponding to the quantization parameters, in the encoding stage, a target candidate model with the lowest rate distortion cost can be selected from the candidate models, and the target candidate model is used for filtering the fusion feature map.

In one embodiment, the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the filtering the fusion characteristic diagram by using the target filtering processing model to obtain a filtered fusion characteristic diagram includes: performing down-sampling processing on the fusion feature map processed by at least one first processing unit to obtain a down-sampled fusion feature map; performing up-sampling processing on the down-sampled fusion feature map to obtain a target fusion feature map; and processing the target fusion characteristic diagram by using the second processing unit to obtain a filtered fusion characteristic diagram.

Optionally, the target filtering processing model includes a first processing unit which is a convolution unit (or convolution layer), a second processing unit which is a residual unit, and as shown in fig. 13, the fused feature map is input into the residual units 1-D-1, and the residual data output by at least one of the residual units is scaled by a scaling process a (e.g., down-sampling process, or dividing by a scaling factor), and preferably, the residual data output by at least one of the residual units 1-D-1 can be scaled to a range of 0-1 by the scaling process; then, in the convolutional layer 2, after the convolutional layer 2 receives the residual data output by the residual unit D, the residual data of the residual unit D is scaled b (for example, upsampling or multiplying by a scaling factor corresponding to the scaling a), and then the residual data subjected to the scaling b and the fusion feature map Fdf are mapped and synthesized in the convolutional layer 2 in a corresponding relationship, so as to obtain the first image block after the first view-point current frame is filtered. In this way, the processing amount of the residual data is reduced to a certain range by carrying out scaling processing on the residual data output by at least one residual unit, so that the calculation complexity of a loop filter of a neural network for multi-view coding can be greatly reduced, and the filtering processing efficiency is improved.

In summary, according to the image processing scheme provided by the embodiment of the present application, in a loop filtering stage based on a neural network, in consideration of the influence of disparity between images from different viewpoints at the same time, depth information or disparity information of an image is merged into the image before filtering, and a second image block matched with a first image block is found from a reference frame by using the depth information or the disparity information, so that the most suitable second image block can be accurately and finely determined; by fusing the characteristic information of the second image block in the first image block, an enhanced image block with clearer texture and edge can be further obtained, so that the compression distortion of the video is reduced, and the video compression quality is improved. In addition, pyramid layering processing is adopted in the feature extraction processing process, feature pyramids with different scales are constructed, downsampling processing is combined, the calculated amount is reduced, meanwhile, information of image blocks of different viewpoints is more comprehensively described, feature information of the different viewpoints is combined by distorting the second feature map, fusing the first feature map and the distorted second feature map, feature information of the first viewpoint is better restored, and the quality of multi-viewpoint video compression coding is improved.

Third embodiment

Referring to fig. 14, fig. 14 is a flowchart illustrating an image processing method according to a third embodiment, where an execution main body in this embodiment may be a computer device or a cluster formed by a plurality of computer devices, and the computer device may be an intelligent terminal (such as the foregoing mobile terminal 100) or a server, and here, the execution main body in this embodiment is an intelligent terminal for example.

S1401, a second reconstructed image block is acquired. Optionally, the first reconstructed image block may also be acquired. Optionally, the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.

In an embodiment, the first reconstructed image block and the second reconstructed image block correspond to different reconstructed images. Different reconstructed images here refer to reconstructed frames from different viewpoints at the same time. The first reconstructed image block and the second reconstructed image block are matched, for example, the first reconstructed image block is a current reconstructed texture block of a current frame of a dependent view point, the second reconstructed image block is a matched texture block in a reference image of an independent view point, and depth information or parallax information of the second reconstructed image block is similar to the current reconstructed texture block. The reconstructed texture block may be any one of CTU, slice (slice), block (tile), and sub-picture, and the matching texture block corresponds to any one of CTU, slice (slice), block (tile), and sub-picture. Alternatively, the first reconstructed image block may be the aforementioned first image block corresponding to the first view point, and the second reconstructed image block may be an image block, such as a second image block, in the aforementioned reference image corresponding to the second view point. And the reconstructed image corresponding to the second reconstructed image block corresponds to the reference image of the second view point.

The second reconstructed image block may be obtained by: and determining a second reconstruction image block from the reference image block corresponding to the first reconstruction image block according to the attribute information of the first reconstruction image block. The attribute information may be auxiliary information, including but not limited to depth information or disparity information, and optionally, the attribute information may correspond to the first auxiliary information or the second auxiliary information, and the specific manner may refer to the manner of obtaining the second image block described in the foregoing second embodiment, which is not described herein again.

S1402, filtering the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block, and the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.

In an embodiment, the attribute information of the first reconstructed image block is auxiliary information, and the attribute information of the first reconstructed image block includes at least one of: inter-frame prediction information of the first reconstructed image block, depth information of the first reconstructed image block, and parallax information of the first reconstructed image block. Alternatively, the attribute information of the first reconstructed image block may correspond to the aforementioned first auxiliary information of the first image block corresponding to the first view.

The depth information or disparity information of the first reconstructed image block is determined from a corresponding depth image, and the depth information may include: depth feature information, depth value-based statistics, the depth slice itself, the pre-processed depth slice, or any combination thereof. In addition, the attribute information may further include image segmentation information, quantization parameter information, and the like, which is not limited herein. The attribute information of the second reconstructed image block may include inter-frame prediction information of the second reconstructed image block, depth information of the second reconstructed image block, and disparity information of the second reconstructed image block. The depth information or disparity information of the second reconstructed image block is determined from the corresponding depth image. I.e. the attribute information may be from the first reconstructed image block or the second reconstructed image block.

In an embodiment, if the first reconstructed image block is filtered according to the attribute information of the second reconstructed image block and the first reconstructed image block to obtain the filtered first reconstructed image block, a process of processing the first image block corresponding to the first view point and obtaining a processing result may be performed by referring to the first auxiliary information about the second image block and the first image block described in the foregoing second embodiment, that is, the first image block and the second image block are sequentially corresponding to the first reconstructed image block and the second reconstructed image block, and the obtained processing result is the filtered first reconstructed image block here. And will not be described in detail herein. Alternatively, the first reconstructed image block may also be filtered according to the attribute information of the second reconstructed image block and the second reconstructed image block, or according to the attribute information of the second reconstructed image block and the attribute information of the first reconstructed image block, or according to the attribute information of the second image block (which is not listed here).

In another embodiment, other reconstructed image blocks may also be obtained, and the first reconstructed image block is subjected to filtering processing in combination with more information, so as to improve the quality of the filtered first reconstructed image block. Namely, the more detailed implementation step of S1402 may further include:

a. acquiring a third reconstructed image block, wherein an image corresponding to the third reconstructed image block is a reference reconstructed image of an image corresponding to the first reconstructed image block;

b. and filtering the first reconstructed image block according to at least one of the attribute information of the third reconstructed image block, the attribute information of the second reconstructed image block and the attribute information of the first reconstructed image block to obtain the filtered first reconstructed image block.

The first reconstructed image block and the third reconstructed image block belong to image blocks coded at different moments of the same view point, for example, the first reconstructed image block is a current reconstructed texture block of a current frame depending on the view point, and the third reconstructed image block is a texture block corresponding to a reference frame depending on the view point. Correspondingly, the image corresponding to (or located in) the third reconstructed image block and the image corresponding to the first reconstructed image block are images of the same viewpoint and at different times, and the image corresponding to the third reconstructed image block is referred to as a reference reconstructed image. The reference reconstructed image is an encoded reconstructed image, belongs to a first view point, and can be read from an image buffer, and the third reconstructed image block can be obtained from the reference reconstructed image of the first view point according to inter-frame prediction information between the image in which the first reconstructed image block is located and the reference reconstructed image.

The first reconstructed image block and the second reconstructed image block are image blocks of different viewpoints at the same time, and the first reconstructed image block and the third reconstructed image block are image blocks of different moments from the same viewpoint, so that the third reconstructed image block and the second reconstructed image block are image blocks of different viewpoints at different times. The first reconstruction image block is filtered by referring to the image blocks of different viewpoints at the same time, the image blocks of the same viewpoint at different times and related attribute information, and information beneficial to filtering of the first image block can be comprehensively used from different shooting angles in space and different coding times in time, so that the filtering quality of the first reconstruction image block is effectively improved, and the distortion of a reconstruction image corresponding to the first reconstruction image block is reduced. The attribute information here may include inter prediction information, depth information or disparity information, quantization parameter information, and the like. The corresponding attribute information may be used adaptively at different stages, so that the first reconstructed image block is better filtered with reference to the different attribute information.

In a possible embodiment, the more detailed implementation step of step b may include:

performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the third reconstructed image block and the first target feature map according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

Optionally, the third preset processing is fusion processing, the first target feature map is a first fusion feature map, and the second target feature map is a second fusion feature map.

The fusion processing of the first reconstructed image block and the second reconstructed image block specifically means that a series of processing is performed according to a feature map corresponding to the first reconstructed image block and a feature map of the second reconstructed image block, the obtained feature map is called a first fusion feature map, and the information description of the first reconstructed image block by the first fusion feature map obtained by fusion can be more comprehensive by referring to depth information or parallax information. After the first fused feature map is obtained, the third reconstructed image block and the first fused feature map may be fused, specifically, a second fused feature map is obtained by fusing a feature map corresponding to the third reconstructed image block and the first fused feature map, and after the second fused feature map is filtered, a filtered second fused feature map is obtained, where the filtered second fused feature map is used to determine the filtered first reconstructed image block. It should be noted that the foregoing may be implemented by using a neural network-based loop filter, and corresponding functional modules are integrated in the neural network-based loop filter, and each functional module is executed according to the foregoing to increase the filtering quality of the first reconstructed image block. For a specific fusion processing manner, reference may be made to the related description below in conjunction with fig. 15.

Please refer to fig. 15, which is a schematic structural diagram of a loop filter based on a neural network according to this embodiment, where the schematic structural diagram includes a fusion module 1 and a fusion module 2, and a filtering processing module based on a neural network, where the filtering processing module based on a neural network may include filtering processing of one or more filters (e.g., DBF, SAO, ALF, DRNLF) in the loop filtering processing.

In one embodiment, the fusion module 1 and the fusion module 2 may include the same functional units, such as the aforementioned functional units of fine matching, feature extraction, warping processing, and feature fusion in the fusion module shown in fig. 12a, and the functional units of feature extraction, warping processing, and feature fusion shown in fig. 12 b. In another embodiment, the fusion module 1 and the fusion module 2 may also be different, for example, the fusion module 1 includes several functional units of fine matching, feature extraction, warping processing, and feature fusion, and the fusion module 2 includes several functional units of feature extraction, warping processing, and feature fusion; for another example, the fusion module 1 includes several functional units of feature extraction, warping processing, and feature fusion, and the fusion module 2 includes several functional units of fine matching, feature extraction, warping processing, and feature fusion. It should be noted that the specific processing logic corresponding to the above functional units is the same as that described in the second embodiment, and the difference is only in the difference of the corresponding input and output results.

The fusion module 1 is used for performing fusion processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information, and the fusion module 2 is used for performing fusion processing on the third reconstruction image block and the first fusion feature map according to the inter-frame prediction information. In combination with the above, for example: in the fusion module 1, fine matching may be performed according to the depth information or the parallax information, and then the feature map corresponding to the reconstructed image block is extracted, and after fusion processing, a first fusion feature map is obtained. And then the fusion module 2 receives the inter-frame prediction information, extracts the feature map corresponding to the third reconstructed texture block, fuses the feature map with the first fusion feature map after processing to obtain a second fusion feature map, and references the inter-frame prediction information when fusing different feature maps, so that the second fusion feature map can more accurately express the feature map of the first reconstructed image block. And then, filtering the second fusion reconstruction characteristic diagram based on a neural network to obtain a filtered first reconstruction image block.

Illustratively, the first reconstructed image block is a current reconstructed texture block of a current frame from a dependent view point, the second reconstructed image block is a reconstructed texture block corresponding to an independent view point reference frame, and the third reconstructed image block is a reconstructed texture block corresponding to a dependent view point reference frame, and the current reconstructed texture block after filtering of the current frame from the dependent view point is obtained through processing by a loop filter based on a neural network shown in fig. 15.

In one embodiment, the step of performing the third preset processing on the first reconstructed image block and the second reconstructed image block is: determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block; performing first preset processing on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset processing; and performing second preset treatment according to the second reconstruction characteristic diagram after the first preset treatment and the first reconstruction characteristic diagram to obtain a first target characteristic diagram.

Optionally, the third preset processing is fusion processing and includes first preset processing and second preset processing, the first preset processing is distortion processing, the second preset processing is feature fusion processing, the second reconstructed feature map after the first preset processing is a distorted second reconstructed feature map, and the first target feature map is a first fused feature map.

It should be noted that the third preset processing further includes feature extraction processing. The following specific processing manner of the third preset processing in this embodiment may also be implemented by using the above steps, for example, the third preset processing on the first reconstructed image block and the third reconstructed image block, which is not further described herein. In this embodiment, the fusion process indicated by the third preset process and the feature fusion process indicated by the second preset process are different processing logics.

Optionally, the processing logic involved in determining the feature maps corresponding to the reconstructed image blocks, performing the warping processing on the second reconstructed feature map, and fusing the reconstructed feature maps is the same as the processing logic involved in determining the first feature map and the second feature map, performing the warping processing on the second feature map, and fusing the first feature map and the second feature map in the foregoing second embodiment. For example, the feature extraction network shown in fig. 8a or 8b may be used to extract a feature map of a reconstructed image block, the warping processing module shown in fig. 9a or 9b may be used to obtain a warped second reconstructed feature map, and the feature fusion network shown in fig. 11 may be used to perform fusion processing. For the corresponding processing principle, reference may be made to the description in the second embodiment, and the processing object and the processing result therein may be substituted into the relevant content in this embodiment, specifically, refer to the following content, which is a brief introduction of the corresponding content:

in one embodiment, it is possible to: and performing feature extraction processing on the first reconstruction image block and the second reconstruction image block based on a feature extraction network and depth information or parallax information of the first reconstruction image block to obtain a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block.

In another embodiment, the first reconstructed image block is a slice and the second reconstructed image block corresponds to a slice; in this case, the first and second reconstructed feature maps may be determined after fine matching, that is: acquiring a first reconstructed image sub-block of the first reconstructed image block and a second reconstructed image sub-block of the second reconstructed image block; the attribute information of the second reconstructed image subblock is matched with the attribute information of the first reconstructed image subblock; performing feature extraction processing on the first reconstructed image subblock and the second reconstructed image subblock based on a feature extraction network and attribute information of the first reconstructed image subblock to obtain a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block; optionally, the attribute information includes depth information or disparity information, and the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block.

The matching of the attribute information corresponding to each of the first reconstructed image sub-block and the second reconstructed image sub-block means that the similarity of the attribute information of the two image sub-blocks is the largest, for example, the similarity of depth information or disparity information is the largest, and the attribute information includes at least one of the following: the attribute information of the first reconstructed image block and the attribute information of the first reconstructed image sub-block have different contents, for example, the depth information of the first reconstructed image sub-block is depth feature information, and the depth information of the second image sub-block is statistical information based on depth values; the precision of the attribute information of the first reconstructed image sub-block is greater than that of the attribute information of the first reconstructed image block, for example, the depth information of the first reconstructed image block is n depth feature information, the attribute information of the first reconstructed image sub-block is m depth feature information, and m is greater than n.

Optionally, the first reconstructed image sub-block and the second reconstructed image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree block is obtained after the edge of the coding tree block is extended, and the size of the extended coding tree block is larger than that of the coding tree block. The input image block has adjacent pixels of the coding unit, and the blocking effect influence of image block division can be effectively reduced in the filtering stage.

Optionally, the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, each feature extraction module in the first N-1 feature extraction modules includes a feature extraction unit and a downsampling unit connected in series, and the nth feature extraction module includes a feature extraction unit. For the specific structural design of the feature extraction network, the structure shown in fig. 8a or fig. 8b may be adopted, except that: and the first feature extraction module in the N cascaded feature extraction modules is used for processing the first reconstructed image block and the second reconstructed image block or the first reconstructed image sub-block and the second reconstructed image sub-block. The feature extraction unit processes a feature map obtained by the corresponding reconstructed image block or the reconstructed image sub-block, which is called a reconstructed feature map and includes a first reconstructed feature map and a second reconstructed feature map, the attribute information may be used as supervision information or reference information, which includes depth information or disparity information, and has a similar effect to the first auxiliary information or the second auxiliary information, which is not described herein again, and in a scene where the third reconstructed image block is processed, the feature extraction network may also receive inter-frame prediction information.

A first preset process for the second reconstructed feature map, namely: determining a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, determining a first preset processing parameter based on the first reconstruction feature map, the second reconstruction feature map and the attribute information; and carrying out first preset treatment on the second reconstruction characteristic diagram based on a first preset treatment model to obtain a first preset treated second reconstruction characteristic diagram, wherein the first preset treatment model comprises a first treatment model determined according to the first preset treatment parameters.

In an embodiment, the first predetermined process is a warping process, the first predetermined process model is a warping model, and the first predetermined process parameter is a warping parameter.

Optionally, the first preset process based on the first preset process model is as follows: determining sample point coordinates of the second reconstructed feature map, optionally target pixel coordinates comprised by the second processing model, from the first processing model and the second processing model; determining a target pixel value corresponding to the sampling point coordinate according to the second reconstruction feature map and the sampling kernel function; and generating a second reconstruction characteristic map after the first preset treatment according to the target pixel value corresponding to the sampling point coordinate. The first preset processing on the second feature map can be referred to, and the corresponding result can be obtained by substituting the corresponding content into the second reconstructed feature map here, which is not described herein again.

After the second reconstruction characteristic map after the first preset processing is obtained, second preset processing is carried out on the second reconstruction characteristic map and the first reconstruction characteristic map, and the second preset processing comprises the following steps: and performing second preset processing on the first reconstruction feature map and the second reconstruction feature map subjected to the first preset processing by using a feature fusion network to obtain a first fusion reconstruction feature map. The second preset process is a feature fusion process, where the feature fusion network in fig. 11 may adopt a feature fusion network, and the feature fusion network performs the second preset process by combining the first preset processing module and the feature extraction network, fuses the first reconstructed feature map obtained by the pyramid layering process and the second reconstructed feature map after the first preset process, and finally outputs the first fused reconstructed feature map by the feature fusion module. The detailed description of the specific processing flow is omitted here.

The implementation manner of performing filtering processing on the second target feature map may be: filtering the second target characteristic diagram by using a target filtering processing model to obtain a filtered second target characteristic diagram; generating a filtered first reconstruction image block according to the filtered second target feature map; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter. In one embodiment, the target filtering process model includes at least one processing unit including one or both of a first processing unit and a second processing unit; the filtering the second target feature map by using the target filtering processing model to obtain a filtered second target feature map includes: performing down-sampling processing on the second target feature map processed by at least one first processing unit to obtain a down-sampled second target feature map; performing up-sampling processing on the down-sampled second target feature map to obtain a target fusion reconstruction feature map; and processing the target fusion reconstruction characteristic diagram by using a second processing unit to obtain a filtered second target characteristic diagram.

For the filtering process of the second target feature map, reference may be made to the filtering process of the target feature map in the second embodiment, and processing logics thereof are the same, and are not described herein again.

In another possible embodiment, the more detailed implementation step of step b may include:

performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; filtering the second target characteristic diagram to obtain a filtered first target characteristic diagram; performing third preset processing on the filtered first target feature map and the filtered third reconstructed image block according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

Optionally, the third preset processing is fusion processing, and includes first preset processing and second preset processing, where the first preset processing is distortion processing, the second preset processing is feature fusion processing, the first target feature map is a first fusion feature map, and the second target feature map is a second fusion feature map. In addition, the third preset processing module further comprises feature extraction processing.

The fusion processing of the first reconstructed image block and the second reconstructed image block, which has been described above, is not described herein again; the filtered first fusion feature map and the filtered third reconstruction image block are subjected to fusion processing, specifically, a second fusion feature map is obtained according to a series of processing of feature maps corresponding to the first fusion feature map and the third reconstruction image block, the mode is the same as that of the fusion of the first reconstruction image block and the second reconstruction image block, and corresponding contents can refer to the fusion of the first reconstruction image block and the second reconstruction image block, and description is not expanded here; the filtering processing on the first fused feature map and the second fused feature map may be implemented by using a filtering processing module based on a neural network as shown in fig. 13, and the corresponding processing manner is also described above, and is not described here again.

Please refer to fig. 16, which is a schematic structural diagram of another loop filter based on a neural network according to an embodiment of the present application, including two fusion modules and two filtering processing modules based on a neural network, where the fusion module 1 and the fusion module 2 may include the same functional units or different functional units, and reference may be specifically made to the description of the fusion module 1 and the fusion module 2 in fig. 15, which is not described herein again. The filter processing module 1 based on the neural network and the filter processing module 2 based on the neural network may both adopt the contents shown in fig. 13. Different from fig. 15, the filtering process is performed according to the structure of the loop filter based on the neural network as shown in fig. 16, the first fused feature map obtained by the fusion module 1 is input into the fusion module 2 for processing after the filtering process based on the neural network, so as to obtain the second fused feature map, and the second fused feature map is filtered by the filtering process module based on the neural network, in this embodiment, by the serial connection of the combination of the two fusion modules and the filtering process module based on the neural network, the information of the frames at the same different viewpoints can be referred to in the first fusion process, the feature information of the original image can be effectively maintained by the filtering process based on the neural network, and then the information of the reference frames at the same viewpoints can be further referred to by the filtering process based on the neural network again, so as to fully fuse various pieces of information related to the first reconstructed image block, and further reducing the distortion degree of the reconstructed image corresponding to the first reconstructed image block.

It should be noted that, in this embodiment, the number and the combination of the fusion module and the neural network-based filtering processing module included in the neural network-based loop filter shown in fig. 15 to 17 are only described and illustrated as examples, and other combinations may also be included, for example, on the basis of fig. 16, one fusion module and one neural network-based filtering processing module are further connected in series, which is not limited herein.

In another possible embodiment, the more detailed implementation step of step b may include: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the first reconstructed image block and the third reconstructed image block according to inter-frame prediction information of the third reconstructed image block and the first reconstructed image block to obtain a second target feature map; and determining a filtered first reconstruction image block according to the first target feature map and the second target feature map.

Optionally, the third preset processing is fusion processing, and includes first preset processing and second preset processing, where the first preset processing is distortion processing, the second preset processing is feature fusion processing, the first target feature map is a first fusion feature map, and the second target feature map is a second fusion feature map. It should be noted that the third preset process may further include a feature extraction process.

The method includes performing fusion processing on a first reconstructed image block and a second reconstructed image block, specifically performing a series of processing according to a feature map corresponding to the first reconstructed image block and a feature map corresponding to the second reconstructed image block, optionally, during the specific processing, first performing fine matching through depth information or parallax information, finding a reconstructed image sub-block matched with a reconstructed image sub-block of the first reconstructed image block from the second reconstructed image block, and then performing feature extraction on each reconstructed image sub-block with reference to the depth information or the parallax information to obtain a corresponding feature map, and in an embodiment, directly extracting features of the reconstructed image blocks without performing the fine matching to obtain corresponding feature maps; and then, after distortion processing is carried out on the second reconstruction feature map corresponding to the second reconstruction image block by referring to the depth information or the parallax information, the second reconstruction feature map is fused with the first reconstruction feature map corresponding to the first reconstruction image block to obtain a first fusion feature map. Similarly, the first reconstructed image block and the third reconstructed image block can be fused according to the inter-frame prediction information in the same manner to obtain a second fusion feature map.

It should be noted that the feature map of the image block or the image sub-block may be extracted by using a pyramid hierarchical processing manner, and the detailed process may refer to the foregoing description and is not described herein again.

In one embodiment, the step of determining the filtered first reconstructed image block from the first target feature map and the second target feature map may comprise: filtering the first target characteristic diagram and the second target characteristic diagram to obtain a filtered first target characteristic diagram and a filtered second target characteristic diagram; performing third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block; and taking the target fusion reconstruction image block as a first reconstruction image block after filtering.

Optionally, the third preset processing is fusion processing and includes first preset processing and second preset processing, the first preset processing is distortion processing, the second preset processing is feature fusion processing, the first target feature map is a first fusion feature map, the second target feature map is a second fusion feature map, the filtered first target feature map is a filtered first fusion feature map, and the filtered second target feature map is a filtered second fusion feature map. In addition, the third preset processing further includes feature extraction processing.

The filtering process here may include performing filtering process on the first fused feature map and the second fused feature map by using different neural network models, and the way of performing the filtering process on both the first fused feature map and the second fused feature map and then performing the fusing process may be: and after the second filtered fusion feature map is subjected to distortion processing, the second filtered fusion feature map is fused with the first filtered fusion feature map, or the second filtered fusion feature map is directly subjected to fusion processing, and a target fusion reconstruction image block, namely the first filtered reconstruction image block, is obtained according to the fusion feature map.

Fig. 17 is a schematic structural diagram of another neural network-based loop filter provided in the embodiment of the present application, and includes a fusion module 1, a fusion module 2, and a fusion module 3, and a neural network-based filtering processing module 1 and a neural network-based filtering processing module 2. The internal structures of the fusion modules may be the same or different, for example, the fusion module 3 includes three functional units of feature extraction, warping processing and feature fusion, and the fusion module 1 and the fusion module 2 include four functional units of fine matching, feature extraction, warping processing and feature fusion. The fusion module 1 is configured to process the first reconstructed image block and the second reconstructed image block according to the depth information or the parallax information to obtain a first fusion feature map, where the first reconstructed image block may be a current reconstructed texture block of a current frame of a dependent view point, and the second reconstructed image block may be a matching texture block corresponding to an independent view point reference frame. The fusion module 2 is configured to process the first reconstructed image block and the third reconstructed image block according to the inter-frame prediction information to obtain a second fusion feature map, where the third reconstructed image block may be a texture block corresponding to the dependent view reference frame. The fusion module 3 may perform fusion processing according to the first fusion feature map and the second fusion feature map, for example, perform fusion processing on the first feature map and the warped second fusion feature map, and obtain a filtered first reconstructed image block, for example, a filtered current reconstructed texture block after depending on the viewpoint current frame, according to the fusion feature map.

In the embodiment, the first reconstruction image block is processed by referring to different attribute information, the feature information of the second reconstruction image block and the feature information of the third reconstruction image block through the parallel fusion module and the filtering processing module based on the neural network to obtain the filtered first reconstruction image block, so that auxiliary information beneficial to filtering can be obtained from images of different viewpoints and coded images of the same viewpoint, the quality of a reconstruction image where the first reconstruction image block is located is improved, and video distortion is reduced.

In summary, with the solution provided by the embodiment of the present application, based on encoded reconstructed image blocks (including reconstructed image blocks of the same viewpoint and different time points and reconstructed image blocks of the same viewpoint and different time points), and in combination with attribute information (including depth information or disparity information, inter-frame prediction information) about a first reconstructed image block, a filtering process may be performed on a currently encoded reconstructed image block, and may include a feature extraction process, a warping process, a feature fusion process, and the like, in a specific implementation, a combination of different numbers of fusion modules and a filtering process module based on a neural network may be used, so that through combination of multiple fusion and filtering processes, useful information in other reconstructed image blocks is fully referred to, and other related feature information is fused into a final filtering result, thereby effectively improving quality of reconstruction of a filtered first image block, distortion of the reconstructed image is reduced.

Fourth embodiment

Referring to fig. 18, fig. 18 is a schematic structural diagram of an image processing apparatus according to a fourth embodiment, where the image processing apparatus may be a computer program (including program code) running in a server, and the image processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The image processing apparatus 1800 includes: an acquisition module 1801, a processing module 1802.

An obtaining module 1801, configured to obtain the first auxiliary information

In an embodiment, the obtaining module 1801 is further configured to further obtain a first image block corresponding to the first view point, and/or obtain a reference image corresponding to the second view point.

A processing module 1802, configured to process a first image block corresponding to a first view according to a reference image corresponding to a second view and the first auxiliary information. Determining or generating a processing result which can be used for obtaining a reconstructed image or a decoded image corresponding to the first image block; the reference image is an image corresponding to a second viewpoint; and the second viewpoint is different from the first viewpoint.

In one embodiment, the processing module 1802 is specifically configured to: and determining or generating a processing result corresponding to the first image block according to the second image block of the reference image and the first auxiliary information.

In one embodiment, the processing module 1802 is specifically configured to: determining a second image block from the reference image according to the first auxiliary information; determining a first feature map corresponding to the first image block and a second feature map corresponding to the second image block; and determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map.

In one embodiment, the processing module 1802 is specifically configured to: performing first preset treatment on the second characteristic diagram according to the first characteristic diagram to obtain a target second characteristic diagram; performing second preset treatment according to the first characteristic diagram and the target second characteristic diagram to obtain a target characteristic diagram; and determining or generating a processing result corresponding to the first image block according to the target feature map.

In one embodiment, the processing module 1802 is specifically configured to: acquiring first auxiliary information of the first image block, wherein the first auxiliary information comprises depth information, and the depth information is determined according to a depth image corresponding to the first image block; acquiring similarity between first auxiliary information of each image block in the reference image and first auxiliary information of the first image block; and determining the image block with the maximum similarity in the reference image as a second image block matched with the first image block.

In one embodiment, the first auxiliary information comprises depth information or disparity information; the depth information is at least one of: depth feature information, statistical information based on depth values, depth slices after preprocessing.

In one embodiment, at least one of: the second image block and the first image block have the same size; when the first image block is a slice or a coding tree block, the second image block is correspondingly a slice or a coding tree block; when the second image block is a slice, the second image block is composed of a plurality of coding tree units.

In one embodiment, the processing module 1802 is specifically configured to: and performing feature extraction processing on the first image block and the second image block based on a feature extraction network and the first auxiliary information to obtain a first feature map corresponding to the first image block and a second feature map corresponding to the second image block.

In one embodiment, the first image block is a slice, and the second image block corresponds to a slice; the processing module 1802 is specifically configured to: acquiring a first image sub-block of the first image block and a second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block; performing feature extraction processing on the first image sub-block and the second image sub-block based on a feature extraction network and the second auxiliary information to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block; determining or generating a first feature map corresponding to the first image block through the first sub-feature map, and determining or generating a second feature map corresponding to the second image block through the second sub-feature map; optionally, the second auxiliary information is different from the first auxiliary information.

In one embodiment, the feature extraction network comprises N cascaded feature extraction modules, wherein N is an integer greater than or equal to 1, each feature extraction module in the first N-1 feature extraction modules comprises a feature extraction unit and a downsampling unit which are connected in series, and the Nth feature extraction module comprises a feature extraction unit; a first feature extraction module of the N cascaded feature extraction modules, configured to process the first image block and the second image block, or the first image sub-block and the second image sub-block; each feature extraction module of the N cascaded feature extraction modules except the first feature extraction module is used for processing the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected with the output of the feature extraction unit, and the output of the down-sampling unit is connected with the input of the feature extraction unit in the next feature extraction module; optionally, the first auxiliary information or the second auxiliary information is used as supervision information of at least one feature extraction module of the N cascaded feature extraction modules.

In one embodiment, the first image sub-block and the second image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree block is obtained after the edge of the coding tree block is extended, and the size of the extended coding tree block is larger than that of the coding tree block.

In one embodiment, the processing module 1802 is specifically configured to: determining a first preset processing parameter based on the first feature map and the second feature map; or, determining a first preset processing parameter based on the first feature map, the second feature map and the first auxiliary information; performing first preset processing on the second characteristic diagram based on a first preset processing model to obtain a target second characteristic diagram; the first preset processing model comprises a first processing model determined according to the first preset processing parameter.

In an embodiment, the first preset processing model includes the first processing model and a second processing model, and the processing module 1802 is specifically configured to: determining sample point coordinates in the second feature map from the first and second process models, optionally the second process model comprising target pixel coordinates; determining a target pixel value corresponding to the coordinate of the sampling point according to the second characteristic diagram and the sampling kernel function; and generating a target second characteristic diagram according to the target pixel value corresponding to the sampling point coordinate.

In one embodiment, the processing module 1802 is specifically configured to: performing second preset treatment on the first characteristic diagram and the target second characteristic diagram by using a characteristic fusion network to obtain a target characteristic diagram; optionally, the feature fusion network includes N feature fusion modules and M upsampling modules, where M is an integer greater than or equal to 1, and M +1= N; the input of the ith feature fusion module in the feature fusion network is connected with the output of the ith first preset processing module in the N first preset processing modules, the output of the ith feature fusion module is connected with the input of the jth up-sampling module, j is an integer greater than or equal to 1 and less than or equal to M, and i = j + 1; the output of the jth upsampling module is connected with the input of the jth feature fusion module; the ith first preset processing module is used for carrying out first preset processing on a second feature map output by the ith feature extraction module in the feature extraction network, wherein i is an integer greater than or equal to 1 and is less than or equal to N; the Nth feature fusion module is used for fusing a target second feature map output by the Nth first preset processing module and a first feature map output by the Nth feature extraction unit; and when i is not equal to N, the ith feature fusion module is used for fusing a target second feature map output by the ith first preset processing module, a first feature map output by the ith feature extraction unit and a feature map output by the ith up-sampling module.

In one embodiment, the processing module 1802 is specifically configured to: filtering the target characteristic diagram to obtain a filtered target characteristic diagram; and determining a processing result corresponding to the first image block according to the filtered target characteristic diagram.

In one embodiment, the processing module 1802 is specifically configured to: filtering the target characteristic diagram by using a target filtering processing model to obtain a filtered target characteristic diagram; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter.

In one embodiment, the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the processing module 1802 is specifically configured to: performing down-sampling processing on the target feature map processed by at least one first processing unit to obtain a down-sampled target feature map; performing up-sampling processing on the down-sampled target feature map to obtain a target fusion feature map; and processing the target fusion characteristic diagram by using the second processing unit to obtain a filtered target characteristic diagram.

In a possible embodiment, the image processing apparatus described above may be further configured to implement the steps of:

the obtaining module 1801 is further configured to obtain a first reconstructed image block and a second reconstructed image block;

the processing module 1802 is further configured to filter the first reconstructed image block according to the second reconstructed image block and the attribute information of the first reconstructed image block and/or the second reconstructed image block to obtain a filtered first reconstructed image block; optionally, the first reconstructed image block and the second reconstructed image block correspond to different or the same reconstructed image.

In one embodiment, the attribute information of the first reconstructed image block includes at least one of: inter-frame prediction information of the first reconstructed image block, depth information of the first reconstructed image block, and parallax information of the first reconstructed image block.

In one embodiment, the processing module 1802 is specifically configured to: acquiring a third reconstructed image block, wherein an image corresponding to the third reconstructed image block is a reference reconstructed image of an image corresponding to the first reconstructed image block; and filtering the first reconstructed image block according to the attribute information of the third reconstructed image block, the second reconstructed image block and the first reconstructed image block to obtain a filtered first reconstructed image block.

In one embodiment, the processing module 1802 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the third reconstructed image block and the first target feature map according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

In one embodiment, the processing module 1802 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; filtering the first target characteristic diagram to obtain a filtered first target characteristic diagram; performing third preset processing on the filtered first target feature map and the filtered third reconstructed image block according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

In one embodiment, the processing module 1802 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the first reconstructed image block and the third reconstructed image block according to inter-frame prediction information of the third reconstructed image block and the first reconstructed image block to obtain a second target feature map; and determining a filtered first reconstruction image block according to the first target feature map and the second target feature map.

In one embodiment, the processing module 1802 is specifically configured to: filtering the first target characteristic diagram and the second target characteristic diagram to obtain a filtered first target characteristic diagram and a filtered second target characteristic diagram; performing third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block; and taking the target fusion reconstruction image block as a first reconstruction image block after filtering.

In one embodiment, the processing module 1802 is specifically configured to: determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block; performing first preset processing on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset processing; and performing second preset treatment according to the second reconstruction characteristic diagram after the first preset treatment and the first reconstruction characteristic diagram to obtain a first target characteristic diagram.

In one embodiment, the processing module 1802 is specifically configured to: and performing feature extraction processing on the first reconstruction image block and the second reconstruction image block based on a feature extraction network and depth information or parallax information of the first reconstruction image block to obtain a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block.

In one embodiment, the first reconstructed image block is a slice and the second reconstructed image block corresponds to a slice; the processing module 1802 is specifically configured to: acquiring a first reconstructed image sub-block of the first reconstructed image block and a second reconstructed image sub-block of the second reconstructed image block; the attribute information of the second reconstructed image subblock is matched with the attribute information of the first reconstructed image subblock; performing feature extraction processing on the first reconstructed image subblock and the second reconstructed image subblock based on a feature extraction network and attribute information of the first reconstructed image subblock to obtain a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block; optionally, the attribute information includes depth information or disparity information, and the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block.

In one embodiment, the feature extraction network comprises N cascaded feature extraction modules, wherein N is an integer greater than or equal to 1, each feature extraction module in the first N-1 feature extraction modules comprises a feature extraction unit and a downsampling unit which are connected in series, and the Nth feature extraction module comprises a feature extraction unit; a first feature extraction module of the N cascaded feature extraction modules, configured to process the first reconstructed image block and the second reconstructed image block, or the first reconstructed image sub-block and the second reconstructed image sub-block; each feature extraction module of the N cascaded feature extraction modules except the first feature extraction module is used for processing the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected with the output of the feature extraction unit, and the output of the down-sampling unit is connected with the input of the feature extraction unit in the next feature extraction module; optionally, the attribute information is used as supervision information of at least one feature extraction module of the N cascaded feature extraction modules.

In one embodiment, the first reconstructed image sub-block and the second reconstructed image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree block is obtained after the edge of the coding tree block is extended, and the size of the extended coding tree block is larger than that of the coding tree block.

In one embodiment, the processing module 1802 is specifically configured to: determining a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, determining a first preset processing parameter based on the first reconstruction feature map, the second reconstruction feature map and the attribute information; and carrying out first preset treatment on the second reconstruction characteristic diagram based on a first preset treatment model to obtain a first preset treated second reconstruction characteristic diagram, wherein the first preset treatment model comprises a first treatment model determined according to the first preset treatment parameters.

In one embodiment, the processing module 1802 is specifically configured to: determining according to the first processing model and the second processing model to obtain coordinates of sampling points of the second reconstructed feature map, and optionally, coordinates of target pixels included in the second processing model; determining a target pixel value corresponding to the sampling point coordinate according to the second reconstruction feature map and the sampling kernel function; and generating a second reconstruction characteristic map after the first preset treatment according to the target pixel value corresponding to the sampling point coordinate.

In one embodiment, the processing module 1802 is specifically configured to: performing second preset processing on the first reconstruction feature map and the second reconstruction feature map subjected to the first preset processing by using a feature fusion network to obtain a first target feature map; optionally, the feature fusion network includes N feature fusion modules and M upsampling modules, where M is an integer greater than or equal to 1, and M +1= N; the input of the ith feature fusion module in the feature fusion network is connected with the output of the ith first preset processing module in the N first preset processing modules, the output of the ith feature fusion module is connected with the input of the jth up-sampling module, j is an integer greater than or equal to 1 and less than or equal to M, and i = j + 1; the output of the jth upsampling module is connected with the input of the jth feature fusion module; the ith first preset processing module is used for carrying out first preset processing on a second feature map output by the ith feature extraction module in the feature extraction network, wherein i is an integer greater than or equal to 1 and is less than or equal to N; the Nth feature fusion module is used for fusing a target second feature map output by the Nth first preset processing module and a first reconstruction feature map output by the Nth feature extraction unit; and when i is not equal to N, the ith feature fusion module is used for fusing the first preset processed second reconstruction feature map output by the ith first preset processing module, the first reconstruction feature map output by the ith feature extraction unit and the feature map output by the ith up-sampling module.

In one embodiment, the processing module 1802 is specifically configured to: filtering the second target characteristic diagram by using a target filtering processing model to obtain a filtered second target characteristic diagram; generating a filtered first reconstruction image block according to the filtered second target feature map; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter.

In one embodiment, the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the processing module 1802 is specifically configured to: performing down-sampling processing on the second target feature map processed by at least one first processing unit to obtain a down-sampled second target feature map; performing up-sampling processing on the down-sampled second target feature map to obtain a target fusion reconstruction feature map; and processing the target fusion reconstruction characteristic diagram by using a second processing unit to obtain a filtered second target characteristic diagram.

It can be understood that the functions of the functional modules of the image processing apparatus described in the embodiment of the present application can be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process of the method can refer to the description related to the foregoing method embodiment, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

The embodiment of the application also provides an image processing method, which comprises the following steps:

s10: acquiring a second reconstruction image block;

s20: and filtering the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block and the attribute information of the second reconstructed image block to obtain the filtered first reconstructed image block.

Optionally, the step of S20 includes at least one of:

Optionally, the step S10 further includes: and acquiring the first reconstruction image block.

Optionally, the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.

Optionally, the step of S20 includes the following steps:

s201: acquiring a third reconstructed image block, wherein an image corresponding to the third reconstructed image block is a reference reconstructed image of an image corresponding to the first reconstructed image block;

s202: and filtering the first reconstructed image block according to at least one of the attribute information of the third reconstructed image block, the attribute information of the second reconstructed image block and the attribute information of the first reconstructed image block to obtain the filtered first reconstructed image block.

Optionally, the step S202 includes at least one of:

performing third preset processing on the first reconstructed image block and the second reconstructed image block according to depth information or parallax information of the first reconstructed image block to obtain a first target feature map, performing third preset processing on the third reconstructed image block and the first target feature map according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map, and performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block;

performing third preset processing on the first reconstructed image block and the second reconstructed image block according to the depth information or the parallax information of the first reconstructed image block to obtain a first target feature map, performing filtering processing on the first target feature map to obtain a filtered first target feature map, performing third preset processing on the filtered first target feature map and the filtered third reconstructed image block according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map, and performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block;

and performing third preset processing on the first reconstructed image block and the second reconstructed image block according to the depth information or the parallax information of the first reconstructed image block to obtain a first target feature map, performing third preset processing on the first reconstructed image block and the third reconstructed image block according to inter-frame prediction information of the third reconstructed image block and the first reconstructed image block to obtain a second target feature map, and determining the filtered first reconstructed image block according to the first target feature map and the second target feature map.

Optionally, the determining a filtered first reconstructed image block according to the first target feature map and the second target feature map includes:

filtering the first target characteristic diagram and the second target characteristic diagram to obtain a filtered first target characteristic diagram and a filtered second target characteristic diagram;

performing third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block;

and taking the target fusion reconstruction image block as a first reconstruction image block after filtering.

Optionally, the performing, according to the depth information or the parallax information of the first reconstructed image block, third preset processing on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map includes:

determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block;

performing first preset processing on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset processing;

and performing second preset treatment according to the second reconstruction characteristic diagram after the first preset treatment and the first reconstruction characteristic diagram to obtain a first target characteristic diagram.

Optionally, the determining, according to the depth information or the disparity information of the first reconstructed image block, a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block includes:

and performing feature extraction processing on the first reconstruction image block and the second reconstruction image block based on a feature extraction network and depth information or parallax information of the first reconstruction image block to obtain a first reconstruction feature map of the first reconstruction image block and a second reconstruction feature map of the second reconstruction image block.

Optionally, the first reconstructed image block is a slice, and the second reconstructed image block is a corresponding slice; the determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block includes:

acquiring a first reconstructed image sub-block of the first reconstructed image block and a second reconstructed image sub-block of the second reconstructed image block;

and performing feature extraction processing on the first reconstructed image sub-block and the second reconstructed image sub-block based on a feature extraction network and attribute information of the first reconstructed image sub-block to obtain a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block.

Optionally, the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, each feature extraction module in the first N-1 feature extraction modules includes a feature extraction unit and a downsampling unit connected in series, and the nth feature extraction module includes a feature extraction unit.

Optionally, the performing, according to the first reconstructed feature map, a first preset process on the second reconstructed feature map to obtain a second reconstructed feature map after the first preset process includes:

determining a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, determining a first preset processing parameter based on the first reconstruction feature map, the second reconstruction feature map and the attribute information;

and carrying out first preset treatment on the second reconstruction characteristic diagram based on a first preset treatment model to obtain a first preset treated second reconstruction characteristic diagram, wherein the first preset treatment model comprises a first treatment model determined according to the first preset treatment parameters.

Optionally, the first preset processing model includes the first processing model and a second processing model, and the performing a first preset process on the second reconstructed feature map based on the first preset processing model to obtain a first preset processed second reconstructed feature map includes:

determining sample point coordinates of the second reconstructed feature map according to the first processing model and the second processing model;

determining a target pixel value corresponding to the sampling point coordinate according to the second reconstruction feature map and the sampling kernel function;

and generating a second reconstruction characteristic map after the first preset treatment according to the target pixel value corresponding to the sampling point coordinate.

Optionally, the filtering the second target feature map to obtain a filtered first reconstructed image block includes:

filtering the second target characteristic diagram by using a target filtering processing model to obtain a filtered second target characteristic diagram;

and generating a filtered first reconstruction image block according to the filtered second target feature map.

Optionally, the target filtering processing model comprises at least one processing unit, the processing unit comprises one or both of a first processing unit and a second processing unit; the filtering the second target feature map by using the target filtering processing model to obtain a filtered second target feature map includes:

performing down-sampling processing on the second target feature map processed by at least one first processing unit to obtain a down-sampled second target feature map;

performing up-sampling processing on the down-sampled second target feature map to obtain a target fusion reconstruction feature map;

and processing the target fusion reconstruction characteristic diagram by using a second processing unit to obtain a filtered second target characteristic diagram.

The embodiment of the application further provides an intelligent terminal, which comprises a memory and a processor, wherein the memory stores an image processing program, and the image processing program is executed by the processor to realize the steps of the image processing method in any embodiment. The smart terminal may be a mobile terminal 100 as shown in fig. 1.

It should be understood that the mobile terminal described in the embodiment of the present application may perform the method description of any one of the above embodiments, and may also perform the description of the image processing apparatus in the above corresponding embodiment, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

In a possible embodiment, the processor 110 of the mobile terminal 100 shown in fig. 1 may be configured to call the image processing program stored in the memory 109 to perform the following operations:

acquiring first auxiliary information;

and processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information. The determined or generated processing result can be used for obtaining a reconstructed image or a decoded image corresponding to the first image block; and the second viewpoint is different from the first viewpoint.

In an embodiment, the processor 110 is specifically configured to: a first image block corresponding to the first view and/or a reference image corresponding to the second view is further acquired.

In one embodiment, the processor 110 is specifically configured to: and determining or generating a processing result corresponding to the first image block according to the second image block of the reference image and the first auxiliary information.

In one embodiment, the processor 110 is specifically configured to: determining a second image block from the reference image according to the first auxiliary information; determining a first feature map corresponding to the first image block and a second feature map corresponding to the second image block; and determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map.

In one embodiment, the processor 110 is specifically configured to: performing first preset treatment on the second characteristic diagram according to the first characteristic diagram to obtain a target second characteristic diagram; performing second preset treatment according to the first characteristic diagram and the target second characteristic diagram to obtain a target characteristic diagram; and determining or generating the processing result corresponding to the first image block according to the target feature map.

In one embodiment, the processor 110 is specifically configured to: acquiring first auxiliary information of the first image block, wherein the first auxiliary information comprises depth information, and the depth information is determined according to a depth image corresponding to the first image block; acquiring similarity between first auxiliary information of each image block in the reference image and first auxiliary information of the first image block; and determining the image block with the maximum similarity in the reference image as a second image block matched with the first image block.

In one embodiment, the processor 110 is specifically configured to: and performing feature extraction processing on the first image block and the second image block based on a feature extraction network and the first auxiliary information to obtain a first feature map corresponding to the first image block and a second feature map corresponding to the second image block.

In one embodiment, the first image block is a slice, and the second image block corresponds to a slice; the processor 110 is specifically configured to: acquiring a first image sub-block of the first image block and a second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block; performing feature extraction processing on the first image sub-block and the second image sub-block based on a feature extraction network and the second auxiliary information to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block; determining or generating a first feature map corresponding to the first image block through the first sub-feature map, and determining or generating a second feature map corresponding to the second image block through the second sub-feature map; optionally, the second auxiliary information is different from the first auxiliary information.

In one embodiment, the processor 110 is specifically configured to: determining a first preset processing parameter based on the first feature map and the second feature map; or, determining a first preset processing parameter based on the first feature map, the second feature map and the first auxiliary information; performing first preset processing on the second characteristic diagram based on a first preset processing model to obtain a target second characteristic diagram; the first preset processing model comprises a first processing model determined according to the first preset processing parameter.

In an embodiment, the first preset process model includes the first process model and a second process model, and the processor 110 is specifically configured to: determining sample point coordinates in the second feature map from the first and second process models, optionally the second process model comprising target pixel coordinates; determining a target pixel value corresponding to the coordinate of the sampling point according to the second characteristic diagram and the sampling kernel function; and generating a target second characteristic diagram according to the target pixel value corresponding to the sampling point coordinate.

In one embodiment, the processor 110 is specifically configured to: performing second preset treatment on the first characteristic diagram and the target second characteristic diagram by using a characteristic fusion network to obtain a target characteristic diagram; optionally, the feature fusion network includes N feature fusion modules and M upsampling modules, where M is an integer greater than or equal to 1, and M +1= N; the input of the ith feature fusion module in the feature fusion network is connected with the output of the ith first preset processing module in the N first preset processing modules, the output of the ith feature fusion module is connected with the input of the jth up-sampling module, j is an integer greater than or equal to 1 and less than or equal to M, and i = j + 1; the output of the jth upsampling module is connected with the input of the jth feature fusion module; the ith first preset processing module is used for carrying out first preset processing on a second feature map output by the ith feature extraction module in the feature extraction network, wherein i is an integer greater than or equal to 1 and is less than or equal to N; the Nth feature fusion module is used for fusing a target second feature map output by the Nth first preset processing module and a first feature map output by the Nth feature extraction unit; and when i is not equal to N, the ith feature fusion module is used for fusing a target second feature map output by the ith first preset processing module, a first feature map output by the ith feature extraction unit and a feature map output by the ith up-sampling module.

In one embodiment, the processor 110 is specifically configured to: filtering the target characteristic diagram to obtain a filtered target characteristic diagram; and determining the processing result corresponding to the first image block according to the filtered target feature map.

In one embodiment, the processor 110 is specifically configured to: filtering the target characteristic diagram by using a target filtering processing model to obtain a filtered target characteristic diagram; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter.

In one embodiment, the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the processor 110 is specifically configured to: performing down-sampling processing on the target feature map processed by at least one first processing unit to obtain a down-sampled target feature map; performing up-sampling processing on the down-sampled target feature map to obtain a target fusion feature map; and processing the target fusion characteristic diagram by using the second processing unit to obtain a filtered target characteristic diagram.

In another possible embodiment, the processor 110 of the mobile terminal 100 as described above in fig. 1 may be configured to invoke an image processing program stored in the memory 109 to perform the following operations: acquiring a first reconstruction image block and a second reconstruction image block; filtering the first reconstructed image block according to the second reconstructed image block and the attribute information of the first reconstructed image block and/or the second reconstructed image block to obtain a filtered first reconstructed image block; optionally, the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.

In one embodiment, the processor 110 is specifically configured to: acquiring a third reconstructed image block, wherein an image corresponding to the third reconstructed image block is a reference reconstructed image of an image corresponding to the first reconstructed image block; and filtering the first reconstructed image block according to the attribute information of the third reconstructed image block, the second reconstructed image block and the first reconstructed image block to obtain a filtered first reconstructed image block.

In one embodiment, the processor 110 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the third reconstructed image block and the first target feature map according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

In one embodiment, the processor 110 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; filtering the first target characteristic diagram to obtain a filtered first target characteristic diagram; performing third preset processing on the filtered first target feature map and the filtered third reconstructed image block according to inter-frame prediction information of the first reconstructed image block to obtain a second target feature map; and filtering the second target characteristic diagram to obtain a filtered first reconstruction image block.

In one embodiment, the processor 110 is specifically configured to: performing third preset processing on the first reconstruction image block and the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block to obtain a first target feature map; performing third preset processing on the first reconstructed image block and the third reconstructed image block according to inter-frame prediction information of the third reconstructed image block and the first reconstructed image block to obtain a second target feature map; and determining a filtered first reconstruction image block according to the first target feature map and the second target feature map.

In one embodiment, the processor 110 is specifically configured to: filtering the first target characteristic diagram and the second target characteristic diagram to obtain a filtered first target characteristic diagram and a filtered second target characteristic diagram; performing third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block; and taking the target fusion reconstruction image block as a first reconstruction image block after filtering.

In one embodiment, the processor 110 is specifically configured to: determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block; performing first preset processing on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset processing; and performing second preset treatment according to the second reconstruction characteristic diagram after the first preset treatment and the first reconstruction characteristic diagram to obtain a first target characteristic diagram.

In one embodiment, the processor 110 is specifically configured to: and performing feature extraction processing on the first reconstruction image block and the second reconstruction image block based on a feature extraction network and depth information or parallax information of the first reconstruction image block to obtain a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block.

In one embodiment, the first reconstructed image block is a slice and the second reconstructed image block corresponds to a slice; the processor 110 is specifically configured to: acquiring a first reconstructed image sub-block of the first reconstructed image block and a second reconstructed image sub-block of the second reconstructed image block; the attribute information of the second reconstructed image subblock is matched with the attribute information of the first reconstructed image subblock; performing feature extraction processing on the first reconstructed image subblock and the second reconstructed image subblock based on a feature extraction network and attribute information of the first reconstructed image subblock to obtain a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block; optionally, the attribute information includes depth information or disparity information, and the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block.

In one embodiment, the processor 110 is specifically configured to: determining a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, determining a first preset processing parameter based on the first reconstruction feature map, the second reconstruction feature map and the attribute information; and carrying out first preset treatment on the second reconstruction characteristic diagram based on a first preset treatment model to obtain a first preset treated second reconstruction characteristic diagram, wherein the first preset treatment model comprises a first treatment model determined according to the first preset treatment parameters.

In one embodiment, the processor 110 is specifically configured to: determining according to the first processing model and the second processing model to obtain coordinates of sampling points of the second reconstructed feature map, and optionally, coordinates of target pixels included in the second processing model; determining a target pixel value corresponding to the sampling point coordinate according to the second reconstruction feature map and the sampling kernel function; and generating a second reconstruction characteristic map after the first preset treatment according to the target pixel value corresponding to the sampling point coordinate.

In one embodiment, the processor 110 is specifically configured to: performing second preset processing on the first reconstruction feature map and the second reconstruction feature map subjected to the first preset processing by using a feature fusion network to obtain a first target feature map; optionally, the feature fusion network includes N feature fusion modules and M upsampling modules, where M is an integer greater than or equal to 1, and M +1= N; the input of the ith feature fusion module in the feature fusion network is connected with the output of the ith first preset processing module in the N first preset processing modules, the output of the ith feature fusion module is connected with the input of the jth up-sampling module, j is an integer greater than or equal to 1 and less than or equal to M, and i = j + 1; the output of the jth upsampling module is connected with the input of the jth feature fusion module; the ith first preset processing module is used for carrying out first preset processing on a second feature map output by the ith feature extraction module in the feature extraction network, wherein i is an integer greater than or equal to 1 and is less than or equal to N; the Nth feature fusion module is used for fusing a target second feature map output by the Nth first preset processing module and a first reconstruction feature map output by the Nth feature extraction unit; and when i is not equal to N, the ith feature fusion module is used for fusing the first preset processed second reconstruction feature map output by the ith first preset processing module, the first reconstruction feature map output by the ith feature extraction unit and the feature map output by the ith up-sampling module.

In one embodiment, the processor 110 is specifically configured to: filtering the second target characteristic diagram by using a target filtering processing model to obtain a filtered second target characteristic diagram; generating a filtered first reconstruction image block according to the filtered second target feature map; optionally, the target filtering processing model includes a target candidate model selected from a plurality of candidate models according to a rate-distortion cost, and each of the plurality of candidate models has a mapping relation with a quantization parameter.

In one embodiment, the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the processor 110 is specifically configured to: performing down-sampling processing on the second target feature map processed by at least one first processing unit to obtain a down-sampled second target feature map; performing up-sampling processing on the down-sampled second target feature map to obtain a target fusion reconstruction feature map; and processing the target fusion reconstruction characteristic diagram by using a second processing unit to obtain a filtered second target characteristic diagram.

An embodiment of the present application further provides a computer-readable storage medium, where an image processing program is stored on the computer-readable storage medium, and when the image processing program is executed by a processor, the image processing method in any of the above embodiments is implemented.

In the embodiments of the intelligent terminal and the computer-readable storage medium provided in the present application, all technical features of any one of the embodiments of the image processing method may be included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the method, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with emphasis, and reference may be made to the description of other embodiments for parts that are not described or illustrated in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. An image processing method, characterized by comprising the steps of:

s1: acquiring first auxiliary information;

s2: and processing the first image block corresponding to the first view according to the reference image corresponding to the second view and the first auxiliary information.

2. The method of claim 1, wherein the step of S2 is preceded by the step of:

acquiring the first image block corresponding to the first viewpoint; and/or acquiring the reference image corresponding to the second viewpoint.

3. The method of claim 1, wherein the step of S2 comprises the steps of:

s21: determining a second image block from the reference image according to the first auxiliary information;

s22: determining a first feature map corresponding to the first image block and a second feature map corresponding to the second image block;

s23: and determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map.

4. The method of claim 3, wherein the step of S21 includes:

acquiring first auxiliary information of the first image block, wherein the first auxiliary information comprises depth information, and the depth information is determined according to a depth image corresponding to the first image block;

acquiring similarity between first auxiliary information of each image block in the reference image and first auxiliary information of the first image block;

5. The method of claim 3, wherein the step of S22 includes:

and performing feature extraction processing on the first image block and the second image block based on a feature extraction network and the first auxiliary information to obtain a first feature map corresponding to the first image block and a second feature map corresponding to the second image block.

6. The method of claim 3, wherein the first tile and the second tile are slices; the step of S22, including:

acquiring a first image sub-block of the first image block and a second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block;

performing feature extraction processing on the first image sub-block and the second image sub-block based on a feature extraction network and the second auxiliary information to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block;

and determining or generating a first feature map corresponding to the first image block through the first sub-feature map, and determining or generating a second feature map corresponding to the second image block through the second sub-feature map.

7. The method of claim 5, wherein the feature extraction network comprises N cascaded feature extraction modules, wherein N is an integer greater than or equal to 1, each of the first N-1 feature extraction modules comprises a feature extraction unit and a downsampling unit in series, and the Nth feature extraction module comprises a feature extraction unit.

8. The method according to any one of claims 3 to 7, wherein the step of S23 includes the steps of:

s231: performing first preset treatment on the second characteristic diagram according to the first characteristic diagram to obtain a target second characteristic diagram;

s232: performing second preset treatment according to the first characteristic diagram and the target second characteristic diagram to obtain a target characteristic diagram;

s233: and determining or generating the processing result corresponding to the first image block according to the target feature map.

9. The method as claimed in claim 8, wherein the S231 step includes:

determining a first preset processing parameter based on the first feature map and the second feature map; or, determining a first preset processing parameter based on the first feature map, the second feature map and the first auxiliary information;

performing first preset processing on the second characteristic diagram based on a first preset processing model to obtain a target second characteristic diagram; the first preset processing model comprises a first processing model determined according to the first preset processing parameter.

10. The method of claim 9, wherein the first predetermined processing model includes the first processing model and a second processing model, and the performing a first predetermined processing on the second feature map based on the first predetermined processing model to obtain a target second feature map includes:

11. The method of claim 8, wherein the step S232 includes:

and performing second preset treatment on the first characteristic diagram and the target second characteristic diagram by using a characteristic fusion network to obtain a target characteristic diagram.

12. The method of claim 8, wherein the S233, comprises:

filtering the target characteristic diagram to obtain a filtered target characteristic diagram;

and determining the processing result corresponding to the first image block according to the filtered target feature map.

13. The method of claim 12, wherein the filtering the target feature map to obtain a filtered target feature map comprises:

and carrying out filtering processing on the target characteristic diagram by using a target filtering processing model to obtain a filtered target characteristic diagram.

14. The method of claim 13, wherein the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the filtering the target feature map by using the target filtering processing model to obtain a filtered target feature map includes:

performing down-sampling processing on the target feature map processed by at least one first processing unit to obtain a down-sampled target feature map;

performing up-sampling processing on the down-sampled target feature map to obtain a target fusion feature map;

and processing the target fusion characteristic diagram by using the second processing unit to obtain a filtered target characteristic diagram.

15. An image processing method, characterized by comprising the steps of:

s10: acquiring a second reconstruction image block;

16. The method of claim 15, wherein the step of S10 further comprises:

and acquiring the first reconstruction image block.

17. The method of claim 15, wherein the first reconstructed image block and the second reconstructed image block correspond to a same or different reconstructed image.

18. The method according to any one of claims 15 to 17, wherein the step of S20 includes the steps of:

19. The method of claim 18, wherein the step S202 includes at least one of:

20. The method of claim 19, wherein determining the filtered first reconstructed image block from the first target feature map and the second target feature map comprises:

21. The method of claim 19, wherein the performing a third pre-processing on the first reconstructed image block and the second reconstructed image block according to the depth information or the disparity information of the first reconstructed image block to obtain a first target feature map comprises:

22. The method of claim 21, wherein the determining a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block according to the depth information or the disparity information of the first reconstructed image block comprises:

23. The method of claim 21, wherein the first reconstructed image block is a slice and the second reconstructed image block corresponds to a slice; the determining a first reconstruction feature map corresponding to the first reconstruction image block and a second reconstruction feature map corresponding to the second reconstruction image block according to the depth information or the parallax information of the first reconstruction image block includes:

24. The method of claim 22, wherein the feature extraction network comprises N cascaded feature extraction modules, wherein N is an integer greater than or equal to 1, each of the first N-1 feature extraction modules comprises a feature extraction unit and a downsampling unit in series, and the nth feature extraction module comprises a feature extraction unit.

25. The method according to any one of claims 21 to 24, wherein the performing a first preset process on the second reconstructed feature map according to the first reconstructed feature map to obtain a second reconstructed feature map after the first preset process includes:

26. The method of claim 25, wherein the first pre-set processing model comprises the first processing model and a second processing model, and the performing a first pre-set process on the second reconstructed feature map based on the first pre-set processing model to obtain a first pre-set processed second reconstructed feature map comprises:

27. The method according to any one of claims 19 to 24, wherein said filtering said second target feature map to obtain a filtered first reconstructed image block comprises:

28. The method of claim 27, wherein the target filter processing model comprises at least one processing unit including one or both of a first processing unit and a second processing unit; the filtering the second target feature map by using the target filtering processing model to obtain a filtered second target feature map includes:

29. An intelligent terminal, characterized in that, intelligent terminal includes: memory, a processor, wherein the memory has stored thereon an image processing program which, when executed by the processor, implements the steps of the image processing method of any of claims 1 to 28.

30. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 28.