WO2023134482A1 - 图像处理方法、智能终端及存储介质 - Google Patents

图像处理方法、智能终端及存储介质 Download PDF

Info

Publication number
WO2023134482A1
WO2023134482A1 PCT/CN2022/144217 CN2022144217W WO2023134482A1 WO 2023134482 A1 WO2023134482 A1 WO 2023134482A1 CN 2022144217 W CN2022144217 W CN 2022144217W WO 2023134482 A1 WO2023134482 A1 WO 2023134482A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
image block
processing
reconstructed
block
Prior art date
Application number
PCT/CN2022/144217
Other languages
English (en)
French (fr)
Inventor
刘雨田
Original Assignee
深圳传音控股股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳传音控股股份有限公司 filed Critical 深圳传音控股股份有限公司
Publication of WO2023134482A1 publication Critical patent/WO2023134482A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • the present application relates to the field of computer technology, and in particular to an image processing method, an intelligent terminal and a storage medium.
  • multi-view video captures the same scene from different perspectives through multiple cameras, which can provide viewers with rich dynamic scenes and real sensory experience.
  • video compression technology the research on video coding technology for multi-viewpoint video is gradually deepening.
  • the 3D-HEVC coding technology proposed on the basis of the video coding standard HEVC (High Efficiency Video Coding, High Efficiency Video Coding) can efficiently compress multi-viewpoint video and its corresponding depth data.
  • the stage of loop filtering processing (such as loop filtering processing based on neural network) is to reduce the distortion of the reconstructed frame , usually using reference frames from different viewpoints at the same time to enhance the reconstructed frame, and the generated enhanced frame is used in the subsequent encoding process, but because the relevant information is not fully utilized in the filtering process, the image block of the reconstructed frame and the reference frame The image blocks cannot be well matched, which affects the quality of multi-view video coding.
  • the main purpose of this application is to provide an image processing method, intelligent terminal and storage medium, which can make full use of the information of image blocks of different viewpoints, reduce the distortion of reconstructed images or decoded images, and effectively improve the encoding quality of multi-viewpoint video.
  • the application provides an image processing method, including:
  • the first image block corresponding to the first viewpoint is processed according to the reference image corresponding to the second viewpoint and the first auxiliary information.
  • This application provides another image processing method, including:
  • the first reconstructed image block is filtered to obtain a filtered First reconstruct the image block.
  • performing filtering on the first reconstructed image block according to at least one of the second reconstructed image block, attribute information of the first reconstructed image block, and attribute information of the second reconstructed image block , to obtain the filtered first reconstructed image block including at least one of the following:
  • Filtering is performed on the first reconstructed image block according to the second reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the first reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the second reconstructed image block and the first reconstructed image block to obtain a filtered first reconstructed image block.
  • the application provides an image processing device, including:
  • an acquiring module configured to acquire first auxiliary information
  • the processing module is configured to process the first image block corresponding to the first viewpoint according to the reference image corresponding to the second viewpoint and the first auxiliary information.
  • the present application provides another image processing device, including:
  • An acquisition module configured to acquire a second reconstructed image block
  • a processing module configured to filter the first reconstructed image block according to at least one of the second reconstructed image block, the attribute information of the first reconstructed image block, and the attribute information of the second reconstructed image block, to obtain the filtered first reconstructed image block.
  • the present application also provides an intelligent terminal, including: a memory and a processor, wherein an image processing program is stored in the memory, and when the image processing program is executed by the processor, the steps of the above method are implemented.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the above method are realized.
  • the image processing method of the present application includes the steps of: acquiring first auxiliary information; and processing the first image block corresponding to the first viewpoint according to the reference image corresponding to the second viewpoint and the first auxiliary information.
  • the image block of the viewpoint currently being coded can be processed by using the auxiliary information and the image block of the viewpoint different from the viewpoint currently being coded.
  • the obtained processing result helps to determine the reconstructed image or decoded image of the image block of the viewpoint currently being encoded, reduces video encoding distortion, improves video encoding quality, and further improves user experience.
  • FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application
  • FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a multi-view video encoder provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a multi-viewpoint video decoder provided by an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of an image processing method according to the first embodiment
  • Fig. 6 is a schematic structural diagram of a neural network-based loop filter shown according to the first embodiment
  • Fig. 7 is a schematic flowchart of an image processing method according to the second embodiment
  • Fig. 8a is a schematic structural diagram of a feature extraction network shown according to the second embodiment
  • Fig. 8b is a schematic structural diagram of another feature extraction network shown according to the second embodiment
  • Fig. 9a is a schematic structural diagram of a first preset processing module according to the second embodiment.
  • Fig. 9b is a schematic structural diagram of another first preset processing module according to the second embodiment.
  • Fig. 10 is a schematic structural diagram showing a combined feature extraction network and a first preset processing module according to the second embodiment
  • Fig. 11 is a schematic structural diagram showing a feature fusion network according to the second embodiment
  • Fig. 12a is a schematic structural diagram of a third preset processing module according to the second embodiment.
  • Fig. 12b is a schematic structural diagram of another third preset processing module according to the second embodiment.
  • Fig. 13 is a schematic structural diagram of a neural network-based filtering processing module shown according to the second embodiment
  • Fig. 14 is a schematic flowchart of an image processing method according to a third embodiment
  • Fig. 15 is a schematic structural diagram of a neural network-based loop filter according to a third embodiment
  • Fig. 16 is a schematic structural diagram of another neural network-based loop filter according to the third embodiment.
  • Fig. 17 is a schematic structural diagram of another neural network-based loop filter according to the third embodiment.
  • Fig. 18 is a schematic structural diagram of an image processing device according to a fourth embodiment.
  • first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information.
  • first information may also be called second information, and similarly, second information may also be called first information.
  • second information may also be called first information.
  • the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination”.
  • the singular forms "a”, “an” and “the” are intended to include the plural forms as well, unless the context indicates otherwise.
  • A, B, C means “any of the following: A; B; C; A and B; A and C; B and C; A and B and C
  • A, B or C or "A, B and/or C” means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C”. Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way. ⁇
  • the words “if”, “if” as used herein may be interpreted as “at” or “when” or “in response to determining” or “in response to detecting”.
  • the phrases “if determined” or “if detected (the stated condition or event)” could be interpreted as “when determined” or “in response to the determination” or “when detected (the stated condition or event) )” or “in response to detection of (a stated condition or event)”.
  • step codes such as S501 and S502 are used, the purpose of which is to express the corresponding content more clearly and concisely, and does not constitute a substantive limitation on the order.
  • S502 will be executed first and then S501, etc., but these should be within the scope of protection of this application.
  • the communication device mentioned in this application may be a terminal device (such as a mobile terminal, specifically a mobile phone), or a network device (such as a base station).
  • a terminal device such as a mobile terminal, specifically a mobile phone
  • a network device such as a base station
  • the terminal device can be implemented in various forms.
  • the terminal equipment described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.
  • PDA Personal Digital Assistant
  • PMP portable media players
  • navigation devices Smart terminals such as wearable devices, smart bracelets, and pedometers
  • Smart terminals such as wearable devices, smart bracelets, and pedometers
  • fixed terminals such as digital TVs and desktop computers.
  • a mobile terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.
  • FIG. 1 is a schematic diagram of the hardware structure of a mobile terminal implementing various embodiments of the present application.
  • the mobile terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an A /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components.
  • RF Radio Frequency, radio frequency
  • the radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station.
  • the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency unit 101 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.
  • GSM Global System of Mobile communication, Global System for Mobile Communications
  • GPRS General Packet Radio Service
  • CDMA2000 Code Division Multiple Access 2000
  • WCDMA Wideband Code Division Multiple Access
  • TD-SCDMA Time Division-Synchronous Code Division Multiple Access, Time Division Synchro
  • WiFi is a short-distance wireless transmission technology.
  • the mobile terminal can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 102, which provides users with wireless broadband Internet access.
  • Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the mobile terminal, and can be completely omitted as required without changing the essence of the invention.
  • the audio output unit 103 can store the audio received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the mobile terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like.
  • the audio data is converted into an audio signal and output as sound.
  • the audio output unit 103 can also provide audio output related to a specific function performed by the mobile terminal 100 (eg, call signal reception sound, message reception sound, etc.).
  • the audio output unit 103 may include a speaker, a buzzer, and the like.
  • the A/V input unit 104 is used to receive audio or video signals.
  • the A/V input unit 104 may include a graphics processing unit (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processing unit 1041 is used for still pictures or The image data of the video is processed.
  • the processed image frames may be displayed on the display unit 106 .
  • the image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage media) or sent via the radio frequency unit 101 or the WiFi module 102 .
  • the microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data.
  • the processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode.
  • the microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.
  • the mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor includes an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the mobile terminal 100 moves to the ear. panel 1061 and/or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here.
  • the display unit 106 is used to display information input by the user or information provided to the user.
  • the display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
  • LCD Liquid Crystal Display
  • OLED Organic Light-Emitting Diode
  • the user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the mobile terminal.
  • the user input unit 107 may include a touch panel 1071 and other input devices 1072 .
  • the touch panel 1071 also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1071 may include two parts, a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it.
  • the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the user input unit 107 may also include other input devices 1072 .
  • other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.
  • the touch panel 1071 may cover the display panel 1061.
  • the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event.
  • the corresponding visual output is provided on the display panel 1061 .
  • the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated.
  • the implementation of the input and output functions of the mobile terminal is not specifically limited here.
  • the interface unit 108 serves as an interface through which at least one external device can be connected with the mobile terminal 100 .
  • an external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, audio input/output (I/O) ports, video I/O ports, headphone ports, and more.
  • the interface unit 108 can be used to receive input from an external device (for example, data information, power, etc.) transfer data between devices.
  • the memory 109 can be used to store software programs as well as various data.
  • the memory 109 can mainly include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.
  • the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone.
  • the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the processor 110 is the control center of the mobile terminal, and uses various interfaces and lines to connect various parts of the entire mobile terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the mobile terminal and process data, so as to monitor the mobile terminal as a whole.
  • the processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor.
  • the application processor mainly processes operating systems, user interfaces, and application programs, etc.
  • the demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
  • the mobile terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components.
  • a power supply 111 (such as a battery) for supplying power to various components.
  • the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.
  • the mobile terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.
  • the following describes the communication network system on which the mobile terminal of the present application is based.
  • FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application.
  • the communication network system is an LTE system of general mobile communication technology.
  • 201 E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.
  • E-UTRAN Evolved UMTS Terrestrial Radio Access Network
  • EPC Evolved Packet Core, Evolved Packet Core Network
  • the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.
  • E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on.
  • the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .
  • a backhaul for example, X2 interface
  • EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc.
  • MME2031 is a control node that processes signaling between UE201 and EPC203, and provides bearer and connection management.
  • HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and save some user-specific information about service features and data rates.
  • PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.
  • the IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.
  • IMS IP Multimedia Subsystem, IP Multimedia Subsystem
  • LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, 5G and Future new network systems (such as 5G), etc., are not limited here.
  • viewpoints can be divided into two categories: independent viewpoints (such as the introduction of 2 below) and dependent viewpoints (such as the introduction of 3 below).
  • An independent viewpoint may also be called a base viewpoint, and the coding of this viewpoint is independent and does not depend on other viewpoints. That is, video images of independent viewpoints can be encoded using traditional video encoders (such as HEVC video encoders) independent of other viewpoints, and the corresponding bit streams can be extracted separately to form 2D bit streams, thereby restoring 2D video.
  • traditional video encoders such as HEVC video encoders
  • the encoding of this view usually uses the information of the encoded independent view to predict the information of the current encoding view, thereby reducing the redundancy between views and improving the encoding efficiency.
  • VSP View Synthesis Prediction
  • a predictive coding technique for 3D video sequences which is used to predict the image of the current viewpoint from other viewpoints.
  • the main difference from inter-frame prediction is that the predicted image generated by view synthesis prediction is a view synthesis image generated from the reconstructed image and reconstructed depth of an encoded (or decoded) viewpoint different from the current encoded (or decoded) viewpoint,
  • the predicted image generated by the inter-frame prediction is the reconstructed image at another moment of the currently coded (or decoded) viewpoint.
  • the depth image also known as the range image, refers to an image that uses the distance (depth) from the image collector to each point in the scene as the pixel value. According to the depth image, it can directly reflect the visibility of objects in the scene. The geometry of the surface. Since the depth map can record the distance between the object in the scene and the camera, it can be used for measurement, 3D reconstruction, virtual viewpoint synthesis, etc.
  • a binocular camera can be used to capture two left and right viewpoint images of the same scene, and a (binocular) stereo matching algorithm is used to obtain a disparity map, and then a depth map is obtained.
  • the coding logic unit that is sequentially encoded into the HEVC bit stream usually includes three blocks, namely two chrominance blocks and one luma block. Such a block is called CTB (Coding Tree Block, coding tree block).
  • CTB Coding Tree Block, coding tree block
  • CTU also includes related syntax elements.
  • reconstruction and decoding may be used interchangeably, and the terms “image”, “picture” and “frame” may be used interchangeably.
  • image usually, but not necessarily, the term “reconstruction” is used on the encoder side and “decoding” is used on the decoder side.
  • FIG. 3 is a schematic structural diagram of a multi-view encoder provided by an embodiment of the present application.
  • V0 is an independent viewpoint
  • V1 is a dependent viewpoint
  • the texture image of each viewpoint is associated with a corresponding depth image.
  • the predicted texture block of the texture image depending on the viewpoint can be generated by using the reconstructed texture block of the texture image of the independent viewpoint and the corresponding reconstructed depth block of the depth image.
  • view-independent reconstructed depth blocks can be utilized to generate view-dependent predicted depth blocks. Encoding and decoding of independent and dependent views using a multi-view encoder is as follows.
  • the original image block (including the texture image block of the texture image and the depth image block of the depth image) is subtracted from the predicted block (including the texture image block) obtained by intra-frame prediction and/or inter-frame prediction.
  • the texture prediction block of the image and the depth prediction block of the depth image) to obtain the residual block (including the texture residual block of the texture image and the depth residual block of the depth image).
  • the residual block is transformed and quantized, and then encoded by an entropy encoder to form an encoded bit stream.
  • the residual block is subjected to inverse quantization and inverse transformation processing, and is added to the predicted block obtained through intra-frame prediction and/or inter-frame prediction to obtain a reconstructed block.
  • the loop filtering process may also include at least one of DBF (Deblocking Filter, deblocking filter), SAO (Sample-Adaptive Offset, sampling adaptive compensation), ALF (Adaptive Loop Filter, adaptive loop filtering) ( Fig. 3 is not shown);
  • DBF Deblocking Filter, deblocking filter
  • SAO Sample-Adaptive Offset, sampling adaptive compensation
  • ALF Adaptive Loop Filter, adaptive loop filtering
  • the loop filter processing based on neural network can also increase the filter based on neural network, this neural network can be super-resolution neural network, based on dense residual convolutional neural network, general convolutional neural network Network, etc., are not limited here.
  • DBF Deep Convolutional Neural Network based In-Loop Filter, a loop filter based on a dense residual convolutional neural network
  • SAO Session In-Loop Filter
  • ALF ALF
  • the reconstructed blocks processed by loop filtering will be further synthesized into a reconstructed image and stored in an image buffer for prediction processing of subsequent image blocks.
  • the original image block (including the texture image block of the texture image and the depth image block of the depth image) is subtracted by the prediction block (including the texture image block) obtained by intra-frame prediction and/or inter-frame prediction.
  • the texture prediction block of the image and the depth prediction block of the depth image to obtain the residual block (including the texture residual block of the texture image and the depth residual block of the depth image).
  • the residual block is transformed and quantized, and then encoded by an entropy encoder to form an encoded bit stream.
  • the residual block is subjected to inverse quantization and inverse transformation processing, and is added to the predicted block obtained through intra-frame prediction and/or inter-frame prediction to obtain a reconstructed block.
  • the neural network-based loop filtering process may also include at least one of DBF, SAO, and ALF (not shown in FIG. 3 ), you can also add a neural network-based filter to further improve the quality of the filtered image, such as DRNLF.
  • the reconstructed blocks processed by the loop filter are further synthesized into a reconstructed image and stored in an image buffer for prediction processing of subsequent image blocks.
  • view synthesis prediction can also be performed on images dependent on viewpoint V1.
  • the image blocks of the independent viewpoint V0 corresponding to the dependent viewpoint V1 may be read from the image buffer, including texture image blocks of the texture image and depth image blocks of the depth image.
  • the corresponding predicted depth image block of the depth image dependent on the viewpoint V1 can be generated, and the texture image block and the depth image block of the texture image and the depth image block of the corresponding independent viewpoint V0 can be generated, Predicted texture blocks depending on the texture image of the viewpoint V1 can be generated.
  • control data related to view synthesis prediction that is, the control data included in the prediction data in Figure 3 for indicating that the decoding end and the encoding end maintain the same prediction mode
  • other related data such as filter control data
  • the decoding process of the independent viewpoint V0 and the dependent viewpoint V1 will go through the following process: the video decoder performs entropy decoding on the received encoded bitstream (such as the bitstream of the independent viewpoint V0 or the bitstream of the dependent viewpoint V1) to obtain the predicted data, the filter control data indicated by the encoder, and the quantized transform coefficients; afterward, the quantized transform coefficients undergo inverse quantization and inverse transformation to obtain a residual block, and the residual block and prediction data undergo various prediction methods (for example, Including intra-frame prediction, inter-frame prediction, view synthesis prediction), the output prediction blocks are summed, and then according to the instruction of the filter control data to the loop filter processing, the same method as the multi-video encoder is adopted.
  • the video decoder performs entropy decoding on the received encoded bitstream (such as the bitstream of the independent viewpoint V0 or the bitstream of the dependent viewpoint V1) to obtain the predicted data, the filter control data indicated by the encoder, and the quantized transform coefficients; after
  • the filtering method performs filtering processing on the decoded image blocks, and the filtered decoded image blocks are further synthesized into decoded images, and the decoded images are cached in the decoded image buffer for prediction processing of subsequent image blocks, and the decoded video data is output at the same time.
  • the prediction parameters obtained by decoding the dependent viewpoint V1 may include control data for instructing the decoder to use viewpoint synthesis prediction, multi-view video
  • the decoder obtains the prediction block by video synthesis prediction. For example, according to the depth image block of the corresponding independent viewpoint V0, the corresponding predicted depth image block of the depth image dependent on the viewpoint V1 can be generated, and according to the corresponding independent viewpoint V0, the predicted depth image block can be generated.
  • the texture image block of the texture image of V0 and the depth image block of the depth image can generate the predicted texture block of the texture image dependent on the viewpoint V1, and then sum the predicted depth image block, predicted texture block, and their corresponding residual blocks Wait for a series of processing to get the respective decoded images.
  • FIG. 5 is a schematic flowchart of an image processing method according to the first embodiment.
  • the execution subject in this embodiment may be a computer device or a cluster composed of multiple computer devices.
  • the computer device may It may be an intelligent terminal (such as the aforementioned mobile terminal 100 ), or may be a server.
  • the implementation subject in this embodiment is an intelligent terminal as an example for description.
  • the first auxiliary information includes depth information or disparity information
  • the depth information includes at least one of the following: depth feature information, statistical information based on depth values, depth slices, preprocessed depth slices, Combination of depth feature information and statistics based on depth values.
  • disparity information is inversely proportional to the distance from a point in three-dimensional space to the projection center plane, as long as the disparity information of a certain point in the scene is known, the depth information of the point can be known.
  • Depth information or disparity information can be determined from the corresponding depth image, and the depth feature information can be any one or more of point features, line features, surface features, and depth profile information of the region of interest; based on the depth value
  • the statistical information of may be the statistical information of the depth value of the corresponding depth slice.
  • the statistical information based on the depth value can be used to calculate the similarity between the depth slice of the first viewpoint and the depth slice of the second viewpoint; the depth slice refers to the slice area corresponding to the texture slice in the depth map; the preprocessed depth slice is for example is a quantized depth slice.
  • the depth information can be represented by a matrix, and the size of the matrix is associated with the corresponding texture slice. For example, it is marked as 1 for a depth region of interest or a specific surface feature related to depth, and is marked as 0 for other regions. This helps to extract the features of the depth region of interest or the texture region corresponding to the specific surface feature about the depth, and further perform loop filtering on these features to improve the quality of the reconstructed image or the decoded image.
  • the first auxiliary information may be auxiliary information from a first viewpoint and/or a second viewpoint, the first viewpoint may be a dependent viewpoint, and the second viewpoint may be an independent viewpoint.
  • the method further includes: acquiring the first image block corresponding to the first viewpoint; and/or acquiring the first image block corresponding to the first viewpoint.
  • the second viewpoint is different from the first viewpoint.
  • the reference image and the image where the first image block is located belong to images of different viewpoints at the same moment.
  • the first view here can be a dependent view, and the second view can be an independent view;
  • the first image block is an input based on a neural network
  • the reconstruction block is a reconstruction texture image block (also referred to as a reconstruction texture block), and the reconstruction texture block can be a CTU, a slice (slice), a block (tile), or a sub-image Any one of the above;
  • the image where the first image block is located may be called a current frame depending on the viewpoint, the current frame may be the current texture frame F1, and the current texture frame F1 is the reconstructed image.
  • the reference image is a reference frame obtained from the image buffer (or decoded image buffer), the reference frame is a reconstructed image corresponding to the second viewpoint (or a decoded image corresponding to the second viewpoint), and the reference image is first Encoding is completed on the image where the first image block is located.
  • the reference image is a reference frame of an independent viewpoint.
  • the first image block is the currently reconstructed texture slice S1 (or the currently reconstructed texture slice S1) of the current texture frame F1 dependent on the viewpoint
  • the currently reconstructed texture slice S1 can be an intra prediction slice (I slice) or an inter frame prediction slice (P slice)
  • I slice intra prediction slice
  • P slice inter frame prediction slice
  • the current texture frame F1 has not been fully reconstructed
  • the independent viewpoint reference frame FR1 corresponding to the reference image
  • the texture slice in the reference frame can be matched to the currently reconstructed texture slice S1 to process.
  • the obtained currently reconstructed texture slice S1 is a reconstructed texture slice obtained through intra-frame prediction processing (i.e., intra-frame prediction slice I slice), or a reconstructed texture slice obtained through inter-frame prediction processing (i.e. Inter prediction slice P slice) instead of the texture slice obtained by inter-view prediction (such as view synthesis prediction), the currently reconstructed texture slice does not refer to the texture information of independent viewpoints in the reconstruction process, so in the loop filtering process
  • the stage can enhance the quality of reconstructed texture slices after in-loop filtering by fusing texture information from reference frames from independent viewpoints.
  • the present application is not limited thereto, and the currently reconstructed texture slice S1 may also be a reconstructed texture slice obtained through inter-view prediction processing.
  • subsequent reference to texture information of independent viewpoints can also further improve the quality of the reconstructed texture slice after filtering.
  • the size of the image area of the first image block of the first viewpoint obtained above is smaller than the size of the image area of the reference image of the second viewpoint, so that it can be determined from the larger image area of the reference image of the second viewpoint.
  • the first image block of a viewpoint is matched with the second image block, thereby improving the matching degree.
  • the first viewpoint and the second viewpoint can correspond to dependent viewpoints and independent viewpoints, and both the first image block and the second image block can be reconstructed texture blocks.
  • the specific matching method can refer to the content introduced in the corresponding embodiment in FIG. Do not elaborate.
  • a processing result corresponding to the first image block may be determined or generated according to the second image block of the reference image and the first auxiliary information.
  • the second image block may be obtained from a reference image. Acquisition of the second image block may follow the following preset rule, namely: determine the second image block matching the first image block of the first viewpoint in a larger image area divided by the reference image of the second viewpoint. Assuming that the first viewpoint is a dependent viewpoint, the second viewpoint is an independent viewpoint, and the first image block is a reconstructed texture block, the second image block can be determined according to the obtained reconstructed texture block and the reference frame, or can be determined according to the obtained reconstructed texture The block and the image area in the reference frame determine a second image block.
  • the second image block is determined as follows: 1 When the first image block is the current view-dependent The reconstructed texture block of the frame, the second image block is determined from the reconstructed slice in the independent viewpoint reference frame (optionally, the size of the reconstructed texture block is smaller than the size of the reconstructed slice); 2 When the first image block is the current view-dependent When the reconstructed texture block of the frame and the reconstructed image block are CTUs, any one of the slices, tiles, and sub-images of the reference image of the independent viewpoint can be obtained to determine the second image block; 3 when the first image When the block is a reconstructed texture block of the current frame depending on the viewpoint and the reconstructed image block is a slice or a tile, a sub-image in a reference image of an independent viewpoint may be obtained, and the second image block may be determined therefrom.
  • the relationship between the first image block and the second image block includes at least one of the following: the second image block and the first image block have the same size, the first image block and the first image block The type of the second image block is the same, and when the second image block is a slice, the second image block is composed of a plurality of coding tree units.
  • the same size here means that the size of the image area of the image block is the same, for example, the size of the first image block and the second image block are both 8 ⁇ 8, and the type is the same.
  • the first image block is a slice
  • the second image block The two image blocks correspond to slices
  • the first image block is a coding tree block (CTU)
  • the second image block corresponds to a coding tree block (CTU).
  • the above-mentioned slices are specifically texture slices (or reconstructed texture slices).
  • the texture slices of the second viewpoint may not be included in the NAL (Network Abstraction Layer, network abstraction layer, in the usual sense)
  • the texture slice in the video encoding standard H.264 is responsible for packing and transmitting the encoded data in the format required by the network), but the size and shape are the same as the texture slice of the first viewpoint (such as texture slice depending on the viewpoint), An image region composed of multiple CTUs.
  • the second image block may be determined from the reference image according to the first auxiliary information of the first image block.
  • the first image block is a currently reconstructed texture block, specifically the currently reconstructed texture slice S1 (or texture slice S1)
  • the first viewpoint is a dependent viewpoint
  • the first auxiliary information is depth information or disparity information, for example, depending on
  • the depth information corresponding to the texture slice S1 in the current texture frame F1 of the viewpoint may be the depth slice itself or a preprocessed depth slice, or may be the statistical information of the depth value of the depth slice corresponding to the currently reconstructed texture slice S1.
  • the depth information Ds1 or disparity information Ds2 corresponding to the currently reconstructed texture slice S1 is determined from the depth image corresponding to the texture slice S1.
  • the corresponding reference texture slice SR1 in the independent view reference texture frame FR1 can be determined.
  • step S502 includes:
  • the first auxiliary information includes depth information, and optionally, the depth information is determined according to a depth image corresponding to the first image block;
  • the first auxiliary information of the first image block includes depth information or disparity information, and the depth information or the disparity information is determined from a depth image corresponding to the first image block.
  • the first auxiliary information may be determined from the depth slice corresponding to the current reconstructed texture slice S1.
  • each image block in the reference image of the second viewpoint is of the same type as the first image block, for example, both are texture slices; similar to the first auxiliary information of the first image block, the first auxiliary information of each image block in the reference image
  • the information also includes depth information or disparity information, which is determined from the depth image corresponding to the reference image.
  • the similarity between the second image block and the first image block is measured by the similarity between the respective first auxiliary information.
  • the similarity between independent viewpoint texture slices and viewpoint-dependent texture slices is calculated based on depth information, and the depth information includes at least one of the following: depth feature information, statistical information based on depth values, depth slices, preprocessed depth Combination of slices, depth features, and statistics based on depth values.
  • each image block of the reference image of the corresponding second viewpoint can be Find a second image block that matches the first image block of the first viewpoint, and the first auxiliary information of the second image block is most similar to the first auxiliary information of the first image block.
  • the processing result is used to obtain a reconstructed image or a decoded image corresponding to the first image block. It should be noted that when this solution is applied to the multi-view video encoding end, the processing result is used to obtain the reconstructed image corresponding to the first image block; when this solution is applied to the multi-view video decoding end, the processing result is used to obtain the first The decoded image corresponding to the image block.
  • the processing result includes the filtered first image block, and the neural network-based in-loop filter processor shown in FIG. Image blocks are processed to determine or generate processing results.
  • the loop filter based on the neural network includes a fusion module and a filtering processing module based on the neural network.
  • the fusion module can receive the first auxiliary information (such as depth information or disparity information), the first image block (such as the current reconstructed texture block of the current frame depending on the viewpoint), and a reference image (such as an independent viewpoint reference frame), the fusion module can determine the second image block (such as a matching texture block) from the reference image, and the first image block and the second image block undergoes a series of processing in the fusion module, and then the result obtained by the fusion module is input to the neural network-based filtering processing module for processing to obtain the filtered first image block.
  • the first auxiliary information such as depth information or disparity information
  • the first image block such as the current reconstructed texture block of the current frame depending on the viewpoint
  • a reference image such as an independent viewpoint reference frame
  • the loop filter based on the neural network can be set in the multi-view encoder shown in Figure 3 or the multi-view decoder shown in Figure 4, and the filtering processing module based on the neural network adopts the following Figure 13 A schematic diagram of the structure is shown.
  • the fusion module can exist independently of the neural network-based filtering processing module, that is, it can be set as a separate functional module.
  • the fusion module is included in the neural network-based loop filter processor in FIG. 3 or FIG. 4 together with the neural network-based filter processing module, and is used to determine or generate the processing result of the loop filter process , the reconstructed image or the decoded image can be further synthesized according to the processing result.
  • the processing result may also be called a view-dependent loop-filtered texture block, or a view-dependent loop-filtered reconstructed texture block.
  • the fusion module receives view-dependent texture blocks, view-independent reference frames, and depth information or disparity information. In another embodiment, the fusion module may also receive the current reconstructed texture block of the current view-dependent frame, the reconstructed texture slice in the independent view reference frame, and depth information or disparity information.
  • the reconstructed texture block of the current frame dependent on the viewpoint may be one of CTU, slice, tile, and sub-image
  • the fusion module can receive one of the slices, squares, or sub-images from the reference frame of the independent viewpoint from the image buffer, when the reconstruction texture block of the current frame that the fusion module receives depends on the viewpoint is a slice (slice) or a square ( tile), the fusion module may receive sub-images of reference frames from independent viewpoints from the image buffer.
  • the image processing scheme provided by the embodiment of the present application can be applied to the scene of multi-viewpoint video coding and decoding, and by referring to the first auxiliary information (including depth information or disparity information), the information of image blocks of different viewpoints can be utilized Together, it can reduce the parallax effect between frames of different viewpoints at the same time, so as to achieve a better matching effect of image blocks of different viewpoints, and assist the first image block to determine or generate the corresponding
  • the processing result can reduce the degree of distortion of the processing result, thereby obtaining a high-quality reconstructed image or decoded image.
  • FIG. 7 is a schematic flowchart of an image processing method according to the second embodiment.
  • the execution subject in this embodiment may be a computer device or a cluster composed of multiple computer devices.
  • the computer device may It may be an intelligent terminal (such as the aforementioned mobile terminal 100 ), or may be a server.
  • the implementation subject in this embodiment is an intelligent terminal as an example for description.
  • the second image block can be determined from one of the slices, squares, or sub-images of the reference image from the second viewpoint obtained from the image buffer, when the first viewpoint
  • the second image block may be determined from a sub-image obtained from a reference image from a second viewpoint in an image buffer. That is, the processing rule for determining the second image block from the reference image of the second viewpoint here is: the size of the image area of the second viewpoint is larger than the size of the image area of the first image block of the first viewpoint.
  • the type of the second image block and the first image block are the same, for example, the first image block is a reconstructed texture slice, and the second image block determined from the sub-image of the reference image is also a reconstructed Texture slice: the second image block matches the first image block, and the second image block can also be called a matching image block, for example, when the first image block is a reconstructed texture block, the second image block is a matching texture block , specifically, when the reconstructed texture block is a reconstructed texture slice, the reconstructed texture slice of the reference image of the second viewpoint can be roughly matched with the reconstructed texture slice of the first viewpoint (slice to slice registration), and the obtained second image block is The reconstructed texture slices meeting the matching condition in the reference image of the second viewpoint (that is, matching texture slices).
  • S702 corresponds to roughly matching the first image block of the first viewpoint with the second image block of the second viewpoint, and the optional implementation steps are as follows 1 ⁇ 3:
  • the first image block is the currently reconstructed texture slice S1 of the current texture frame F1 dependent on the viewpoint
  • the reference image is a reference frame of an independent viewpoint
  • the texture slice SR in the reference texture frame FR1 of the independent viewpoint is combined with Rough matching is performed on the current reconstructed texture slice S1 of the current frame depending on the viewpoint: first, the depth information Ds1 or disparity information Ds2 corresponding to the current reconstructed texture slice S1 is obtained, and then, the corresponding independent viewpoint reference frame FR1 can be found in the corresponding The reference texture slice SR1 to which the depth information Ds1 or disparity information Ds2 corresponding to the reconstructed texture slice S1 is most similar.
  • a texture slice of an independent view whose depth information similarity to the view-dependent texture slice S1 has the greatest similarity is determined as the reference texture slice SR1, and the reference texture slice SR1 matches the view-dependent texture slice S1.
  • the texture slice SR of an independent view may not be a texture slice included in the NAL in the usual sense, but an image area composed of multiple CTUs having the same size and shape as the texture slice in the dependent view.
  • the feature maps corresponding to the first image block and the second image block can be obtained by directly extracting the features of the image blocks, or they can be obtained by fine matching of the first image block and the second image block, and the image sub-maps output by the fine matching block feature extraction.
  • Method 1 Perform feature extraction processing on the first image block and the second image block based on the feature extraction network and the first auxiliary information, and obtain the first feature map corresponding to the first image block and the second image block.
  • the first auxiliary information here is the same as the first auxiliary information used in the aforementioned rough matching, and can come from the first image block or the second image block;
  • the first auxiliary information includes depth information or disparity information, depth information or disparity information It may be represented by a matrix, or may also be a depth map or a disparity map, or a depth map or a disparity map that has been preprocessed (such as quantization processing, normalization processing).
  • the first auxiliary information can be used as the reference information of the feature extraction network, which can make the extracted feature and depth information of the first image block and the second image block and/or when performing feature extraction processing on the first image block and the second image block Or a mapping relationship is established between the disparity information, so that the first preset processing model and/or the first preset processing parameter can be determined more accurately when the first preset processing is performed subsequently.
  • the first default processing is warp processing
  • the first default processing model is a warp model
  • the first default processing parameter is a warp parameter.
  • the depth information corresponding to a texture image region of a specific depth that requires supervision may be set to 1, and the depth information corresponding to texture image regions of other depths may be set to 0.
  • the depth information or disparity information obtained in this way is used as the first auxiliary information.
  • the first auxiliary information may be the supervision information of the feature extraction network, which can only extract the features corresponding to the texture image area of a specific depth that needs to be supervised in the first image block and the second image block to generate respective corresponding features picture. In this way, in the case of limited computing resources or transmission bandwidth, it is possible to prioritize or only process texture image regions of a specific depth that require supervision.
  • the feature extraction network includes a neural network, such as a convolutional neural network, a residual convolutional neural network, a deep learning neural network, etc., and can extract image blocks corresponding to
  • the feature map of , the feature map is a multidimensional (eg two-dimensional) matrix.
  • the corresponding feature extraction unit may be a convolutional layer, and the downsampling unit may be a pooling layer.
  • Mode 2 Acquiring the first image sub-block of the first image block and the second image sub-block of the second image block; the second auxiliary information of the second image sub-block and the first image sub-block The second auxiliary information matching; based on the feature extraction network and the second auxiliary information, perform feature extraction processing on the first image sub-block and the second image sub-block to obtain the first image sub-block of the first image A sub-feature map and a second sub-feature map of the second image sub-block; through the first sub-feature map, determine or generate the first feature map corresponding to the first image block, and through the second sub-feature map A feature map is used to determine or generate a second feature map corresponding to the second image block; optionally, the second auxiliary information is different from the first auxiliary information.
  • the first sub-feature maps corresponding to all the first image sub-blocks of the first image block are combined/stitched into the first feature map, and the first sub-feature maps corresponding to all the second image sub-blocks of the second image block are combined/joined.
  • the two sub-feature maps are concatenated into a second feature map.
  • the correspondence of the second image sub-block is the result of fine matching of the first image sub-block and the image sub-blocks in the second image block after rough matching, and the second image sub-block can also be called a matched image Sub-blocks, the first image sub-block and the second image sub-block are of the same type.
  • the first image block of the first viewpoint is a reconstructed texture block
  • the second image block of the second viewpoint corresponds to a reconstructed texture block (or called a matching texture block)
  • the first image sub-block is the reconstructed texture block of the first viewpoint
  • the second image sub-block is the reconstructed texture sub-block in the reconstructed texture block of the second viewpoint (or called matching texture sub-block).
  • the matching of the second auxiliary information of the second image sub-block and the first image sub-block means that the similarity between the second auxiliary information of different image sub-blocks is the largest. Similar to the rough matching, the fine matching uses the second auxiliary information to calculate the similarity.
  • the second auxiliary information may include depth information or disparity information.
  • the depth information may include but not limited to at least one of the following:
  • Depth feature information such as point features, line features, surface features, boundary features, and depth profile information about the depth of interest
  • the depth information can be used to calculate the reconstructed depth block of the second viewpoint and the reconstructed depth of the first viewpoint similarity between blocks;
  • both the first auxiliary information and the second auxiliary information are depth information, which may include different contents.
  • the depth information corresponding to a texture slice is depth feature information during rough matching, and the corresponding depth information of a texture sub-block is reconstructed during fine matching.
  • the depth information is the statistical information based on the depth value; or, the depth information corresponding to the texture slice is the depth feature information during the rough matching, and the depth information corresponding to the reconstructed texture sub-block is the combination of the depth feature information and the statistical information based on the depth value during the fine matching.
  • the precision of the first auxiliary information and the second auxiliary information is different.
  • the depth information corresponding to the texture slice during rough matching is n pieces of depth feature information
  • the depth information corresponding to the reconstructed texture sub-block during fine matching is m pieces of depth feature information, optionally, m is greater than n and is an integer greater than or equal to 1.
  • one kind of depth information for example, low-precision depth information
  • another kind of depth information high-precision depth information
  • the first image sub-block and the second image sub-block are image sub-blocks of the same type, for example, the first image sub-block and the second image sub-block are encoded Treeblock or extended coded treeblock.
  • the first image block is a reconstructed texture block, and the reconstructed texture block can be a reconstructed texture slice, that is, the first image block is a reconstructed texture slice;
  • the first image sub-block is a reconstructed texture sub-block, and the reconstructed texture sub-block can be a reconstructed texture coding tree block CTB, that is, the first image sub-block is a reconstructed texture coding tree block CTB, and the types of the second image block and the second image sub-block correspond to the first image block and the first image sub-block.
  • the way to obtain the reconstructed texture sub-block may be: according to the predetermined processing sequence, the reconstructed texture coding tree block CTBd in the reconstructed texture slice S1 of the first viewpoint and the reconstructed texture coding tree block CTBi in the reconstructed texture slice SR1 of the second viewpoint Perform fine matching (block to block registration).
  • the first viewpoint is a dependent viewpoint
  • the second viewpoint is an independent viewpoint.
  • the depth information or disparity information corresponding to the reconstructed texture coding tree block in the reconstructed texture slice that depends on the viewpoint can be determined according to the order of raster scanning.
  • the similarity between the reconstructed texture coding tree block of the first viewpoint and the reconstructed texture coding tree block of the second viewpoint can be calculated according to the depth information, that is, the depth corresponding to the reconstructed texture coding tree block CTBd of the first viewpoint can be determined
  • the reconstructed texture coding tree block CTBi of the second viewpoint with the largest information similarity is used as the reconstructed texture coding tree block matching the reconstructed texture coding tree block CTBd of the first viewpoint.
  • the similarity of depth information can be used expressed as a function of similarity probabilities.
  • the reconstructed texture sub-block can also be an extended reconstructed texture coding tree block CTBex, that is, both the first image sub-block and the second image sub-block are extended reconstructed texture coding tree blocks, and the expanded reconstructed texture coding tree block CTBex is a coded tree block for reconstructed texture
  • the block edge of the coding tree block CTB is a coding tree block extended
  • the extended reconstructed coding tree block CTBex includes the reconstructed texture coding tree block CTB.
  • the extended area may be filled with pixels of other reconstructed texture coding tree blocks adjacent to the reconstructed texture coding tree block, therefore, the size of the expanded reconstructed texture coding tree block CTBex is larger than the size of the reconstructed texture coding tree block.
  • the reconstructed coded image or decoded image will produce block effects, but here, when using the extended reconstructed texture coded tree block for loop filtering, the extended area is adjacent to other The pixels of the reconstructed texture coding tree block are filled, which can effectively reduce the block effect caused by the segmentation. Therefore, the first image sub-block and the second image sub-block are extended reconstructed texture coding tree blocks, which can alleviate the block effect from the essence of image division, and further improve the filtering effect and the quality of encoding and decoding.
  • the second auxiliary information is used as reference information for fine matching, so that the extracted features of the first image sub-block and the second image sub-block are consistent with the depth information and/or disparity information
  • Establishing a mapping relationship can also be used as supervision information, so that the feature extraction network can only extract the features corresponding to the texture image area of a specific depth that needs to be supervised in the first image sub-block and the second image sub-block to generate corresponding feature maps.
  • the above two methods can extract the corresponding feature map (including the first feature map and the second feature map) through the feature extraction network.
  • the difference is only in the processing objects received by the feature extraction network: under fine matching, the feature extraction network What is received are a first image sub-block (eg reconstructed texture sub-block), a second image sub-block (eg matching texture sub-block), and second auxiliary information.
  • the feature extraction network receives the first image block (for example, the currently reconstructed texture block), the second image block (for example, the matching texture block), and the first auxiliary information.
  • the detailed content and processing principle of the feature extraction network are introduced.
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series feature extraction unit and a down-sampling unit, the Nth feature extraction module includes a feature extraction unit; the first feature extraction module in the N cascaded feature extraction modules is used to process the first image block and the second image block , or the first image sub-block and the second image sub-block; each feature extraction module in the N cascaded feature extraction modules except the first feature extraction module is used for pre-processing
  • the output of a feature extraction module for each feature extraction module, the input of the down sampling unit is connected with the output of the feature extraction unit, and the output of the down sampling unit is connected with the feature extraction unit in the latter feature extraction module input connection; optionally, the first auxiliary information or the second auxiliary information is used as reference information and/or supervision information of at least one feature extraction module in the N cascaded feature extraction modules.
  • FIG. 8a is a schematic structural diagram of a feature extraction network provided by an embodiment of the present application.
  • the features of the first image block extracted by the feature extraction network using pyramid layered processing will be described in conjunction with FIG. 8a.
  • the feature extraction network receives the first auxiliary information and processes the first image block and the second image block as an example. .
  • the first image block (such as a reconstructed texture block), the second image block (such as a matching texture block) and the first auxiliary information are input into the feature extraction module 1, and through the convolution operation in the feature extraction unit 1, the output feature Figure Fd1 and feature map Fi1.
  • the down-sampling unit 1 outputs the feature map Fd1 and the feature map Fi1 to obtain the down-sampled feature map Fdd1 and the down-sampled feature map Fid1.
  • the processing of the feature extraction unit 1 and the down-sampling unit 1 included in the feature extraction module 1 is the first level of processing.
  • the down-sampled feature map Fdd1 and the down-sampled feature map Fid1 are input to the feature extraction module 2, and the feature map Fd2 and feature map Fi2 are output through the convolution operation in the feature extraction unit 2 .
  • the down-sampling unit 2 performs a down-sampling operation on the output feature map Fd2 and feature map Fi2 to obtain the down-sampled feature map Fdd2 and feature map Fid2.
  • the processing of the feature extraction unit 2 and the down-sampling unit 2 included in the feature extraction module 2 is the second level of processing.
  • the operation of the feature extraction module for each subsequent level is similar until the processing of the n-1th level.
  • the downsampled feature map Fdd(n-1) and the downsampled feature map Fid(n-1) are input to the feature extraction module N, and the output feature map Fdn and Feature map Fin.
  • the feature maps Fd1-Fdn are collectively referred to as the first feature map
  • the feature maps Fi1-Fin are collectively referred to as the second feature map.
  • the down-sampling processing of the down-sampling unit is performed after each processing by a feature extraction unit, which can reduce the size of the feature map, and the size of the feature map through the series feature extraction module will gradually shrink, thereby making the feature
  • the semantics expressed by graphs are more abstract, and this process is also called pyramid feature extraction.
  • the feature maps generated by the feature extraction modules at each level are feature maps of different scales.
  • Subsequent use of feature maps of different scales for distortion and fusion processing can enrich the expression of feature maps, and then make the final fusion
  • the obtained feature map more comprehensively and accurately describes the information of the first image block of the first viewpoint, so as to better restore the original image corresponding to the first image block.
  • Part or all of the feature extraction modules in the feature extraction network can receive the first auxiliary information/second auxiliary information, which can be used as reference information or supervisory information, and the reference information can make the extracted features and auxiliary information A mapping relationship is established between the information, and the supervision information can be used to supervise the training of the feature extraction unit, so that the trained feature extraction network can accurately extract image blocks with different depth information, and then obtain accurate feature maps.
  • the first-level feature extraction module specifically, the feature extraction unit 1
  • the first auxiliary information/second auxiliary information such as depth information or disparity information
  • all modules or some modules of the feature extraction module at each level can receive the first auxiliary information/second auxiliary information (such as depth information or disparity information), so as to be applicable to scenes with a high-precision depth variation range , which is conducive to obtaining high-quality reconstructed texture blocks after loop filtering.
  • first auxiliary information/second auxiliary information such as depth information or disparity information
  • FIG 8b Please refer to the schematic diagram of another feature extraction network shown in Figure 8b. As shown in Figure 8b, when the feature extraction network extracts the feature maps corresponding to the first image block and the second image block, each feature extraction unit receives the first Auxiliary information to better extract feature maps of image blocks with high-precision depth information.
  • the feature extraction network shown in FIG. 8a or FIG. 8b can also receive the second auxiliary information to process the first image sub-block and the second image sub-block, and the specific processing flow is the same as that of receiving the first auxiliary information and processing the second image sub-block.
  • the contents of the first image block and the second image block example are the same, and will not be described in detail here.
  • the feature maps output by each feature extraction unit may be sub-feature maps, for example, the feature maps Fd1-Fdn are collectively called the first sub-feature map, and the feature maps Fi1-Fin are collectively called the second sub-feature map.
  • the first sub-feature maps corresponding to all the first image sub-blocks of the first image block can be combined/spliced into the first feature map, and the sub-feature maps corresponding to all the second image sub-blocks of the second image block
  • the second sub-feature map is spliced into the second feature map, and then the first preset processing and the second preset processing are performed based on the first feature map and the second feature map; in another embodiment, the first sub-feature map can also be directly The first preset processing and the second preset processing are performed on the feature map and the second sub-feature map.
  • the processing of sub-feature maps is described as follows:
  • the warped second sub-feature map and the first feature map are subjected to feature fusion processing to obtain a fused feature map.
  • the second sub-feature map may be directly warped without determining the first feature map and the second feature map.
  • S704. Determine or generate a processing result corresponding to the first image block according to the first feature map and the second feature map.
  • the processing result is used to generate a reconstructed image or a decoded image corresponding to the first image block.
  • a first preset processing parameter is determined based on the first characteristic map and the second characteristic map; or, based on the first characteristic map, the second characteristic map and the first auxiliary
  • the information determines the first preset processing parameters; based on the first preset processing model, the first preset processing is performed on the second feature map to obtain the target second feature map; the first preset processing model includes according to the first preset processing model A first processing model determined by preset processing parameters.
  • the first preset processing model includes the first processing model and a second processing model, and the first preset processing is performed on the second feature map based on the first preset processing model.
  • Set up processing to get the second feature map of the target including:
  • the first preset processing is warping processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset processing model is a warping model.
  • the feature maps of different viewpoints can be mapped to each other, where the second feature map of the second viewpoint is mapped to the first viewpoint, which can make the distortion
  • the second feature map is similar to the object shape, size and other attributes in the first feature map of the first viewpoint.
  • the first preset processing in step (1) is distortion processing, which may include the following content: determining distortion parameters based on the first feature map and the second feature map; or, based on the first feature map A feature map, the second feature map, and the first auxiliary information determine a distortion parameter; or, determine a distortion parameter based on the first feature map, the second feature map, and the second auxiliary information;
  • the model performs warping processing on the second feature map to obtain a warped second feature map; the warping model includes a first processing model determined according to the warping parameters.
  • the feature maps output by the feature extraction units in the feature extraction modules of each level in the feature extraction network are the first feature map and the second feature map, such as the feature maps Fd1-Fdn shown in the aforementioned Figure 8a (corresponding to first feature map) and feature maps Fi1-Fin (corresponding to the second feature map).
  • the second feature map of each layer can be subjected to the first preset processing (such as warping processing) to obtain the target second feature map (such as warping after the second feature map).
  • the following takes the first feature map Fdx and the second feature map Fix output by the feature extraction unit x in the feature extraction module of the x-th (x ⁇ [1 ⁇ n]) layer as an example, and the principle of distorting the second feature map Fix Make the following instructions:
  • the distortion parameter determination module receives the first feature map Fdx from the first viewpoint and the second feature map Fix from the second viewpoint, and outputs the distortion parameters.
  • the width of the first feature map Fdx of the first viewpoint is Wdx
  • the height is Hdx
  • the number of channels is Cdx
  • the width of the second feature map Fix of the second viewpoint is Wix
  • the height is Hix
  • the number of channels is Cix
  • Warp parameter determination module is built based on neural network.
  • the distortion parameter determination module can be implemented by using a fully connected layer or a convolutional layer. It should be noted that the distortion parameter determination module also includes a regression layer, which is used to generate distortion parameters.
  • a neural network-based distortion parameter determination module can be constructed through a neural network learning algorithm, and the distortion processing module can establish a mapping relationship between input variables (including the first feature map and the second feature map) and the distortion parameters.
  • the distortion processing module can establish a mapping relationship between input variables (including the first feature map and the second feature map) and the distortion parameters.
  • a training sample is established, and the training sample includes input and output.
  • the input includes a first feature map and a second feature map, and the output includes a distortion parameter.
  • the warp parameters at include warp parameters labeled for the second feature map of different warp types such as cropping, translation, rotation, scaling and skewing. Then, use the training samples to perform forward propagation calculations to obtain the input and output of neurons in each layer.
  • the neural network-based distortion parameter determination module receives the first feature map and the second feature map, the trained neural network included in the distortion parameter determination module can accurately determine the distortion parameters. It should be noted that the distortion parameter determining module may also receive the first sub-feature map of the first viewpoint and the second sub-feature map of the second viewpoint, and determine the distortion parameter.
  • the target pixel coordinates in the grid that is, the pixel coordinates of the distorted feature map
  • the corresponding pixel coordinates in the second feature map Correspondence between pixel coordinates.
  • this correspondence is used to determine warp parameters.
  • the distortion parameters include at least one of the following: parameters related to affine transformation and parameters related to projective transformation. It should be noted that, when the pyramid layered extraction process is adopted, the distortion parameters may be different for the first feature map and the second feature map output by different layers.
  • a warping model can be obtained according to the determined warping parameters, and the warping model can reflect the mapping relationship between the second feature map Fix and the corresponding pixel coordinates of the warped second feature map Fiwx.
  • the distortion model includes a first processing model and a second processing model
  • the first processing model is a distortion model determined according to the distortion parameters
  • the second processing model includes target pixel coordinates
  • the The warping model distorts the second feature map to obtain the warped second feature map may include: determining sampling points in the second feature map according to the first processing model and the second processing model coordinates; according to the second feature map and the sampling kernel function, determine a target pixel value corresponding to the sampling point coordinates; generate a distorted second feature map according to the target pixel value corresponding to the sampling point coordinates.
  • the first processing model determined by the distortion parameter includes any one of an affine transformation matrix, a projection transformation matrix, and a combination of the affine transformation matrix and the projection transformation matrix;
  • the pixel grid model G is a predefined grid model.
  • the pixel grid model G may be determined according to features and/or auxiliary information in the first feature map. Determining the pixel grid model G according to the features and/or auxiliary information in the first feature map can make the setting of the pixel grid model more flexible.
  • the sampling point coordinates in the second feature map can be obtained.
  • the first processing model is an affine transformation matrix
  • the sampling point coordinates defined in the second feature map Fix are (xis, yis)
  • the pixel-wise affine transformation is:
  • a constant 1 is added to the prime coordinates to form homogeneous coordinates. Through homogeneous coordinates, some common warping transformations can be represented.
  • a standardized coordinate system can also be used.
  • the value of the target pixel coordinates (xit, yit) of the grid in the output feature map (ie, the warped second feature map) can be limited within the range of -1 to 1
  • the values of the sampling point coordinates (xis, yis) defined in the second feature map are limited within the range of -1 to 1. In this way, subsequent sampling and transformation can be applied in a standardized coordinate system.
  • the input second feature map can be cropped, translated, rotated, scaled and skewed to form an output feature map (ie, the warped second feature map).
  • the warp model includes a plurality of warp sub-models, and each warp sub-model is assigned a corresponding weight.
  • the above-mentioned principle of distorting the second feature map based on the distortion model to obtain the coordinates of the sampling point can be regarded as part of the sampling process for the second feature map, and then the distorted second feature can be generated according to the sampling result (ie, the coordinates of the sampling point) Figure Fiwx.
  • the sampling kernel function can be applied at the sampling point coordinates (xis, yis) defined in the second feature map to obtain the sampling point coordinates in the output feature map.
  • the pixel value of the corresponding pixel please refer to the following expressions:
  • the height KC is the number of channels.
  • the finally warped second feature map includes the target pixel value corresponding to the sampling point coordinates.
  • a preset processing module is specifically a distortion processing module, including a distortion parameter determination module, a distortion model determination module, and a sampling module.
  • the functions of each module are as follows: the distortion parameter determination module has been introduced above, and will not be repeated here.
  • the distortion model determination module can receive the distortion parameters output by the distortion parameter determination module, and then output the distortion model.
  • the distortion parameters are related to affine transformation
  • the parameters of the distortion model include an affine transformation matrix
  • the sampling module is used to distort and sample the second feature map in combination with the distortion model, including obtaining the coordinates of the sampling points, and using the sampling kernel function to calculate the sampling points in the second feature map The target pixel value at the coordinates, and then output the distorted second feature map Fiwx.
  • the difference between the warp processing modules shown in Fig. 9a and Fig. 9b lies in that the input received is different.
  • the warp processing module shown in Fig. 9a receives the first feature map and the second feature map to determine the warp model.
  • the distortion processing module shown in FIG. 9b can also receive auxiliary information (including first auxiliary information/second auxiliary information, such as depth information or disparity information) to determine Distort the model.
  • the warp parameter determination module receives the first feature map Fdx from the first viewpoint, the second feature map Fix from the second viewpoint, and auxiliary information, and outputs the warp parameters.
  • the width of the first feature map Fdx of the first viewpoint is Wdx, the height is Hdx, and the number of channels is Cdx;
  • the width of the second feature map Fix of the second viewpoint is Wix, the height is Hix, and the number of channels is Cix;
  • Warp parameter determination module is built based on neural network.
  • the distortion parameter determination module can be implemented by using a fully connected layer or a convolutional layer. It should be noted that the distortion parameter determination module also includes a regression layer, which is used to generate distortion parameters.
  • a neural network-based distortion parameter determination module can be constructed, and the distortion processing module can establish a mapping from input variables (including the first feature map, the second feature map, and the first auxiliary information) to the distortion parameters relation.
  • input variables including the first feature map, the second feature map, and the first auxiliary information
  • the distortion processing module can establish a mapping from input variables (including the first feature map, the second feature map, and the first auxiliary information) to the distortion parameters relation.
  • the training sample includes input and output, and optionally, the input includes a first feature map, a second feature map, and first auxiliary information
  • the output includes warp parameters, where the warp parameters include warp parameters labeled for the second feature map of different warp types (eg cropping, translation, rotation, scaling and skewing).
  • the trained neural network included in the distortion parameter determination module can accurately determine the distortion parameters.
  • auxiliary information for example, first auxiliary information and second auxiliary information
  • it is possible to determine the target pixel coordinates in the grid that is, the pixel coordinates of the distorted feature map
  • the corresponding pixel coordinates in the second feature map Correspondence between.
  • this correspondence is used to determine warp parameters. It can be seen from the foregoing that, in the feature extraction stage, a mapping relationship between the features of the first image block and the second image block and depth information and/or disparity information can be established.
  • the coordinates of the target pixels in the grid that is, the pixel coordinates of the distorted feature map
  • the relationship between corresponding pixel coordinates in the second feature map
  • the warping process with reference to the depth information or disparity information can make the distorted second
  • the processing of the first feature map and the second feature map output by each feature extraction unit in the feature extraction network by the above-mentioned twisting processing module is the same, that is, the first feature map is used to process the second feature map Distortion processing to obtain the distorted second feature map.
  • the first feature map is used to process the second feature map Distortion processing to obtain the distorted second feature map.
  • the first preset processing module shown in Figure 10 which is proposed based on the structural schematic diagram of the feature extraction network shown in Figure 8a.
  • processing module As shown in Figure 10, it includes N distortion processing modules and N feature extraction modules.
  • the feature extraction network processes the first image block and the second image block to obtain the corresponding first feature map and second feature map.
  • Each feature extraction The feature maps Fdx(x ⁇ [1,n]) and Fix output by the unit are both processed by the warping processing module to obtain the warped second feature map Fiwx, which finally includes N warped second feature maps.
  • the feature extraction module and the distortion processing module with the same label belong to the same level of processing, for example, the feature extraction module 1 and the distortion processing module 1 belong to the first level, and the feature extraction module 2 and the distortion processing module 2 belong to the second level.
  • N levels are included to form a pyramid hierarchical structure, and each level has a feature extraction module and a distortion processing module.
  • a feature fusion network may be used to perform second preset processing on the first feature map and the target second feature map to obtain the target feature map.
  • the first preset processing is warping processing
  • the second preset processing is feature fusion processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset Let the processing model be the warp model, and the target feature map be the fusion feature map.
  • the first feature map here can be the feature map obtained by pyramid layer extraction, which corresponds to the output of the feature extraction unit in any feature extraction module in the feature extraction network, and the second feature map after the distortion is the pair of corresponding distortion processing modules.
  • the result obtained after the second feature map output by the feature extraction unit of each layer is distorted.
  • the second feature map is twisted, i is an integer greater than or equal to 1,
  • the feature fusion network here is used to fuse the warped second feature map output by the warping processing module and the first feature map output by the feature extraction network.
  • the feature fusion network corresponds to the feature extraction network, and can also be divided into N levels, each level includes a feature fusion module and/or upsampling Modules, each level has N feature fusion modules and M (ie N-1) upsampling modules. And there are correspondingly N warping processing modules performing warping processing on the second feature map.
  • N warping processing modules performing warping processing on the second feature map.
  • each feature fusion module is input to the previous upsampling module, that is, the output of the Nth feature fusion module is the first The input of N-1 upsampling modules, so that the feature map output by the first-level feature fusion module in the pyramid hierarchical structure is the fusion feature map used in the final filtering.
  • the distortion processing module N receives and processes the first feature map Fdn and the second feature map Fin output by the feature extraction module N, and outputs the twisted second feature map Fiwn, and then, the feature fusion module N pairs the first feature map Fdn and the distorted Fiwn perform feature fusion to obtain the fused feature map Fdfn, and the fused feature map Fdfn is then input to the upsampling module M of the N-1th level, that is, the upsampling module (N-1) for upsampling processing.
  • the upsampling module (N-1) performs upsampling processing on the fusion feature map Fdfn to obtain the upsampled feature map Fun, and the upsampled feature map Fun is output to the feature fusion of the N-1 level Module N-1, while the distortion processing module (N-1) receives and processes the first feature map Fd(n-1) and the second feature map Fi(n-1) output by the feature extraction module (N-1), output Warped second feature map Fiw(n-1).
  • the feature fusion module (N-1) performs feature fusion on the first feature map Fd(n-1), the distorted feature map Fiw(n-1) and the upsampled feature map Fun to obtain the fused feature map Fdf(n-1).
  • the fused feature map Fdf(n-1) will be input to the up-sampling module (N-2) of the N-2th level for up-sampling processing.
  • Level N-2 The upsampling module (N-2), that is, the upsampling module (M-1) performs upsampling on the fusion feature map Fdf(n-1) to obtain the upsampled feature map Fu(n-1 ), and output the upsampled feature map Fu(n-1) to the feature fusion module (N-2) of the N-2th level, and the feature fusion module (N-2) pairs the twist processing module (N-2)
  • the output distorted second feature map Fiw(n-2), the first feature map Fd(n-2) output by the feature extraction module (N-2) and the upsampled feature map Fu(n-1) are performed Feature fusion to obtain the fusion feature map Fdf(n-2).
  • the distortion processing module 1 receives and processes the first feature map Fd1 and the second feature map Fi1 output by the feature extraction module 1, and outputs the second feature map Fiw1 after distortion.
  • the fusion feature module 1 pairs the first feature The image Fd1, the distorted second feature image Fiw1, and the upsampled feature image Fu2 from the second level perform feature fusion to obtain the fused feature image Fdf.
  • the fused feature map Fdf can finally be used to determine the processing result of the first image block.
  • the size of the feature map can be enlarged by upsampling the fused feature map, and finally a fused feature map with the same size as the input feature map is obtained, and the fused feature map is filtered to obtain a higher quality reconstructed image.
  • each level in the first level to the N-1th level includes a feature extraction module, a twist processing module, feature fusion module, and upsampling module.
  • the Nth level includes a feature extraction module, a warping processing module, and a feature fusion module.
  • the data processed by the feature fusion module at each level is the fused feature map.
  • the processing logic corresponding to the feature extraction module and the distortion processing module in the N levels is from top to bottom, that is, from the first level to the Nth level, and the processing of the feature fusion module and the upsampling module is from bottom to top, that is, from From the Nth level to the first level, the pyramid processing model formed can realize the accurate extraction of feature maps.
  • the number N of levels or the number of modules can be set as required, or can be set according to empirical values, and there is no limitation here.
  • each feature fusion module performs feature fusion on the first feature map and the distorted second feature map, and an optional way to obtain the fused feature map Fdf (that is, the fused feature map) can be: the corresponding channel Add the first feature map on and the warped second feature map, and the number of channels remains unchanged (that is, the add operation); it can also be: input the warped second feature map Fiw and the first feature map Fd into the connection layer (concatenate), through the connection operation, output the fused features (ie, concat operation), for example, each output channel of the connection layer is:
  • * represents convolution
  • C represents the number of channels
  • Xi represents the first feature map of the i-th channel
  • Yi represents the second feature map of the i-th channel
  • Ki represents the convolution kernel corresponding to the first feature map
  • Ki+ c represents the convolution kernel corresponding to the second feature map.
  • FIG. 12a A detailed structural schematic diagram of , for example, processing the first image block according to the second image block and auxiliary information may include fine matching, feature extraction, and warping processing. In another embodiment, no fine matching may be included, that is, corresponding to the structural schematic diagram shown in FIG. 12b.
  • the processing logic of the fusion module is called the third preset processing, and the fusion module is correspondingly called the third preset processing module.
  • filtering may be performed on the target feature map to obtain a filtered target feature map; and the processing result corresponding to the first image block is determined according to the filtered target feature map.
  • the target feature map may be filtered by using a target filtering processing model to obtain a filtered target feature map.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit.
  • the step of filtering the target feature map by using the target filtering processing model to obtain the filtered target feature map includes: performing down-sampling processing on at least one target feature map processed by the first processing unit to obtain down-sampled the target feature map after sampling; perform upsampling processing on the down-sampled target feature map to obtain a target fusion feature map; use the second processing unit to process the target fusion feature map to obtain a filtered target feature map.
  • the target feature map is a fusion feature map.
  • the processing result is used to generate a reconstructed image or a decoded image corresponding to the first image block.
  • the processing result is used to generate the reconstructed image corresponding to the first image block; when applied to the multi-view decoder, the processing result is used to generate the decoded image corresponding to the first image block.
  • the optional implementation of step (3) includes: performing filtering processing on the fusion feature map to obtain a filtered fusion feature map; A processing result of an image block.
  • the processing result includes the first image block filtered by the first viewpoint.
  • the processing result obtained here may be the reconstruction after filtering of the current frame dependent on the viewpoint. Texture blocks. Subsequently, the processing result may also undergo other filtering processing (for example, ALF), and the reconstructed image or the decoded image is further synthesized according to the result of the filtering processing.
  • ALF filtering processing
  • the filtering process on the fusion feature map may be: use the target filtering processing model to filter the fusion feature map to obtain the filtered fusion feature map; optionally, the target filtering process
  • the model includes a target candidate model selected from a plurality of candidate models according to the rate-distortion cost, and each candidate model in the plurality of candidate models has a mapping relationship with a quantization parameter.
  • the target candidate model included in the target filtering processing model may be a neural network model, and the neural network model is set in a neural network-based filtering processing module.
  • the structure of the neural network-based filtering processing module may be as shown in FIG. 13 , including at least one convolutional layer and at least one residual unit.
  • the fused feature map Fdf (that is, the fused feature map) is fed into the convolutional layer 1, and after D residual units and a convolutional layer 2, the first image block filtered by the first viewpoint is output.
  • D is an integer greater than or equal to 1
  • the neural network-based filtering processing module corresponds to a processing module in a neural network-based loop filter (eg, DRNLF).
  • each neural network-based filtering processing module has a plurality of candidate models, each candidate model corresponds to a different quantization parameter, and the quantization parameter is obtained from a quantization parameter map (QP map), and the quantization parameter map is filled A matrix with multiple quantization parameters.
  • QP map quantization parameter map
  • multiple candidate models can be trained for different quantization parameters, and the best candidate model can be associated with the quantization parameters.
  • the target candidate model with the lowest rate-distortion cost can be selected from multiple candidate models.
  • the fusion feature map is filtered by the target candidate model.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; Performing filtering processing on the fusion feature map to obtain a filtered fusion feature map, including: performing down-sampling processing on at least one fusion feature map processed by the first processing unit to obtain a down-sampled fusion feature map; The sampled fusion feature map is subjected to upsampling processing to obtain a target fusion feature map; the target fusion feature map is processed by the second processing unit to obtain a filtered fusion feature map.
  • the first processing unit included in the target filtering processing model is a convolution unit (or convolutional layer), and the second processing unit is a residual unit.
  • the target filtering processing model is shown in Figure 13, and the fusion feature map input residual In units 1-D-1, the residual data output by at least one residual unit is subjected to scaling processing a (for example, downsampling processing, or divided by a scaling factor), preferably, residual units 1-D can be scaled to -1
  • the residual data output by at least one residual unit in the residual unit is scaled to the range of 0 to 1; after that, in the convolutional layer 2, after the convolutional layer 2 receives the residual data output by the residual unit D , the residual data of the residual unit D is scaled b (such as upsampling processing, or multiplied by the scaling factor corresponding to the scaling process a), and then the residual data after scaling b and the fusion feature map Fdf are convolved Layer 2 performs mapping and synthesis of corresponding relationships to obtain the filtered first image block
  • the processing amount of the residual data is reduced to a certain range, which can greatly reduce the computational complexity of the loop filter of the neural network used for multi-view coding , to improve the efficiency of filtering processing.
  • the image is integrated before filtering.
  • Depth information or disparity information using the depth information or disparity information to find the second image block matching the first image block from the reference frame, so that the most suitable second image block can be accurately and finely determined; by using the first image block
  • an enhanced image block with clearer texture and edges can be obtained, thereby reducing video compression distortion and improving video compression quality.
  • pyramid hierarchical processing is used to construct feature pyramids of different scales, combined with down-sampling processing, which reduces the amount of calculation and more comprehensively describes the information of image blocks from different viewpoints.
  • the fusion of the first feature map and the distorted second feature map combines the feature information of different viewpoints, better restores the feature information of the first viewpoint, and improves the quality of multi-view video compression coding.
  • FIG. 14 is a schematic flowchart of an image processing method according to a third embodiment.
  • the execution subject in this embodiment may be a computer device or a cluster composed of multiple computer devices.
  • the computer device may It may be an intelligent terminal (such as the aforementioned mobile terminal 100 ), or may be a server.
  • the implementation subject in this embodiment is an intelligent terminal as an example for description.
  • the first reconstructed image block may also be acquired.
  • the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.
  • the first reconstructed image block and the second reconstructed image block correspond to different reconstructed images.
  • Different reconstructed images here refer to reconstructed frames from different viewpoints at the same moment.
  • the first reconstructed image block matches the second reconstructed image block, for example, the first reconstructed image block is the current reconstructed texture block of the current frame dependent on the viewpoint, and the second reconstructed image block is the matched texture block in the reference image of the independent viewpoint, its depth Information or disparity information is similar to the currently reconstructed texture block.
  • the reconstructed texture block can be any one of CTU, slice, tile, and sub-image
  • the matching texture block can be any one of CTU, slice, tile, and sub-image.
  • the first reconstructed image block may be the aforementioned first image block corresponding to the first viewpoint
  • the second reconstructed image block may be an image block in the aforementioned reference image corresponding to the second viewpoint, such as the second image block.
  • the reconstructed image corresponding to the second reconstructed image block corresponds to the reference image of the second viewpoint.
  • the method for obtaining the second reconstructed image block may be: according to the attribute information of the first reconstructed image block, the second reconstructed image block is determined from the reference image block corresponding to the first reconstructed image block.
  • the attribute information may be auxiliary information, including but not limited to depth information or disparity information.
  • the attribute information may correspond to the aforementioned first auxiliary information or second auxiliary information.
  • the attribute information of the first reconstructed image block is auxiliary information
  • the attribute information of the first reconstructed image block includes at least one of the following: inter prediction information of the first reconstructed image block, the Depth information of the first reconstructed image block and disparity information of the first reconstructed image block.
  • the attribute information of the first reconstructed image block may correspond to the aforementioned first auxiliary information of the first image block corresponding to the first viewpoint.
  • the depth information or disparity information of the first reconstructed image block is determined from the corresponding depth image, and the depth information may include: any one of depth feature information, statistical information based on depth values, depth slices themselves, preprocessed depth slices, or Various combinations.
  • the attribute information may also include image segmentation information, quantization parameter information, etc., which are not limited here.
  • the attribute information of the second reconstructed image block may include inter prediction information of the second reconstructed image block, depth information of the second reconstructed image block, and disparity information of the second reconstructed image block.
  • Depth information or disparity information of the second reconstructed image block is determined from the corresponding depth image. That is, the attribute information can come from the first reconstructed image block or the second reconstructed image block.
  • the first reconstructed image block is filtered according to the attribute information of the second reconstructed image block and the first reconstructed image block to obtain the filtered first reconstructed image block, it can refer to the aforementioned second The process of processing the first image block corresponding to the first viewpoint and obtaining the processing result according to the first auxiliary information of the second image block and the first image block introduced in the embodiment, that is, the first image block and the second image block
  • the image blocks are sequentially corresponding to the first reconstructed image block and the second reconstructed image block, and the obtained processing result is here the filtered first reconstructed image block. I won't go into details here.
  • the second reconstructed image block and the attribute information of the second reconstructed image block or according to the attribute information of the second reconstructed image block and the attribute information of the first reconstructed image block, or according to the Attribute information (not listed here) performs filtering on the first reconstructed image block.
  • other reconstructed image blocks may also be obtained, and combined with more information, the first reconstructed image block is filtered, so as to improve the quality of the filtered first reconstructed image block. That is, the more detailed implementation steps of S1402 may also include:
  • the first reconstructed image block and the third reconstructed image block belong to the image blocks coded at different times of the same viewpoint, for example, the first reconstructed image block is the current reconstructed texture block depending on the current frame of the viewpoint, and the third reconstructed image block is corresponding to the viewpoint-dependent reference frame Texture blocks.
  • the image corresponding to (or located in) the third reconstructed image block and the image corresponding to the first reconstructed image block are images of the same viewpoint at different times, and here, the image corresponding to the third reconstructed image block is referred to as a reference reconstructed image .
  • the reference reconstructed image is an encoded reconstructed image, which belongs to the first viewpoint and can be read from the image buffer, and the third reconstructed image block can be predicted according to the interframe between the image where the first reconstructed image block is located and the reference reconstructed image
  • the information is obtained from the reference reconstructed image of the first viewpoint.
  • the first reconstructed image block and the second reconstructed image block are image blocks of different viewpoints at the same time
  • the first reconstructed image block and the third reconstructed image block are image blocks of the same viewpoint and different time points
  • the third reconstructed image block and the second reconstructed image block are image blocks of different viewpoints at different times.
  • the information that is beneficial to the filtering of the first image block is effectively used, thereby effectively improving the filtering quality of the first reconstructed image block, and further reducing the distortion of the reconstructed image corresponding to the first reconstructed image block.
  • the attribute information here may include inter-frame prediction information, depth information or disparity information, quantization parameter information, and the like. At different stages, corresponding attribute information may be used adaptively, so that the first reconstructed image block performs better filtering with reference to different attribute information.
  • step b may include:
  • the depth information or disparity information of the first reconstructed image block perform a third preset process on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map; according to the first reconstructed image block Reconstructing the inter-frame prediction information of the image block, performing a third preset process on the third reconstructed image block and the first target feature map to obtain a second target feature map; performing filtering processing on the second target feature map , to obtain the filtered first reconstructed image block.
  • the third preset processing is fusion processing
  • the first target feature map is a first fusion feature map
  • the second target feature map is a second fusion feature map.
  • the fusion processing of the first reconstructed image block and the second reconstructed image block specifically refers to performing a series of processing according to the feature map corresponding to the first reconstructed image block and the feature map of the second reconstructed image block, and the obtained feature map is called the first
  • the fused first fused feature map by referring to the depth information or the disparity information, the fused first fused feature map can describe the information of the first reconstructed image block more comprehensively.
  • the third reconstructed image block and the first fused feature map can be fused, specifically according to the feature map corresponding to the third reconstructed image block and the first fused feature map, and then obtained The second fused feature map.
  • a filtered second fused feature map is obtained, and the filtered second fused feature map is used to determine the filtered first reconstructed image block.
  • the above content can be realized by using a neural network-based loop filter, and the corresponding functional modules are integrated in the neural network-based loop filter, and each functional module is executed according to the above content to increase the first reconstruction image block filter quality.
  • FIG. 15 is a schematic structural diagram of a neural network-based loop filter provided in this embodiment.
  • the structural schematic diagram includes a fusion module 1 and a fusion module 2, and a neural network-based filtering processing module.
  • a neural network-based The filtering processing module may include filtering processing of one or more filters (such as DBF, SAO, ALF, DRNLF) in the loop filtering processing.
  • the fusion module 1 and the fusion module 2 may include the same functional units, for example, including the functional units of fine matching, feature extraction, warping processing and feature fusion in the aforementioned fusion module shown in FIG. 12 a , and
  • Fig. 12b shows several functional units including feature extraction, warping processing and feature fusion.
  • the fusion module 1 and the fusion module 2 can also be different, for example, the fusion module 1 includes several functional units of fine matching, feature extraction, distortion processing and feature fusion, and the fusion module 2 includes feature extraction, distortion processing and The functional units of feature fusion; as another example, fusion module 1 includes functional units of feature extraction, warping processing, and feature fusion, and fusion module 2 includes functional units of fine matching, feature extraction, warping processing, and feature fusion. It should be noted that the specific processing logic corresponding to the above functional units is the same as that described in the second embodiment, and the only difference lies in the corresponding input and output results.
  • the fusion module 1 is used to perform fusion processing on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information
  • the fusion module 2 is used to perform fusion processing on the third reconstructed image block and the first fusion feature map according to the inter-frame prediction information fusion processing.
  • fine matching can be performed according to the depth information or disparity information, and then the feature map corresponding to the reconstructed image block is extracted, and after fusion processing, the first fusion feature map is obtained.
  • the fusion module 2 receives the inter-frame prediction information, extracts the feature map corresponding to the third reconstructed texture block, processes the feature map and fuses it with the first fusion feature map to obtain the second fusion feature map, where different feature maps are fused
  • the second fused feature map can express the feature map of the first reconstructed image block more accurately.
  • the second fused and reconstructed feature map is subjected to filtering processing based on a neural network to obtain a filtered first reconstructed image block.
  • the first reconstructed image block is the current reconstructed texture block of the current view-dependent frame
  • the second reconstructed image is the reconstructed texture block corresponding to the independent viewpoint reference frame
  • the third reconstructed image block is the reconstructed texture block corresponding to the viewpoint-dependent reference frame.
  • the final filtered result is the currently reconstructed texture block after the filtering of the current frame depending on the viewpoint.
  • the step of performing the third preset processing on the first reconstructed image block and the second reconstructed image block is: determining the first reconstructed image according to the depth information or disparity information of the first reconstructed image block The first reconstructed feature map corresponding to the block and the second reconstructed feature map corresponding to the second reconstructed image block; the first preset processing is performed on the second reconstructed feature map according to the first reconstructed feature map to obtain the first Presetting the processed second reconstructed feature map; performing second preset processing according to the first preset processed second reconstructed feature map and the first reconstructed feature map to obtain the first target feature map.
  • the third preset processing is fusion processing, including first preset processing and second preset processing, the first preset processing is warping processing, and the second preset processing is feature fusion processing , the second reconstructed feature map after the first preset processing is a warped second reconstructed feature map, and the first target feature map is a first fused feature map.
  • the third preset processing also includes feature extraction processing.
  • the specific processing method of the third preset processing involved in the following in this embodiment can also be realized by using the above steps, for example, the third preset processing of the first reconstructed image block and the third reconstructed image block, which will not be elaborated here .
  • the fusion processing referred to by the third preset processing and the feature fusion processing referred to by the second preset processing are different processing logics.
  • the content of determining the feature maps corresponding to each of the reconstructed image blocks, distorting the second reconstructed feature map, and fusing the reconstructed feature maps is the same as determining the first feature map and the second feature map in the aforementioned second embodiment graph, the processing logic involved in warping the second feature map and fusing it with the first feature map is the same.
  • the feature extraction network shown in Figure 8a or Figure 8b can be used to extract the feature map of the reconstructed image block
  • the twisted processing module shown in Figure 9a or Figure 9b can be used to obtain the twisted second reconstruction feature map, as shown in Figure 11
  • the feature fusion network is used for fusion processing.
  • the corresponding processing principle please refer to the description in the second embodiment, and just substitute the processing objects and processing results into the relevant content in this embodiment.
  • the following content which is a brief introduction of the corresponding content:
  • feature extraction processing may be performed on the first reconstructed image block and the second reconstructed image block based on the feature extraction network and the depth information or disparity information of the first reconstructed image block to obtain the A first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block.
  • the first reconstructed image block is a slice
  • the second reconstructed image block corresponds to a slice
  • the first reconstructed feature map and the second reconstructed feature map can be determined after fine matching, namely: Acquire the first reconstructed image sub-block of the first reconstructed image block and the second reconstructed image sub-block of the second reconstructed image block; attribute information of the second reconstructed image sub-block and the first reconstructed image sub-block
  • the property information of the block is matched; based on the feature extraction network and the property information of the first reconstructed image sub-block, the feature extraction process is performed on the first reconstructed image sub-block and the second reconstructed image sub-block to obtain the first reconstructed image sub-block.
  • the matching of the attribute information corresponding to the first reconstructed image sub-block and the second reconstructed image sub-block means that the similarity of the attribute information of the two image sub-blocks is the largest, for example, the depth information or the disparity information has the largest similarity, and the different attribute information includes the following At least one: the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block, for example, the depth information of the first reconstructed image sub-block is depth feature information, and the second image sub-block
  • the depth information of the block is statistical information based on the depth value; the accuracy of the attribute information of the first reconstructed image sub-block is greater than the accuracy of the attribute information of the first reconstructed image block, for example, the depth information of the first reconstructed image block is n depth feature information, the attribute information of the first reconstructed image sub-block is m depth feature information, and m is greater than n.
  • the first reconstructed image sub-block and the second reconstructed image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree blocks are obtained after edge extension of the coding tree blocks Yes, the size of the extended coding tree block is larger than the coding tree block.
  • the input image block has adjacent pixels of the coding unit, which can effectively reduce the block effect of image block division in the filtering stage.
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series feature extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit.
  • N is an integer greater than or equal to 1
  • each feature extraction module in the first N-1 feature extraction modules includes a series feature extraction unit and a downsampling unit
  • the Nth feature extraction module includes a feature extraction unit.
  • the structure shown in Figure 8a or Figure 8b can be adopted, the difference is: the first feature extraction module in the N cascaded feature extraction modules is used to process the first The reconstructed image block and the second reconstructed image block, or the first reconstructed image sub-block and the second reconstructed image sub-block.
  • the feature map obtained by processing the corresponding reconstructed image block or sub-block of the reconstructed image by the feature extraction unit is called the reconstructed feature map, including the first reconstructed feature map and the second reconstructed feature map.
  • the attribute information can be used as supervisory information or reference information, including depth information or The disparity information is similar to the first auxiliary information or the second auxiliary information, and will not be described here.
  • the feature extraction network may also receive inter-frame prediction information.
  • the first preset processing on the second reconstruction feature map namely: determining a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, based on the first reconstruction feature map , the second reconstructed feature map, and the attribute information determine a first preset processing parameter; perform a first preset process on the second reconstructed feature map based on the first preset processing model, and obtain the first preset process
  • the second reconstructed characteristic map, the first preset treatment model includes the first treatment model determined according to the first preset treatment parameters.
  • the first default processing is warp processing
  • the first default processing model is a warp model
  • the first default processing parameter is a warp parameter
  • the first preset processing based on the first preset processing model is as follows: determine the sampling point coordinates of the second reconstructed feature map according to the first processing model and the second processing model, optionally, The target pixel coordinates included in the second processing model; according to the second reconstruction feature map and the sampling kernel function, determine the target pixel value corresponding to the sampling point coordinates; generate the first pixel value corresponding to the sampling point coordinates according to the target pixel value A second reconstructed feature map after preset processing. Refer to the aforementioned first preset processing of the second feature map, and substitute the corresponding content into the second reconstructed feature map here to obtain the corresponding result, which will not be repeated here.
  • the second preset process is performed on it and the first reconstructed feature map, including: using a feature fusion network to process the first reconstructed feature map and the first reconstructed feature map
  • the preset processed second reconstructed feature map is subjected to a second preset process to obtain a first fusion reconstructed feature map.
  • the second preset processing is feature fusion processing.
  • the feature fusion network here can adopt the feature fusion network included in FIG.
  • the first reconstructed feature map obtained by pyramid layering processing and the second reconstructed feature map after the first preset processing are fused, and finally the first fused and reconstructed feature map is output by the feature fusion module.
  • the specific processing flow will not be repeated here.
  • the implementation manner of performing filtering processing on the second target feature map may be: performing filtering processing on the second target feature map using a target filtering processing model to obtain a filtered second target feature map; according to the filtered first target feature map Two target feature maps generate filtered first reconstructed image blocks; optionally, the target filtering processing model includes a target candidate model selected from multiple candidate models according to the rate-distortion cost, each of the multiple candidate models There is a mapping relationship between candidate models and quantization parameters.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; Performing filtering processing on the target feature map to obtain a filtered second target feature map, including: performing downsampling processing on at least one second target feature map processed by the first processing unit to obtain a downsampled second target feature map Figure: performing upsampling processing on the downsampled second target feature map to obtain a target fusion and reconstruction feature map; using a second processing unit to process the target fusion and reconstruction feature map to obtain a filtered second target feature map.
  • step b may include:
  • the depth information or disparity information of the first reconstructed image block perform a third preset process on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map; Filtering the target feature map to obtain a filtered first target feature map; according to the inter-frame prediction information of the first reconstructed image block, the filtered first target feature map and the third reconstructed image block Performing third preset processing to obtain a second target feature map; performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block.
  • the third preset processing is fusion processing, including first preset processing and second preset processing, the first preset processing is warping processing, and the second preset processing is feature fusion processing , the first target feature map is a first fusion feature map, and the second target feature map is a second fusion feature map.
  • the third preset processing module also includes feature extraction processing.
  • the fusion processing of the first reconstructed image block and the second reconstructed image block has been introduced above, and will not be repeated here; the fusion processing of the filtered first fusion feature map and the third reconstructed image block is specifically based on the first The first fusion feature map and the feature map corresponding to the third reconstruction image block are processed to obtain the second fusion feature map in the same way as the fusion of the first reconstruction image block and the second reconstruction image block.
  • the corresponding content can also refer to the first
  • the fusion of the reconstructed image block and the second reconstructed image block will not be explained here; the filtering processing of the first fusion feature map and the second fusion feature map can be realized by using a neural network-based filtering processing module as shown in Figure 13 , and the corresponding processing methods have also been introduced above, and will not be repeated here.
  • FIG. 16 is a structural schematic diagram of another neural network-based loop filter provided by the embodiment of the present application, including two fusion modules and two neural network-based filter processing modules, wherein the fusion module 1 and the fusion module 2 may include the same functional units, or may include different functional units.
  • the fusion module 1 and the fusion module 2 may include the same functional units, or may include different functional units.
  • Both the neural network-based filtering processing module 1 and the neural network-based filtering processing module 2 can adopt the content shown in FIG. 13 .
  • the difference from Figure 15 is that the filtering process is performed according to the structure of the neural network-based loop filter shown in Figure 16, and the first fusion feature map obtained by the fusion module 1 is input after the neural network-based filtering process Process in the fusion module 2 to obtain the second fusion feature map, and then filter the second fusion feature map through a neural network-based filter processing module.
  • two fusion modules and neural network-based filter processing can refer to the frame information of the same different viewpoint during the first fusion processing, and effectively maintain the characteristic information of the original image through the filtering processing based on the neural network, and then undergo another fusion processing and filtering based on the neural network
  • the processing can further refer to the information of the reference frame of the same viewpoint, fully fuse various information related to the first reconstructed image block, and further reduce the distortion of the reconstructed image corresponding to the first reconstructed image block.
  • the number of fusion modules and filtering processing modules based on neural network included in the neural network-based loop filter shown in FIG. 15 to FIG. may also include other combinations, such as connecting a fusion module and a neural network-based filtering processing module in series on the basis of FIG. 16 , which is not limited here.
  • the more detailed implementation steps of the above step b may include: according to the depth information or disparity information of the first reconstructed image block, the first reconstructed image block and the second reconstructed image
  • the third preset processing is performed on the block to obtain the first target feature map; according to the inter-frame prediction information of the third reconstructed image block and the first reconstructed image block, the first reconstructed image block and the third reconstructed image block are
  • the reconstructed image block is subjected to a third preset process to obtain a second target feature map; and a filtered first reconstructed image block is determined according to the first target feature map and the second target feature map.
  • the third preset processing is fusion processing, including first preset processing and second preset processing, the first preset processing is warping processing, and the second preset processing is feature fusion processing , the first target feature map is a first fusion feature map, and the second target feature map is a second fusion feature map. It should be noted that the third preset processing may also include feature extraction processing.
  • fusion processing is performed on the first reconstructed image block and the second reconstructed image block, specifically, a series of processing is performed according to the feature map corresponding to the first reconstructed image block and the feature map corresponding to the second reconstructed image block.
  • fine matching can be performed through depth information or disparity information, and a reconstructed image sub-block matching the reconstructed image sub-block of the first reconstructed image block can be found from the second reconstructed image block, and then the depth information or disparity information to perform feature extraction on the respective reconstructed image sub-blocks to obtain corresponding feature maps.
  • fine matching may not be performed, but the features of the reconstructed image blocks are directly extracted to obtain corresponding feature maps; Then refer to the depth information or disparity information to distort the second reconstructed feature map corresponding to the second reconstructed image block, and then fuse with the first reconstructed feature map corresponding to the first reconstructed image block to obtain a first fused feature map.
  • the first reconstructed image block and the third reconstructed image block can be fused according to the inter-frame prediction information to obtain a second fused feature map.
  • the feature map of the image block or image sub-block can be extracted by adopting the pyramid layered processing method.
  • the detailed process please refer to the above-mentioned introduction, which will not be repeated here.
  • the step of determining the filtered first reconstructed image block according to the first target feature map and the second target feature map may include: filtering the first target feature map and the second target feature map processing to obtain a filtered first target feature map and a filtered second target feature map; performing a third preset process according to the filtered first target feature map and the filtered second target feature map, Obtaining a target fused and reconstructed image block; using the target fused and reconstructed image block as a filtered first reconstructed image block.
  • the third preset processing is fusion processing, including first preset processing and second preset processing, the first preset processing is warping processing, and the second preset processing is feature fusion processing , the first target feature map is a first fusion feature map, the second target feature map is a second fusion feature map, the filtered first target feature map is a filtered first fusion feature map, and The filtered second target feature map is the filtered second fusion feature map.
  • the third preset processing also includes feature extraction processing.
  • the filtering process here may include filtering the first fused feature map and the second fused feature map using different neural network models respectively, the first fused feature map and the second fused feature map are both filtered and then fused. Yes: Distort the filtered second fusion feature map and then fuse with the filtered first fusion feature map, or directly perform fusion processing on the filtered fusion feature map, and obtain the target according to the fusion feature map The reconstructed image block is fused, that is, the filtered first reconstructed image block.
  • Fig. 17 is a structural schematic diagram of another neural network-based loop filter provided by the embodiment of the application, including fusion module 1, fusion module 2 and fusion module 3, and neural network-based filtering processing module 1 and based on Neural network filter processing module 2.
  • the internal structure of each fusion module can be the same or different.
  • fusion module 3 includes three functional units: feature extraction, distortion processing, and feature fusion, while fusion module 1 and fusion module 2 include fine matching, feature extraction, distortion processing, Features fuse these four functional units.
  • the fusion module 1 is used to process the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information to obtain the first fusion feature map, the first reconstructed image block can be the current reconstructed texture block depending on the current frame of the viewpoint, and the second The reconstructed image block may be a matching texture block corresponding to the independent viewpoint reference frame.
  • the fusion module 2 is used to process the first reconstructed image block and the third reconstructed image block according to the inter-frame prediction information to obtain a second fusion feature map.
  • the third reconstructed image block may be a texture block corresponding to a viewpoint-dependent reference frame.
  • the fusion module 3 can perform fusion processing according to the first fusion feature map and the second fusion feature map, for example, perform fusion processing on the first feature map and the distorted second fusion feature map, and obtain the filtered first feature map according to the fusion feature map A reconstructed image block, for example, a current reconstructed texture block filtered by the current frame depending on the viewpoint.
  • the first reconstructed image block is processed with reference to different attribute information, feature information of the second reconstructed image block, and feature information of the third reconstructed image block through the parallel fusion module and the filter processing module based on the neural network, Obtain the first reconstructed image block after filtering, so that auxiliary information useful for filtering can be obtained from images of different viewpoints and encoded images of the same viewpoint, thereby improving the quality of the reconstructed image where the first reconstructed image block is located, and reducing video distortion .
  • the coded reconstructed image blocks (including reconstructed image blocks at different times of the same viewpoint and reconstructed image blocks at different viewpoints at the same time) can be combined with information about the first reconstructed image
  • the attribute information of the block (including depth information or disparity information, inter-frame prediction information), performs filtering processing on the reconstructed image block currently being encoded, which may include feature extraction processing, distortion processing, feature fusion processing, etc.
  • filtering processing may include feature extraction processing, distortion processing, feature fusion processing, etc.
  • Different numbers of fusion modules and neural network-based filtering processing modules can be combined, so that through the combination of multiple fusion and filtering processing, the useful information in other reconstructed image blocks can be fully referred to, and other relevant feature information can be integrated into the final image.
  • the quality of the filtered first reconstructed image block is further effectively improved, and the distortion of the reconstructed image is reduced.
  • FIG. 18 is a schematic structural diagram of an image processing device according to a fourth embodiment.
  • the image processing device may be a computer program (including program code) running on a server.
  • the image processing device is a Application software; the device can be used to execute the corresponding steps in the method provided by the embodiment of the present application.
  • the image processing apparatus 1800 includes: an acquisition module 1801 and a processing module 1802 .
  • Obtaining module 1801 configured to obtain first auxiliary information
  • the obtaining module 1801 is further configured to further obtain a first image block corresponding to the first viewpoint, and/or obtain a reference image corresponding to the second viewpoint.
  • the processing module 1802 is configured to process the first image block corresponding to the first viewpoint according to the reference image corresponding to the second viewpoint and the first auxiliary information. Determining or generating a processing result can be used to obtain a reconstructed image or a decoded image corresponding to the first image block; the reference image is an image corresponding to a second viewpoint; and the second viewpoint is different from the first viewpoint .
  • the processing module 1802 is specifically configured to: determine or generate a processing result corresponding to the first image block according to the second image block of the reference image and the first auxiliary information.
  • the processing module 1802 is specifically configured to: determine the second image block from the reference image according to the first auxiliary information; determine the first feature map corresponding to the first image block and the first feature map A second feature map corresponding to the two image blocks; determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map.
  • the processing module 1802 is specifically configured to: perform first preset processing on the second feature map according to the first feature map to obtain a target second feature map; performing a second preset process on the target second feature map to obtain a target feature map; determining or generating a processing result corresponding to the first image block according to the target feature map.
  • the processing module 1802 is specifically configured to: acquire first auxiliary information of the first image block, the first auxiliary information includes depth information, and the depth information is based on the Determine the depth image; obtain the similarity between the first auxiliary information of each image block in the reference image and the first auxiliary information of the first image block; determine the image block with the largest similarity in the reference image as A second image block matching the first image block.
  • the first auxiliary information includes depth information or disparity information; the depth information is at least one of the following: depth feature information, statistical information based on depth values, depth slices, and preprocessed depth slices.
  • the size of the second image block is the same as that of the first image block; when the first image block is a slice or a coding tree block, the second image block Corresponding to a slice or a coding tree block; when the second image block is a slice, the second image block is composed of multiple coding tree units.
  • the processing module 1802 is specifically configured to: perform feature extraction processing on the first image block and the second image block based on the feature extraction network and the first auxiliary information, to obtain the first image A first feature map corresponding to the block and a second feature map corresponding to the second image block.
  • the first image block is a slice, and the second image block corresponds to a slice;
  • the processing module 1802 is specifically configured to: acquire the first image sub-block and the second image sub-block of the first image block The second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block; based on the feature extraction network and the second auxiliary information Perform feature extraction processing on the first image sub-block and the second image sub-block to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block; through the The first sub-feature map, determine or generate the first feature map corresponding to the first image block, and determine or generate the second feature map corresponding to the second image block through the second sub-feature map;
  • the second auxiliary information is different from the first auxiliary information.
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series of features An extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit; the first feature extraction module in the N cascaded feature extraction modules is used to process the first image block and the second image block An image block, or the first image sub-block and the second image sub-block; each feature extraction module in the N cascaded feature extraction modules except the first feature extraction module is used for Process the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected to the output of the feature extraction unit, and the output of the down-sampling unit is connected to the feature in the next feature extraction module The input of the extraction unit is connected; optionally, the first auxiliary information or the second auxiliary information is used as supervisory information of at least one of the N cascaded feature extraction modules.
  • the first image sub-block and the second image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree blocks are obtained after edge extension of the coding tree blocks Yes, the size of the extended coding tree block is larger than the size of the coding tree block.
  • the processing module 1802 is specifically configured to: determine a first preset processing parameter based on the first feature map and the second feature map; or, based on the first feature map, the second The feature map and the first auxiliary information determine a first preset processing parameter; perform first preset processing on the second feature map based on the first preset processing model to obtain a target second feature map; the first preset It is assumed that the processing model includes a first processing model determined according to the first preset processing parameters.
  • the first preset processing model includes the first processing model and the second processing model
  • the processing module 1802 is specifically configured to: determine according to the first processing model and the second processing model The sampling point coordinates in the second feature map, optionally, the second processing model includes target pixel coordinates; according to the second feature map and the sampling kernel function, determine the target pixel value corresponding to the sampling point coordinates ; Generate a target second feature map according to the target pixel value corresponding to the sampling point coordinates.
  • the first preset processing is warping processing
  • the second preset processing is feature fusion processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset Let the processing model be the warp model, and the target feature map be the fusion feature map.
  • the second feature map output by the feature extraction modules is subjected to the first preset processing, the
  • the processing module 1802 is specifically configured to: filter the target feature map to obtain a filtered target feature map; determine the processing results.
  • the processing module 1802 is specifically configured to: use the target filtering model to filter the target feature map to obtain the filtered target feature map; optionally, the target filtering model includes The distortion cost is a target candidate model selected from a plurality of candidate models, where each candidate model in the plurality of candidate models has a mapping relationship with a quantization parameter.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; the processing module 1802 is specifically configured to: at least A target feature map processed by the first processing unit is down-sampled to obtain a down-sampled target feature map; the down-sampled target feature map is up-sampled to obtain a target fusion feature map; The second processing unit processes the target fusion feature map to obtain a filtered target feature map.
  • the above-mentioned image processing device can also be used to implement the steps of the following methods:
  • the obtaining module 1801 is also used to obtain the first reconstructed image block and the second reconstructed image block;
  • the processing module 1802 is further configured to filter the first reconstructed image block according to the second reconstructed image block and the first reconstructed image block and/or attribute information of the second reconstructed image block, to obtain A filtered first reconstructed image block; optionally, the first reconstructed image block and the second reconstructed image block correspond to different or identical reconstructed images.
  • the attribute information of the first reconstructed image block includes at least one of the following: inter prediction information of the first reconstructed image block, depth information of the first reconstructed image block, and - Disparity information of the reconstructed image block.
  • the processing module 1802 is specifically configured to: acquire a third reconstructed image block, the image corresponding to the third reconstructed image block is a reference reconstructed image of the image corresponding to the first reconstructed image block; according to the The attribute information of the third reconstructed image block, the second reconstructed image block, and the first reconstructed image block is filtered on the first reconstructed image block to obtain a filtered first reconstructed image block.
  • the processing module 1802 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; according to the inter-frame prediction information of the first reconstructed image block, perform a third preset process on the third reconstructed image block and the first target feature map to obtain a second target A feature map: performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block.
  • the processing module 1802 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; filtering the first target feature map to obtain a filtered first target feature map; according to the inter-frame prediction information of the first reconstructed image block, the filtered Perform a third preset process on the first target feature map and the third reconstructed image block to obtain a second target feature map; filter the second target feature map to obtain a filtered first reconstructed image block .
  • the processing module 1802 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; according to the inter-frame prediction information of the third reconstructed image block and the first reconstructed image block, perform a third reconstruction on the first reconstructed image block and the third reconstructed image block Preset processing to obtain a second target feature map; determine a filtered first reconstructed image block according to the first target feature map and the second target feature map.
  • the processing module 1802 is specifically configured to: filter the first target feature map and the second target feature map to obtain the filtered first target feature map and the filtered second target feature map Feature map; perform third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block; use the target fusion reconstruction image block as the filtered The first reconstructed image block of .
  • the processing module 1802 is specifically configured to: determine the first reconstruction feature map and the second reconstruction feature map corresponding to the first reconstruction image block according to the depth information or disparity information of the first reconstruction image block A second reconstruction feature map corresponding to the image block; performing a first preset process on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset process; according to the first The preset processed second reconstructed feature map and the first reconstructed feature map are subjected to a second preset process to obtain a first target feature map.
  • the processing module 1802 is specifically configured to: based on the feature extraction network and the depth information or disparity information of the first reconstructed image block, perform the processing on the first reconstructed image block and the second reconstructed image block
  • the feature extraction process obtains a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block.
  • the first reconstructed image block is a slice, and the second reconstructed image block corresponds to a slice; the processing module 1802 is specifically configured to: acquire a first reconstructed image sub-block of the first reconstructed image block and the second reconstructed image sub-block of the second reconstructed image block; the attribute information of the second reconstructed image sub-block matches the attribute information of the first reconstructed image sub-block; based on the feature extraction network and the first The attribute information of the reconstructed image sub-block, performing feature extraction processing on the first reconstructed image sub-block and the second reconstructed image sub-block, to obtain the first reconstructed feature map corresponding to the first reconstructed image block and the second reconstructed image sub-block A second reconstructed feature map corresponding to two reconstructed image blocks; optionally, the attribute information includes depth information or disparity information, and the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block .
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series of features An extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit; the first feature extraction module in the N cascaded feature extraction modules is used to process the first reconstructed image block and the first Two reconstructed image blocks, or the first reconstructed image sub-block and the second reconstructed image sub-block; each feature extraction in the N cascaded feature extraction modules except the first feature extraction module module for processing the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected to the output of the feature extraction unit, and the output of the down-sampling unit is connected to the next feature
  • the input connection of the feature extraction unit in the extraction module optionally, the attribute information is used as the supervision information of at least one feature extraction module in the N cascaded feature extraction modules.
  • the first reconstructed image sub-block and the second reconstructed image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree blocks are edge extensions of the coding tree blocks Afterwards, the size of the extended coding tree block is larger than the coding tree block.
  • the processing module 1802 is specifically configured to: determine a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, based on the first reconstruction feature map, the The second reconstruction feature map and the attribute information determine a first preset processing parameter; perform a first preset process on the second reconstruction feature map based on the first preset processing model, and obtain the first preset process after the first preset process Reconstructing the feature map, the first preset processing model includes a first processing model determined according to the first preset processing parameters.
  • the processing module 1802 is specifically configured to: determine according to the first processing model and the second processing model, obtain the sampling point coordinates of the second reconstructed feature map, optionally, the first 2. Process the target pixel coordinates included in the model; determine the target pixel value corresponding to the sampling point coordinates according to the second reconstruction feature map and the sampling kernel function; generate a first preset according to the target pixel value corresponding to the sampling point coordinates The processed second reconstructed feature map.
  • the first preset processing is warping processing
  • the second preset processing is feature fusion processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset Let the processing model be the warp model, and the target feature map be the fusion feature map.
  • the processing module 1802 is specifically configured to: perform filtering processing on the second target feature map using the target filtering processing model to obtain a filtered second target feature map; according to the filtered second target feature map The feature map generates a filtered first reconstructed image block; optionally, the target filtering processing model includes a target candidate model selected from multiple candidate models according to the rate-distortion cost, and each candidate model in the multiple candidate models There is a mapping relationship with quantization parameters.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; the processing module 1802 is specifically configured to: at least A second target feature map processed by the first processing unit is down-sampled to obtain a down-sampled second target feature map; the down-sampled second target feature map is up-sampled to obtain a target Fused and reconstructed feature maps: using a second processing unit to process the target fused and reconstructed feature maps to obtain a filtered second target feature map.
  • the embodiment of the present application also provides an image processing method, the method including the following steps:
  • the S20 step includes at least one of the following:
  • Filtering is performed on the first reconstructed image block according to the second reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the first reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the second reconstructed image block to obtain a filtered first reconstructed image block.
  • Filtering is performed on the first reconstructed image block according to the attribute information of the second reconstructed image block and the first reconstructed image block to obtain a filtered first reconstructed image block.
  • the step S10 further includes: acquiring the first reconstructed image block.
  • the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.
  • step S20 includes the following steps:
  • S201 Acquire a third reconstructed image block, where an image corresponding to the third reconstructed image block is a reference reconstructed image of an image corresponding to the first reconstructed image block;
  • S202 Filter the first reconstructed image block according to at least one of the attribute information of the third reconstructed image block, the second reconstructed image block, and the first reconstructed image block, so as to obtain a filtered first reconstructed image block - Reconstruct image blocks.
  • the step S202 includes at least one of the following:
  • the depth information or disparity information of the first reconstructed image block perform a third preset process on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map, according to the first reconstructed image block Reconstructing the inter-frame prediction information of the image block, performing a third preset process on the third reconstructed image block and the first target feature map to obtain a second target feature map, and performing filtering processing on the second target feature map , to obtain the filtered first reconstructed image block;
  • the depth information or disparity information of the first reconstructed image block perform a third preset process on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map, and perform a third preset process on the first reconstructed image block.
  • Perform filtering processing on the target feature map to obtain a filtered first target feature map, and perform filtering on the filtered first target feature map and the third reconstructed image block according to the inter-frame prediction information of the first reconstructed image block performing a third preset process to obtain a second target feature map, and performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block;
  • the depth information or disparity information of the first reconstructed image block perform a third preset process on the first reconstructed image block and the second reconstructed image block to obtain a first target feature map, according to the third Reconstructing the image block and the inter-frame prediction information of the first reconstructed image block, performing a third preset process on the first reconstructed image block and the third reconstructed image block to obtain a second target feature map, according to the The first object feature map and the second object feature map determine a filtered first reconstructed image block.
  • the determining the filtered first reconstructed image block according to the first target feature map and the second target feature map includes:
  • the target fused and reconstructed image block is used as the filtered first reconstructed image block.
  • performing a third preset process on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block to obtain a first target feature map include:
  • determining the first reconstructed feature map corresponding to the first reconstructed image block and the second reconstructed feature map corresponding to the second reconstructed image block Figures including:
  • the first reconstructed image block is a slice, and the second reconstructed image block corresponds to a slice; the first reconstructed image block is determined according to the depth information or disparity information of the first reconstructed image block
  • the corresponding first reconstruction feature map and the second reconstruction feature map corresponding to the second reconstruction image block include:
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series feature extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit.
  • performing a first preset process on the second reconstructed feature map according to the first reconstructed feature map to obtain a second reconstructed feature map after the first preset process includes:
  • the first preset processing is performed on the second reconstruction feature map based on the first preset processing model to obtain the second reconstruction feature map after the first preset processing, and the first preset processing model includes the first preset processing model according to the first The first processing model determined by the preset processing parameters.
  • the first preset processing model includes the first processing model and a second processing model
  • the first preset processing is performed on the second reconstructed feature map based on the first preset processing model to obtain
  • the second reconstructed feature map after the first preset processing includes:
  • a second reconstructed feature map after the first preset processing is generated according to the target pixel value corresponding to the coordinate of the sampling point.
  • the filtering the second target feature map to obtain the filtered first reconstructed image block includes:
  • a filtered first reconstructed image block is generated according to the filtered second target feature map.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit;
  • the target feature map is filtered to obtain the filtered second target feature map, including:
  • a second processing unit is used to process the target fusion and reconstruction feature map to obtain a filtered second target feature map.
  • An embodiment of the present application further provides a smart terminal, the smart terminal includes a memory and a processor, and an image processing program is stored in the memory, and when the image processing program is executed by the processor, the steps of the image processing method in any of the foregoing embodiments are implemented.
  • the smart terminal may be the mobile terminal 100 shown in FIG. 1 .
  • the mobile terminal described in the embodiments of the present application may execute the description of the method in any of the foregoing embodiments, and may also execute the description of the image processing apparatus in the foregoing corresponding embodiments, which will not be repeated here.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the processor 110 of the mobile terminal 100 shown in FIG. 1 can be used to call the image processing program stored in the memory 109 to perform the following operations:
  • the first image block corresponding to the first viewpoint is processed according to the reference image corresponding to the second viewpoint and the first auxiliary information.
  • the determined or generated processing result may be used to obtain a reconstructed image or a decoded image corresponding to the first image block; and the second viewpoint is different from the first viewpoint.
  • the processor 110 is specifically configured to: further acquire a first image block corresponding to the first viewpoint, and/or a reference image corresponding to the second viewpoint.
  • the processor 110 is specifically configured to: determine or generate a processing result corresponding to the first image block according to the second image block of the reference image and the first auxiliary information.
  • the processor 110 is specifically configured to: determine the second image block from the reference image according to the first auxiliary information; determine the first feature map corresponding to the first image block and the second image block. A second feature map corresponding to the two image blocks; determining or generating a processing result corresponding to the first image block according to the first feature map and the second feature map.
  • the processor 110 is specifically configured to: perform first preset processing on the second feature map according to the first feature map to obtain a target second feature map; performing a second preset process on the target second feature map to obtain a target feature map; determining or generating the processing result corresponding to the first image block according to the target feature map.
  • the processor 110 is specifically configured to: acquire first auxiliary information of the first image block, the first auxiliary information includes depth information, and the depth information is based on the Determine the depth image; obtain the similarity between the first auxiliary information of each image block in the reference image and the first auxiliary information of the first image block; determine the image block with the largest similarity in the reference image as A second image block matching the first image block.
  • the first auxiliary information includes depth information or disparity information; the depth information is at least one of the following: depth feature information, statistical information based on depth values, depth slices, and preprocessed depth slices.
  • the size of the second image block is the same as that of the first image block; when the first image block is a slice or a coding tree block, the second image block Corresponding to a slice or a coding tree block; when the second image block is a slice, the second image block is composed of multiple coding tree units.
  • the processor 110 is specifically configured to: perform feature extraction processing on the first image block and the second image block based on the feature extraction network and the first auxiliary information to obtain the first image A first feature map corresponding to the block and a second feature map corresponding to the second image block.
  • the first image block is a slice, and the second image block corresponds to a slice;
  • the processor 110 is specifically configured to: acquire the first image sub-block and the second image sub-block of the first image block The second image sub-block of the second image block; the second auxiliary information of the second image sub-block matches the second auxiliary information of the first image sub-block; based on the feature extraction network and the second auxiliary information Perform feature extraction processing on the first image sub-block and the second image sub-block to obtain a first sub-feature map of the first image sub-block and a second sub-feature map of the second image sub-block; through the The first sub-feature map, determine or generate the first feature map corresponding to the first image block, and determine or generate the second feature map corresponding to the second image block through the second sub-feature map;
  • the second auxiliary information is different from the first auxiliary information.
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series of features An extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit; the first feature extraction module in the N cascaded feature extraction modules is used to process the first image block and the second image block An image block, or the first image sub-block and the second image sub-block; each feature extraction module in the N cascaded feature extraction modules except the first feature extraction module is used for Process the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected to the output of the feature extraction unit, and the output of the down-sampling unit is connected to the feature in the next feature extraction module The input of the extraction unit is connected; optionally, the first auxiliary information or the second auxiliary information is used as supervisory information of at least one of the N cascaded feature extraction modules.
  • the first image sub-block and the second image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree blocks are obtained after edge extension of the coding tree blocks Yes, the size of the extended coding tree block is larger than the size of the coding tree block.
  • the processor 110 is specifically configured to: determine a first preset processing parameter based on the first characteristic map and the second characteristic map; or, based on the first characteristic map, the second The feature map and the first auxiliary information determine a first preset processing parameter; perform first preset processing on the second feature map based on the first preset processing model to obtain a target second feature map; the first preset It is assumed that the processing model includes a first processing model determined according to the first preset processing parameters.
  • the first preset processing model includes the first processing model and the second processing model
  • the processor 110 is specifically configured to: determine according to the first processing model and the second processing model The sampling point coordinates in the second feature map, optionally, the second processing model includes target pixel coordinates; according to the second feature map and the sampling kernel function, determine the target pixel value corresponding to the sampling point coordinates ; Generate a target second feature map according to the target pixel value corresponding to the sampling point coordinates.
  • the first preset processing is warping processing
  • the second preset processing is feature fusion processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset Let the processing model be the warp model, and the target feature map be the fusion feature map.
  • the second feature map output by the feature extraction modules is subjected to the first preset processing, i
  • the processor 110 is specifically configured to: perform filtering processing on the target feature map to obtain a filtered target feature map; determine the The result of the processing.
  • the processor 110 is specifically configured to: use a target filtering model to filter the target feature map to obtain a filtered target feature map; optionally, the target filtering model includes The distortion cost is a target candidate model selected from a plurality of candidate models, where each candidate model in the plurality of candidate models has a mapping relationship with a quantization parameter.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; the processor 110 is specifically configured to: A target feature map processed by the first processing unit is down-sampled to obtain a down-sampled target feature map; the down-sampled target feature map is up-sampled to obtain a target fusion feature map; The second processing unit processes the target fusion feature map to obtain a filtered target feature map.
  • the processor 110 of the mobile terminal 100 shown in FIG. reconstructing an image block; filtering the first reconstructed image block according to the second reconstructed image block and the first reconstructed image block and/or attribute information of the second reconstructed image block to obtain a filtered A first reconstructed image block; optionally, the first reconstructed image block and the second reconstructed image block correspond to the same or different reconstructed images.
  • the attribute information of the first reconstructed image block includes at least one of the following: inter prediction information of the first reconstructed image block, depth information of the first reconstructed image block, and - Disparity information of the reconstructed image block.
  • the processor 110 is specifically configured to: acquire a third reconstructed image block, the image corresponding to the third reconstructed image block is a reference reconstructed image of the image corresponding to the first reconstructed image block; according to the The attribute information of the third reconstructed image block, the second reconstructed image block, and the first reconstructed image block is filtered on the first reconstructed image block to obtain a filtered first reconstructed image block.
  • the processor 110 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; according to the inter-frame prediction information of the first reconstructed image block, perform a third preset process on the third reconstructed image block and the first target feature map to obtain a second target A feature map: performing filtering processing on the second target feature map to obtain a filtered first reconstructed image block.
  • the processor 110 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; filtering the first target feature map to obtain a filtered first target feature map; according to the inter-frame prediction information of the first reconstructed image block, the filtered Perform a third preset process on the first target feature map and the third reconstructed image block to obtain a second target feature map; filter the second target feature map to obtain a filtered first reconstructed image block .
  • the processor 110 is specifically configured to: perform a third preset on the first reconstructed image block and the second reconstructed image block according to the depth information or disparity information of the first reconstructed image block processing to obtain a first target feature map; according to the inter-frame prediction information of the third reconstructed image block and the first reconstructed image block, perform a third reconstruction on the first reconstructed image block and the third reconstructed image block Preset processing to obtain a second target feature map; determine a filtered first reconstructed image block according to the first target feature map and the second target feature map.
  • the processor 110 is specifically configured to: perform filtering processing on the first target feature map and the second target feature map to obtain the filtered first target feature map and the filtered second target feature map. Feature map; perform third preset processing according to the filtered first target feature map and the filtered second target feature map to obtain a target fusion reconstruction image block; use the target fusion reconstruction image block as the filtered The first reconstructed image block of .
  • the processor 110 is specifically configured to: determine the first reconstruction feature map corresponding to the first reconstruction image block and the second reconstruction feature map according to the depth information or disparity information of the first reconstruction image block.
  • a second reconstruction feature map corresponding to the image block performing a first preset process on the second reconstruction feature map according to the first reconstruction feature map to obtain a second reconstruction feature map after the first preset process; according to the first The preset processed second reconstructed feature map and the first reconstructed feature map are subjected to a second preset process to obtain a first target feature map.
  • the processor 110 is specifically configured to: based on the feature extraction network and the depth information or disparity information of the first reconstructed image block, perform The feature extraction process obtains a first reconstructed feature map corresponding to the first reconstructed image block and a second reconstructed feature map corresponding to the second reconstructed image block.
  • the first reconstructed image block is a slice, and the second reconstructed image block corresponds to a slice; the processor 110 is specifically configured to: acquire a first reconstructed image sub-block of the first reconstructed image block and the second reconstructed image sub-block of the second reconstructed image block; the attribute information of the second reconstructed image sub-block matches the attribute information of the first reconstructed image sub-block; based on the feature extraction network and the first The attribute information of the reconstructed image sub-block, performing feature extraction processing on the first reconstructed image sub-block and the second reconstructed image sub-block, to obtain the first reconstructed feature map corresponding to the first reconstructed image block and the second reconstructed image sub-block A second reconstructed feature map corresponding to two reconstructed image blocks; optionally, the attribute information includes depth information or disparity information, and the attribute information of the first reconstructed image block is different from the attribute information of the first reconstructed image sub-block .
  • the feature extraction network includes N cascaded feature extraction modules, where N is an integer greater than or equal to 1, and each feature extraction module in the first N-1 feature extraction modules includes a series of features An extraction unit and a downsampling unit, the Nth feature extraction module includes a feature extraction unit; the first feature extraction module in the N cascaded feature extraction modules is used to process the first reconstructed image block and the first Two reconstructed image blocks, or the first reconstructed image sub-block and the second reconstructed image sub-block; each feature extraction in the N cascaded feature extraction modules except the first feature extraction module module for processing the output of the previous feature extraction module; for each feature extraction module, the input of the down-sampling unit is connected to the output of the feature extraction unit, and the output of the down-sampling unit is connected to the next feature
  • the input connection of the feature extraction unit in the extraction module optionally, the attribute information is used as the supervision information of at least one feature extraction module in the N cascaded feature extraction modules.
  • the first reconstructed image sub-block and the second reconstructed image sub-block are coding tree blocks or extended coding tree blocks; the extended coding tree blocks are edge extensions of the coding tree blocks Afterwards, the size of the extended coding tree block is larger than the coding tree block.
  • the processor 110 is specifically configured to: determine a first preset processing parameter based on the first reconstruction feature map and the second reconstruction feature map; or, based on the first reconstruction feature map, the The second reconstruction feature map and the attribute information determine a first preset processing parameter; perform a first preset process on the second reconstruction feature map based on the first preset processing model, and obtain the first preset process after the first preset process Reconstructing the feature map, the first preset processing model includes a first processing model determined according to the first preset processing parameters.
  • the processor 110 is specifically configured to: determine according to the first processing model and the second processing model, and obtain the coordinates of the sampling points of the second reconstructed feature map, optionally, the first 2. Process the target pixel coordinates included in the model; determine the target pixel value corresponding to the sampling point coordinates according to the second reconstruction feature map and the sampling kernel function; generate a first preset according to the target pixel value corresponding to the sampling point coordinates The processed second reconstructed feature map.
  • the first preset processing is warping processing
  • the second preset processing is feature fusion processing
  • the target second feature map is a warped second feature map
  • the first preset processing parameters are warping parameters
  • the first preset Let the processing model be the warp model, and the target feature map be the fusion feature map.
  • the processor 110 is specifically configured to: use a feature fusion network to perform a second preset process on the first reconstructed feature map and the second reconstructed feature map after the first preset process, to obtain the first A target feature map;
  • the output of the jth upsampling module is connected to the input of the jth feature fusion module;
  • the ith first preset processing module uses Performing the first preset processing on the second feature map output by the i-th
  • the processor 110 is specifically configured to: perform filtering processing on the second target feature map using a target filtering processing model to obtain a filtered second target feature map; according to the filtered second target feature map The feature map generates a filtered first reconstructed image block; optionally, the target filtering processing model includes a target candidate model selected from multiple candidate models according to the rate-distortion cost, and each candidate model in the multiple candidate models There is a mapping relationship with quantization parameters.
  • the target filtering processing model includes at least one processing unit, and the processing unit includes one or both of a first processing unit and a second processing unit; the processor 110 is specifically configured to: A second target feature map processed by the first processing unit is down-sampled to obtain a down-sampled second target feature map; the down-sampled second target feature map is up-sampled to obtain a target Fused and reconstructed feature maps: using a second processing unit to process the target fused and reconstructed feature maps to obtain a filtered second target feature map.
  • the mobile terminal described in the embodiments of the present application may execute the description of the method in any of the foregoing embodiments, and may also execute the description of the image processing apparatus in the foregoing corresponding embodiments, which will not be repeated here. In addition, the description of the beneficial effect of using the same method will not be repeated.
  • An embodiment of the present application further provides a computer-readable storage medium, on which an image processing program is stored, and when the image processing program is executed by a processor, the steps of the image processing method in any of the foregoing embodiments are implemented.
  • the embodiments of the smart terminal and the computer-readable storage medium provided in this application may contain all the technical features of any of the above-mentioned image processing method embodiments. No more details.
  • An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.
  • the embodiment of the present application also provides a chip, including a memory and a processor.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.
  • Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • a computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
  • Usable media may be magnetic media, (eg, floppy disk, memory disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), among others.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提出了一种图像处理方法、智能终端及存储介质,图像处理方法包括以下步骤:获取第一辅助信息;根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。通过本方案,可以充分利用不同视点的图像块的信息,减少重建图像或解码图像的失真,进而有效提高多视点视频的编码质量。

Description

图像处理方法、智能终端及存储介质
本申请要求于2022年1月12日提交中国专利局、申请号为202210029380.6、发明名称为“图像处理方法、智能终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及计算机技术领域,具体涉及一种图像处理方法、智能终端及存储介质。
背景技术
不同于单一视点的视频,多视点视频通过多个摄像头从不同视角拍摄同一场景,能够向观众提供丰富的动态场景和真实的感官体验。随着视频压缩技术的发展,面向多视点视频的视频编码技术的研究也在逐步深入。目前,在视频编码标准HEVC(High Efficiency Video Coding,高效率视频编码)的基础之上提出的3D-HEVC编码技术,可以高效压缩多视点视频和其对应的深度数据。
然而,在构思及实现本申请过程中,发明人发现至少存在如下问题:在多视点视频编码技术中,环路滤波处理(例如基于神经网络的环路滤波处理)阶段为了降低重建帧的失真度,通常会利用同一时刻不同视点的参考帧对重建帧进行增强处理,产生的增强帧用于后续的编码流程中,但是由于滤波过程中没有充分利用相关信息,导致重建帧的图像块和参考帧的图像块之间不能很好地匹配,影响多视点视频编码质量。
前面的叙述在于提供一般的背景信息,并不一定构成现有技术。
技术解决方案
本申请的主要目的在于提供一种图像处理方法、智能终端及存储介质,可以充分利用不同视点的图像块的信息,减少重建图像或解码图像的失真,进而有效提高多视点视频的编码质量。
本申请提供一种图像处理方法,包括:
获取第一辅助信息;
根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。
本申请提供另一种图像处理方法,包括:
获取第二重建图像块;
根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
可选地,所述根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块,包括以下至少一种:
根据所述第二重建图像块,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据第一重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块和第一重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据第一重建图像块的属性信息和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
本申请提供一种图像处理装置,包括:
获取模块,用于获取第一辅助信息;
处理模块,用于根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。
本申请提供另一种图像处理装置,包括:
获取模块,用于获取第二重建图像块;
处理模块,用于根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
本申请还提供一种智能终端,包括:存储器、处理器,其中,所述存储器上存储有图像处理程序,所述图像处理程序被所述处理器执行时实现如上述方法的步骤。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述方法的步骤。
如上所述,本申请的图像处理方法,包括步骤:获取第一辅助信息;根据对应于第二视点的参考图像和第一辅助信息,对对应于第一视点的第一图像块进行处理。通过上述技术方案,可以利用辅助信息和不同于当前正在编码的视点的图像块对当前正在编码的视点的图像块进行处理。其得到的处理结果有助于确定当前正在编码的视点的图像块的重建图像或解码图像,降低视频编码失真,提升视频编码质量,进而提升用户体验。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为实现本申请各个实施例的一种智能终端的硬件结构示意图;
图2为本申请实施例提供的一种通信网络系统架构图;
图3是本申请实施例提供的一种多视点视频编码器的结构示意图;
图4是本申请实施例提供的一种多视点视频解码器的结构示意图;
图5是根据第一实施例示出的一种图像处理方法的流程示意图;
图6是根据第一实施例示出的基于神经网络的环路滤波器的结构示意图;
图7是根据第二实施例示出的一种图像处理方法的流程示意图;
图8a是根据第二实施例示出的一种特征提取网络的结构示意图;
图8b是根据第二实施例示出的另一种特征提取网络的结构示意图
图9a是根据第二实施例示出的一种第一预设处理模块的结构示意图;
图9b是根据第二实施例示出的另一种第一预设处理模块的结构示意图;
图10是根据第二实施例示出的一种结合特征提取网络和第一预设处理模块的结构示意图;
图11是根据第二实施例示出的一种包括特征融合网络的结构示意图;
图12a是根据第二实施例示出的一种第三预设处理模块的结构示意图;
图12b是根据第二实施例示出的另一种第三预设处理模块的结构示意图;
图13是根据第二实施例示出的一种基于神经网络的滤波处理模块的结构示意图;
图14是根据第三实施例示出的一种图像处理方法的流程示意图;
图15是根据第三实施例示出的一种基于神经网络的环路滤波器的结构示意图;
图16是根据第三实施例示出的另一种基于神经网络的环路滤波器的结构示意图;
图17是根据第三实施例示出的又一种基于神经网络的环路滤波器的结构示意图;
图18是根据第四实施例示出的一种图像处理装置的结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
本申请的实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素,此外,本申请不同实施例中具有同样命名的部件、特征、要素可能具有相同含义,也可能具有不同含义,其具体含义需以其在该具体实施例中的解释或者进一步结合该具体实施例中上下文进行确定。
应当理解,尽管在本文可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本文范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语"如果"可以被解释成为"在……时"或"当……时"或"响应于确定"。再者,如同在本文中所使用的,单数形式“一”、“一个”和“该”旨在也包括复数形式,除非上下文中有相反的指示。应当进一步理解,术语“包含”、“包括”表明存在所述的特征、步骤、操作、元件、组件、项目、种类、和/或组,但不排除一个或多个其他特征、步骤、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。本申请使用的术语“或”、“和/或”、“包括以下至少一个”等可被解释为包括性的,或意味着任一个或任何组合。例如,“包括以下至少一个:A、B、C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”,再如,“A、B或C”或者“A、B和/或C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”。仅当元件、功能、步骤或操作的组合在某些方式下内在地互相排斥时,才会出现该定义的例外。·
应该理解的是,虽然本申请实施例中的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
需要说明的是,在本文中,采用了诸如S501、S502等步骤代号,其目的是为了更清楚简要地表述相应内容,不构成顺序上的实质性限制,本领域技术人员在具体实施时,可能会先执行S502后执行S501等,但这些均应在本申请的保护范围之内。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或者“单元”的后缀仅为了有利于本申请的说明,其本身没有特定的意义。因此,“模块”、“部件”或者“单元”可以混合地使用。
本申请中提及的通信设备,可以是终端设备(如移动终端,具体如手机),也可以是网络设备(如基站),具体所指,需要结合上下文加以明确。
可选地,终端设备可以以各种形式来实施。例如,本申请中描述的终端设备可以包括诸如手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、便捷式媒体播放器(Portable Media Player,PMP)、导航装置、可穿戴设备、智能手环、计步器等智能终端,以及诸如数字TV、台式计算机等固定终端。
后续描述中将以移动终端为例进行说明,本领域技术人员将理解的是,除了特别用于移动目的的元件之外,根据本申请的实施方式的构造也能够应用于固定类型的终端。
请参阅图1,其为实现本申请各个实施例的一种移动终端的硬件结构示意图,该移动终端100可以包括:RF(Radio Frequency,射频)单元101、WiFi模块102、音频输出单元103、A/V(音频/视频)输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、处理器110、以及电源111等部件。本领域技术人员可以理解,图1中示出的移动终端结构并不构成对移动终端的限定,移动终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图1对移动终端的各个部件进行具体的介绍:
射频单元101可用于收发信息或通话过程中,信号的接收和发送,具体的,将基站的下行信息接收后,给处理器110处理;另外,将上行的数据发送给基站。通常,射频单元101包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元101还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA2000(Code Division Multiple Access 2000,码分多址2000)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、TD-SCDMA(Time Division-Synchronous Code Division Multiple Access,时分同步码分多址)、FDD-LTE(Frequency Division Duplexing-Long Term Evolution,频分双工长期演进)、TDD-LTE(Time Division Duplexing-Long Term Evolution,分时双工长期演进)和5G等。
WiFi属于短距离无线传输技术,移动终端通过WiFi模块102可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图1示出了WiFi模块102,但是可以理解的是,其并不属于移动终端的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
音频输出单元103可以在移动终端100处于呼叫信号接收模式、通话模式、记录模式、语音识别模式、广播接收模式等等模式下时,将射频单元101或WiFi模块102接收的或者在存储器109中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元103还可以提供与移动终端100执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元103可以包括扬声器、蜂鸣器等等。
A/V输入单元104用于接收音频或视频信号。A/V输入单元104可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风1042,图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元106上。经图形处理器1041处理后的图像帧可以存储在存储器109(或其它存储介质)中或者经由射频单元101或WiFi模块102进行发送。麦克风1042可以在电话通话模式、记录模式、语音识别模式等等运行模式中经由麦克风1042接收声音(音频数据),并且能够将这样的声音处理为音频数据。处理后的音频(语音)数据可以在电话通话模式的情况下转换为可经由射频单元101发送到移动通信基站的格式输出。麦克风1042可以实施各种类型的噪声消除(或抑制)算法以消除(或抑制)在接收和发送音频信号的过程中产生的噪声或者干扰。
移动终端100还包括至少一种传感器105,比如光传感器、运动传感器以及其他传感器。可选地,光传感器包括环境光传感器及接近传感器,可选地,环境光传感器可根据环境光线的明暗来调节显示面板1061的亮度,接近传感器可在移动终端100移动到耳边时,关闭显示面板1061和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
显示单元106用于显示由用户输入的信息或提供给用户的信息。显示单元106可包括显示面板1061,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1061。
用户输入单元107可用于接收输入的数字或字符信息,以及产生与移动终端的用户设置以及功能控制有关的键信号输入。可选地,用户输入单元107可包括触控面板1071以及其他输入设备1072。触控面板1071,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1071上或在触控面板1071附近的操作),并根据预先设定的程式驱动相应的连接装置。触控面板1071可包括触摸检测装置和触摸控制器两个部分。可选地,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器110,并能接收处理器110发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1071。除了触控面板1071,用户输入单元107还可以包括其他输入设备1072。可选地,其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种,具体此处不做限定。
可选地,触控面板1071可覆盖显示面板1061,当触控面板1071检测到在其上或附近的触摸操作后,传送给处理器110以确定触摸事件的类型,随后处理器110根据触摸事件的类型在显示面板1061上提供相应的视觉输出。虽然在图1中,触控面板1071与显示面板1061是作为两个独立的部件来实现移动终端的输入和输出功能,但是在某些实施例中,可以将触控面板1071与显示面板1061集成而实现移动终端的输入和输出功能,具体此处不做限定。
接口单元108用作至少一个外部装置与移动终端100连接可以通过的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元108可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到移动终端100内的一个或多个元件或者可以用于在移动终端100和外部装置之间传输数据。
存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序区和存储数据区,可选地,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器109可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器110是移动终端的控制中心,利用各种接口和线路连接整个移动终端的各个部分,通过运行或执行存储在存储器109内的软件程序和/或模块,以及调用存储在存储器109内的数据,执行移动终端的各种功能和处理数据,从而对移动终端进行整体监控。处理器110可包括一个或多个处理单元;优选的,处理器110可集成应用处理器和调制解调处理器,可选地,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。
移动终端100还可以包括给各个部件供电的电源111(比如电池),优选的,电源111可以通过 电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管图1未示出,移动终端100还可以包括蓝牙模块等,在此不再赘述。
为了便于理解本申请实施例,下面对本申请的移动终端所基于的通信网络系统进行描述。
请参阅图2,图2为本申请实施例提供的一种通信网络系统架构图,该通信网络系统为通用移动通信技术的LTE系统,该LTE系统包括依次通讯连接的UE(User Equipment,用户设备)201,E-UTRAN(Evolved UMTS Terrestrial Radio Access Network,演进式UMTS陆地无线接入网)202,EPC(Evolved Packet Core,演进式分组核心网)203和运营商的IP业务204。
可选地,UE201可以是上述终端100,此处不再赘述。
E-UTRAN202包括eNodeB2021和其它eNodeB2022等。可选地,eNodeB2021可以通过回程(backhaul)(例如X2接口)与其它eNodeB2022连接,eNodeB2021连接到EPC203,eNodeB2021可以提供UE201到EPC203的接入。
EPC203可以包括MME(Mobility Management Entity,移动性管理实体)2031,HSS(Home Subscriber Server,归属用户服务器)2032,其它MME2033,SGW(Serving Gate Way,服务网关)2034,PGW(PDN Gate Way,分组数据网络网关)2035和PCRF(Policy and Charging Rules Function,政策和资费功能实体)2036等。可选地,MME2031是处理UE201和EPC203之间信令的控制节点,提供承载和连接管理。HSS2032用于提供一些寄存器来管理诸如归属位置寄存器(图中未示)之类的功能,并且保存有一些有关服务特征、数据速率等用户专用的信息。所有用户数据都可以通过SGW2034进行发送,PGW2035可以提供UE 201的IP地址分配以及其它功能,PCRF2036是业务数据流和IP承载资源的策略与计费控制策略决策点,它为策略与计费执行功能单元(图中未示)选择及提供可用的策略和计费控制决策。
IP业务204可以包括因特网、内联网、IMS(IP Multimedia Subsystem,IP多媒体子系统)或其它IP业务等。
虽然上述以LTE系统为例进行了介绍,但本领域技术人员应当知晓,本申请不仅仅适用于LTE系统,也可以适用于其他无线通信系统,例如GSM、CDMA2000、WCDMA、TD-SCDMA、5G以及未来新的网络系统(如5G)等,此处不做限定。
基于上述移动终端硬件结构以及通信网络系统,提出本申请各个实施例。
为便于理解,下面先对本申请实施例可能涉及到的专业术语进行解释。
1)多视点视频
由多个摄像头(可以是同一设备的多个摄像头,或者不同设备的一个或多个摄像头)组成的阵列在同一时刻从不同视角对同一场景进行拍摄得到的,是一种有效的三维(three-dimensional,3D)视频表示方法,能够更加生动地再现场景,提供立体感和交互功能。对多视点视频进行压缩编码和解压解码的过程称为多视点视频编码和多视点视频解码。在3D-HEVC中,视点可以被分成了两类:独立视点(如下述2)的介绍)与非独立视点(如下述3)的介绍)。
2)独立视点
独立视点也可称为基础视点,该视点的编码是独立的,不依赖于其他视点。也即独立视点的视频图像可以不依赖于其他视点而利用传统的视频编码器(例如HEVC视频编码器)进行编码,对应的比特流可以单独提取出来形成二维比特流,从而恢复二维视频。
3)非独立视点
也可称为依赖视点,该视点的编码通常是利用已编码的独立视点的信息来预测当前编码视点的信息,从而降低视点间冗余,提高编码效率。
4)视点合成预测(View Synthesis Prediction,VSP)
一种三维视频序列的预测编码技术,用于自其他视点预测当前视点的图像。与帧间预测的主要区别在于:视点合成预测产生的预测图像是一个由不同于当前编码(或解码)视点的一个已编码(或已解码)视点的重建图像和重建深度生成的视点合成图像,而帧间预测产生的预测图像是当前编码(或解码)视点另一时刻的重建图像。
5)深度图(depth image)
即深度图像,也被称为距离影像(range image),是指将从图像采集器到场景中各点的距离(深度)作为像素值的图像,根据深度图像能够直接反映了场景中物体的可见表面的几何形状。深度图由于能够记录场景中物体距离摄像头的距离,可以用以测量、三维重建、虚拟视点合成等。对于深度图的获取可以是可以利用双目相机拍摄同一场景的左、右两幅视点图像,运用(双目)立体匹配算法获取视差图,进而获取深度图。
6)编码树单元(Coding Tree Unit,CTU)
依次编码成HEVC比特流的编码逻辑单元,通常包括三个块,即两个色度块和一个亮度块,这样的一个块叫做CTB(Coding Tree Block,编码树块),除此之外,CTU还包括相关的语法元素。
可选地,术语“重建”和“解码”可以互换使用,术语“图像”、“图片”和“帧”可以互换使用。通常但并非必须,术语“重建”在编码器侧使用,而“解码”在解码器侧使用。
基于上述内容,下面对用于编码多视点视频的多视点编码器进行介绍。请参见图3,是本申请实施例提供的一种多视点编码器的结构示意图。以存在视点V0~V1的多视点视频来举例说明,可选地,V0是独立视点,V1是依赖视点,每个视点的纹理图像与相应的深度图像相关联。本领域技术人员可知,可以利用独立视点的纹理图像的重建纹理块和对应的深度图像的重建深度块来生成依赖视点的纹理图像的预测纹理块。此外,可以利用独立视点的重建深度块来生成依赖视点的预测深度块。利用多视点编码器对独立视点和依赖视点的编解码处理如下。
(1.1)利用多视点编码器300a对独立视点V0的编码处理说明如下:
在接收独立视点V0的输入视频数据之后,将原始图像块(包括纹理图像的纹理图像块和深度图像的深度图像块)减去通过帧内预测和/或帧间预测得到的预测块(包括纹理图像的纹理预测块和深度图像的深度预测块),得到残差块(包括纹理图像的纹理残差块和深度图像的深度残差块)。之后,对残差块进行变换和量化处理,再由熵编码器进行编码,形成已编码的比特流。除此之外,残差块会进行逆量化和逆变换处理,并与通过帧内预测和/或帧间预测得到的预测块相加得到重建块。由于变换和量化的原因,重建块与输入帧(输入视频数据的图像)中的图像块之间存在失真。因此,需要对重建块进行环路滤波处理。例如,基于神经网络的环路滤波处理。此外,环路滤波处理也可以包括DBF(Deblocking  Filter,去块效应滤波)、SAO(Sample-Adaptive Offset,采样自适应补偿)、ALF(Adaptive Loop Filter,自适应环路滤波)中的至少一个(图3未示出);基于神经网络的环路滤波处理还可以增加基于神经网络的滤波器,该神经网络可以是超分辨率神经网络、基于密集残差卷积神经网络、一般的卷积神经网络等等,在此不做限制。例如由DBF、DRNLF(Dense Residual Convolutional Neural Network based In-Loop Filter,基于密集残差卷积神经网络的环路滤波器)、SAO、ALF构成基于神经网络的环路滤波处理(图3未示出)。经过环路滤波处理的重建块会进一步合成重建图像并被存储于图像缓冲器中,以用于后续图像块的预测处理。
(1.2)利用多视点编码器300a对依赖视点V1的编码处理说明如下:
在接收依赖视点V1的输入视频数据之后,将原始图像块(包括纹理图像的纹理图像块和深度图像的深度图像块)减去通过帧内预测和/或帧间预测得到的预测块(包括纹理图像的纹理预测块和深度图像的深度预测块),得到残差块(包括纹理图像的纹理残差块和深度图像的深度残差块)。之后,对残差块进行变换和量化处理,再由熵编码器进行编码,形成已编码的比特流。除此之外,残差块会进行逆量化和逆变换处理,并与通过帧内预测和/或帧间预测得到的预测块相加得到重建块。由于变换和量化的原因,重建块与输入帧中的图像块之间存在失真。因此,需要对重建块进行环路滤波处理,例如,基于神经网络的环路滤波处理,该基于神经网络的环路滤波处理还可以包括DBF、SAO、ALF中的至少一个(图3未示出),还可以增加基于神经网络的滤波器进一步提升滤波图像质量,例如DRNLF。经过环路滤波处理的重建块会进一步合成重建图像并被存储与图像缓冲器中,以用于后续图像块的预测处理。
此外,依赖视点V1的图像还可进行视点合成预测。具体来说,可以从图像缓冲器中读取依赖视点V1对应的独立视点V0的图像块,包括纹理图像的纹理图像块和深度图像的深度图像块。进一步,依据对应的独立视点V0的深度图像块可以生成对应的依赖视点V1的深度图像的预测深度图像块,以及依据对应的独立视点V0的纹理图像的纹理图像块和深度图像的深度图像块,可以生成依赖视点V1的纹理图像的预测纹理块。接下来,将与视点合成预测相关的控制数据(即图3中的预测数据包括的用于指示解码端和编码端保持相同的预测方式的控制数据)和其他相关数据(例如滤波器控制数据)进行熵编码,并在已编码的比特流中进行传输。
(2.1)对于多视点视频的解码处理,可以视为多视点视频的编码处理的逆过程,结合图4示出的多视点视频解码器300b对独立视点V0和依赖视点V1的解码处理说明如下:
对独立视点V0和依赖视点V1的解码处理会经历如下过程:视频解码器对接收到的已编码的比特流(例如独立视点V0的比特流或依赖视点V1的比特流)进行熵解码,得到预测数据、编码端指示的滤波器控制数据以及量化后的变换系数;之后,量化后的变换系数经过逆量化和逆变换得到残差块,该残差块和预测数据经过多种预测方式(例如,包括帧内预测、帧间预测、视点合成预测)中的一种处理后输出的预测块进行求和处理,再根据滤波器控制数据对环路滤波处理的指示,采用和多视频编码器相同的滤波方式对解码图像块进行滤波处理,滤波后的解码图像块会进一步合成解码图像,将解码图像缓存到解码图像缓冲器,用于后续图像块的预测处理,同时输出解码的视频数据。
在此需要注意的是,依赖视点V1的比特流在多视点视频解码器中进行解码时,依赖视点V1解码得到的预测参数可以包括用于指示解码器使用视点合成预测的控制数据,多视点视频解码器根据该控制数据的指示采用视频合成预测方式得到预测块,例如依据对应的独立视点V0的深度图像块可以生成对应的依赖视点V1的深度图像的预测深度图像块,以及依据对应的独立视点V0的纹理图像的纹理图像块和深度图像的深度图像块,可以生成依赖视点V1的纹理图像的预测纹理块,然后将预测深度图像块、预测纹理块、与它们对应的残差块进行求和等一系列处理得到各自的解码图像。
基于上述对多视频编码器和多视频解码器的介绍,下面结合附图,对本申请实施例提供的图像处理方法进行阐述。
第一实施例
请参见图5,图5是根据第一实施例示出的一种图像处理方法的流程示意图,该实施例中的执行主体可以是一个计算机设备或者是多个计算机设备构成的集群,该计算机设备可以是智能终端(如前述移动终端100),也可以是服务器,此处,以本实施例中的执行主体为智能终端为例进行说明。
S501,获取第一辅助信息。
在一个实施例中,所述第一辅助信息包括深度信息或视差信息,所述深度信息包括以下至少一种:深度特征信息、基于深度值的统计信息、深度切片、预处理后的深度切片、深度特征信息和基于深度值的统计信息的组合信息。
对于视差信息和深度信息存在如下关系:由于视差和三维空间上的点到投影中心平面的距离成反比,因此只要知道场景中某点的视差信息,就可以知道该点的深度信息。深度信息或视差信息可以自对应的深度图像中确定,深度特征信息可以是关于深度的点特征、线特征、面特征、感兴趣区域的深度轮廓信息中的任一种或多种;基于深度值的统计信息可以为对应深度切片的深度值的统计信息。基于深度值的统计信息可以用于计算第一视点的深度切片和第二视点的深度切片之间的相似度;深度切片是指深度图中和纹理切片对应的切片区域;预处理的深度切片例如是经过量化处理后的深度切片。可选地,深度信息可以用矩阵表示,矩阵的尺寸和对应纹理切片相关联,示例性地,对于感兴趣深度区域或者关于深度的特定面特征标记为1,对于其他区域标记为0。如此有助于提取感兴趣深度区域或关于深度的特定面特征对应的纹理区域的特征,并进一步对这些特征进行环路滤波处理,以提高重建图像或解码图像的品质。
第一辅助信息可以是来自第一视点和/或第二视点的辅助信息,第一视点可以是依赖视点,第二视点可以是独立视点。
S502,根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。
在一个实施例中,还包括:获取所述对应于第一视点的第一图像块;和/或,获取所述对应于第一视点的第一图像块。可选地,所述第二视点不同于所述第一视点。可选地,参考图像和第一图像块所在的图像属于同一时刻不同视点的图像。
在一个实施例中,对应前述介绍的多视点视频编码器和多视点视频解码器,此处的第一视点可以为依赖视点,第二视点可以为独立视点;第一图像块为输入基于神经网络的环路滤波处理之前的重建块,可选地,重建块为重建纹理图像块(也可简称重建纹理块),重建纹理块可以是CTU、切片(slice)、方块(tile)、子图像中的任一种;第一图像块所在的图像可以称为依赖视点的当前帧,当前帧可以为 当前纹理帧F1,当前纹理帧F1为重建图像。可选地,参考图像是从图像缓冲器(或解码图像缓冲器)中获取的参考帧,该参考帧为第二视点对应的重建图像(或第二视点对应的解码图像),且参考图像先于第一图像块所在的图像完成编码。例如参考图像为独立视点的参考帧。
例如,第一图像块为依赖视点当前纹理帧F1的当前重建的纹理切片S1(或当前重建纹理切片S1),当前重建的纹理切片S1可以为帧内预测切片(I slice)或帧间预测切片(P slice),可选地,当前纹理帧F1没有完全重建完成,而独立视点的参考帧FR1(对应参考图像)已经重建完成,后续可以匹配参考帧中的纹理切片对当前重建的纹理切片S1进行处理。需要说明的是,在获取到的当前重建的纹理切片S1为通过帧内预测处理获得的重建纹理切片(即帧内预测切片I slice),或者为通过帧间预测处理获得的重建纹理切片(即帧间预测切片P slice),而非通过视点间预测(例如视点合成预测)得到的纹理切片时,当前重建纹理切片由于在重建过程中没有参考独立视点的纹理信息,因此,在环路滤波处理阶段可以通过融合来自独立视点的参考帧的纹理信息,增强环路滤波处理后的重建纹理切片的质量。然而本申请并非限于此,当前重建的纹理切片S1也可以为视点间预测处理获得的重建纹理切片。在获取到的当前重建纹理切片为通过视点间预测处理获得的重建纹理切片时,后续参考独立视点的纹理信息也可以进一步提高滤波处理后的重建纹理切片的质量。
上述获取到的第一视点的第一图像块的图像区域的尺寸小于第二视点的参考图像的图像区域的尺寸,这样可以从第二视点的参考图像这一较大的图像区域中确定与第一视点的第一图像块相匹配的第二图像块,从而提高匹配程度。第一视点和第二视点可以对应为依赖视点和独立视点,第一图像块和第二图像块均可以是重建纹理块,具体的匹配方式可以参见图7对应实施例介绍的内容,在此先不做详述。
可选地,可以根据所述参考图像的第二图像块和第一辅助信息,确定或生成对应于所述第一图像块的一处理结果。
在一个实施例中,可以从参考图像中获取第二图像块。对于第二图像块获取可以遵循以下预设规则,即:在第二视点的参考图像划分的较大的图像区域中确定与第一视点的第一图像块匹配的第二图像块。假设第一视点为依赖视点,第二视点为独立视点,第一图像块为重建纹理块,可以根据获取到的重建纹理块和参考帧确定第二图像块,或者,可以根据获取到的重建纹理块和参考帧中的图像区域确定第二图像块。更多地,由于重建纹理块为CTU、切片、方块、子图像中的任一种,根据上述预设规则,有如下确定第二图像块的情形:①当第一图像块为依赖视点的当前帧的重建纹理块,第二图像块从独立视点参考帧中的重建切片中确定(可选地,该重建纹理块的尺寸小于重建切片的尺寸);②当第一图像块为依赖视点的当前帧的重建纹理块且重建图像块为CTU时,可以获取独立视点的参考图像的切片(slice)、方块(tile)、子图像中的任一种以确定第二图像块;③当第一图像块为依赖视点的当前帧的重建纹理块且重建图像块为切片(slice)或方块(tile)时,可以获取独立视点的参考图像中的子图像,并从中确定第二图像块。
在一个实施例中,第一图像块和第二图像块之间的关系包括以下至少一种:所述第二图像块和所述第一图像块的尺寸相同,所述第一图像块和所述第二图像块的类型相同,当所述第二图像块为切片时,所述第二图像块由多个编码树单元构成。这里的尺寸相同是指图像块的图像区域大小相同,例如第一图像块和第二图像块的尺寸都是8×8,类型相同例如是当所述第一图像块为切片时,所述第二图像块对应为切片,当所述第一图像块为编码树块(CTU)时,所述第二图像块对应为编码树块(CTU)。
上述切片具体为纹理切片(或重建纹理切片),可选地,第二视点的纹理切片(例如独立视点的纹理切片SR)可以不是通常意义上包含在NAL(Network Abstraction Layer,网络提取层,在视频编标准H.264中负责将编码后的数据以网络要求的格式进行打包和传输)中的纹理切片,而是尺寸、形状和第一视点的纹理切片(例如依赖视点的纹理切片)相同,由多个CTU组成的图像区域。
可选地,可以根据第一图像块的第一辅助信息从参考图像中确定第二图像块。示例性地,第一图像块为当前重建纹理块,具体为当前重建的纹理切片S1(或纹理切片S1),第一视点为依赖视点,第一辅助信息为深度信息或视差信息,例如,依赖视点当前纹理帧F1中的纹理切片S1对应的深度信息可以是深度切片本身或经过预处理的深度切片,还可以是当前重建的纹理切片S1对应的深度切片的深度值的统计信息。当前重建的纹理切片S1对应的深度信息Ds1或视差信息Ds2自纹理切片S1对应的深度图像确定。根据当前重建的纹理切片S1对应的深度信息Ds1或视差信息Ds2可以确定在独立视点的参考纹理帧FR1中对应的参考纹理切片SR1,具体可以参见下述介绍。
可选地,步骤S502包括:
获取所述第一图像块的第一辅助信息,可选地,所述第一辅助信息包括深度信息,可选地,所述深度信息根据所述第一图像块对应的深度图像确定;
计算或获取所述参考图像中各个图像块的第一辅助信息和所述第一图像块的第一辅助信息的相似度;
将所述参考图像中所述相似度最大的图像块确定为与所述第一图像块匹配的第二图像块。
可选地,所述第一图像块的第一辅助信息包括深度信息或视差信息,所述深度信息或所述视差信息自所述第一图像块对应的深度图像确定。例如,当第一图像块为依赖视点的当前纹理帧F1的当前重建纹理切片S1时,第一辅助信息可以从当前重建纹理切片S1对应的深度切片中确定。
可选地,第二视点的参考图像中各个图像块与第一图像块类型相同,例如均为纹理切片;与第一图像块的第一辅助信息类似,参考图像中各个图像块的第一辅助信息也包括深度信息或视差信息,深度信息或视差信息自参考图像对应的深度图像确定。对于第二图像块与第一图像块的相似度,通过各自的第一辅助信息之间的相似度来衡量。例如,依据深度信息计算独立视点的纹理切片和依赖视点的纹理切片之间的相似度,深度信息包括以下至少一种:深度特征信息、基于深度值的统计信息、深度切片、预处理后的深度切片、深度特征信息和基于深度值的统计信息的组合信息。
可选地,通过确定与第一视点的第一图像块对应的第一辅助信息相似度最大的第二视点的参考图像中的图像块,可以在对应的第二视点的参考图像的各图像块中查找到与第一视点的第一图像块匹配的第二图像块,该第二图像块的第一辅助信息和第一图像块的第一辅助信息之间最相似。
所述处理结果用于获得所述第一图像块对应的重建图像或解码图像。需要说明的是,当本方案应用在多视点视频编码端时,处理结果用于获取第一图像块对应的重建图像;当本方案应用在多视点视频解码端时,处理结果用于获取第一图像块对应的解码图像。
在一个实施例中,处理结果包括滤波后的第一图像块,可以利用图6示出的基于神经网络的环路滤波处理器,根据第一辅助信息和参考图像的第二图像块对第一图像块进行处理,确定或生成处理结果。
如图6所示,基于神经网络的环路滤波器包括融合模块和基于神经网络的滤波处理模块,可选地, 融合模块可以接收第一辅助信息(例如深度信息或视差信息)、第一图像块(例如依赖视点的当前帧的当前重建纹理块)、参考图像(例如独立视点参考帧)进行处理,融合模块可以从参考图像中确定第二图像块(例如匹配纹理块),第一图像块和第二图像块在融合模块中经过一系列处理之后,再将融合模块处理得到的结果输入到基于神经网络的滤波处理模块中进行处理,得到滤波后的第一图像块。需要说明的是,基于神经网络的环路滤波器可以设置于图3所示的多视点编码器或图4所示的多视点解码器中,基于神经网络的滤波处理模块采用下述图13所示出的结构示意图。在一个实施例中,融合模块可以独立于基于神经网络的滤波处理模块存在,即设置为一个单独的功能模块。在另一个实施例中,融合模块与基于神经网络的滤波处理模块一起包含于图3或图4中的基于神经网络的环路滤波处理器中,用来确定或生成环路滤波处理的处理结果,后续根据该处理结果可以进一步合成重建图像或解码图像。在一实施例中,该处理结果也可以称为依赖视点的已环路滤波的纹理块,或依赖视点的已环路滤波的重建纹理块。
在一种实施方式中,融合模块接收依赖视点的纹理块、独立视点的参考帧以及深度信息或视差信息。在另一实施方式中,融合模块也可以接收依赖视点的当前帧的当前重建纹理块、独立视点参考帧中的重建纹理切片以及深度信息或视差信息。可选地,依赖视点的当前帧的重建纹理块可以是CTU、切片(slice)、方块(tile)、子图像中的一种,当融合模块接收依赖视点的当前帧的重建纹理块是CTU时,融合模块可以自图像缓冲器中接收来自独立视点的参考帧的切片、方块、或子图像中的一种,当融合模块接收依赖视点的当前帧的重建纹理块是切片(slice)或方块(tile)时,融合模块可以自图像缓冲器中接收来自独立视点的参考帧的子图像。
有关融合模块和基于神经网络的滤波处理模块的更详细的处理方式可见图7对应实施例阐述的内容,在此先不做详述。
综上所述,本申请实施例提供的图像处理方案可以应用于多视点视频编解码的场景中,通过参考第一辅助信息(包括深度信息或视差信息),将不同视点的图像块的信息利用起来,可以减小同一时刻不同视点的帧之间的视差影响,以实现不同视点的图像块较好的匹配效果,根据匹配度较高的第二图像块辅助第一图像块确定或生成对应的处理结果,能够降低处理结果的失真度,从而获取高质量的重建图像或解码图像。
第二实施例
请参见图7,图7是根据第二实施例示出的一种图像处理方法的流程示意图,该实施例中的执行主体可以是一个计算机设备或者是多个计算机设备构成的集群,该计算机设备可以是智能终端(如前述移动终端100),也可以是服务器,此处,以本实施例中的执行主体为智能终端为例进行说明。
S701,获取第一辅助信息、对应于第一视点的第一图像块以及对应于第二视点的参考图像。此步骤可以参见第一实施例中相关描述,在此不做赘述。
S702,根据所述第一辅助信息从所述参考图像中确定第二图像块。
当第一视点的第一图像块是CTU时,可以自图像缓冲器中获取来自第二视点的参考图像的切片、方块、或子图像中的一种中确定第二图像块,当第一视点的第一图像块为切片或方块时,可以自图像缓冲器中获取来自第二视点的参考图像的子图像中确定第二图像块。即此处从第二视点的参考图像中确定第二图像块的处理规则为:第二视点的图像区域的尺寸大于第一视点的第一图像块的图像区域的尺寸。
在一个实施例中,所述第二图像块和所述第一图像块的类型是相同的,例如第一图像块为重建纹理切片,从参考图像的子图像中确定的第二图像块也是重建纹理切片;所述第二图像块和所述第一图像块匹配,第二图像块也可称为匹配图像块,例如当第一图像块为重建纹理块时,第二图像块为匹配纹理块,具体的,重建纹理块为重建纹理切片时,可以将第二视点的参考图像的重建纹理切片与第一视点的重建纹理切片进行粗略匹配(slice to slice registration),得到的第二图像块为第二视点的参考图像中符合匹配条件的重建纹理切片(即匹配纹理切片)。
在一个实施例中,S702对应是将第一视点的第一图像块和第二视点的第二图像块进行粗略匹配,可选实现步骤如下①~③:
①获取所述第一图像块的第一辅助信息;②获取所述参考图像中各个图像块的第一辅助信息和所述第一图像块的第一辅助信息的相似度;③将所述参考图像中所述相似度最大的图像块确定为与所述第一图像块匹配的第二图像块。具体内容可以参见第一实施例中相同步骤的描述,这里不再赘述。
示例性地,所述第一图像块为依赖视点的当前纹理帧F1的当前重建纹理切片S1,所述参考图像为独立视点的参考帧,将独立视点的参考纹理帧FR1中的纹理切片SR和依赖视点的当前帧的当前重建纹理片S1进行粗略匹配:首先,获取到当前重建纹理切片S1对应的深度信息Ds1或视差信息Ds2,接着,可以在对应的独立视点的参考帧FR1中查找与当前重建纹理切片S1对应的深度信息Ds1或视差信息Ds2最相似的参考纹理切片SR1。例如,确定与依赖视点的纹理切片S1对应的深度信息的相似度最大的独立视点的纹理切片为参考纹理切片SR1,该参考纹理切片SR1和依赖视点的纹理切片S1相匹配。需要说明的是,独立视点的纹理切片SR可以不是通常意义上包含在NAL中的纹理切片,而是尺寸和形状与依赖视点中的纹理切片相同的、多个CTU组成的图像区域。
S703,确定所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图。
第一图像块和第二图像块各自对应的特征图可以是直接对图像块进行特征提取得到的,也可以是第一图像块和第二图像块经过精细匹配后,对精细匹配输出的图像子块进行特征提取得到的。针对这两种不同的方式,可以参见下述的介绍。
方式1:基于特征提取网络和所述第一辅助信息对所述第一图像块和所述第二图像块进行特征提取处理,得到所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图。
此处的第一辅助信息和前述粗略匹配使用的第一辅助信息是相同的,可以来自第一图像块或第二图像块;该第一辅助信息包括深度信息或视差信息,深度信息或视差信息可以通过矩阵来表示,或者也可以是深度图或视差图,或者是经过预处理(例如量化处理、归一化处理)的深度图或视差图来表示。第一辅助信息可以作为特征提取网络的参考信息,能够在对第一图像块和第二图像块进行特征提取处理时,使得提取的第一图像块和第二图像块的特征与深度信息以及/或者视差信息之间建立映射关系,以使得后续进行第一预设处理时能够更精确地确定第一预设处理模型和/或第一预设处理参数。在一实施例中,第一预设处理为扭曲处理,第一预设处理模型为扭曲模型,第一预设处理参数为扭曲参数。
在另一实施方式中,可以对需要监督的特定深度的纹理图像区域对应的深度信息设置为1,而对其他深度的纹理图像区域对应的深度信息设置为0。进一步,将此种方式得到的深度信息或视差信息作为第一辅助信息。可选地,第一辅助信息可以为特征提取网络的监督信息,能够使得仅提取第一图像块 和第二图像块中对应于需要监督的特定深度的纹理图像区域的特征而生成各自对应的特征图。如此,可以在计算资源或者传输带宽有限的情况下,优先处理或仅处理需要监督的特定深度的纹理图像区域。
特征提取网络包括神经网络,例如卷积神经网络、残差卷积神经网络、深度学习神经网络等中的任一种或多种的组合,通过训练好的神经网络的处理,可以提取图像块对应的特征图,特征图为多维(例如二维)矩阵。对应的特征提取单元可以为卷积层,下采样单元可以为池化层。
方式2:获取所述第一图像块的第一图像子块和所述第二图像块的第二图像子块;所述第二图像子块的第二辅助信息和所述第一图像子块的第二辅助信息匹配;基于特征提取网络和所述第二辅助信息对所述第一图像子块和所述第二图像子块进行特征提取处理,得到所述第一图像子块的第一子特征图和所述第二图像子块的第二子特征图;通过所述第一子特征图,确定或生成所述第一图像块对应的第一特征图,以及通过所述第二子特征图,确定或生成所述第二图像块对应的第二特征图;可选地,所述第二辅助信息与所述第一辅助信息不同。在一实施例中,将第一图像块的所有第一图像子块对应的第一子特征图组合/拼接成第一特征图,以及将第二图像块的所有第二图像子块对应的第二子特征图拼接成第二特征图。
此方式下,第二图像子块对应是在粗略匹配之后,将第一图像子块和第二图像块中的图像子块进行精细匹配得到的结果,第二图像子块也可以称为匹配图像子块,第一图像子块和第二图像子块的类型相同。示例性地,第一视点的第一图像块为重建纹理块,第二视点的第二图像块对应为重建纹理块(或称为匹配纹理块),第一图像子块为第一视点的重建纹理块中的重建纹理子块,第二图像子块为第二视点的重建纹理块中的重建纹理子块(或称为匹配纹理子块)。
第二图像子块和第一图像子块各自的第二辅助信息匹配是指不同图像子块的第二辅助信息之间的相似度最大。和粗略匹配的方式类似,精细匹配是利用第二辅助信息进行相似度计算的。第二辅助信息可以包括深度信息或视差信息,第二辅助信息为深度信息时,深度信息可以包括但不限于以下至少一种:
①深度特征信息,例如,关于深度的点特征、线特征、面特征、边界特征、感兴趣部分的深度轮廓信息;
②基于深度值的统计信息,例如,重建纹理编码树块CTBd对应的重建深度块的深度值的统计信息,此时深度信息可以用于计算第二视点的重建深度块和第一视点的重建深度块之间的相似度;
③深度特征信息和基于深度值的统计信息的组合;
④重建深度块本身或经过预处理的重建深度块。
需要说明的是,精细匹配使用的第二辅助信息和粗略匹配使用的第一辅助信息不同。精细匹配时使用的第二辅助信息的内容不同,或者精度更高。在一实施方式中,第一辅助信息和第二辅助信息都为深度信息,可以包括的内容不同,例如,粗略匹配时纹理切片对应的深度信息为深度特征信息,精细匹配时重建纹理子块对应的深度信息是基于深度值的统计信息;或者,粗略匹配时纹理切片对应的深度信息为深度特征信息,精细匹配时重建纹理子块对应的深度信息是深度特征信息和基于深度值的统计信息的组合。在另一实施方式中,第一辅助信息和第二辅助信息的精度不同,例如,粗略匹配时纹理切片对应的深度信息是n个深度特征信息,精细匹配时重建纹理子块对应的深度信息是m个深度特征信息,可选地,m大于n且为大于等于1的整数。如此,在粗略匹配时,可以利用一种深度信息(例如,低精度深度信息)对纹理切片进行匹配,而在精细匹配时,可以利用另一种深度信息(高精度深度信息)对重建纹理编码树块进行匹配,通过两次不同维度的匹配,可以保持较好的计算复杂度和匹配结果之间的平衡。
在一个可行的实施例中,所述第一图像子块和所述第二图像子块为相同类型的图像子块,例如,所述第一图像子块和所述第二图像子块为编码树块或扩展的编码树块。第一图像块为重建纹理块,重建纹理块可以是重建纹理切片,即第一图像块为重建纹理切片;第一图像子块为重建纹理子块,重建纹理子块可以是重建纹理编码树块CTB,即第一图像子块为重建纹理编码树块CTB,第二图像块和第二图像子块的类型与第一图像块以及第一图像子块对应。获取重建纹理子块的方式可以为:依据预定的处理顺序,对第一视点的重建纹理切片S1中的重建纹理编码树块CTBd与第二视点的重建纹理切片SR1中的重建纹理编码树块CTBi进行精细匹配(block to block registration)。例如,第一视点为依赖视点,第二视点为独立视点,可以依据光栅扫描的顺序,通过依赖视点的重建纹理切片中的重建纹理编码树块对应的深度信息或视差信息,确定在独立视点的参考纹理帧FR1中对应的参考纹理切片中的重建纹理编码树块CTBi。
可选地,可以依据深度信息计算第一视点的重建纹理编码树块和第二视点的重建纹理编码树块之间的相似度,即确定与第一视点的重建纹理编码树块CTBd对应的深度信息相似程度最大的第二视点的重建纹理编码树块CTBi,并将其作为和第一视点的重建纹理编码树块CTBd相匹配的重建纹理编码树块,此处,深度信息的相似程度可以利用关于相似概率的函数来表示。
重建纹理子块还可以是扩展的重建纹理编码树块CTBex,即第一图像子块和第二图像子块均为扩展的重建纹理编码树块,扩展的重建纹理编码树块CTBex是对重建纹理编码树块CTB的块边缘进行扩展的编码树块,扩展的重建编码树块CTBex包括重建纹理编码树块CTB。可选地,扩展区域可以利用与重建纹理编码树块相邻的其他重建纹理编码树块的像素进行填充,因此,扩展的重建纹理编码树块CTBex的尺寸比重建纹理编码树块的尺寸大。由于对于编码图像的分割是以编码树块为基础,重建的编码图像或解码图像会产生块效应,但是此处利用扩展的重建纹理编码树块进行环路滤波时,扩展区域是相邻的其他重建纹理编码树块的像素进行填充的,这样可以有效降低分割带来的块效应。因此,第一图像子块和第二图像子块为扩展的重建纹理编码树块,可以从图像划分的本质上减轻块效应,进而提高滤波效果和编解码的质量。
在方式2下,和第一辅助信息类似,第二辅助信息除了作为精细匹配的参考信息,使得提取的第一图像子块和第二图像子块的特征与深度信息以及/或者视差信息之间建立映射关系,还可以作为监督信息,使得特征提取网络能够仅提取第一图像子块和第二图像子块中对应于需要监督的特定深度的纹理图像区域的特征而生成各自对应的特征图。
上述两种方式都可以通过特征提取网络提取得到对应的特征图(包括第一特征图和第二特征图),区别仅在于特征提取网络接收的处理对象的不同:在精细匹配下,特征提取网络接收的是第一图像子块(例如重建纹理子块)、第二图像子块(例如匹配纹理子块)、第二辅助信息。在只有粗略匹配时,特征提取网络接收的是第一图像块(例如当前重建纹理块)、第二图像块(例如匹配纹理块)、第一辅助信息。接下来对特征提取网络所包括的详细内容以及处理原理进行介绍。
在一个实施例中,特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元;所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一图像块和所述第二图像块,或者所述第一图像子块和所述第二图像子块;所述N个级联的特征提取模块中除所述第一个特征提取模块之外的各个特征提取模块,用于处理前一个特征提取模块的输出;针对所述每个特征提取模块,所述下采样单元的输入和所述特征提取单元的输出连接,所述下采样单元的输出和后一个特征提取模块中特征提取单元的输入连接;可选地,所述第一辅助信息或所述第二辅助信息作为所述N个级联的特征提取模块中至少一个特征提取模块的参考信息和/或监督信息。
基于上述对特征提取网络的介绍,请参见图8a,是本申请实施例提供的一种特征提取网络的结构示意图。下面将结合图8a对特征提取网络采用金字塔分层处理的方式提取第一图像块的特征进行如下说明,在此以特征提取网络接收第一辅助信息处理第一图像块和第二图像块为示例。
首先,第一图像块(例如重建纹理块)、第二图像块(例如匹配纹理块)以及第一辅助信息被输入至特征提取模块1中,通过特征提取单元1中的卷积操作,输出特征图Fd1和特征图Fi1。之后,下采样单元1对输出特征图Fd1和特征图Fi1,得到经过下采样的特征图Fdd1和经过下采样的特征图Fid1。特征提取模块1包括的特征提取单元1和下采样单元1的处理为第一层级的处理。
接下来,经过下采样的特征图Fdd1和经过下采样的特征图Fid1被输入至特征提取模块2,通过特征提取单元2中的卷积操作,输出特征图Fd2和特征图Fi2。之后,下采样单元2对输出的特征图Fd2和特征图Fi2进行下采样操作,得到经过下采样的特征图Fdd2和特征图Fid2。特征提取模块2包括的特征提取单元2和下采样单元2的处理为第二层级的处理。
同理,对于接下来的每一层级的特征提取模块的操作类似,直到第n-1层级的处理。对于第N层级的处理,经过下采样的特征图Fdd(n-1)和经过下采样的特征图Fid(n-1)被输入至特征提取模块N,通过卷积操作,输出特征图Fdn和特征图Fin。
此处将特征图Fd1~Fdn统称为第一特征图,特征图Fi1~Fin统称为第二特征图。在上述实施例中,每经过一个特征提取单元处理后再进行下采样单元的下采样处理,可以降低特征图的尺寸大小,通过串联的特征提取模块特征图的尺寸大小会逐渐缩小,进而使得特征图表达的语义更加抽象,这一过程也即金字塔特征提取。在金字塔特征提取处理中,每一层级的特征提取模块产生的特征图都是不同尺度的特征图,后续利用不同尺度的特征图进行扭曲、融合处理,可以丰富特征图的表达,进而使得最终融合得到的特征图更全面准确地描述第一视点的第一图像块的信息,从而更好地还原第一图像块对应的原始图像。
特征提取网络中的部分或全部特征提取模块可接收第一辅助信息/第二辅助信息,第一辅助信息/第二辅助信息可以作为参考信息或监督信息,参考信息可以使得提取得到的特征和辅助信息之间建立映射关系,监督信息可以用于对特征提取单元进行监督训练,这样可以使得训练后的特征提取网络精确地提取不同深度信息的图像块,进而获取得到准确的特征图。在上述示例中,可以仅是第一层级的特征提取模块(具体是特征提取单元1)接收第一辅助信息/第二辅助信息(例如深度信息或视差信息),这样计算相对简单,适用于深度信息较简单的场景。在其他示例中,还可以每一层级的特征提取模块的所有模块或部分模块接收第一辅助信息/第二辅助信息(例如深度信息或视差信息),从而适用于高精度的深度变化范围的场景,有利于得到高质量的环路滤波处理后的重建纹理块。此外,还可以是对于不同的场景,不同的时刻,在每一层级使能或禁能接收第一辅助信息/第二辅助信息(例如深度信息或视差信息),以使得可以自适应地控制计算的复杂度,满足不同应用的要求。请参见图8b示出的另一种特征提取网络的示意图,如图8b所示,在特征提取网络提取第一图像块和第二图像块对应的特征图时,各个特征提取单元均接收第一辅助信息,以更好地提取高精度的深度信息的图像块的特征图。
需要说明的是,如图8a或图8b所示出的特征提取网络还可以接收第二辅助信息处理第一图像子块和第二图像子块,具体处理流程和前述接收第一辅助信息处理第一图像块和第二图像块示例的内容相同,在此不做详述。各个特征提取单元输出的特征图可以为子特征图,例如特征图Fd1~Fdn统称为第一子特征图,特征图Fi1~Fin统称为第二子特征图。在一实施例中,可以将第一图像块的所有第一图像子块对应的第一子特征图组合/拼接成第一特征图,以及将第二图像块的所有第二图像子块对应的第二子特征图拼接成第二特征图,后续基于第一特征图和第二特征图进行第一预设处理和第二预设处理;在另一实施例中,也可以直接对第一子特征图和第二子特征图进行第一预设处理以及第二预设处理。
在一实施例中,以第一预设处理为扭曲处理,第二预设处理为特征融合处理为例,对子特征图的处理进行如下说明:
将第一图像块的所有第一图像子块对应的第一子特征图组合/拼接成第一特征图,以及将第二图像块的所有第二图像子块对应的第二子特征图拼接成第二特征图。接下来,将第二特征图进行扭曲处理。最后,将扭曲后的第二特征图和第一特征图进行特征融合处理,以得到融合后的特征图。需要说明的是,也可以先不用确定第一特征图和第二特征图,而直接将第二子特征图进行扭曲处理。在将第二子特征图进行扭曲处理,得到扭曲后的第二子特征图之后,可以将所有第二子特征图进行组合/拼接,从而确定或生成扭曲后的第二特征图。以及将第一子特征图进行组合/拼接,从而确定或生成第一特征图。最后,将扭曲后的第二子特征图和第一特征图进行特征融合处理,以得到融合后的特征图。再或者,也可以先不用确定第一特征图和第二特征图,而直接将第二子特征图进行扭曲处理。在将第二子特征图进行扭曲处理,得到扭曲后的第二子特征图之后,将扭曲后的第二子特征图和第一子特征图进行特征融合处理,以得到融合后的子特征图。最后,将融合后的子特征图进行组合/拼接处理,以得到融合后的特征图。
S704,根据所述第一特征图以及所述第二特征图,确定或生成对应于所述第一图像块的一处理结果。处理结果用于生成所述第一图像块对应的重建图像或解码图像。
在一个可行的实施例中,S704的具体实现步骤包括(1)~(3):
根据所述第一特征图对所述第二特征图进行第一预设处理,得到目标第二特征图。
在一实施例中,基于所述第一特征图和所述第二特征图确定第一预设处理参数;或者,基于所述第一特征图、所述第二特征图以及所述第一辅助信息确定第一预设处理参数;基于第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图;所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在另一实施例中,所述第一预设处理模型包括所述第一处理模型和第二处理模型,所述基于所述第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图,包括:
根据所述第一处理模型和所述第二处理模型确定所述第二特征图中的采样点坐标;
根据所述第二特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;
根据所述采样点坐标对应的目标像素值生成目标第二特征图。
可选地,第一预设处理为扭曲处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型。
通过对第二视点的第二图像块对应的第二特征图进行扭曲处理,可以将不同视点的特征图相互映射,此处将第二视点的第二特征图映射到第一视点,可以使得扭曲后的第二特征图和第一视点的第一特征图中的对象形状、大小等属性相似。通过将扭曲后的第二特征图和第一特征图进行融合,从而在滤波阶段可以提高第一图像块对应的重建图像的质量,降低重建图像的失真,以更好地还原第一图像块对应的原始图像。
在一种实施方式中,步骤(1)中第一预设处理为扭曲处理,可以包括以下内容:基于所述第一特征图和所述第二特征图确定扭曲参数;或者,基于所述第一特征图、所述第二特征图以及所述第一辅助信息确定扭曲参数;或者,基于所述第一特征图、所述第二特征图以及所述第二辅助信息确定扭曲参数;基于扭曲模型对所述第二特征图进行扭曲处理,得到扭曲后的第二特征图;所述扭曲模型包括根据所述扭曲参数确定的第一处理模型。
可选地,特征提取网络中每层级的特征提取模块中的特征提取单元输出的特征图为第一特征图和第二特征图,例如前述图8a中示出的特征图Fd1~Fdn(对应为第一特征图)和特征图Fi1~Fin(对应为第二特征图)。可选地,对于采取金字塔分层提取得到的第二特征图来说,每一层的第二特征图都可以进行第一预设处理(例如扭曲处理),得到目标第二特征图(例如扭曲后的第二特征图)。
下面以第x(x∈[1~n])层的特征提取模块中的特征提取单元x输出的第一特征图Fdx和第二特征图Fix作为示例,对第二特征图Fix的扭曲处理原理进行如下说明:
通过扭曲参数确定模块接收来自第一视点的第一特征图Fdx和来自第二视点的第二特征图Fix,输出扭曲参数。可选地,第一视点的第一特征图Fdx的宽度为Wdx,高度为Hdx,以及通道数为Cdx;第二视点的第二特征图Fix的宽度为Wix,高度为Hix,以及通道数为Cix;扭曲参数确定模块是基于神经网络构建的。例如扭曲参数确定模块可以利用全连接层或卷积层来实现,需要说明的是,扭曲参数确定模块还包括一个回归层,该回归层用于产生扭曲参数。
可选地,通过神经网络学习算法,可以构建基于神经网络的扭曲参数确定模块,扭曲处理模块能够建立输入变量(包括第一特征图、第二特征图)到扭曲参数的映射关系。例如在训练基于神经网络的扭曲参数确定模块的过程中,首先,建立训练样本,训练样本包括输入和输出,可选地,输入包括第一特征图、第二特征图,输出包括扭曲参数,此处的扭曲参数包括对不同扭曲类型(例如裁剪、平移、旋转、缩放和倾斜)的第二特征图标记的扭曲参数。然后,利用训练样本进行正向传播计算,得到各层神经元的输入和输出。接着,计算神经网络输出的估计的扭曲参数和标记的扭曲参数之间的误差,通过调整各层网络的权重和偏移值,使得网络的误差平方和最小。最终,当误差达到预设精度时,将得到的各层网络的权重和偏移值作为最后取值,完成神经网络的训练。在预测阶段,当基于神经网络的扭曲参数确定模块接收第一特征图、第二特征图时,扭曲参数确定模块包括的训练后的神经网络能够准确地确定出扭曲参数。需要说明的是,上述扭曲参数确定模块还可以接收第一视点的第一子特征图和第二视点的第二子特征图,确定扭曲参数。
可选地,通过第一特征图和第二特征图中各特征之间的相似性,可以确定网格中目标像素坐标(即扭曲后的特征图的像素坐标)与第二特征图中对应的像素坐标之间的对应关系。可选地,该对应关系用于确定扭曲参数。
在一个实施例中,扭曲参数包括以下至少一种:关于仿射变换的参数、关于投影变换的参数。需要说明的是,采用金字塔分层提取处理,对于不同层输出的第一特征图和第二特征图来说,扭曲参数可以不同。
在扭曲参数确定之后,依据确定的扭曲参数可以得到扭曲模型,扭曲模型能够反映第二特征图Fix和扭曲后的第二特征图Fiwx的对应像素坐标之间的映射关系。
在一个可行的实施方式中,扭曲模型包括第一处理模型和第二处理模型,所述第一处理模型是根据所述扭曲参数确定的扭曲模型,所述第二处理模型包括目标像素坐标,利用扭曲模型对第二特征图进行扭曲处理,得到扭曲后的第二特征图的实现方式可以包括:根据所述第一处理模型和所述第二处理模型确定所述第二特征图中的采样点坐标;根据所述第二特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成扭曲后的第二特征图。
所述扭曲参数确定的第一处理模型包括仿射变换矩阵、投影变换矩阵、以及所述仿射变换矩阵和所述投影变换矩阵的组合中的任一种;第二处理模型为像素网格(grid)模型G,假设G={Gi},Gi为输出特征图中的网格的目标像素坐标(xit,yit),可选地,这些像素为经过扭曲处理的输出像素。在一实施例中,像素网格模型G为预先定义的网格模型。在其他实施例中,像素网格模型G可以根据第一特征图中的特征和/或辅助信息来确定。根据第一特征图中的特征和/或辅助信息来确定像素网格模型G可以使得像素网格模型的设置更灵活。
根据第一处理模型和第二处理模型可以得到第二特征图中的采样点坐标,假设第一处理模型为仿射变换矩阵,第二特征图Fix中定义的采样点坐标为(xis,yis),则像素式的仿射变换为:
Figure PCTCN2022144217-appb-000001
素坐标基础上加入一个常量1,构成齐次坐标。通过齐次坐标,能够表示一些常见的扭曲变换。
在优选的实施例中,还可以使用标准化的坐标系。例如,通过标准化的坐标系,可以将输出特征图(即扭曲后的第二特征图)中的网格的目标像素坐标(xit,yit)的取值限制在-1到1的范围内,以及将第二特征图中定义的采样点坐标(xis,yis)的取值限制在-1到1的范围内。如此,可以使得后续的采样和变换均应用于标准化的坐标系下。
通过扭曲模型可以使得输入的第二特征图被裁剪、平移、旋转、缩放和倾斜,以形成输出特征图(即扭曲后的第二特征图)。
在另一个可行的实施方式中,所述扭曲模型包括多个扭曲子模型,每个扭曲子模型分配有对应的权重。通过对每个扭曲子模型分配不同的权重,可以获取较优的扭曲模型组合,通过扭曲模型组合可以 侧重不同方面对第二特征图进行扭曲处理,使得扭曲效果更好。
上述基于扭曲模型对第二特征图进行扭曲得到采样点坐标的原理,可以视为对第二特征图进行采样处理的一部分,接着可以依据采样结果(即采样点坐标)产生扭曲后的第二特征图Fiwx。可选地,在确定出第二特征图的采样点坐标之后,可以在第二特征图中定义的采样点坐标(xis,yis)处应用采样核函数,来得到输出特征图中的采样点坐标对应的像素点的像素值。具体可参见如下表达式:
Figure PCTCN2022144217-appb-000002
高度KC是通道数。最终扭曲后的第二特征图就包括采样点坐标对应的目标像素值。
针对上述介绍的步骤(1)中的扭曲处理的逻辑,可以通过如下扭曲处理模块实现,请参见图9a和图9b,是本申请实施例提供的第一预设处理模块的结构示意图,该第一预设处理模块具体为扭曲处理模块,包括扭曲参数确定模块、扭曲模型确定模块、采样模块。对于各个模块的功能如下:扭曲参数确定模块前述已做相关介绍,这里不再赘述,扭曲模型确定模块可以接收扭曲参数确定模块输出的扭曲参数,进而输出扭曲模型,例如扭曲参数为关于仿射变换的参数,扭曲模型对应包括仿射变换矩阵,采样模块用于结合扭曲模型对第二特征图进行扭曲和采样处理,包括获取采样点坐标,以及利用采样核函数计算在第二特征图的采样点坐标处的目标像素值,进而输出扭曲后的第二特征图Fiwx。
如图9a和图9b示出的扭曲处理模块不同的之处在于:接收的输入不同。图9a示出的扭曲处理模块接收第一特征图和第二特征图,以确定扭曲模型。而图9b示出的扭曲处理模块除了接收第一特征图和第二特征图之外,还可以接收辅助信息(包括第一辅助信息/第二辅助信息,例如深度信息或视差信息),以确定扭曲模型。在一个实施例中,通过扭曲参数确定模块接收来自第一视点的第一特征图Fdx、来自第二视点的第二特征图Fix、和辅助信息,输出扭曲参数。可选地,第一视点的第一特征图Fdx的宽度为Wdx,高度为Hdx,以及通道数为Cdx;第二视点的第二特征图Fix的宽度为Wix,高度为Hix,以及通道数为Cix;扭曲参数确定模块是基于神经网络构建的。例如扭曲参数确定模块可以利用全连接层或卷积层来实现,需要说明的是,扭曲参数确定模块还包括一个回归层,该回归层用于产生扭曲参数。
可选地,通过神经网络学习算法,可以构建基于神经网络的扭曲参数确定模块,扭曲处理模块能够建立输入变量(包括第一特征图、第二特征图、第一辅助信息)到扭曲参数的映射关系。例如在训练基于神经网络的扭曲参数确定模块的过程中,首先,建立训练样本,训练样本包括输入和输出,可选地,输入包括第一特征图、第二特征图、和第一辅助信息,输出包括扭曲参数,此处的扭曲参数包括对不同扭曲类型(例如裁剪、平移、旋转、缩放和倾斜)的第二特征图标记的扭曲参数。然后,利用训练样本进行正向传播计算,得到各层神经元的输入和输出。接着,计算神经网络输出的估计的扭曲参数和标记的扭曲参数之间的误差,通过调整各层网络的权重和偏移值,使得网络的误差平方和最小。最终,当误差达到预设精度时,将得到的各层网络的权重和偏移值作为最后取值,完成神经网络的训练。在预测阶段,当基于神经网络的扭曲参数确定模块接收第一特征图、第二特征图时,扭曲参数确定模块包括的训练后的神经网络能够准确地确定出扭曲参数。
可选地,通过辅助信息(例如,第一辅助信息和第二辅助信息),可以确定网格中目标像素坐标(即扭曲后的特征图的像素坐标)与第二特征图中对应的像素坐标之间的对应关系。可选地,该对应关系用于确定扭曲参数。由前述可知,在特征提取阶段可以建立第一图像块和第二图像块的特征与深度信息和/或视差信息之间的映射关系。进一步地,通过上述第一图像块和第二图像块的特征与深度信息和/或视差信息之间的映射关系,可以确定网格中目标像素坐标(即扭曲后的特征图的像素坐标)与第二特征图中对应的像素坐标之间的关系。
由于第一视点和第二视点的图像之间的差异(例如,形状大小尺寸的差异)主要是由视差或深度造成的,因此,参考深度信息或视差信息的扭曲处理可以使得扭曲后的第二特征图更匹配对应的第一视点的第一特征图,从而使得后续融合特征图的质量更高。
在一实施例中,上述扭曲处理模块对特征提取网络中每个特征提取单元输出的第一特征图和第二特征图的处理都是相同的,即利用第一特征图对第二特征图进行扭曲处理,得到扭曲后的第二特征图。请参见图10示出的一种特征提取网络和第一预设处理模块结合的结构示意图,是基于图8a示出的特征提取网络的结构示意图所提出的,第一预设处理模块具体为扭曲处理模块。如图10所示,包括N个扭曲处理模块和N个特征提取模块,特征提取网络处理第一图像块和第二图像块,得到对应的第一特征图和第二特征图,每个特征提取单元输出的特征图Fdx(x∈[1,n])和Fix都对应由扭曲处理模块进行处理,得到扭曲后的第二特征图Fiwx,也即最终包括N个扭曲后的第二特征图。可以理解的是,相同标号的特征提取模块和扭曲处理模块属于同一层级的处理,例如特征提取模块1和扭曲处理模块1属于第1层级,特征提取模块2和扭曲处理模块2属于第2层级,依次类推,包括N个层级,构成金字塔分层结构,每个层级都有特征提取模块和扭曲处理模块。
根据所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图。
在一实施例中,可以利用特征融合网络对所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图。
可选地,第一预设处理为扭曲处理,第二预设处理为特征融合处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型,目标特征图为融合特征图。
此处的第一特征图可以为金字塔分层提取得到的特征图,对应为特征提取网络中任一特征提取模块中的特征提取单元的输出,扭曲后的第二特征图为对应扭曲处理模块对每层特征提取单元输出的第二特征图进行扭曲处理后得到的结果。
在一实施例中,得到融合特征图的实现方式可以为:利用特征融合网络对所述第一特征图和所述扭曲后的第二特征图进行特征融合处理,得到融合特征图;可选地,所述特征融合网络包括N个特征融合模块和M个上采样模块,所述M为大于或等于1的整数,M+1=N;所述特征融合网络中第i个特征融合模块的输入和N个扭曲处理模块中第i个扭曲处理模块的输出连接,第i个特征融合模块的输出和第j个上采样模块的输入连接,j为大于等于1的整数且j小于等于M,i=j+1;第j个上采样模块的 输出和第j个特征融合模块的输入连接;所述第i个扭曲处理模块用于对所述特征提取网络中第i个特征提取模块输出的第二特征图进行扭曲处理,i为大于等于1的整数,且i小于或等于N;第N个特征融合模块用于融合第N个扭曲处理模块输出的扭曲后的第二特征图和所述第N个特征提取单元输出的第一特征图;当i不等于N时,所述第i个特征融合模块用于融合第i个扭曲处理模块输出的扭曲后的第二特征图、所述第i个特征提取单元输出的第一特征图以及所述第i个上采样模块输出的特征图。
这里的特征融合网络用于融合扭曲处理模块输出的扭曲后的第二特征图和特征提取网络输出的第一特征图。在前述金字塔分层提取得到的第一特征图和第二特征图基础之上,特征融合网络和特征提取网络对应,也可以分为N个层级,每个层级包括特征融合模块和/或上采样模块,各个层级共有N个特征融合模块和M(即N-1)个上采样模块。以及对应有N个扭曲处理模块对第二特征图进行扭曲处理。基于上述内容,请参见图11示出的一种包括特征融合网络的结构示意图,每个特征融合模块输出的结果输入到上一个上采样模块中,也就是第N个特征融合模块的输出即第N-1个上采样模块的输入,这样由金字塔分层结构中的第1层级特征融合模块输出的特征图即为最终滤波使用的融合特征图。
下面结合图11示出的结构示意图对融合处理的具体处理流程进行如下说明:
第N层级:扭曲处理模块N接收并处理特征提取模块N输出的第一特征图Fdn和第二特征图Fin,输出扭曲后的第二特征图Fiwn,接着,特征融合模块N对第一特征图Fdn和扭曲后的Fiwn进行特征融合,得到融合特征图Fdfn,融合特征图Fdfn之后被输入第N-1层级的上采样模块M,即上采样模块(N-1)进行上采样处理。
第N-1层级:上采样模块(N-1)对融合特征图Fdfn进行上采样处理,得到经过上采样的特征图Fun,经过上采样的特征图Fun输出给第N-1层级的特征融合模块N-1,同时扭曲处理模块(N-1)接收并处理特征提取模块(N-1)输出的第一特征图Fd(n-1)和第二特征图Fi(n-1),输出扭曲后的第二特征图Fiw(n-1)。接下来,特征融合模块(N-1)对第一特征图Fd(n-1)、扭曲后的特征图Fiw(n-1)和经过上采样的特征图Fun进行特征融合,得到融合特征图Fdf(n-1)。该融合特征图Fdf(n-1)将输入第N-2层级的上采样模块(N-2)进行上采样处理。
第N-2层级:上采样模块(N-2),即上采样模块(M-1)对融合特征图Fdf(n-1)进行上采样,得到经过上采样的特征图Fu(n-1),并将经过上采样的特征图Fu(n-1)输出给第N-2层级的特征融合模块(N-2),特征融合模块(N-2)对扭曲处理模块(N-2)输出的扭曲后的第二特征图Fiw(n-2)、特征提取模块(N-2)输出的第一特征图Fd(n-2)以及经过上采样的特征图Fu(n-1)进行特征融合,得到融合特征图Fdf(n-2)。
以此类推,后续的层级都是采用相同的处理方式,直到第1层级。
第1层级:扭曲处理模块1接收并处理特征提取模块1输出的第一特征图Fd1和第二特征图Fi1,输出扭曲后的第二特征图Fiw1,接下来,融合特征模块1对第一特征图Fd1、扭曲后的第二特征图Fiw1以及来自第2层级的经过上采样的特征图Fu2进行特征融合,得到融合特征图Fdf。该融合特征图Fdf最终可以用于确定第一图像块的以处理结果。在上述过程中,通过对融合后的特征图的上采样可以放大特征图的尺寸大小,进而最后得到和输入特征图的尺寸相同的融合特征图,对该融合特征图进行滤波,以得到更优质的重建图像。
可以理解的是,在图11示出的结构图中,从横向看,包括N个层级,可选地,第1层级至第N-1层级中的每个层级均包括特征提取模块、扭曲处理模块、特征融合模块以及上采样模块,第N层级中包括特征提取模块、扭曲处理模块、特征融合模块,每个层级由特征融合模块输出处理的数据,即融合特征图。从纵向看,N个层级中的特征提取模块以及扭曲处理模块对应的处理逻辑由上至下,即从第1层级至第N层级,特征融合模块以及上采样模块的处理由下至上,即从第N层级至第1层级,构成的金字塔处理模型可以实现特征图的准确提取。此外,需要说明的是,对于层级的数量N或者说各个模块的数量可以按需设定,也可以根据经验值设定,在此不做限制。
在一实施例中,各个特征融合模块对第一特征图和扭曲后的第二特征图进行特征融合,得到融合后的特征图Fdf(即融合特征图)的可选方式可以是:将对应通道上的第一特征图和扭曲后的第二特征图相加,通道数不变(即,add操作);也可以是:将扭曲后的第二特征图Fiw和第一特征图Fd输入连接层(concatenate),通过连接操作,输出融合后的特征(即,concat操作),例如,连接层的每个输出通道为:
Figure PCTCN2022144217-appb-000003
可选地,*表示卷积,C表示通道数,Xi表示第i通道的第一特征图,Yi表示第i通道的第二特征图,Ki表示第一特征图对应的卷积核,Ki+c表示第二特征图对应的卷积核。
基于上述内容,可以将上述步骤S702和S703中涉及的粗略匹配、精细匹配、特征提取、扭曲处理以及特征融合的具体示例总结为如图12a所示的内容,对应为图6所示的融合模块的详细的结构示意图,例如根据第二图像块和辅助信息对第一图像块进行处理,即可以包括精细匹配、特征提取、扭曲处理。在另一种实施例中,还可以包括不做精细匹配,即对应图12b所示出的结构示意图。在此将融合模块的处理逻辑称为第三预设处理,融合模块对应称为第三预设处理模块。
根据所述目标特征图确定或生成对应于所述第一图像块的处理结果。
在一个实施例中,可以对所述目标特征图进行滤波处理,得到滤波后的目标特征图;以及根据所述滤波后的目标特征图确定对应于所述第一图像块的所述处理结果。进一步地,可以利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图。可选地,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种。利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图的步骤,包括:对至少一个所述第一处理单元处理后的目标特征图进行下采样处理,得到下采样后的目标特征图;对所述下采样后的目标特征图进行上采样处理,得到目标融合特征图;利用所述第二处理单元处理所述目标融合特征图,得到滤波后的目标特征图。
可选地,目标特征图为融合特征图。处理结果用于生成所述第一图像块对应的重建图像或解码图像。本实施例提供的方案应用于多视点编码器端时,处理结果用于生成第一图像块对应的重建图像,应用于多视点解码器端时,处理结果用于生成第一图像块对应的解码图像。在一个实施例中,步骤(3)的可选实现方式包括:对所述融合特征图进行滤波处理,得到滤波后的融合特征图;根据所述滤波后的 融合特征图确定对应于所述第一图像块的一处理结果。该处理结果包括第一视点滤波后的第一图像块,例如当第一图像块为依赖视点的当前帧的当前重建纹理图像时,此处得到的处理结果可以为依赖视点当前帧滤波后的重建纹理块。后续该处理结果还可能经过其他滤波处理(例如ALF),根据滤波处理得到的结果进一步合成重建图像或解码图像。
在一可行的实施方式中,对融合特征图的滤波处理可以是:利用目标滤波处理模型对所述融合特征图进行滤波处理,得到滤波后的融合特征图;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。
目标滤波处理模型包括的目标候选模型可以为神经网络模型,该神经网络模型设置于基于神经网络的滤波处理模块中。可选地,基于神经网络的滤波处理模块的结构可以如图13所示,包括至少一个卷积层和至少一个残差单元。融合后的特征图Fdf(即融合特征图)被馈入卷积层1,经过D个残差单元,以及一个卷积层2之后,输出第一视点滤波后的第一图像块。可选地,D为大于等于1的整数,该基于神经网络的滤波处理模块对应为基于神经网络的环路滤波器(例如DRNLF)中的处理模块。
在一个实施例中,每个基于神经网络的滤波处理模块具有多个候选模型,每个候选模型对应于不同的量化参数,该量化参数自量化参数图(QP map)得到,量化参数图是填充有多个量化参数的矩阵。在训练阶段,可以针对不同的量化参数训练多个候选模型,并将最佳的候选模型与量化参数对应,在编码阶段,可以从多个候选模型中选择出率失真代价最低的目标候选模型,利用该目标候选模型对融合特征图进行滤波处理。
在一个实施例中,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;所述利用目标滤波处理模型对所述融合特征图进行滤波处理,得到滤波后的融合特征图,包括:对至少一个所述第一处理单元处理后的融合特征图进行下采样处理,得到下采样后的融合特征图;对所述下采样后的融合特征图进行上采样处理,得到目标融合特征图;利用所述第二处理单元处理所述目标融合特征图,得到滤波后的融合特征图。
可选地,目标滤波处理模型包括的第一处理单元为卷积单元(或卷积层),第二处理单元为残差单元,目标滤波处理模型如图13所示,融合特征图输入残差单元1~D-1中,对至少一个残差单元输出的残差数据进行缩放处理a(例如下采样处理,或者除以缩放因子),优选地,可以通过缩放处理将残差单元1~D-1残差单元中的至少一个残差单元输出的残差数据缩放至0~1的范围内;之后,在卷积层2中,卷积层2接收残差单元D输出的残差数据之后,将残差单元D的残差数据进行缩放处理b(例如上采样处理,或者乘以缩放处理a对应的缩放因子),再将经过缩放处理b的残差数据和融合特征图Fdf在卷积层2进行对应关系的映射和合成,得到第一视点当前帧滤波后的第一图像块。这样通过对至少一个残差单元输出的残差数据进行缩放处理,将残差数据的处理量缩小到一定范围,可以极大地减低用于多视点编码的神经网络的环路滤波器的计算复杂度,提高滤波处理的效率。
综上所述,通过本申请实施例提供的图像处理方案,在基于神经网络的环路滤波阶段,考虑到同一时刻不同视点之间的图像之间存在的视差的影响,在滤波之前融入图像的深度信息或视差信息,利用深度信息或视差信息从参考帧中查找到与第一图像块匹配的第二图像块,从而能够准确精细地确定最合适的第二图像块;通过在第一图像块中融合第二图像块的特征信息,可以进一步获得纹理和边缘更加清晰的增强图像块,进而减少视频的压缩失真,提高视频压缩质量。此外,在特征提取处理的过程中采用金字塔分层处理,构建不同尺度的特征金字塔,结合下采样处理,减小计算量的同时将更全面的描述不同视点的图像块的信息,通过扭曲第二特征图、融合第一特征图和扭曲后的第二特征图,将不同视点的特征信息加以结合,更好地还原第一视点的特征信息,提高多视点视频压缩编码的质量。
第三实施例
请参见图14,图14是根据第三实施例示出的一种图像处理方法的流程示意图,该实施例中的执行主体可以是一个计算机设备或者是多个计算机设备构成的集群,该计算机设备可以是智能终端(如前述移动终端100),也可以是服务器,此处,以本实施例中的执行主体为智能终端为例进行说明。
S1401,获取第二重建图像块。可选地,还可以获取第一重建图像块。可选地,所述第一重建图像块和所述第二重建图像块对应于相同或不同的重建图像。
在一实施例中,所述第一重建图像块和所述第二重建图像块对应于不同的重建图像。此处不同的重建图像是指同一时刻不同视点的重建帧。第一重建图像块和第二重建图像块匹配,例如第一重建图像块为依赖视点的当前帧的当前重建纹理块,第二重建图像块为独立视点的参考图像中的匹配纹理块,其深度信息或视差信息和当前重建纹理块相似。重建纹理块可以是CTU、切片(slice)、方块(tile)、子图像中的任一种,匹配纹理块对应为CTU、切片(slice)、方块(tile)、子图像中的任一种。可选地,第一重建图像块可以是前述对应于第一视点的第一图像块,第二重建图像块可以是前述对应于第二视点的参考图像中的图像块,例如第二图像块。第二重建图像块对应的重建图像和第二视点的参考图像对应。
对于第二重建图像块的获取方式,可以是:根据第一重建图像块的属性信息,从第一重建图像块对应的参考图像块中确定出第二重建图像块。属性信息可以是一辅助信息,包括但不限于深度信息或视差信息,可选地,属性信息可以对应于前述的第一辅助信息或者第二辅助信息,具体方式可参见前述第二实施例中介绍的获取第二图像块的方式,在此不做赘述。
S1402,根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
在一实施例中,第一重建图像块的属性信息为一辅助信息,所述第一重建图像块的属性信息,包括以下至少一种:所述第一重建图像块的帧间预测信息、所述第一重建图像块的深度信息、所述第一重建图像块的视差信息。可选地,第一重建图像块的属性信息可以对应为前述对应于第一视点的第一图像块的第一辅助信息。
第一重建图像块的深度信息或视差信息自对应的深度图像确定,深度信息可以包括:深度特征信息、基于深度值的统计信息、深度切片本身、预处理后的深度切片中的任一种或多种的组合。此外,属性信息还可以包括图像分割信息、量化参数信息等,在此不做限制。第二重建图像块的属性信息同理,可以包括第二重建图像块的帧间预测信息、所述第二重建图像块的深度信息、所述第二重建图像块的视差信息。第二重建图像块的深度信息或视差信息自对应的深度图像确定。即属性信息可以来自第一重建图像块或第二重建图像块。
在一个实施例中,若根据第二重建图像块和第一重建图像块的属性信息,对所述第一重建图像块进行滤波,得到滤波后的第一重建图像块,可以对应参考前述第二实施例中介绍的有关根据第二图像块 和第一图像块的第一辅助信息,对对应于第一视点的第一图像块进行处理并得到处理结果的过程,即将第一图像块和第二图像块依次对应为第一重建图像块和第二重建图像块,得到的处理结果即此处的滤波后的第一重建图像块。在此不做赘述。可选地,还可以根据第二重建图像块和第二重建图像块的属性信息,或者根据第二重建图像块的属性信息和第一重建图像块的属性信息,或者,根据第二图像块的属性信息(此处未列举完)对第一重建图像块进行滤波。
在另一个实施例中,还可以获取其他重建图像块,结合更多的信息对第一重建图像块进行滤波处理,以提高滤波后的第一重建图像块的质量。即S1402更详细的实现步骤还可以包括:
a、获取第三重建图像块,所述第三重建图像块对应的图像为所述第一重建图像块对应的图像的参考重建图像;
b、根据所述第三重建图像块、第二重建图像块和所述第一重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
第一重建图像块和第三重建图像块属于同一视点不同时刻编码的图像块,例如第一重建图像块为依赖视点当前帧的当前重建纹理块,第三重建图像块为依赖视点参考帧对应的纹理块。对应地,第三重建图像块对应的(或所在的)图像和第一重建图像块对应的图像为同一视点不同时刻的图像,在此,将第三重建图像块对应的图像称为参考重建图像。该参考重建图像为已编码的重建图像,属于第一视点,可以从图像缓冲器中读取,第三重建图像块可以根据第一重建图像块所在的图像和参考重建图像之间的帧间预测信息,从该第一视点的参考重建图像中获取到。
由第一重建图像块和第二重建图像块为同一时刻不同视点的图像块,以及第一重建图像块和第三重建图像块为同一视点不时刻点的图像块,可知,第三重建图像块和第二重建图像块为不同时刻不同视点的图像块。通过参考同一时刻不同视点的图像块、同一视点不同时刻的图像块以及有关的属性信息,对第一重建图像块进行滤波处理,可以从空间上不同的拍摄角度以及时间上不同的编码时刻,全面地使用有利于第一图像块滤波的信息,从而有效提高第一重建图像块的滤波质量,进而降低第一重建图像块对应的重建图像的失真。这里的属性信息可以包括帧间预测信息、深度信息或视差信息、量化参数信息等等。在不同的阶段,可以适应性地使用相应的属性信息,以使得第一重建图像块参考不同的属性信息更好地进行滤波。
在一可行的实施方式中,步骤b更详细的实现步骤可以包括:
根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述第三重建图像块和所述第一目标特征图进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
可选地,所述第三预设处理为融合处理,所述第一目标特征图为第一融合特征图,所述第二目标特征图为第二融合特征图。
对第一重建图像块和第二重建图像块的融合处理具体是指根据第一重建图像块对应的特征图和第二重建图像块的特征图进行一系列处理,得到的特征图称为第一融合特征图,通过参考深度信息或视差信息可以让融合得到第一融合特征图对第一重建图像块的信息描述更加全面。在得到第一融合特征图之后,可以对第三重建图像块和第一融合特征图进行融合处理,具体是根据第三重建图像块对应的特征图和第一融合特征图进行融合处理,进而得到第二融合特征图,第二融合特征图经过滤波之后,得到滤波后的第二融合特征图,该滤波后的第二融合特征图用于确定滤波后的第一重建图像块。需要说明的是,可以使用基于神经网络的环路滤波器实现上述内容,而基于神经网络的环路滤波器中集成相应的功能模块,各个功能模块按照上述内容执行,以增加第一重建图像块的滤波质量。对于具体的融合处理方式,可以参见下述结合图15的相关介绍。
请参见图15,是本实施例提供的一种基于神经网络的环路滤波器的结构示意图,该结构示意图包括融合模块1和融合模块2,以及基于神经网络的滤波处理模块,这里基于神经网络的滤波处理模块可以包括环路滤波处理中的一种或多种滤波器(例如DBF、SAO、ALF、DRNLF)的滤波处理。
在一个实施例中,融合模块1和融合模块2可以包括相同的功能单元,例如包括前述图12a示出的融合模块中的精细匹配、特征提取、扭曲处理以及特征融合这几个功能单元,又例如图12b示出的包括特征提取、扭曲处理以及特征融合这几个功能单元。在另一个实施例中,融合模块1和融合模块2也可以不同,例如融合模块1包括精细匹配、特征提取、扭曲处理以及特征融合这几个功能单元,融合模块2包括特征提取、扭曲处理以及特征融合这几个功能单元;又例如,融合模块1包括特征提取、扭曲处理以及特征融合这几个功能单元,融合模块2包括精细匹配、特征提取、扭曲处理以及特征融合这几个功能单元。需要说明的是,上述功能单元对应的具体处理逻辑和第二实施例中相关描述是相同的,区别仅在于对应的输入输出结果的不同。
融合模块1用于根据深度信息或视差信息对第一重建图像块和第二重建图像块进行融合处理,融合模块2用于根据帧间预测信息对第三重建图像块和第一融合特征图进行融合处理。结合上述内容,例如:在融合模块1中,可以根据深度信息或视差信息进行精细匹配,接着提取对应重建图像块的特征图,经过融合处理之后,得到第一融合特征图。然后融合模块2接收帧间预测信息,提取第三重建纹理块对应的特征图,对该特征图处理之后和第一融合特征图融合,得到第二融合特征图,此处在融合不同的特征图时参考帧间预测信息,可以使得第二融合特征图对第一重建图像块的特征图进行更准确地表达。之后将第二融合重建特征图经过基于神经网络的滤波处理,得到滤波后的第一重建图像块。
示例性地,第一重建图像块为依赖视点当前帧的当前重建纹理块,第二重建图像为独立视点参考帧对应重建纹理块,第三重建图像块为依赖视点参考帧对应重建纹理块,经过图15所示的基于神经网络的环路滤波器的处理,最终滤波得到的是依赖视点当前帧滤波后的当前重建纹理块。
在一个实施例中,对第一重建图像块和第二重建图像块进行第三预设处理的步骤为:根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图;根据第一预设处理后的第二重建特征图和所述第一重建特征图进行第二预设处理,得到第一目标特征图。
可选地,所述第三预设处理为融合处理,包括第一预设处理和第二预设处理,所述第一预设处理为扭曲处理,所述第二预设处理为特征融合处理,所述第一预设处理后的第二重建特征图为扭曲后的第二重建特征图,所述第一目标特征图为第一融合特征图。
需要说明的是,第三预设处理还包括特征提取处理。本实施例下述涉及的第三预设处理的具体处 理方式,同样可以采用以上步骤实现,例如对第一重建图像块和第三重建图像块的第三预设处理,此处不再展开阐述。本实施例中第三预设处理所指代的融合处理和第二预设处理所指代的特征融合处理为不同的处理逻辑。
可选地,上述确定重建图像块各自对应的特征图、对第二重建特征图进行扭曲处理以及对重建特征图进行融合的内容,和前述第二实施例中确定第一特征图和第二特征图、对第二特征图进行扭曲处理以及将其与第一特征图融合所涉及的处理逻辑是相同的。例如可以采用如图8a或图8b示出的特征提取网络提取重建图像块的特征图,利用图9a或图9b示出的扭曲处理模块得到扭曲后的第二重建特征图,利用图11示出的特征融合网络进行融合处理。相应的处理原理可以参见第二实施例中的描述内容,将其中的处理对象和处理结果代入本实施例中的相关内容即可,具体请参见如下内容,是相应内容的简要介绍:
在一实施例中,可以:基于特征提取网络和所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图。
在另一实施例中,所述第一重建图像块为切片,所述第二重建图像块对应为切片;此时可以精细匹配之后再确定第一重建特征图和第二重建特征图,即:获取所述第一重建图像块的第一重建图像子块和所述第二重建图像块的第二重建图像子块;所述第二重建图像子块的属性信息和所述第一重建图像子块的属性信息匹配;基于特征提取网络和所述第一重建图像子块的属性信息,对所述第一重建图像子块和所述第二重建图像子块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;可选地,所述属性信息包括深度信息或视差信息,所述第一重建图像块的属性信息和所述第一重建图像子块的属性信息不同。
上述第一重建图像子块和第二重建图像子块各自对应的属性信息匹配是指两个图像子块的属性信息相似度最大,例如深度信息或视差信息的相似程度最大,属性信息不同包括以下至少一种:所述第一重建图像块的属性信息和所述第一重建图像子块的属性信息的内容不同,例如是第一重建图像子块的深度信息为深度特征信息,第二图像子块的深度信息为基于深度值的统计信息;所述第一重建图像子块的属性信息的精度大于所述第一重建图像块的属性信息的精度,例如第一重建图像块的深度信息为n个深度特征信息,第一重建图像子块的的属性信息为m个深度特征信息,且m大于n。
可选地,所述第一重建图像子块和所述第二重建图像子块为编码树块或者扩展的编码树块;所述扩展的编码树块为所述编码树块的边缘扩展之后得到的,所述扩展的编码树块尺寸大于所述编码树块。这样输入的图像块具有编码单元的相邻像素,在滤波阶段可以有效降低图像块划分的块效应影响。
可选地,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元。对于特征提取网络的具体结构设计,可以采用图8a或图8b所示的结构,不同的是:所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一重建图像块和所述第二重建图像块,或者所述第一重建图像子块和所述第二重建图像子块。特征提取单元处理对应重建图像块或重建图像子块得到的特征图称为重建特征图,包括第一重建特征图和第二重建特征图,属性信息可以作为监督信息或参考信息,包括深度信息或视差信息,作用和第一辅助信息或第二辅助信息类似,此处不赘述,在处理第三重建图像块的场景下,特征提取网络接收的还可以是帧间预测信息。
对第二重建特征图的第一预设处理,也即:基于所述第一重建特征图和所述第二重建特征图确定第一预设处理参数;或者,基于所述第一重建特征图、所述第二重建特征图、所述属性信息确定第一预设处理参数;基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在一实施例中,第一预设处理为扭曲处理,第一预设处理模型为扭曲模型,第一预设处理参数为扭曲参数。
可选地,基于第一预设处理模型的第一预设处理如下:根据所述第一处理模型和所述第二处理模型确定所述第二重建特征图的采样点坐标,可选地,所述第二处理模型包括的目标像素坐标;根据所述第二重建特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成第一预设处理后的第二重建特征图。可参见前述对第二特征图的第一预设处理,将对应的内容代入此处的第二重建特征图即可得到相应的结果,这里不再赘述。
得到第一预设处理后的第二重建特征图之后,再将其和第一重建特征图进行第二预设处理,包括:利用特征融合网络对所述第一重建特征图和所述第一预设处理后的第二重建特征图进行第二预设处理,得到第一融合重建特征图。第二预设处理为特征融合处理,此处的特征融合网络可以采用图11中包括的特征融合网络,该特征融合网络结合第一预设处理模块以及特征提取网络进行第二预设处理,对金字塔分层处理得到的第一重建特征图以及第一预设处理后的第二重建特征图进行融合,最终由特征融合模块输出第一融合重建特征图。对于具体的处理流程这里也不再赘述。
对第二目标特征图进行滤波处理的实现方式,可以是:利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图;根据所述滤波后的第二目标特征图生成滤波后的第一重建图像块;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。在一个实施例中,目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;所述利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图,包括:对至少一个所述第一处理单元处理后的第二目标特征图进行下采样处理,得到下采样后的第二目标特征图;对所述下采样后的第二目标特征图进行上采样处理,得到目标融合重建特征图;利用第二处理单元处理所述目标融合重建特征图,得到滤波后的第二目标特征图。
对第二目标特征图的滤波处理可以参见第二实施例中对目标特征图的滤波处理,其处理逻辑是相同的,此处不再赘述。
在另一可行的实施方式中,上述步骤b更详细的实现步骤可以包括:
根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;对所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述滤波后的第一目标特征图和所述第三重建图像块进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
可选地,所述第三预设处理为融合处理,包括第一预设处理和第二预设处理,所述第一预设处理为扭曲处理,所述第二预设处理为特征融合处理,所述第一目标特征图为第一融合特征图,所述第二目标特征图为第二融合特征图。此外,第三预设处理模块还包括特征提取处理。
对第一重建图像块和第二重建图像块的融合处理,前述已进行相关介绍,在此不做赘述;滤波后的第一融合特征图和第三重建图像块的融合处理,具体是根据第一融合特征图和第三重建图像块对应的特征图一系列处理得到第二融合特征图,方式同对第一重建图像块和第二重建图像块的融合,相应的内容也可以参考对第一重建图像块和第二重建图像块的融合,此处不展开说明;对第一融合特征图以及第二融合特征图的滤波处理可以采用如图13示出的基于神经网络的滤波处理模块来实现,相应的处理方式前述也已介绍,这里不再赘述。
请参见图16,是本申请实施例提供的另一种基于神经网络的环路滤波器的结构示意图,包括两个融合模块和两个基于神经网络的滤波处理模块,其中融合模块1和融合模块2可以包括相同的功能单元,也可以包括不同的功能单元,具体可参考图15有关融合模块1和融合模块2的介绍内容,在此不做赘述。基于神经网络的滤波处理模块1和基于神经网络的滤波处理模块2均可以采用图13示出的内容。和图15不同的是,按照如图16示出的基于神经网络的环路滤波器的结构进行滤波处理,融合模块1得到的第一融合特征图是在经过基于神经网络的滤波处理之后再输入融合模块2中进行处理,得到第二融合特征图,再将第二融合特征图经过基于神经网络的滤波处理模块进行滤波处理,此实施方式下,通过两个融合模块和基于神经网络的滤波处理模块组合的串联,可以在第一次融合处理时参考同一不同视点的帧的信息,并通过基于神经网络的滤波处理有效维护原始图像的特征信息,然后经过再一次融合处理和基于神经网络的滤波处理可以进一步参考同一视点的参考帧的信息,充分融合和第一重建图像块有关的各种信息,进一步降低第一重建图像块对应的重建图像的失真度。
需要说明的是,对于本实施例中图15至图17示出的基于神经网络的环路滤波器中包括的融合模块和基于神经网络的滤波处理模块的数量,以及组合方式仅作为示例性地说明和展示,还可以包括其他组合方式,例如在图16的基础上再串联一个融合模块和基于神经网络的滤波处理模块,在此不做限定。
在又一可行的实施方式中,上述步骤b更详细的实现步骤可以包括:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第三重建图像块和所述第一重建图像块的帧间预测信息,对所述第一重建图像块和所述第三重建图像块进行第三预设处理,得到第二目标特征图;根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块。
可选地,所述第三预设处理为融合处理,包括第一预设处理和第二预设处理,所述第一预设处理为扭曲处理,所述第二预设处理为特征融合处理,所述第一目标特征图为第一融合特征图,所述第二目标特征图为第二融合特征图。需要说明的是,第三预设处理还可以包括特征提取处理。
此处对第一重建图像块和第二重建图像块进行融合处理,具体是根据第一重建图像块对应的特征图和第二重建图像块对应的特征图进行一系列处理,可选地,在具体的处理过程中,首先,可以通过深度信息或视差信息进行精细匹配,从第二重建图像块中查找到和第一重建图像块的重建图像子块匹配的重建图像子块,然后再参考深度信息或视差信息对各自的重建图像子块进行特征提取,得到对应的特征图,在一实施例中,也可以不做精细匹配,而是直接提取重建图像块的特征,得到对应的特征图;接着参考深度信息或视差信息对第二重建图像块对应的第二重建特征图进行扭曲处理后,再和第一重建图像块对应的第一重建特征图融合,得到第一融合特征图。同理,按照同样的方式可以根据帧间预测信息,对第一重建图像块和第三重建图像块融合,得到第二融合特征图。
需要说明的是,采用金字塔分层处理方式可以提取图像块或图像子块的特征图,详细过程可以参见前述的介绍,在此不做赘述。
在一个实施例中,根据第一目标特征图和第二目标特征图确定滤波后的第一重建图像块的步骤可以包括:对所述第一目标特征图和所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图和滤波后的第二目标特征图;根据所述滤波后的第一目标特征图和所述滤波后的第二目标特征图进行第三预设处理,得到目标融合重建图像块;将所述目标融合重建图像块作为滤波后的第一重建图像块。
可选地,所述第三预设处理为融合处理,包括第一预设处理和第二预设处理,所述第一预设处理为扭曲处理,所述第二预设处理为特征融合处理,所述第一目标特征图为第一融合特征图,所述第二目标特征图为第二融合特征图,所述滤波后的第一目标特征图为滤波后的第一融合特征图,所述滤波后的第二目标特征图为滤波后的第二融合特征图。此外,第三预设处理还包括特征提取处理。
这里的滤波处理可以包括对第一融合特征图和第二融合特征图分别采用不同的神经网络模型进行滤波处理,第一融合特征图和第二融合特征图都滤波之后再进行融合处理的方式可以是:对滤波后的第二融合特征图进行扭曲处理之后再和滤波后的第一融合特征图进行融合,也可以是直接对滤波后的融合特征图进行融合处理,根据融合的特征图得到目标融合重建图像块,也即滤波后的第一重建图像块。
请参见图17,是申请实施例提供的又一种基于神经网络的环路滤波器的结构示意图,包括融合模块1、融合模块2和融合模块3,以及基于神经网络的滤波处理模块1和基于神经网络的滤波处理模块2。各个融合模块的内部结构可以相同,也可以不同,例如融合模块3包括特征提取、扭曲处理、特征融合这三个功能单元,而融合模块1和融合模块2包括精细匹配、特征提取、扭曲处理、特征融合这四个功能单元。融合模块1用于根据深度信息或视差信息处理第一重建图像块和第二重建图像块,得到第一融合特征图,第一重建图像块可以为依赖视点当前帧的当前重建纹理块,第二重建图像块可以为独立视点参考帧对应的匹配纹理块。融合模块2用于根据帧间预测信息对第一重建图像块和第三重建图像块进行处理,得到第二融合特征图,第三重建图像块可以为依赖视点参考帧对应的纹理块。融合模块3可以根据第一融合特征图和第二融合特征图进行融合处理,例如是将第一特征图和扭曲后的第二融合特征图进行融合处理,根据融合的特征图得到滤波后的第一重建图像块,例如依赖视点当前帧滤波后的当前重建纹理块。
这一实施方式通过并列的融合模块和基于神经网络的滤波处理模块,参考不同的属性信息、第二重建图像块的特征信息和第三重建图像块的特征信息对第一重建图像块进行处理,得到滤波后的第一重建图像块,这样可以从不同视点的图像以及同一视点的已编码图像中获取有利于滤波的辅助信息,进而提高第一重建图像块所在的重建图像的质量,降低视频失真。
综上所述,通过本申请实施例提供的方案,可以根据已编码的重建图像块(包括同一视点不同时刻的重建图像块以及同一时刻不同视点的重建图像块),并结合关于第一重建图像块的属性信息(包括深度信息或视差信息、帧间预测信息),对当前正在编码的重建图像块进行滤波处理,可以包括特征提 取处理、扭曲处理、特征融合处理等,在具体的实现中,可以不同数量的融合模块和基于神经网络的滤波处理模块的组合,这样能够通过多次融合以及滤波处理的结合,充分地参考其他重建图像块中有用的信息,将其他有关的特征信息融入最终的滤波结果中,进而有效提高滤波的第一重建图像块的质量,降低重建图像的失真。
第四实施例
请参见图18,图18是根据第四实施例示出的一种图像处理装置的结构示意图,该图像处理装置可以是运行于服务器中的一个计算机程序(包括程序代码),例如图像处理装置为一个应用软件;该装置可以用于执行本申请实施例提供的方法中的相应步骤。该图像处理装置1800包括:获取模块1801、处理模块1802。
获取模块1801,用于获取第一辅助信息
在一实施例中,获取模块1801,还用于进一步获取对应于第一视点的第一图像块,和/或,获取对应于第二视点的参考图像。
处理模块1802,用于根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。确定或生成一处理结果可以用于获得所述第一图像块对应的重建图像或解码图像;所述参考图像为对应于第二视点的图像;以及所述第二视点不同于所述第一视点。
在一个实施例中,处理模块1802,具体用于:根据所述参考图像的第二图像块和第一辅助信息,确定或生成对应于所述第一图像块的一处理结果。
在一个实施例中,处理模块1802,具体用于:根据所述第一辅助信息从所述参考图像中确定第二图像块;确定所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图;根据所述第一特征图以及所述第二特征图确定或生成对应于所述第一图像块的一处理结果。
在一个实施例中,处理模块1802,具体用于:根据所述第一特征图对所述第二特征图进行第一预设处理,得到目标第二特征图;根据所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图;根据所述目标特征图确定或生成对应于所述第一图像块的处理结果。
在一个实施例中,处理模块1802,具体用于:获取所述第一图像块的第一辅助信息,所述第一辅助信息包括深度信息,所述深度信息根据所述第一图像块对应的深度图像确定;获取所述参考图像中各个图像块的第一辅助信息和所述第一图像块的第一辅助信息的相似度;将所述参考图像中所述相似度最大的图像块确定为与所述第一图像块匹配的第二图像块。
在一个实施例中,所述第一辅助信息包括深度信息或视差信息;所述深度信息为以下至少一种:深度特征信息、基于深度值的统计信息、深度切片、预处理之后的深度切片。
在一个实施例中,包括以下至少一种:所述第二图像块与所述第一图像块的尺寸相同;当所述第一图像块为切片或编码树块时,所述第二图像块对应为切片或编码树块;当所述第二图像块为切片时,所述第二图像块由多个编码树单元构成。
在一个实施例中,处理模块1802,具体用于:基于特征提取网络和所述第一辅助信息对所述第一图像块和所述第二图像块进行特征提取处理,得到所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图。
在一个实施例中,所述第一图像块为切片,所述第二图像块对应为切片;处理模块1802,具体用于:获取所述第一图像块的第一图像子块和所述第二图像块的第二图像子块;所述第二图像子块的第二辅助信息和所述第一图像子块的第二辅助信息匹配;基于特征提取网络和所述第二辅助信息对所述第一图像子块和所述第二图像子块进行特征提取处理,得到所述第一图像子块的第一子特征图和所述第二图像子块的第二子特征图;通过所述第一子特征图,确定或生成所述第一图像块对应的第一特征图,以及通过所述第二子特征图,确定或生成所述第二图像块对应的第二特征图;可选地,所述第二辅助信息与所述第一辅助信息不同。
在一个实施例中,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元;所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一图像块和所述第二图像块,或者所述第一图像子块和所述第二图像子块;所述N个级联的特征提取模块中除所述第一个特征提取模块之外的各个特征提取模块,用于处理前一个特征提取模块的输出;针对所述每个特征提取模块,所述下采样单元的输入和所述特征提取单元的输出连接,所述下采样单元的输出和后一个特征提取模块中特征提取单元的输入连接;可选地,所述第一辅助信息或所述第二辅助信息作为所述N个级联的特征提取模块中至少一个特征提取模块的监督信息。
在一个实施例中,所述第一图像子块和所述第二图像子块为编码树块或者扩展的编码树块;所述扩展的编码树块为所述编码树块的边缘扩展之后得到的,所述扩展的编码树块的尺寸大于所述编码树块的尺寸。
在一个实施例中,处理模块1802,具体用于:基于所述第一特征图和所述第二特征图确定第一预设处理参数;或者,基于所述第一特征图、所述第二特征图以及所述第一辅助信息确定第一预设处理参数;基于第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图;所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在一个实施例中,所述第一预设处理模型包括所述第一处理模型和第二处理模型,处理模块1802,具体用于:根据所述第一处理模型和所述第二处理模型确定所述第二特征图中的采样点坐标,可选地,所述第二处理模型包括目标像素坐标;根据所述第二特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成目标第二特征图。
可选地,第一预设处理为扭曲处理,第二预设处理为特征融合处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型,目标特征图为融合特征图。
在一个实施例中,处理模块1802,具体用于:利用特征融合网络对所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图;可选地,所述特征融合网络包括N个特征融合模块和M个上采样模块,所述M为大于或等于1的整数,M+1=N;所述特征融合网络中第i个特征融合模块的输入和N个第一预设处理模块中第i个第一预设处理模块的输出连接,第i个特征融合模块的输出和第j个上采样模块的输入连接,j为大于等于1的整数且j小于等于M,i=j+1;第j个上采样模块的输出和第j个特征融合模块的输入连接;所述第i个第一预设处理模块用于对所述特征提取网络中第i个特征提取模块输出的第二特征图进行第一预设处理,i为大于等于1的整数,且i小于或等于N;第N个特征融合模块用于融合第N个第一预设处理模块输出的目标第二特征图和所述第N个特征提取单元 输出的第一特征图;当i不等于N时,所述第i个特征融合模块用于融合第i个第一预设处理模块输出的目标第二特征图、所述第i个特征提取单元输出的第一特征图以及所述第i个上采样模块输出的特征图。
在一个实施例中,处理模块1802,具体用于:对所述目标特征图进行滤波处理,得到滤波后的目标特征图;根据所述滤波后的目标特征图确定对应于所述第一图像块的处理结果。
在一个实施例中,处理模块1802,具体用于:利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。
在一个实施例中,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;处理模块1802,具体用于:对至少一个所述第一处理单元处理后的目标特征图进行下采样处理,得到下采样后的目标特征图;对所述下采样后的目标特征图进行上采样处理,得到目标融合特征图;利用所述第二处理单元处理所述目标融合特征图,得到滤波后的目标特征图。
在一个可行的实施例中,上述图像处理装置还可以用于实现以下方法的步骤:
获取模块1801,还用于获取第一重建图像块和第二重建图像块;
处理模块1802,还用于根据所述第二重建图像块和所述第一重建图像块和/或所述第二重建图像块的属性信息,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块;可选地,所述第一重建图像块和所述第二重建图像块对应于不同或相同的重建图像。
在一个实施例中,所述第一重建图像块的属性信息,包括以下至少一种:所述第一重建图像块的帧间预测信息、所述第一重建图像块的深度信息、所述第一重建图像块的视差信息。
在一个实施例中,处理模块1802,具体用于:获取第三重建图像块,所述第三重建图像块对应的图像为所述第一重建图像块对应的图像的参考重建图像;根据所述第三重建图像块、第二重建图像块和所述第一重建图像块的属性信息,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
在一个实施例中,处理模块1802,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述第三重建图像块和所述第一目标特征图进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
在一个实施例中,处理模块1802,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;对所述第一目标特征图进行滤波处理,得到滤波后的第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述滤波后的第一目标特征图和所述第三重建图像块进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
在一个实施例中,处理模块1802,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第三重建图像块和所述第一重建图像块的帧间预测信息,对所述第一重建图像块和所述第三重建图像块进行第三预设处理,得到第二目标特征图;根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块。
在一个实施例中,处理模块1802,具体用于:对所述第一目标特征图和所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图和滤波后的第二目标特征图;根据所述滤波后的第一目标特征图和所述滤波后的第二目标特征图进行第三预设处理,得到目标融合重建图像块;将所述目标融合重建图像块作为滤波后的第一重建图像块。
在一个实施例中,处理模块1802,具体用于:根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图;根据第一预设处理后的第二重建特征图和所述第一重建特征图进行第二预设处理,得到第一目标特征图。
在一个实施例中,处理模块1802,具体用于:基于特征提取网络和所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图。
在一个实施例中,所述第一重建图像块为切片,所述第二重建图像块对应为切片;处理模块1802,具体用于:获取所述第一重建图像块的第一重建图像子块和所述第二重建图像块的第二重建图像子块;所述第二重建图像子块的属性信息和所述第一重建图像子块的属性信息匹配;基于特征提取网络和所述第一重建图像子块的属性信息,对所述第一重建图像子块和所述第二重建图像子块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;可选地,所述属性信息包括深度信息或视差信息,所述第一重建图像块的属性信息和所述第一重建图像子块的属性信息不同。
在一个实施例中,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元;所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一重建图像块和所述第二重建图像块,或者所述第一重建图像子块和所述第二重建图像子块;所述N个级联的特征提取模块中除所述第一个特征提取模块之外的各个特征提取模块,用于处理前一个特征提取模块的输出;针对所述每个特征提取模块,所述下采样单元的输入和所述特征提取单元的输出连接,所述下采样单元的输出和后一个特征提取模块中特征提取单元的输入连接;可选地,所述属性信息作为所述N个级联的特征提取模块中至少一个特征提取模块的监督信息。
在一个实施例中,所述第一重建图像子块和所述第二重建图像子块为编码树块或者扩展的编码树块;所述扩展的编码树块为所述编码树块的边缘扩展之后得到的,所述扩展的编码树块尺寸大于所述编码树块。
在一个实施例中,处理模块1802,具体用于:基于所述第一重建特征图和所述第二重建特征图确定第一预设处理参数;或者,基于所述第一重建特征图、所述第二重建特征图、所述属性信息确定第一预设处理参数;基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在一个实施例中,处理模块1802,具体用于:根据所述第一处理模型和所述第二处理模型确定,得到所述第二重建特征图的采样点坐标,可选地,所述第二处理模型包括的目标像素坐标;根据所述第二重建特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成第一预设处理后的第二重建特征图。
可选地,第一预设处理为扭曲处理,第二预设处理为特征融合处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型,目标特征图为融合特征图。
在一个实施例中,处理模块1802,具体用于:利用特征融合网络对所述第一重建特征图和所述第一预设处理后的第二重建特征图进行第二预设处理,得到第一目标特征图;可选地,所述特征融合网络包括N个特征融合模块和M个上采样模块,所述M为大于或等于1的整数,M+1=N;所述特征融合网络中第i个特征融合模块的输入和N个第一预设处理模块中第i个第一预设处理模块的输出连接,第i个特征融合模块的输出和第j个上采样模块的输入连接,j为大于等于1的整数且j小于等于M,i=j+1;第j个上采样模块的输出和第j个特征融合模块的输入连接;所述第i个第一预设处理模块用于对所述特征提取网络中第i个特征提取模块输出的第二特征图进行第一预设处理,i为大于等于1的整数,且i小于或等于N;第N个特征融合模块用于融合第N个第一预设处理模块输出的目标第二特征图和所述第N个特征提取单元输出的第一重建特征图;当i不等于N时,所述第i个特征融合模块用于融合第i个第一预设处理模块输出的第一预设处理后的第二重建特征图、所述第i个特征提取单元输出的第一重建特征图以及所述第i个上采样模块输出的特征图。
在一个实施例中,处理模块1802,具体用于:利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图;根据所述滤波后的第二目标特征图生成滤波后的第一重建图像块;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。
在一个实施例中,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;处理模块1802,具体用于:对至少一个所述第一处理单元处理后的第二目标特征图进行下采样处理,得到下采样后的第二目标特征图;对所述下采样后的第二目标特征图进行上采样处理,得到目标融合重建特征图;利用第二处理单元处理所述目标融合重建特征图,得到滤波后的第二目标特征图。
可以理解的是,本申请实施例所描述的图像处理装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
本申请实施例还提供一种图像处理方法,所述方法包括以下步骤:
S10:获取第二重建图像块;
S20:根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
可选地,所述S20步骤包括以下至少一种:
根据所述第二重建图像块,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据第一重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块和第一重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据第一重建图像块的属性信息和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
可选地,所述S10步骤,还包括:获取所述第一重建图像块。
可选地,所述第一重建图像块和所述第二重建图像块对应于相同或不同的重建图像。
可选地,所述S20步骤,包括以下步骤:
S201:获取第三重建图像块,所述第三重建图像块对应的图像为所述第一重建图像块对应的图像的参考重建图像;
S202:根据所述第三重建图像块、第二重建图像块和所述第一重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
可选地,所述S202步骤,包括以下至少一种:
根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,根据所述第一重建图像块的帧间预测信息,对所述第三重建图像块和所述第一目标特征图进行第三预设处理,得到第二目标特征图,对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块;
根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,对所述第一目标特征图进行滤波处理,得到滤波后的第一目标特征图,根据所述第一重建图像块的帧间预测信息,对所述滤波后的第一目标特征图和所述第三重建图像块进行第三预设处理,得到第二目标特征图,对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块;
根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,根据所述第三重建图像块和所述第一重建图像块的帧间预测信息,对所述第一重建图像块和所述第三重建图像块进行第三预设处理,得到第二目标特征图,根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块。
可选地,所述根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块,包括:
对所述第一目标特征图和所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图和滤波后的第二目标特征图;
根据所述滤波后的第一目标特征图和所述滤波后的第二目标特征图进行第三预设处理,得到目标融合重建图像块;
将所述目标融合重建图像块作为滤波后的第一重建图像块。
可选地,所述根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,包括:
根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;
根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图;
根据第一预设处理后的第二重建特征图和所述第一重建特征图进行第二预设处理,得到第一目标特征图。
可选地,所述根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图,包括:
基于特征提取网络和所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行特征提取处理,得到所述第一重建图像块的第一重建特征图和所述第二重建图像块的第二重建特征图。
可选地,所述第一重建图像块为切片,所述第二重建图像块对应为切片;所述根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图,包括:
获取所述第一重建图像块的第一重建图像子块和所述第二重建图像块的第二重建图像子块;
基于特征提取网络和所述第一重建图像子块的属性信息,对所述第一重建图像子块和所述第二重建图像子块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图。
可选地,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元。
可选地,所述根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,包括:
基于所述第一重建特征图和所述第二重建特征图确定第一预设处理参数;或者,基于所述第一重建特征图、所述第二重建特征图、所述属性信息确定第一预设处理参数;
基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
可选地,所述第一预设处理模型包括所述第一处理模型和第二处理模型,所述基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,包括:
根据所述第一处理模型和所述第二处理模型确定所述第二重建特征图的采样点坐标;
根据所述第二重建特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;
根据所述采样点坐标对应的目标像素值生成第一预设处理后的第二重建特征图。
可选地,所述对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块,包括:
利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图;
根据所述滤波后的第二目标特征图生成滤波后的第一重建图像块。
可选地,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;所述利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图,包括:
对至少一个所述第一处理单元处理后的第二目标特征图进行下采样处理,得到下采样后的第二目标特征图;
对所述下采样后的第二目标特征图进行上采样处理,得到目标融合重建特征图;
利用第二处理单元处理所述目标融合重建特征图,得到滤波后的第二目标特征图。
本申请实施例还提供一种智能终端,智能终端包括存储器、处理器,存储器上存储有图像处理程序,该图像处理程序被处理器执行时实现上述任一实施例中的图像处理方法的步骤。该智能终端可以是如图1所示的移动终端100。
应当理解,本申请实施例中所描述的移动终端可执行上述任一实施例的方法描述,也可执行上述所对应实施例中对该图像处理装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
在一可行的实施例中,如图1所示的移动终端100的处理器110可以用于调用存储器109中存储的图像处理程序,以执行如下操作:
获取第一辅助信息;
根据对应于第二视点的参考图像和所述第一辅助信息,对对应于第一视点的第一图像块进行处理。确定或生成的处理结果可以用于获得所述第一图像块对应的重建图像或解码图像;以及所述第二视点不同于所述第一视点。
在一实施例中,处理器110,具体用于:进一步获取对应于第一视点的第一图像块,和/或对应于第二视点的参考图像。
在一个实施例中,处理器110,具体用于:根据所述参考图像的第二图像块和第一辅助信息,确定或生成对应于所述第一图像块的一处理结果。
在一个实施例中,处理器110,具体用于:根据所述第一辅助信息从所述参考图像中确定第二图像块;确定所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图;根据所述第一特征图以及所述第二特征图确定或生成对应于所述第一图像块的一处理结果。
在一个实施例中,处理器110,具体用于:根据所述第一特征图对所述第二特征图进行第一预设处理,得到目标第二特征图;根据所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图;根据所述目标特征图确定或生成对应于所述第一图像块的所述处理结果。
在一个实施例中,处理器110,具体用于:获取所述第一图像块的第一辅助信息,所述第一辅助信息包括深度信息,所述深度信息根据所述第一图像块对应的深度图像确定;获取所述参考图像中各个 图像块的第一辅助信息和所述第一图像块的第一辅助信息的相似度;将所述参考图像中所述相似度最大的图像块确定为与所述第一图像块匹配的第二图像块。
在一个实施例中,所述第一辅助信息包括深度信息或视差信息;所述深度信息为以下至少一种:深度特征信息、基于深度值的统计信息、深度切片、预处理之后的深度切片。
在一个实施例中,包括以下至少一种:所述第二图像块与所述第一图像块的尺寸相同;当所述第一图像块为切片或编码树块时,所述第二图像块对应为切片或编码树块;当所述第二图像块为切片时,所述第二图像块由多个编码树单元构成。
在一个实施例中,处理器110,具体用于:基于特征提取网络和所述第一辅助信息对所述第一图像块和所述第二图像块进行特征提取处理,得到所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图。
在一个实施例中,所述第一图像块为切片,所述第二图像块对应为切片;处理器110,具体用于:获取所述第一图像块的第一图像子块和所述第二图像块的第二图像子块;所述第二图像子块的第二辅助信息和所述第一图像子块的第二辅助信息匹配;基于特征提取网络和所述第二辅助信息对所述第一图像子块和所述第二图像子块进行特征提取处理,得到所述第一图像子块的第一子特征图和所述第二图像子块的第二子特征图;通过所述第一子特征图,确定或生成所述第一图像块对应的第一特征图,以及通过所述第二子特征图,确定或生成所述第二图像块对应的第二特征图;可选地,所述第二辅助信息与所述第一辅助信息不同。
在一个实施例中,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元;所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一图像块和所述第二图像块,或者所述第一图像子块和所述第二图像子块;所述N个级联的特征提取模块中除所述第一个特征提取模块之外的各个特征提取模块,用于处理前一个特征提取模块的输出;针对所述每个特征提取模块,所述下采样单元的输入和所述特征提取单元的输出连接,所述下采样单元的输出和后一个特征提取模块中特征提取单元的输入连接;可选地,所述第一辅助信息或所述第二辅助信息作为所述N个级联的特征提取模块中至少一个特征提取模块的监督信息。
在一个实施例中,所述第一图像子块和所述第二图像子块为编码树块或者扩展的编码树块;所述扩展的编码树块为所述编码树块的边缘扩展之后得到的,所述扩展的编码树块的尺寸大于所述编码树块的尺寸。
在一个实施例中,处理器110,具体用于:基于所述第一特征图和所述第二特征图确定第一预设处理参数;或者,基于所述第一特征图、所述第二特征图以及所述第一辅助信息确定第一预设处理参数;基于第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图;所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在一个实施例中,所述第一预设处理模型包括所述第一处理模型和第二处理模型,处理器110,具体用于:根据所述第一处理模型和所述第二处理模型确定所述第二特征图中的采样点坐标,可选地,所述第二处理模型包括目标像素坐标;根据所述第二特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成目标第二特征图。
可选地,第一预设处理为扭曲处理,第二预设处理为特征融合处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型,目标特征图为融合特征图。
在一个实施例中,处理器110,具体用于:利用特征融合网络对所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图;可选地,所述特征融合网络包括N个特征融合模块和M个上采样模块,所述M为大于或等于1的整数,M+1=N;所述特征融合网络中第i个特征融合模块的输入和N个第一预设处理模块中第i个第一预设处理模块的输出连接,第i个特征融合模块的输出和第j个上采样模块的输入连接,j为大于等于1的整数且j小于等于M,i=j+1;第j个上采样模块的输出和第j个特征融合模块的输入连接;所述第i个第一预设处理模块用于对所述特征提取网络中第i个特征提取模块输出的第二特征图进行第一预设处理,i为大于等于1的整数,且i小于或等于N;第N个特征融合模块用于融合第N个第一预设处理模块输出的目标第二特征图和所述第N个特征提取单元输出的第一特征图;当i不等于N时,所述第i个特征融合模块用于融合第i个第一预设处理模块输出的目标第二特征图、所述第i个特征提取单元输出的第一特征图以及所述第i个上采样模块输出的特征图。
在一个实施例中,处理器110,具体用于:对所述目标特征图进行滤波处理,得到滤波后的目标特征图;根据所述滤波后的目标特征图确定对应于所述第一图像块的所述处理结果。
在一个实施例中,处理器110,具体用于:利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。
在一个实施例中,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;处理器110,具体用于:对至少一个所述第一处理单元处理后的目标特征图进行下采样处理,得到下采样后的目标特征图;对所述下采样后的目标特征图进行上采样处理,得到目标融合特征图;利用所述第二处理单元处理所述目标融合特征图,得到滤波后的目标特征图。
在另一个可行的实施例中,上如图1所示的移动终端100的处理器110可以用于调用存储器109中存储的图像处理程序,以执行如下操作:获取第一重建图像块和第二重建图像块;根据所述第二重建图像块和所述第一重建图像块和/或所述第二重建图像块的属性信息,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块;可选地,所述第一重建图像块和所述第二重建图像块对应于相同或不同的重建图像。
在一个实施例中,所述第一重建图像块的属性信息,包括以下至少一种:所述第一重建图像块的帧间预测信息、所述第一重建图像块的深度信息、所述第一重建图像块的视差信息。
在一个实施例中,处理器110,具体用于:获取第三重建图像块,所述第三重建图像块对应的图像为所述第一重建图像块对应的图像的参考重建图像;根据所述第三重建图像块、第二重建图像块和所述第一重建图像块的属性信息,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
在一个实施例中,处理器110,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述第三重建图像块和所述第一目标特征图进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
在一个实施例中,处理器110,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;对所述第一目标特征图进行滤波处理,得到滤波后的第一目标特征图;根据所述第一重建图像块的帧间预测信息,对所述滤波后的第一目标特征图和所述第三重建图像块进行第三预设处理,得到第二目标特征图;对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块。
在一个实施例中,处理器110,具体用于:根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图;根据所述第三重建图像块和所述第一重建图像块的帧间预测信息,对所述第一重建图像块和所述第三重建图像块进行第三预设处理,得到第二目标特征图;根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块。
在一个实施例中,处理器110,具体用于:对所述第一目标特征图和所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图和滤波后的第二目标特征图;根据所述滤波后的第一目标特征图和所述滤波后的第二目标特征图进行第三预设处理,得到目标融合重建图像块;将所述目标融合重建图像块作为滤波后的第一重建图像块。
在一个实施例中,处理器110,具体用于:根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图;根据第一预设处理后的第二重建特征图和所述第一重建特征图进行第二预设处理,得到第一目标特征图。
在一个实施例中,处理器110,具体用于:基于特征提取网络和所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图。
在一个实施例中,所述第一重建图像块为切片,所述第二重建图像块对应为切片;处理器110,具体用于:获取所述第一重建图像块的第一重建图像子块和所述第二重建图像块的第二重建图像子块;所述第二重建图像子块的属性信息和所述第一重建图像子块的属性信息匹配;基于特征提取网络和所述第一重建图像子块的属性信息,对所述第一重建图像子块和所述第二重建图像子块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;可选地,所述属性信息包括深度信息或视差信息,所述第一重建图像块的属性信息和所述第一重建图像子块的属性信息不同。
在一个实施例中,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元;所述N个级联的特征提取模块中第一个特征提取模块,用于处理所述第一重建图像块和所述第二重建图像块,或者所述第一重建图像子块和所述第二重建图像子块;所述N个级联的特征提取模块中除所述第一个特征提取模块之外的各个特征提取模块,用于处理前一个特征提取模块的输出;针对所述每个特征提取模块,所述下采样单元的输入和所述特征提取单元的输出连接,所述下采样单元的输出和后一个特征提取模块中特征提取单元的输入连接;可选地,所述属性信息作为所述N个级联的特征提取模块中至少一个特征提取模块的监督信息。
在一个实施例中,所述第一重建图像子块和所述第二重建图像子块为编码树块或者扩展的编码树块;所述扩展的编码树块为所述编码树块的边缘扩展之后得到的,所述扩展的编码树块尺寸大于所述编码树块。
在一个实施例中,处理器110,具体用于:基于所述第一重建特征图和所述第二重建特征图确定第一预设处理参数;或者,基于所述第一重建特征图、所述第二重建特征图、所述属性信息确定第一预设处理参数;基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
在一个实施例中,处理器110,具体用于:根据所述第一处理模型和所述第二处理模型确定,得到所述第二重建特征图的采样点坐标,可选地,所述第二处理模型包括的目标像素坐标;根据所述第二重建特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;根据所述采样点坐标对应的目标像素值生成第一预设处理后的第二重建特征图。
可选地,第一预设处理为扭曲处理,第二预设处理为特征融合处理,目标第二特征图为扭曲后的第二特征图,第一预设处理参数为扭曲参数,第一预设处理模型为扭曲模型,目标特征图为融合特征图。
在一个实施例中,处理器110,具体用于:利用特征融合网络对所述第一重建特征图和所述第一预设处理后的第二重建特征图进行第二预设处理,得到第一目标特征图;可选地,所述特征融合网络包括N个特征融合模块和M个上采样模块,所述M为大于或等于1的整数,M+1=N;所述特征融合网络中第i个特征融合模块的输入和N个第一预设处理模块中第i个第一预设处理模块的输出连接,第i个特征融合模块的输出和第j个上采样模块的输入连接,j为大于等于1的整数且j小于等于M,i=j+1;第j个上采样模块的输出和第j个特征融合模块的输入连接;所述第i个第一预设处理模块用于对所述特征提取网络中第i个特征提取模块输出的第二特征图进行第一预设处理,i为大于等于1的整数,且i小于或等于N;第N个特征融合模块用于融合第N个第一预设处理模块输出的目标第二特征图和所述第N个特征提取单元输出的第一重建特征图;当i不等于N时,所述第i个特征融合模块用于融合第i个第一预设处理模块输出的第一预设处理后的第二重建特征图、所述第i个特征提取单元输出的第一重建特征图以及所述第i个上采样模块输出的特征图。
在一个实施例中,处理器110,具体用于:利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图;根据所述滤波后的第二目标特征图生成滤波后的第一重建图像块;可选地,所述目标滤波处理模型包括根据率失真代价从多个候选模型中选择的目标候选模型,所述多个候选模型中每个候选模型和量化参数存在映射关系。
在一个实施例中,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;处理器110,具体用于:对至少一个所述第一处理单元处理后的第二目标特征图进行下采样处理,得到下采样后的第二目标特征图;对所述下采样后的第二目标特征图进行上采样处理,得到目标融合重建特征图;利用第二处理单元处理所述目标融合重建特征图,得到滤波后的第二目标特征图。
应当理解,本申请实施例中所描述的移动终端可执行上述任一实施例的方法描述,也可执行上述所对应实施例中对该图像处理装置的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也 不再进行赘述。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有图像处理程序,图像处理程序被处理器执行时实现上述任一实施例中的图像处理方法的步骤。
在本申请提供的智能终端和计算机可读存储介质的实施例中,可以包含任一上述图像处理方法实施例的全部技术特征,说明书拓展和解释内容与上述方法的各实施例基本相同,在此不再做赘述。
本申请实施例还提供一种计算机程序产品,计算机程序产品包括计算机程序代码,当计算机程序代码在计算机上运行时,使得计算机执行如上各种可能的实施方式中的方法。
本申请实施例还提供一种芯片,包括存储器和处理器,存储器用于存储计算机程序,处理器用于从存储器中调用并运行计算机程序,使得安装有芯片的设备执行如上各种可能的实施方式中的方法。
可以理解,上述场景仅是作为示例,并不构成对于本申请实施例提供的技术方案的应用场景的限定,本申请的技术方案还可应用于其他场景。例如,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例设备中的单元可以根据实际需要进行合并、划分和删减。
在本申请中,对于相同或相似的术语概念、技术方案和/或应用场景描述,一般只在第一次出现时进行详细描述,后面再重复出现时,为了简洁,一般未再重复阐述,在理解本申请技术方案等内容时,对于在后未详细描述的相同或相似的术语概念、技术方案和/或应用场景描述等,可以参考其之前的相关详细描述。
在本申请中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本申请技术方案的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本申请记载的范围。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,被控终端,或者网络设备等)执行本申请每个实施例的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络,或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、存储盘、磁带)、光介质(例如,DVD),或者半导体介质(例如固态存储盘Solid State Disk(SSD))等。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (30)

  1. 一种图像处理方法,其特征在于,包括以下步骤:
    S1:获取第一辅助信息;
    S2:根据参考图像和所述第一辅助信息,对第一图像块进行处理。
  2. 如权利要求1所述的方法,其特征在于,所述S2步骤之前,还包括:
    获取对应于第一视点的第一图像块;和/或,获取对应于第二视点的参考图像。
  3. 如权利要求1所述的方法,其特征在于,所述S2步骤,包括以下步骤:
    S21:根据所述第一辅助信息从所述参考图像中确定第二图像块;
    S22:确定所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图;
    S23:根据所述第一特征图以及所述第二特征图确定或生成对应于所述第一图像块的一处理结果。
  4. 如权利要求3所述的方法,其特征在于,所述S21步骤,包括:
    获取所述第一图像块的第一辅助信息,所述第一辅助信息包括深度信息,所述深度信息根据所述第一图像块对应的深度图像确定;
    获取所述参考图像中各个图像块的第一辅助信息和所述第一图像块的第一辅助信息的相似度;
    将所述参考图像中所述相似度最大的图像块确定为与所述第一图像块匹配的第二图像块。
  5. 如权利要求3所述的方法,其特征在于,所述S22步骤,包括:
    基于特征提取网络和所述第一辅助信息对所述第一图像块和所述第二图像块进行特征提取处理,得到所述第一图像块对应的第一特征图以及所述第二图像块对应的第二特征图。
  6. 如权利要求3所述的方法,其特征在于,所述第一图像块和所述第二图像块为切片;所述S22步骤,包括:
    获取所述第一图像块的第一图像子块和所述第二图像块的第二图像子块;所述第二图像子块的第二辅助信息和所述第一图像子块的第二辅助信息匹配;
    基于特征提取网络和所述第二辅助信息对所述第一图像子块和所述第二图像子块进行特征提取处理,得到所述第一图像子块的第一子特征图和所述第二图像子块的第二子特征图;
    通过所述第一子特征图,确定或生成所述第一图像块对应的第一特征图,以及通过所述第二子特征图,确定或生成所述第二图像块对应的第二特征图。
  7. 如权利要求5所述的方法,其特征在于,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元。
  8. 如权利要求3至7中任一项所述的方法,其特征在于,所述S23步骤,包括以下步骤:
    S231:根据所述第一特征图对所述第二特征图进行第一预设处理,得到目标第二特征图;
    S232:根据所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图;
    S233:根据所述目标特征图确定或生成对应于所述第一图像块的所述处理结果。
  9. 如权利要求8所述的方法,其特征在于,所述S231步骤,包括:
    基于所述第一特征图和所述第二特征图确定第一预设处理参数;或者,基于所述第一特征图、所述第二特征图以及所述第一辅助信息确定第一预设处理参数;
    基于第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图;所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
  10. 如权利要求9所述的方法,其特征在于,所述第一预设处理模型包括所述第一处理模型和第二处理模型,所述基于所述第一预设处理模型对所述第二特征图进行第一预设处理,得到目标第二特征图,包括:
    根据所述第一处理模型和所述第二处理模型确定所述第二特征图中的采样点坐标;
    根据所述第二特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;
    根据所述采样点坐标对应的目标像素值生成目标第二特征图。
  11. 如权利要求8所述的方法,其特征在于,所述S232步骤,包括:
    利用特征融合网络对所述第一特征图和所述目标第二特征图进行第二预设处理,得到目标特征图。
  12. 如权利要求8所述的方法,其特征在于,所述S233,包括:
    对所述目标特征图进行滤波处理,得到滤波后的目标特征图;
    根据所述滤波后的目标特征图确定对应于所述第一图像块的所述处理结果。
  13. 如权利要求12所述的方法,其特征在于,所述对所述目标特征图进行滤波处理,得到滤波后的目标特征图,包括:
    利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图。
  14. 如权利要求13所述的方法,其特征在于,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;所述利用目标滤波处理模型对所述目标特征图进行滤波处理,得到滤波后的目标特征图,包括:
    对至少一个所述第一处理单元处理后的目标特征图进行下采样处理,得到下采样后的目标特征图;
    对所述下采样后的目标特征图进行上采样处理,得到目标融合特征图;
    利用所述第二处理单元处理所述目标融合特征图,得到滤波后的目标特征图。
  15. 一种图像处理方法,其特征在于,包括以下步骤:
    S10:获取第二重建图像块;
    S20:根据所述第二重建图像块、第一重建图像块的属性信息和所述第二重建图像块的属性信息中的至少一种,对第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
  16. 如权利要求15所述的方法,其特征在于,所述S10步骤,还包括:
    获取所述第一重建图像块。
  17. 如权利要求15所述的方法,其特征在于,所述第一重建图像块和所述第二重建图像块对应于相同或不同的重建图像。
  18. 如权利要求15至17中任一项所述的方法,其特征在于,所述S20步骤,包括以下步骤:
    S201:获取第三重建图像块,所述第三重建图像块对应的图像为所述第一重建图像块对应的图像的参考重建图像;
    S202:根据所述第三重建图像块、第二重建图像块和所述第一重建图像块的属性信息中的至少一种,对所述第一重建图像块进行滤波,以得到滤波后的第一重建图像块。
  19. 如权利要求18所述的方法,其特征在于,所述S202步骤,包括以下至少一种:
    根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,根据所述第一重建图像块的帧间预测信息,对所述第三重建图像块和所述第一目标特征图进行第三预设处理,得到第二目标特征图,对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块;
    根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,对所述第一目标特征图进行滤波处理,得到滤波后的第一目标特征图,根据所述第一重建图像块的帧间预测信息,对所述滤波后的第一目标特征图和所述第三重建图像块进行第三预设处理,得到第二目标特征图,对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块;
    根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,根据所述第三重建图像块和所述第一重建图像块的帧间预测信息,对所述第一重建图像块和所述第三重建图像块进行第三预设处理,得到第二目标特征图,根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块。
  20. 如权利要求19所述的方法,其特征在于,所述根据所述第一目标特征图和所述第二目标特征图确定滤波后的第一重建图像块,包括:
    对所述第一目标特征图和所述第二目标特征图进行滤波处理,得到滤波后的第一目标特征图和滤波后的第二目标特征图;
    根据所述滤波后的第一目标特征图和所述滤波后的第二目标特征图进行第三预设处理,得到目标融合重建图像块;
    将所述目标融合重建图像块作为滤波后的第一重建图像块。
  21. 如权利要求19所述的方法,其特征在于,所述根据所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行第三预设处理,得到第一目标特征图,包括:
    根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图;
    根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图;
    根据第一预设处理后的第二重建特征图和所述第一重建特征图进行第二预设处理,得到第一目标特征图。
  22. 如权利要求21所述的方法,其特征在于,所述根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图,包括:
    基于特征提取网络和所述第一重建图像块的深度信息或视差信息,对所述第一重建图像块和所述第二重建图像块进行特征提取处理,得到所述第一重建图像块的第一重建特征图和所述第二重建图像块的第二重建特征图。
  23. 如权利要求21所述的方法,其特征在于,所述第一重建图像块为切片,所述第二重建图像块对应为切片;所述根据所述第一重建图像块的深度信息或视差信息,确定所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图,包括:
    获取所述第一重建图像块的第一重建图像子块和所述第二重建图像块的第二重建图像子块;
    基于特征提取网络和所述第一重建图像子块的属性信息,对所述第一重建图像子块和所述第二重建图像子块进行特征提取处理,得到所述第一重建图像块对应的第一重建特征图和所述第二重建图像块对应的第二重建特征图。
  24. 如权利要求22所述的方法,其特征在于,所述特征提取网络包括N个级联的特征提取模块,所述N为大于或等于1的整数,前N-1个特征提取模块中每个特征提取模块包括串联的特征提取单元和下采样单元,第N个特征提取模块包括特征提取单元。
  25. 如权利要求21至24中任一项所述的方法,其特征在于,所述根据所述第一重建特征图对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,包括:
    基于所述第一重建特征图和所述第二重建特征图确定第一预设处理参数;或者,基于所述第一重建特征图、所述第二重建特征图、所述属性信息确定第一预设处理参数;
    基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,所述第一预设处理模型包括根据所述第一预设处理参数确定的第一处理模型。
  26. 如权利要求25所述的方法,其特征在于,所述第一预设处理模型包括所述第一处理模型和第二处理模型,所述基于第一预设处理模型对所述第二重建特征图进行第一预设处理,得到第一预设处理后的第二重建特征图,包括:
    根据所述第一处理模型和所述第二处理模型确定所述第二重建特征图的采样点坐标;
    根据所述第二重建特征图以及采样核函数,确定所述采样点坐标对应的目标像素值;
    根据所述采样点坐标对应的目标像素值生成第一预设处理后的第二重建特征图。
  27. 如权利要求19至24中任一项所述的方法,其特征在于,所述对所述第二目标特征图进行滤波处理,以得到滤波后的第一重建图像块,包括:
    利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图;
    根据所述滤波后的第二目标特征图生成滤波后的第一重建图像块。
  28. 如权利要求27所述的方法,其特征在于,所述目标滤波处理模型包括至少一个处理单元,所述处理单元包括第一处理单元和第二处理单元中的一种或两种;所述利用目标滤波处理模型对所述第二目标特征图进行滤波处理,得到滤波后的第二目标特征图,包括:
    对至少一个所述第一处理单元处理后的第二目标特征图进行下采样处理,得到下采样后的第二目标特征图;
    对所述下采样后的第二目标特征图进行上采样处理,得到目标融合重建特征图;
    利用第二处理单元处理所述目标融合重建特征图,得到滤波后的第二目标特征图。
  29. 一种智能终端,其特征在于,所述智能终端包括:存储器、处理器,其中,所述存储器上存储有图像处理程序,所述图像处理程序被所述处理器执行时实现如权利要求1至28中任一项所述的图像处理方法的步骤。
  30. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至28中任一项所述的图像处理方法的步骤。
PCT/CN2022/144217 2022-01-12 2022-12-30 图像处理方法、智能终端及存储介质 WO2023134482A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210029380.6A CN114079779B (zh) 2022-01-12 2022-01-12 图像处理方法、智能终端及存储介质
CN202210029380.6 2022-01-12

Publications (1)

Publication Number Publication Date
WO2023134482A1 true WO2023134482A1 (zh) 2023-07-20

Family

ID=80284520

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/144217 WO2023134482A1 (zh) 2022-01-12 2022-12-30 图像处理方法、智能终端及存储介质

Country Status (2)

Country Link
CN (1) CN114079779B (zh)
WO (1) WO2023134482A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496208A (zh) * 2023-12-29 2024-02-02 山东朝辉自动化科技有限责任公司 一种实时获取料场内堆料信息的方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114079779B (zh) * 2022-01-12 2022-05-17 深圳传音控股股份有限公司 图像处理方法、智能终端及存储介质
CN114245133A (zh) * 2022-02-23 2022-03-25 北京拙河科技有限公司 视频分块编码方法、编码传输方法、系统和设备
CN115209079B (zh) * 2022-02-23 2023-05-02 北京拙河科技有限公司 一种适用于高速摄像机长时间存储数据的方法和设备
CN114422803B (zh) * 2022-03-30 2022-08-05 浙江智慧视频安防创新中心有限公司 一种视频处理方法、装置及设备
CN117135358A (zh) * 2022-05-20 2023-11-28 海思技术有限公司 视频编码方法、视频解码方法及相关装置
WO2023225854A1 (zh) * 2022-05-24 2023-11-30 Oppo广东移动通信有限公司 一种环路滤波方法、视频编解码方法、装置和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014048242A1 (zh) * 2012-09-29 2014-04-03 中兴通讯股份有限公司 预测图像生成方法和装置
CN104469387A (zh) * 2014-12-15 2015-03-25 哈尔滨工业大学 一种多视点视频编码中分量间的运动参数继承方法
JP5926451B2 (ja) * 2013-04-11 2016-05-25 日本電信電話株式会社 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム
US20190297339A1 (en) * 2016-06-30 2019-09-26 Nokia Technologies Oy An Apparatus, A Method and A Computer Program for Video Coding and Decoding
CN113709504A (zh) * 2021-10-27 2021-11-26 深圳传音控股股份有限公司 图像处理方法、智能终端及可读存储介质
CN114079779A (zh) * 2022-01-12 2022-02-22 深圳传音控股股份有限公司 图像处理方法、智能终端及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102158712B (zh) * 2011-03-22 2012-10-24 宁波大学 一种基于视觉的多视点视频信号编码方法
CN103108187B (zh) * 2013-02-25 2016-09-28 清华大学 一种三维视频的编码方法、解码方法、编码器和解码器
US11164326B2 (en) * 2018-12-18 2021-11-02 Samsung Electronics Co., Ltd. Method and apparatus for calculating depth map
JP2021196951A (ja) * 2020-06-16 2021-12-27 キヤノン株式会社 画像処理装置、画像処理方法、プログラム、学習済みモデルの製造方法、および画像処理システム
CN113256544B (zh) * 2021-05-10 2023-09-26 中山大学 一种多视点图像合成方法、系统、装置及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014048242A1 (zh) * 2012-09-29 2014-04-03 中兴通讯股份有限公司 预测图像生成方法和装置
JP5926451B2 (ja) * 2013-04-11 2016-05-25 日本電信電話株式会社 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム
CN104469387A (zh) * 2014-12-15 2015-03-25 哈尔滨工业大学 一种多视点视频编码中分量间的运动参数继承方法
US20190297339A1 (en) * 2016-06-30 2019-09-26 Nokia Technologies Oy An Apparatus, A Method and A Computer Program for Video Coding and Decoding
CN113709504A (zh) * 2021-10-27 2021-11-26 深圳传音控股股份有限公司 图像处理方法、智能终端及可读存储介质
CN114079779A (zh) * 2022-01-12 2022-02-22 深圳传音控股股份有限公司 图像处理方法、智能终端及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496208A (zh) * 2023-12-29 2024-02-02 山东朝辉自动化科技有限责任公司 一种实时获取料场内堆料信息的方法
CN117496208B (zh) * 2023-12-29 2024-03-29 山东朝辉自动化科技有限责任公司 一种实时获取料场内堆料信息的方法

Also Published As

Publication number Publication date
CN114079779A (zh) 2022-02-22
CN114079779B (zh) 2022-05-17

Similar Documents

Publication Publication Date Title
WO2023134482A1 (zh) 图像处理方法、智能终端及存储介质
WO2018095087A1 (zh) 一种去块滤波方法及终端
US10652577B2 (en) Method and apparatus for encoding and decoding light field based image, and corresponding computer program product
CN102598674B (zh) 用于2d视频数据到3d视频数据的转换的深度图产生技术
CN113785593B (zh) 用于视频编解码的方法和装置
CN111615715A (zh) 编码/解码体积视频的方法、装置和流
WO2019134557A1 (zh) 视频图像的处理方法及装置
CN102474662A (zh) 根据无线显示协议制备视频数据
WO2022068682A1 (zh) 图像处理方法及装置
WO2020103800A1 (zh) 视频解码方法和视频解码器
CN111641836A (zh) 点云压缩的方法、装置、计算机设备和存储介质
CN115298710A (zh) 基于人脸复原的视频会议框架
WO2020038378A1 (zh) 色度块预测方法及装置
WO2019001015A1 (zh) 一种图像数据的编码、解码方法及装置
CN114125446A (zh) 图像编码方法、解码方法和装置
CN113709504B (zh) 图像处理方法、智能终端及可读存储介质
TW201838416A (zh) 圖片檔處理方法、裝置及儲存介質
KR102505130B1 (ko) 명시야 컨텐츠를 표현하는 신호를 인코딩하기 위한 방법 및 디바이스
TWI826160B (zh) 圖像編解碼方法和裝置
WO2023011420A1 (zh) 编解码方法和装置
CN110876061B (zh) 色度块预测方法及装置
WO2023019567A1 (zh) 图像处理方法、移动终端及存储介质
CN115955565B (zh) 处理方法、处理设备及存储介质
US20240064334A1 (en) Motion field coding in dynamic mesh compression
US20240163477A1 (en) 3d prediction method for video coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22920112

Country of ref document: EP

Kind code of ref document: A1