CN113542746A

CN113542746A - Video encoding method and apparatus, computer readable medium, and electronic device

Info

Publication number: CN113542746A
Application number: CN202110789592.XA
Authority: CN
Inventors: 李宏伟
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2021-10-22
Anticipated expiration: 2041-07-13
Also published as: CN113542746B

Abstract

The disclosure provides a video coding method, a video coding device, a computer readable medium and an electronic device, and relates to the technical field of image processing. The method comprises the following steps: denoising the current video frame according to a plurality of groups of denoising parameters to obtain a denoising frame set corresponding to the current video frame; determining a current coding unit, and coding the current coding unit corresponding to each de-noising frame in the de-noising frame set to obtain a coding code stream set corresponding to the current coding unit; and selecting a target code stream from the coding code stream set, and taking the target code stream as a code stream corresponding to the current coding unit. The method can fully utilize the performance of the encoder to encode a plurality of current encoding units with different denoising strengths, and then determine the code stream which is more in line with the requirement in a selected mode, so as to realize the high-quality transcoding effect which is in line with the requirement.

Description

Video encoding method and apparatus, computer readable medium, and electronic device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a video encoding method, a video encoding apparatus, a computer-readable medium, and an electronic device.

Background

Video has developed into an important tool for people to entertain and work, the popularization of video such as 4K, HDR and the like and the increase of the demand of video transmission have brought huge pressure on storage resources and network bandwidth, and the development of various applications has produced video with various categories and qualities, which have brought strong demands and challenges to video coding and video enhancement technologies. Thus, video coding techniques and video enhancement techniques are currently the focus of research in academia and industry.

Disclosure of Invention

The present disclosure is directed to a video encoding method, a video encoding apparatus, a computer-readable medium, and an electronic device, so as to improve the quality of a code stream obtained by encoding at least to a certain extent, and further improve the quality of a video image reconstructed based on the code stream.

According to a first aspect of the present disclosure, there is provided a video encoding method comprising: denoising the current video frame according to a plurality of groups of denoising parameters to obtain a denoising frame set corresponding to the current video frame; the denoising frame set comprises denoising frames which are obtained according to a plurality of groups of denoising parameters and correspond to a plurality of different denoising strengths; determining a current coding unit, and coding the current coding unit corresponding to each de-noising frame in the de-noising frame set to obtain a coding code stream set corresponding to the current coding unit; and selecting a target code stream from the coding code stream set, and taking the target code stream as a code stream corresponding to the current coding unit.

According to a second aspect of the present disclosure, there is provided a video encoding apparatus comprising: the video frame denoising module is used for denoising the current video frame according to the multiple groups of denoising parameters to obtain a denoising frame set corresponding to the current video frame; the denoising frame set comprises denoising frames which are obtained according to a plurality of groups of denoising parameters and correspond to a plurality of different denoising strengths; the unit coding module is used for determining a current coding unit and coding the current coding unit corresponding to each de-noised frame in the de-noised frame set so as to obtain a coding code stream set corresponding to the current coding unit; and the code stream selection module is used for selecting a target code stream in the coding code stream set and taking the target code stream as the code stream corresponding to the current coding unit.

According to a third aspect of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the above-mentioned method.

According to a fourth aspect of the present disclosure, there is provided an electronic apparatus, comprising: a processor; and memory storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the above-described method.

According to the video coding method provided by the embodiment of the disclosure, a current video frame is denoised through a plurality of groups of denoising parameters, and a plurality of denoising frames corresponding to different denoising strengths can be obtained; when encoding is performed on the current encoding unit, encoding can be performed on the current encoding unit corresponding to the denoising frame with different denoising strengths to obtain an encoding code stream set, and then a target code stream is selected as a code stream corresponding to the current encoding unit. By the method, the performance of the encoder can be fully utilized to encode a plurality of current encoding units with different denoising strengths, and then the code stream which is more in line with the requirement is determined by a selection mode, so that the high-quality transcoding effect which is in line with the requirement is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;

FIG. 2 shows a schematic diagram of an electronic device to which embodiments of the present disclosure may be applied;

fig. 3 schematically illustrates a flow chart of a video encoding method in an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a method of screening a set of denoised frames in an exemplary embodiment of the disclosure;

fig. 5 schematically illustrates a flow chart of another video encoding method in an exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates a matching block diagram in an exemplary embodiment of the disclosure;

FIG. 7 is a schematic diagram illustrating a principle of spatiotemporal denoising in an exemplary embodiment of the present disclosure;

fig. 8 schematically shows a composition diagram of a video encoding apparatus in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a video encoding method and apparatus according to an embodiment of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The

terminal devices

101, 102, 103 may be various electronic devices having an image processing function, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

The video encoding method provided by the embodiment of the present disclosure is generally executed by the

terminal devices

101, 102, 103, and accordingly, the video encoding apparatus is generally disposed in the

terminal devices

101, 102, 103. However, it is easily understood by those skilled in the art that the video encoding method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the video encoding apparatus may also be disposed in the server 105, which is not particularly limited in the exemplary embodiment. For example, in an exemplary embodiment, a user may acquire a video through a camera module included in the

terminal device

101, 102, 103, and encode the video to obtain a code stream; or the user may capture a video through a camera module included in the

terminal devices

101, 102, and 103, and send the video to another terminal device or the server 105 through the network, so that the other terminal device or the server 105 can encode the video.

An exemplary embodiment of the present disclosure provides an electronic device for implementing a video encoding method, which may be the

terminal device

101, 102, 103 or the server 105 in fig. 1. The electronic device comprises at least a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the video encoding method via execution of the executable instructions.

The following takes the mobile terminal 200 in fig. 2 as an example, and exemplifies the configuration of the electronic device. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes. In other embodiments, mobile terminal 200 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. The interfacing relationship between the components is only schematically illustrated and does not constitute a structural limitation of the mobile terminal 200. In other embodiments, the mobile terminal 200 may also interface differently than shown in fig. 2, or a combination of multiple interfaces.

As shown in fig. 2, the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor module 280, a display 290, a camera module 291, an indicator 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. Wherein the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, and the like.

Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural-Network Processing Unit (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.

The NPU is a Neural-Network (NN) computing processor, which processes input information quickly by using a biological Neural Network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the mobile terminal 200, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

A memory is provided in the processor 210. The memory may store instructions for implementing six modular functions: detection instructions, connection instructions, information management instructions, analysis instructions, data transmission instructions, and notification instructions, and execution is controlled by processor 210.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. Wherein, the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals; the mobile communication module 250 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the mobile terminal 200; the modem processor may include a modulator and a demodulator; the Wireless communication module 260 may provide a solution for Wireless communication including a Wireless Local Area Network (WLAN) (e.g., a Wireless Fidelity (Wi-Fi) network), Bluetooth (BT), and the like, applied to the mobile terminal 200. In some embodiments, antenna 1 of the mobile terminal 200 is coupled to the mobile communication module 250 and antenna 2 is coupled to the wireless communication module 260, such that the mobile terminal 200 may communicate with networks and other devices via wireless communication techniques.

The mobile terminal 200 implements a display function through the GPU, the display screen 290, the application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 290 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information. In an exemplary embodiment, the denoising process of the current video frame and the process of screening the denoised frame may be implemented based on the GPU, the display screen 290, the application processor, and the like.

The mobile terminal 200 may implement a photographing function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290, the application processor, and the like. The ISP is used for processing data fed back by the camera module 291; the camera module 291 is used for capturing still images or videos; the digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals; the video codec is used to compress or decompress digital video, and the mobile terminal 200 may also support one or more video codecs. In an exemplary embodiment, the encoding process of the current coding unit may be implemented by the above-described video encoder.

Internal memory 221 may be used to store computer-executable program code, which includes instructions. The internal memory 221 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the mobile terminal 200, and the like. In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk Storage device, a Flash memory device, a Universal Flash Storage (UFS), and the like. The processor 210 executes various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.

The depth sensor 2801 is used to acquire depth information of a scene. In some embodiments, a depth sensor may be disposed in the camera module 291, and may be used to capture video with depth information. The pressure sensor 2802 is used for sensing a pressure signal and converting the pressure signal into an electrical signal; gyroscope sensor 2803 may be used to determine the motion attitude of mobile terminal 200; in addition, other functional sensors, such as an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., may be provided in the sensor module 280 according to actual needs.

Other devices for providing auxiliary functions may also be included in mobile terminal 200. For example, the keys 294 include a power-on key, a volume key, and the like, and a user can generate key signal inputs related to user settings and function control of the mobile terminal 200 through key inputs. Further examples include indicator 292, motor 293, SIM card interface 295, etc.

In a related video denoising and coding combination scheme, each frame in a video collected by an image collector is usually directly enhanced and denoised, and then the enhanced and denoised video is directly coded to obtain a code stream for transmission or storage. In addition, there are other schemes. For example, patent application publication No. CN212231643U provides an image encoder comprising a receiving module, an artificial intelligence processing module, an encoding module, and an output module. The receiving module is used for receiving an image to be coded through the data input interface; the artificial intelligence processing module is used for detecting a target object in an image to be coded; the encoding module is used for encoding a first image area containing a target object in an image to be encoded at a first compression rate, encoding a second image area not containing the target object in the image to be encoded at a second compression rate, and generating an encoded image, wherein the first compression rate is less than the second compression rate. The image encoder can improve the image quality of image data at the time of stable transmission.

However, in the related art, the coding is usually performed by using only one strength of denoised video, and the performance of the coder cannot be exerted flexibly and sufficiently, so that the quality of the finally reconstructed video image cannot be further improved.

In view of one or more of the above problems, the present example embodiment provides a video encoding method. The video encoding method may be applied to the server 105, and may also be applied to one or more of the

terminal devices

101, 102, and 103, which is not particularly limited in this exemplary embodiment. Referring to fig. 3, the video encoding method may include the following steps S310 to S330:

in step S310, denoising is performed on the current video frame according to the multiple sets of denoising parameters to obtain a denoising frame set corresponding to the current video frame.

Each group of the multiple groups of denoising parameters may include one or more parameters, and denoising with a certain degree of strength can be achieved based on each group of parameters by configuring the value of each group of parameters. Therefore, after the current video frame is denoised based on each group of the multiple groups of denoising parameters, the denoising frames corresponding to the multiple different denoising strengths obtained according to the multiple groups of denoising parameters can be obtained. For example, when a difference-based non-local mean denoising method is adopted, denoising frames with different denoising strengths can be obtained by adjusting the weight between the current block and the successfully matched reference block in a manner of adjusting the weighted fusion result. I.e. the sets of denoising parameters may be sets of different weights. In addition, in different denoising modes, different values can be taken for different denoising parameters, so that the denoising effect with different strengths can be realized on the denoised frame.

It should be noted that, in order to implement denoising with different strengths, values of multiple sets of denoising parameters need to be set in advance. Therefore, when a certain video is processed, on the premise that the denoising parameter is not reset, the denoising strength of the current video frame is equal. For example, denoising with 6 different denoising strengths, L0-L5, can be performed on the current video frame by using multiple sets of denoising parameters configured in advance, and the obtained denoising frame set includes 6 denoising frames with 6 denoising strengths, L0-L5; on the premise of not changing the pre-configured denoising parameters, when other video frames in the video are processed, the obtained denoising frame set also comprises 6 denoising frames with L0-L5.

Correspondingly, after the current video frame is denoised based on each group of the multiple groups of denoising parameters, a plurality of denoised frames denoised with different denoising strengths can be obtained, and the denoised frames jointly form a denoised frame set. Namely, the denoising frame set comprises a plurality of denoising frames corresponding to different denoising strengths.

In an exemplary embodiment, when denoising a current video frame, a plurality of denoising methods may be employed. For example, various denoising algorithms such as spatial, frequency, time domain, or joint denoising may be employed. It should be noted that although there are many denoising algorithms, when a video to be encoded is encoded, only one denoising algorithm is usually selected to denoise the complete video, that is, once it is determined that a certain video is denoised by the algorithm, the algorithm is usually used in the encoding process.

In step S320, a current coding unit is determined, and a current coding unit corresponding to each denoised frame in the denoised frame set is coded to obtain a coded code stream set corresponding to the current coding unit.

In an exemplary embodiment, since each current video frame is of a different kind, the coding unit when coding for the current video frame is also different. Therefore, the coding unit needs to be determined before the coding is performed. When encoding, the coding unit refers to the size of an independent coding unit, and may include a frame level, a slice level, a tile level, and a macroblock level.

Specifically, the type of the coding unit corresponding to the current video frame may be selected based on the type of the current video frame; after determining the coding unit type, the video encoder may divide the coding units of the current video frame according to the coding unit type, and then determine the current coding units according to a certain coding order. For example, when the coding unit is at a frame level, the determined current coding unit is the current video image; when the coding unit is at a macroblock level, the correspondingly determined current coding unit may be a certain macroblock on the current video frame.

It should be noted that, when the coding unit of the current video frame is at a non-frame level, after the code stream corresponding to the current coding unit is obtained, the process of coding the next coding unit and determining the code stream can be directly determined according to a certain coding sequence until the code streams corresponding to the codes of all the coding units in the current video frame are determined, so as to obtain the code stream corresponding to the current video frame; and then denoising the next video frame.

For example, all coding units of the key frame I frames can be specified in advance to be at the frame level, all coding units of the forward reference frame P frames can be at the slice level, and all bidirectional reference frame B frames can be at the macroblock level. At this time, if the current video frame is an I frame, it may be determined that the encoding unit corresponding to the current video frame is at a frame level, so that the video encoder may directly use the frame level as the encoding unit to encode each frame in the denoised frame set corresponding to the current video frame.

In an exemplary embodiment, after the current coding unit is determined, the current coding unit corresponding to each denoised frame in the denoised frame set may be coded to obtain a coded code stream set corresponding to the current coding unit. The current coding unit corresponding to the denoised frame is a region on the denoised frame, which occupies the same region as the current coding unit occupies on the current video frame.

In step S330, a target code stream is selected from the set of encoded code streams, and the target code stream is used as a code stream corresponding to the current coding unit.

In an exemplary embodiment, after the set of encoded code streams is obtained, a target code stream may be selected as a code stream corresponding to the current coding unit. Then, after all the coding units included in the current video frame determine the corresponding code streams, the code streams corresponding to all the coding units can be determined as the code streams corresponding to the current video frame for further transmission or storage.

Specifically, when a target code stream is selected from the coded code stream set, different evaluation parameters can be set according to the reconstruction requirements of the video, then evaluation parameters corresponding to each coded code stream in the coded code stream set are calculated through an evaluation formula for calculating the evaluation parameters, and then the target code stream is selected from each coded code stream based on the evaluation parameters.

In an exemplary embodiment, in order to obtain a code stream with high reconstruction quality for transmission or storage, an optimal target code stream may be selected from the encoded code streams corresponding to the denoising frames according to evaluation parameters such as Peak Signal to Noise Ratio (PSNR) or Rate Distortion Optimization (RDO), and the target code stream is further transmitted and stored as a code stream corresponding to the current video frame.

In an exemplary embodiment, the RDO may be calculated by the following formula: RDO is SAD + λ R, where SAD denotes the sum of absolute errors, R denotes the size of the codestream, and λ is an artificially set coefficient.

It should be noted that, during selection, if the coding unit is at a frame level, the code stream of the corresponding determined current video frame is a coding code stream corresponding to a denoising strength; if the coding unit is at a non-frame level, such as a macroblock level, the corresponding coding bit stream needs to be selected for each macroblock to obtain a target bit stream, the finally obtained bit stream of the current video frame includes bit streams corresponding to a plurality of macroblocks, and the denoising strength corresponding to each bit stream may be equal or unequal. For example, assuming that a certain current video frame includes 20 macroblocks, an encoded code stream with a denoising strength of L1 may be selected for a first macroblock, an encoded code stream with a denoising strength of L3 may be selected for a second macroblock, an encoded code stream with a denoising strength of L1 may be selected again for a third macroblock, and finally, a total of 20 encoded code streams are obtained as a target code stream.

In addition, when the number of groups of denoising parameters is large, that is, the number of denoised frame data included in the denoised frame set is large, in order to avoid the waste of computing resources caused by the need of encoding the current coding units corresponding to too many denoised frames, the denoised frames in the denoised frame set may be screened before encoding the current coding unit corresponding to each denoised frame in the denoised frame set.

In an exemplary embodiment, since the previous and subsequent video frames in the encoding order generally have correlation, the filtering process of the above-mentioned denoised frame set can be guided based on the previous video frame of the current video frame in the encoding order.

Specifically, in an exemplary embodiment, when the coding unit of the current video frame is at a frame level, a first denoising strength may be determined based on a target code stream corresponding to a previous video frame, and then a denoising frame corresponding to the current video frame is screened based on a first upper limit parameter and a first lower limit parameter set in advance with the first denoising strength as a center, so as to obtain a screened denoising frame set.

It should be noted that the first upper limit parameter and the first lower limit parameter may take the same value or different values, and are used to determine a range by using the first denoising strength as a center and using the first upper limit parameter and the first lower limit parameter, then retain the denoising frame corresponding to the denoising strength in the range, and delete other denoising frames, so as to obtain a filtered denoising frame set. For example, the first upper limit parameter and the first lower limit parameter may be w and v, respectively, and w ═ v; or the first limit parameter and the first upper and lower limit parameters may be w and v, respectively, and w ≠ v.

In an exemplary embodiment, when the coding unit of the previous video frame is also at a frame level and the first denoising strength is determined, the denoising strength of the code stream corresponding to the previous video frame (i.e., the target code stream corresponding to the frame-level coding unit corresponding to the previous video frame) may be directly determined as the first denoising strength.

For example, the encoding unit of the current video frame is at a frame level, and the encoding unit of the previous video frame is also at a frame level, and at this time, the first denoising strength determined according to the code stream corresponding to the previous video frame is Lx. Considering the play quality fluctuation of the frame level, it can be determined that the denoising strength range corresponding to the current video frame is [ Lx-v, Lx + w ] by taking Lx as the center and the first upper limit parameter w and the first lower limit parameter v, and then the denoised frames with the denoising strength in the range in the set of denoised frames are retained, and the denoised frames with the denoising strength out of the range are deleted.

In an exemplary embodiment, when the coding unit of the previous video frame is at a non-frame level and the first denoising strength is determined, the average denoising strength of the target code stream corresponding to the previous video frame may be calculated by taking the coding unit as a unit, and the average denoising strength is determined as the first denoising strength.

For example, the encoding unit of the current video frame is at a frame level, and the encoding unit of the previous video frame is at a non-frame level, at this time, according to M corresponding code streams La to Lb of the previous video frame, the average denoising strength of the previous video frame may be determined to be Lx, and then the average denoising strength is determined to be the first denoising strength. Similarly, with the first denoising strength Lx as the center, determining the denoising strength range [ Lx-v, Lx + w ] corresponding to the current video frame by using the first upper limit parameter w and the first lower limit parameter v, then reserving the denoised frames in the denoising strength range in the set of denoised frames, and deleting the denoised frames with the denoising strength not in the range.

When the average denoising strength is calculated, the minimum coding unit can be used as a statistical unit for calculation. For example, when the encoding unit is at a macroblock level, it is necessary to determine the denoising strength of each of M encoded code streams La to Lb, and then calculate the average denoising strength of the M denoising strengths.

Specifically, in an exemplary embodiment, when the coding unit of the current video frame is at a non-frame level, and the filtering is performed on the denoised frame set, as shown in fig. 4, the method may include the following steps S410 to S440:

step S410, determining a second denoising strength based on the target code stream of the current coding unit of the current video frame at the corresponding position in the previous video frame.

In an exemplary embodiment, before encoding the current encoding unit, a second denoising strength corresponding to a target code stream corresponding to a region with the same position occupied by the current encoding unit in the current video frame in a previous video frame of the current video frame may be obtained. For example, when an area occupied by a certain coding unit a of a previous video frame includes an area of a current coding unit in the previous video frame, since a target code stream corresponding to the coding unit a includes a target code stream corresponding to the current coding unit in the previous video frame, a denoising degree corresponding to the target code stream of the coding unit a may be directly determined as a second denoising degree; for another example, when the region in which the current coding unit is located in the previous video frame is divided by the coding units corresponding to the multiple previous video frames, the average denoising strength corresponding to the target code stream may be calculated according to the target code stream corresponding to the coding unit in each region in which the current coding unit is located in the previous video frame, and the average denoising strength is used as the second denoising strength.

It should be noted that, in an exemplary embodiment, when a region where a current coding unit is located in a previous video frame is divided by coding units corresponding to multiple previous video frames, denoising strengths corresponding to multiple coding units may also be used as second denoising strengths.

Step S420, determining a third denoising strength based on a target code stream corresponding to an adjacent coding unit spatially adjacent to the current coding unit in the current video frame.

In an exemplary embodiment, since the encoding of a current coding unit may also generally be associated with neighboring coding units that are adjacent to the current coding unitTherefore, the third denoising strength can be determined based on the target code stream corresponding to the adjacent coding unit spatially adjacent to the current coding unit in the current video frame. For example, under the influence of the coding sequence, only the upper adjacent coding unit and the left adjacent coding unit in the upper, lower, left and right four adjacent coding units of the current coding unit complete coding, so that the third denoising strength L corresponding to the target code stream corresponding to the upper adjacent coding unit can be directly determined_{On the upper part}And a third denoising degree L corresponding to the target code stream corresponding to the left adjacent coding unit_{Left side of}。

And step S430, taking the second denoising degree as a center, screening the denoising frame set based on the second upper limit parameter and the second lower limit parameter to obtain a first subset, and taking the third denoising degree as a center, screening the denoising frame set based on the third upper limit parameter and the third lower limit parameter to obtain a second subset.

In an exemplary embodiment, after the second denoising strength and the third denoising strength are obtained, the first subset and the second subset may be determined based on the second upper limit parameter and the second lower limit parameter and the third upper limit parameter and the third lower limit parameter, respectively, with the second denoising strength and the third denoising strength as the center, respectively.

It should be noted that, since the second denoising strength may be determined by a plurality of coding units of a previous video frame, and the third denoising strength may be determined by a plurality of adjacent coding units, the second denoising strength and/or the third denoising strength may be multiple, and correspondingly, the number of the first subset and the second subset may also be multiple.

Step S440, determining a union of the first subset and the second subset as a filtered de-noised frame set.

In an exemplary embodiment, after obtaining the first subset and the second subset, the first subset and the second subset may be determined as a filtered set of denoised frames.

For example, if the image unit is at a non-frame level such as slice, tile or macroblock, the denoising strength of the target code stream selected at the same position on the previous video frame may be determined to be Lx, and then the denoising strengths of the target code streams selected by the coding units at the positions spatially adjacent to the current coding unit on the current video frame may be determined to be M in total such as L1, L2, …, Ln, and the like. At this time, the set of the filtered de-noised frames corresponding to the current coding unit is [ Lx-v0, Lx + w0] < L1-v1, L1+ w1] < L2-v2, L2+ w2] < u …, < u [ Ln-vn, Ln + wn ]. Wherein v0, v1, …, vn and w1, w2, …, wn are all pre-configured parameters, and may take equal values or unequal values, which is not limited in the present disclosure.

The technical solution of the embodiment of the present disclosure is explained in detail below with reference to fig. 5 to 7:

s501, denoising a current video frame to obtain a plurality of denoising frames L0-LK corresponding to different denoising strengths;

the denoising method can be spatial domain, frequency domain, time domain or combined denoising. From L0 to Lk, denoising becomes more and more intense, where L0 may be non-denoising.

Step S503, determining a current coding unit of a current video frame, screening de-noising frames Lx to Ly from L0 to Lk, and acquiring the current coding unit corresponding to the de-noising frames Lx to Ly;

it should be noted that, in an exemplary embodiment, no screening may be performed; in the case of no screening, the current coding units corresponding to the denoised frames of L0 to Lk are directly obtained here.

Step S505, encoding the current coding units corresponding to the de-noising frames from Lx to Ly (or the current coding units corresponding to the de-noising frames from L0 to Lk) to obtain a coding code stream set corresponding to the current coding unit;

step S507, calculating the RDO corresponding to each code stream in the code stream set according to the RDO formula, determining the code stream with the minimum RDO as a target code stream, and determining the target code stream as the code stream corresponding to the current coding unit; meanwhile, the denoising strength of the target code stream corresponding to the current coding unit is fed back to guide the next current coding unit to continue coding, i.e., the step S502 is skipped to.

It should be noted that, when the coding unit corresponding to the current video frame is at the frame level, the coding code stream of one frame can be completed by executing the above steps once; when the coding unit corresponding to the current video frame is at a non-frame level, after the above steps are performed once, the above steps S502 to S507 need to be repeatedly performed until all coding units included in the current video frame are coded, and a coding code stream of one frame can be obtained.

For the above steps, for example, a video denoising method may be first used to denoise a current video frame in a video to form 3 levels of denoised frames L0-L2; assuming that a coding unit corresponding to the current video frame is at a frame level, coding can be performed on the de-noised frames L0-L2, and then according to an RDO formula, coding and selecting an optimal coding code stream corresponding to the de-noised frames as a code stream corresponding to the current video frame; assuming that a code stream corresponding to a current video frame is a code stream corresponding to a denoised frame L0, if a next video frame is also at a frame level, when the next video frame filters the denoised frame, L0 and L1 may be reserved (assuming that w is equal to v is equal to 1); assuming that a code stream corresponding to a current video frame is a code stream corresponding to a denoised frame L2, if a next video frame is also at a frame level, when the next video frame filters the denoised frame, L2 and L1 may be reserved (assuming that w is equal to v is equal to 1); otherwise, the next frame image unit picker selects candidate levels L0 to L2 (assuming that w is 1 to v).

In an exemplary embodiment, after denoising a current video frame to obtain a plurality of denoised frames L0-LK corresponding to different denoising strengths, the current video frame may be denoised by a difference-based non-local mean denoising method.

Specifically, the image block with the size of n × m may be traversed through the whole graph to obtain a plurality of blocks, the gradient of the current block (cur) is calculated, and if the gradient is greater than the gradient threshold, the block is not denoised; if the gradient is less than or equal to the gradient threshold, reference blocks (ref) surrounding the current block are matched with the current block.

When matching, referring to fig. 6, if the SAD of the current block and the reference block is smaller than the SAD threshold, the reference block and the current block are successfully matched; and after traversing 8 surrounding reference blocks, performing weighted fusion on all the ref and cur which are successfully matched, and finally obtaining output after airspace denoising. The specific formula is as follows:

where n is the number of matched ref and k is a preset variable.

It should be noted that, the gradient threshold and the SAD threshold are set differently according to the noise magnitude, and the larger the noise is, the larger the value of the gradient threshold and the SAD threshold can be set; conversely, the smaller the noise, the smaller the gradient threshold and the SAD threshold may be set to; the block size can be set differently according to the noise form, and the block size can be a larger value as the noise is transited from the fine low-frequency noise to the spot high-frequency noise; the step size range of horizontal and vertical traverse can be from 1 to n, and the step size is smaller and smaller from the fine low-frequency noise to the speckled high-frequency noise; the larger the noise, the smaller the step length; the spatial domain denoising of brightness and chroma, the block size, the traversal step length, the gradient threshold and the SAD threshold can be set differently. According to the size setting of the gradient threshold and the SAD threshold, denoised images of different levels of L0 to Ln can be generated.

In an exemplary embodiment, the current video frame is denoised to obtain a plurality of denoised frames L0-LK corresponding to different denoising strengths, and the current video frame may be denoised based on a spatio-temporal joint denoising method.

Specifically, referring to fig. 7, a current video frame may be converted into a YUV format, and then divided into a Y component currfrmy and a C component currfrmc, which are respectively subjected to denoising processing; for currmY, denoising the currmY through Spatial Noise Reduction (SNR) to obtain snrfmY; then, based on currfrmY, currmC, snrfmY and the reference frame refrmY obtained by the previous processing, time domain denoising (TNR) is carried out together to obtain tnrfmY; then, tnrfrmY is recorded in a reference frame list as a reference frame reffrmY of the next video frame; meanwhile, tnrfrmY and currfrmY are fused to obtain OutfrmY; aiming at the currmC, carrying out spatial domain denoising on the currmC through the SNR to obtain an OutfrmC; and finally, combining the OutfrmY and the OutfrmC to obtain a final de-noising frame.

It should be noted that, in the above process, the denoising frames corresponding to different denoising strengths can be generated by setting the fusion weight of tnrfrmY and currfrmy. Specifically, compared with currfrmy, the larger the proportion of tnrfrmY after fusion is, the stronger the surface denoising strength is.

In summary, in the video denoising and coding combination scheme in the exemplary embodiment, denoising frames at different levels are generated by performing denoising processing at multiple levels on an obtained video; then, selecting a fixed number of de-noised frames from the de-noised frames according to the time correlation and the space correlation, and carrying out coding processing on the de-noised frames according to a certain coding unit; finally, according to an objective and subjective measuring method, an optimal code stream can be selected as a code stream which is finally output; and simultaneously feeding back the denoising strength information selected by the coding unit to guide the next current coding unit to carry out coding.

It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Further, referring to fig. 8, the present exemplary embodiment further provides a video encoding apparatus 800, which includes a video frame denoising module 810, a unit encoding module 820, and a code stream selection module 830. Wherein:

the video frame denoising module 810 may be configured to denoise a current video frame according to multiple sets of denoising parameters to obtain a denoising frame set corresponding to the current video frame; the denoising frame set comprises denoising frames which are obtained according to a plurality of groups of denoising parameters and correspond to a plurality of different denoising strengths.

The unit encoding module 820 may be configured to determine a current encoding unit, and encode the current encoding unit corresponding to each denoised frame in the denoised frame set to obtain an encoding code stream set corresponding to the current encoding unit.

The code stream selection module 830 may be configured to select a target code stream from the encoded code stream set, and use the target code stream as a code stream corresponding to the current coding unit.

In an exemplary embodiment, the unit encoding module 820 may further be configured to screen the denoising frame set to obtain a screened denoising frame set.

In an exemplary embodiment, the unit encoding module 820 may be further configured to determine a first denoising strength based on a target code stream corresponding to a previous video frame; and taking the first denoising strength as a center, and screening a denoising frame set corresponding to the current video frame based on the first upper limit parameter and the first lower limit parameter to obtain a screened denoising frame set.

In an exemplary embodiment, the unit encoding module 820 may be further configured to determine a denoising strength corresponding to the target code stream as a first denoising strength.

In an exemplary embodiment, the unit encoding module 820 may be further configured to calculate an average denoising strength of a target code stream corresponding to a previous video frame by using an encoding unit as a unit, and determine the average denoising strength as a first denoising strength.

In an exemplary embodiment, the unit encoding module 820 may be further configured to determine a second denoising strength based on a target code stream of a corresponding position of a current encoding unit of a current video frame in a previous video frame; determining a third denoising strength based on a target code stream corresponding to an adjacent coding unit adjacent to the current coding unit space in the current video frame; taking the second denoising degree as a center, screening the denoising frame set based on a second upper limit parameter and a second lower limit parameter to obtain a first subset, and taking the third denoising degree as a center, screening the denoising frame set based on a third upper limit parameter and a third lower limit parameter to obtain a second subset; and determining the union of the first subset and the second subset as a screened denoising frame set.

In an exemplary embodiment, the code stream selection module 830 may be further configured to calculate, based on an evaluation formula, an evaluation parameter corresponding to each encoded code stream in the encoded code stream set; and selecting a target code stream from the coding code streams according to the evaluation parameters corresponding to the coding units.

The specific details of each module in the above apparatus have been described in detail in the method section, and details that are not disclosed may refer to the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device, for example, any one or more of the steps in fig. 3 to 5 may be performed.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Furthermore, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A video encoding method, comprising:

denoising a current video frame according to a plurality of groups of denoising parameters to obtain a denoising frame set corresponding to the current video frame; the denoising frame set comprises denoising frames which are obtained according to the multiple groups of denoising parameters and correspond to multiple different denoising strengths;

determining a current coding unit, and coding the current coding unit corresponding to each de-noised frame in the de-noised frame set to obtain a coding code stream set corresponding to the current coding unit;

and selecting a target code stream from the coding code stream set, and taking the target code stream as a code stream corresponding to the current coding unit.

2. The method of claim 1, wherein prior to said encoding a current coding unit of each of said denoised frames in said set of denoised frames, the method further comprises:

and screening the denoising frame set to obtain a screened denoising frame set.

3. The method of claim 2, wherein when the coding unit of the current video frame is at a frame level, the screening the denoised frame set to obtain a screened denoised frame set comprises:

determining a first denoising strength based on a target code stream corresponding to a previous video frame;

and taking the first denoising degree as a center, and screening a denoising frame set corresponding to the current video frame based on a first upper limit parameter and a first lower limit parameter to obtain a screened denoising frame set.

4. The method of claim 3, wherein when the coding unit of the previous video frame is at a frame level, the determining a first denoising strength based on a target code stream corresponding to the previous video frame comprises:

and determining the denoising strength corresponding to the target code stream as the first denoising strength.

5. The method of claim 3, wherein when the coding unit of the previous video frame is at a non-frame level, the determining a first denoising strength based on a target code stream corresponding to the previous video frame comprises:

and calculating the average denoising strength of the target code stream corresponding to the previous video frame by taking a coding unit as a unit, and determining the average denoising strength as the first denoising strength.

6. The method of claim 2, wherein the screening the de-noised frame set to obtain a screened de-noised frame set when the coding unit of the current video frame is at a non-frame level comprises:

determining a second denoising strength based on a target code stream of a corresponding position of a current coding unit of the current video frame in a previous video frame;

determining a third denoising strength based on a target code stream corresponding to an adjacent coding unit which is adjacent to the current coding unit in space in the current video frame;

taking the second denoising degree as a center, screening the denoising frame set based on a second upper limit parameter and a second lower limit parameter to obtain a first subset, and taking the third denoising degree as a center, screening the denoising frame set based on a third upper limit parameter and a third lower limit parameter to obtain a second subset;

and determining the union of the first subset and the second subset as a screened denoising frame set.

7. The method of claim 1, wherein selecting the target code stream from the set of encoded code streams comprises:

calculating an evaluation parameter corresponding to each coding code stream in the coding code stream set based on an evaluation formula;

and selecting a target code stream from each coded code stream according to the evaluation parameter corresponding to the coding unit.

8. A video encoding apparatus, comprising:

the video frame denoising module is used for denoising the current video frame according to a plurality of groups of denoising parameters to obtain a denoising frame set corresponding to the current video frame; the denoising frame set comprises denoising frames which are obtained according to the multiple groups of denoising parameters and correspond to multiple different denoising strengths;

the unit coding module is used for determining a current coding unit and coding the current coding unit corresponding to each denoising frame in the denoising frame set so as to obtain a coding code stream set corresponding to the current coding unit;

and the code stream selection module is used for selecting a target code stream in the coding code stream set and taking the target code stream as the code stream corresponding to the current coding unit.

9. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 7 via execution of the executable instructions.