CN111556227B

CN111556227B - Video denoising method and device, mobile terminal and storage medium

Info

Publication number: CN111556227B
Application number: CN202010425645.5A
Authority: CN
Inventors: 杨敏; 杜凌霄
Original assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2020-05-19
Filing date: 2020-05-19
Publication date: 2022-04-15
Anticipated expiration: 2040-05-19
Also published as: WO2021232965A1; CN111556227A

Abstract

The embodiment of the invention discloses a video denoising method, a video denoising device, a mobile terminal and a storage medium, wherein the method comprises the following steps: acquiring original image data and reference image data in the video data, wherein the original image data is image data to be denoised of a current frame, and the reference image data is image data denoised of a previous frame; dividing original image data into original image blocks and dividing reference image data into reference image blocks respectively according to multiple types of scales; calculating the target motion probability between the original image data and the reference image data according to the original image blocks and the reference image blocks under various types of scales; and carrying out three-dimensional denoising processing on the original image data according to the target motion probability and the reference image data to obtain target image data. The embodiment gives consideration to the performance and effect of three-dimensional denoising processing, and can realize real-time three-dimensional denoising processing under the condition that the performance of a mobile terminal is limited.

Description

Video denoising method and device, mobile terminal and storage medium

Technical Field

The embodiment of the invention relates to a video processing technology, in particular to a video denoising method, a video denoising device, a mobile terminal and a storage medium.

Background

With the rapid development of the mobile internet and the mobile terminal, the video data in the mobile terminal has become a common information carrier in human activities, such as live broadcast, video call, etc., and they contain a lot of information of objects, which becomes one of the ways for people to obtain the external original information.

Due to factors such as sensors, transmission, storage and the like, noise often appears in currently acquired video data, and the noise is particularly obvious in a dim light environment, so that the quality evaluation of the video data by a user subjectively is reduced.

The noise can be understood as a factor that hinders human sense organs from understanding the received information, and is expressed as random variation of brightness or color of pixels in the video data.

Therefore, the video data is usually subjected to Noise Reduction (NR) to remove useless information from the video data while maintaining the integrity (i.e., main features) of the original information as much as possible.

The 3DNR (3D Noise Reduction, three-dimensional denoising) algorithm is a commonly used denoising processing method, the 3DNR considers the characteristics of time and space information, single-point frame-by-frame superposition denoising is carried out, namely for the t (t is a positive integer) frame, the information of the 1 st frame to the t-1 st frame is accumulated to be used as a reference for Noise suppression.

Since frames in video data are continuous on the time axis, the definition of details and the consistency of frames can be maintained, and meanwhile, artifacts (ghost) are easily superimposed when the motion of a camera or an object is large.

For artifacts, artifact removal (Deghost) is usually implemented by using a motion estimation method or a motion compensation method, such as an optical flow method, but these methods are large in calculation amount and cannot be applied to a mobile terminal when the performance of the mobile terminal is limited.

Disclosure of Invention

Embodiments of the present invention provide a video denoising method, apparatus, mobile terminal, and storage medium, to solve the problem of how to slow down or remove artifacts when performing 3DNR on video data under the condition of limited performance.

In a first aspect, an embodiment of the present invention provides a video denoising method, including:

acquiring original image data and reference image data in video data, wherein the original image data is image data to be denoised of a current frame, and the reference image data is image data denoised of a previous frame;

dividing the original image data into original image blocks and dividing the reference image data into reference image blocks respectively according to multiple types of scales;

calculating the target motion probability between the original image data and the reference image data according to the original image blocks and the reference image blocks under various types of scales;

and carrying out three-dimensional denoising processing on the original image data according to the target motion probability and the reference image data to obtain target image data.

In a second aspect, an embodiment of the present invention further provides a video denoising apparatus, including:

the image data acquisition module is used for acquiring original image data and reference image data in the video data, wherein the original image data is image data to be denoised of a current frame, and the reference image data is image data denoised of a previous frame;

the image data partitioning module is used for respectively partitioning the original image data into original image blocks and the reference image data into reference image blocks in various scales;

the target motion probability calculation module is used for calculating the target motion probability between the original image data and the reference image data according to the original image blocks and the reference image blocks under various types of scales;

and the three-dimensional denoising processing module is used for carrying out three-dimensional denoising processing on the original image data according to the target motion probability and the reference image data to obtain target image data.

In a third aspect, an embodiment of the present invention further provides a mobile terminal, where the mobile terminal includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the video denoising method of the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the video denoising method according to the first aspect.

In the embodiment, original image data and reference image data in video data are obtained, the original image data is image data of a current frame to be denoised, the reference image data is image data of a previous frame which is denoised, the original image data is divided into original image blocks and the reference image data is divided into reference image blocks according to various scales, target motion probability between the original image data and the reference image data is calculated according to the original image blocks and the reference image blocks under various scales, and the original image data is subjected to three-dimensional denoising processing according to the target motion probability and the reference image data to obtain the target image data, on one hand, blocking is performed based on various scales, the blocking calculation amount is small, the calculation speed can be ensured, on the other hand, the target motion probability is calculated by fusing the blocks with various scales, and the dimension for calculating the target motion probability is enriched, the accuracy of the target motion probability is improved, whether corresponding pixel points are overlapped or not is accurately judged, the quality of three-dimensional denoising processing is improved, and the phenomenon of artifacts is relieved, so that the performance and the effect of the three-dimensional denoising processing are considered, and the real-time three-dimensional denoising processing can be realized under the condition that the performance of a mobile terminal and the like is limited.

Drawings

Fig. 1 is a flowchart of a video denoising method according to an embodiment of the present invention;

fig. 2 is an exemplary diagram of a scene for denoising video data according to an embodiment of the present invention;

fig. 3 is a flowchart of a video denoising method according to a second embodiment of the present invention;

fig. 4 is a diagram illustrating a blocking process according to a second embodiment of the present invention;

fig. 5A to 5C are exemplary diagrams of a feature region according to a second embodiment of the present invention;

fig. 6 is a schematic diagram of a 3DNR stacking process according to a second embodiment of the present invention;

FIGS. 7A to 7B are graphs comparing the effect of a 3DNR according to the second embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video denoising device according to a third embodiment of the present invention;

fig. 9 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a video denoising method according to an embodiment of the present invention, where the embodiment is applicable to calculating motion probability of a single point by blocking video data according to multiple types of scales, so as to perform a 3DNR condition, the method may be executed by a video denoising device, the video denoising device may be implemented by software and/or hardware, and may be configured in a mobile terminal, for example, a mobile phone, a tablet computer, a smart wearable device (such as a smart watch, and the like), and the method specifically includes the following steps:

s101, acquiring original image data and reference image data in the video data.

In this embodiment, the acquired video data is video data to be denoised, and the video data to be denoised generally refers to video data generated, transmitted or played in a service scene with real-time performance.

In general, the mobile terminal that generates the video data performs denoising processing on the video data, and at this time, as shown in fig. 2, in S201, a camera of the mobile terminal is turned on, and in S202, the camera collects the video data.

The 3DNR is a function that assumes that randomly generated noise in the video data varies with time, and is expressed as follows:

F(t)＝F+N(t)

wherein F (t) is image data with noise, F is original image data, and n (t) is noise that varies with time, and the noise follows a gaussian distribution with a mean value of 0, and according to the theorem of majors, if n (t) is accumulated, the more it is accumulated with time, the noise is approximately close to zero.

For 3DNR, the video data in S202 is original video data, that is, video data without other processing (such as white balance, brightness adjustment, etc.), and is usually in YUV format, and at this time, noise carried by the video data most conforms to gaussian distribution with an average value of 0, so that the effect of 3DNR can be ensured.

Of course, if the operating system in the mobile terminal is limited, the original video data cannot be obtained, and the video data subjected to other processing (such as white balance, brightness adjustment, and the like) may also be subjected to denoising processing, which is not limited in this embodiment.

In addition, in addition to performing denoising processing on the video data in the mobile terminal that generates the video data, denoising processing may also be performed on the video data in the mobile terminal that plays the video data, which is not limited in this embodiment.

For example, in a live service scenario, video data to be subjected to denoising processing may refer to video data for carrying live content, a mobile terminal logged by a host user generates video data, and the video data is distributed to devices logged by each audience user through a live platform to be played, at this time, denoising processing is usually performed on the video data by the mobile terminal logged by the host user.

For another example, in a service scenario of a video call, video data waiting for denoising processing may refer to video data for carrying call content, a mobile terminal logged by a user initiating the call generates video data, and the video data is sent to a device logged by each user invited to call for playing, where the video data is generally denoised by the mobile terminal logged by the user initiating the call.

For another example, in a service scenario of a video conference, video data waiting for denoising processing may refer to video data for carrying conference content, a mobile terminal logged by a speaking user generates video data, and the video data is transmitted to a device logged by each user participating in the conference for playing, at this time, the mobile terminal logged by the speaking user typically performs denoising processing on the video data.

In addition to the video data that requires real-time performance, such as live broadcast and video call, the video data waiting for denoising may also refer to video data generated in a service scene that requires low real-time performance, such as a short video.

Further, the video data includes a plurality of frames of image data, and is sequentially denoted as P in the order of generation₁、P₂、……、P_t-1、P_t、P_t+1、……、P_nAnd t and n are positive integers, and t +1 is less than n, so that n is continuously increased along with the generation of the video data until the generation of the video data is finished because the video data is generated in real time.

In this embodiment, each frame of image data in the video data is sequentially traversed to perform three-dimensional denoising, for convenience of description, the image data to be denoised in the current frame is referred to as original image data, and the image data denoised in the previous frame is referred to as reference image data.

For example, as shown in fig. 2, in S203, the currently input Frame (t Frame) is set as original image data, and in S204, the previous Frame (t-1Frame) is set as reference image data to assist in performing three-dimensional denoising processing on the currently input Frame (t Frame).

It should be noted that, in the process of traversing video data, there is transformation on the attribute of a certain frame of image data, that is, when traversing to the frame of image data, the frame of image data is original image data, and when traversing to the next frame of image data, the frame of image data is reference image data.

For example, in the pair P_tWhen three-dimensional denoising is carried out, the original image data of the current frame to be denoised is P_tIf the reference image data of the previous frame which has been denoised is P_t+1In the pair P_t+1When three-dimensional denoising is carried out, the original image data of the current frame to be denoised is P_t+1If the reference image data of the previous frame which has been denoised is P_t。

And S102, dividing the original image data into original image blocks and dividing the reference image data into reference image blocks respectively in multiple types of scales.

In this embodiment, multiple (two or more) types of scales may be preset, where the type represents a manner of dividing the image data into multiple (two or more) blocks (blocks), and the scale may be a parameter for setting the image data as blocks in the type, for example, a certain type may be number, that is, the image data is a block of a specified number (scale), and for example, a certain type may be size (width and height), that is, the image data is a block of a specified size (scale), and the like.

It should be noted that, under the same type, one or more scales may be set, for example, for a type in which the image data is a specified number of blocks, the number may refer to 8 × 16, 16 × 32, and the like, and for a type in which the image data is a specified size of blocks, the number may refer to 5, 9, and the like, which is not limited in this embodiment.

In a specific implementation, as shown in fig. 2, in S203 and S204, the original image data is subjected to blocking processing in multiple types of scales, that is, the original image data is divided into blocks as original image blocks, and the reference image data is subjected to blocking processing in multiple types of scales, that is, the reference image data is divided into blocks as reference image blocks, where the types and scales of the divided original image blocks are the same as the types and scales of the divided reference image blocks.

And S103, calculating the target motion probability between the original image data and the reference image data according to the original image blocks and the reference image blocks under various scales.

The method comprises the steps of dividing original image blocks and reference image blocks under different types of scales to be different, wherein motion conditions represented by pixels at the same position between the original image blocks and the reference image blocks are different, comparing the motion conditions of the original image blocks and the reference image blocks under different types of scales respectively, comprehensively evaluating the motion conditions of the original image blocks and the reference image blocks under all types of scales, reflecting the motion conditions of the pixels more truly, and calculating target motion probabilities of the pixels at the same position between original image data and reference image data, wherein the target motion probabilities represent the probabilities of the pixels in motion states.

And S104, carrying out three-dimensional denoising processing on the original image data according to the target motion probability and the reference image data to obtain target image data.

In this embodiment, as shown in fig. 2, in S205, 3DNR is applied, and the target motion probability and the reference image data are used to perform three-dimensional denoising processing on the original image data in the video data, and for the video data after denoising processing, subsequent processing may be performed according to a service scene, which is not limited in this embodiment.

For example, as shown in fig. 2, the Video data after the three-dimensional denoising process is displayed on a screen in S206, and the Video data after the denoising process is encoded in S207, for example, in the h.264 format, packaged in the FLV (Flash Video) format, and waiting to be sent to a device playing the Video data.

Example two

Fig. 3 is a flowchart of a video denoising method according to a second embodiment of the present invention, where the present embodiment further details processing operations of block processing, target motion probability calculation, and 3DNR based on the foregoing embodiment, and the method specifically includes the following steps:

s301, original image data and reference image data in the video data are obtained.

The original image data is image data of a current frame to be denoised, and the reference image data is image data of a previous frame which has been denoised.

S302, one or more first target values are determined according to the scale with the types as the number.

In the present embodiment, the image data (original image data, reference image data) is divided into a specified number of blocks (original image blocks, reference image blocks) with the number as a division block, and one or more first target values may be determined for the number as a scale in the type.

Further, the first target value may be expressed in a × b manner, where a and b are both positive integers, and when the image data is divided into blocks, the width is divided into a equal parts and the height is divided into b equal parts, thereby dividing the image data into a × b blocks.

In the case where the size of the image data is constant, the larger a × b is, the smaller the divided block 421 is, whereas the smaller a × b is, the larger the divided block 421 is.

For example, as shown in fig. 4, when the image data 410 is partitioned into two different a × b blocks, a plurality of blocks 421 and a plurality of blocks 422 are obtained, respectively, and the blocks 421 are smaller than the blocks 422.

In addition, the first target value may be a preset value or a value set to adapt to the current video data, which is not limited in this embodiment.

In a mode of adapting video data, the resolution of the video data can be inquired, so that a numerical value which is in positive correlation with the resolution is set as a first target value, the size of a segmented block is in a proper range, the accuracy of motion information between the blocks is ensured, and the accuracy of calculating the motion probability is ensured.

The positive correlation may mean that the larger the resolution, the larger the first target value is, so that the more divided blocks are, whereas the smaller the resolution, the smaller the first target value is, so that the fewer divided blocks are.

For example, if the resolution of the video data is 1920 × 1080, 16 × 32 and 8 × 16 may be set as the first target values so that the divided block sizes are about 120 × 34 and 240 × 68, and if the resolution of the video data is 1280 × 720, 8 × 16 and 4 × 8 may be set as the first target values so that the divided block sizes are about 160 × 45 and 320 × 90.

Of course, other parameters may be adopted to adapt the video data besides the resolution, for example, the first target value is set based on the connected regions of the video data, so that the first target value is positively correlated with the number of the connected regions, and so on, which is not limited in this embodiment.

S303, divide the original image data into areas of which the number is the first target value as original image blocks.

For each first target value, the original image data may be divided into a number of regions of the first target value, and the regions may be regarded as original image blocks if the regions are equal or similar in size.

Note that, in some cases, the original image data cannot be equally divided, and in this case, the width and height of the resolution may be divided by the first target value, the integer after the division may be regarded as the width and height of each block, and the remainder after the division may be assigned to the width and height of each block.

For example, if the resolution of the video data is 1920 × 1080 and the first target value is 16 × 32 or 8 × 16, the width of the original image data is divided into 16 equal parts (the width of each block is 120), the height of the original image data is divided into 32 equal parts (the height of 8 blocks is 33, and the height of 24 blocks is 34), and each of the divided blocks is an original image block; further, the width of the original image data is divided into 8 equal parts (the width of each block is 240), the height of the original image data is divided into 16 equal parts (the height of 8 blocks is 67, the height of 8 blocks is 68), and each block after division is an original image block.

And S304, dividing the reference image data into areas with the number of the first target values as reference image blocks.

For each first target value, the reference image data may be divided into a number of regions of the first target value, and the regions may be regarded as the reference image blocks if the regions are equal or similar in size.

It should be noted that, in some cases, the reference image data cannot be equally divided, and in this case, the width and height of the resolution may be divided by the first target value, the integer after the division may be regarded as the width and height of each block, and the remainder after the division may be assigned to the width and height of each block.

And S305, sequentially setting each position in the video data as a target position according to the size of the type.

In this embodiment, the size is taken as a division block, that is, a block (original image block, reference image block) with a specified size is divided from the image data (original image data, reference image data), so that the position of each pixel point in the video data can be traversed, and each position is set as a target position in sequence.

The target position may be represented by (X, Y), where X represents an X coordinate and Y represents a Y coordinate, and when dividing the image data into blocks, a corresponding range (e.g., a rectangle, a circle, etc.) is divided into blocks by using the target position as a reference point (e.g., a center point, an angular point, a midpoint of a certain edge, etc.).

For example, as shown in fig. 4, each position in the traversal image data 410 is a target position, and a range of 5 × 5 may be divided as a block 431 with the target position a as a center point for the target position a, and a range of 5 × 5 may be divided as a block 432 with the target position B as a center point for the target position B.

And S306, sequentially dividing the area which contains the target position and has the size of the second target value in the original image data to be used as the original image block.

For each target position, the original image data may be divided into an area that includes the target position and has a size of a second target value, and the area may be regarded as an original image block.

The second target value may be preset data, or may also be a value set to adapt to the current video data, for example, the resolution of the video data may be queried, so as to set the value having a positive correlation with the resolution as the second target value, and so on, which is not limited in this embodiment.

It should be noted that, in some cases, the range of the area may exceed the boundary of the original image data, and in this case, the range may be truncated by the original image data as the original image block.

And S307, sequentially dividing the area which contains the target position and has the size of the second target value in the reference image data to be used as the reference image block.

For each target position, the reference image data may be divided into an area including the target position and having a size of a second target value, and the area may be regarded as a reference image block.

It should be noted that, in some cases, the range of the area may exceed the boundary of the reference image data, and in this case, the range may be truncated by the reference image data to serve as the reference image block.

Of course, the manner of dividing the blocks is only an example, and when the embodiment is implemented, other manners of dividing the blocks may be set according to actual situations, which is not limited by the embodiment. In addition, besides the above-mentioned manner of dividing the blocks, a person skilled in the art may also adopt other manners of dividing the blocks according to actual needs, and this embodiment also does not limit this.

According to the method and the device, the original image data and the reference image data are subjected to block processing according to the types of the number and the size, and blocks are divided in different scales, so that the dimensionality of the motion information is richer, the accuracy of the motion information is improved, and the accuracy of the subsequent calculation of the target motion probability is improved.

And S308, respectively calculating motion information between the original image block and the reference image block under each type of scale.

In this embodiment, each scale in each type may be traversed, the original image block and the reference image block in the same region are respectively matched for the same scale of the same type, the motion information between the original image block and the reference image block is calculated, and each pixel point in the original image block uses the motion information together.

For example, dividing the original image data into a plurality of original image blocks a and dividing the reference image data into a plurality of reference image blocks a based on a number a (e.g., 8 × 16) of divided blocks, the motion information a may be calculated using the original image blocks a and the reference image blocks a; dividing the original image data into a plurality of original image blocks B and dividing the reference image data into a plurality of reference image blocks B based on the number B (e.g., 16 × 32) of divided blocks, the motion information B may be calculated using the original image blocks B and the reference image blocks B; dividing blocks based on size C (e.g., 5 × 5), dividing original image data into a plurality of original image blocks C, and dividing reference image data into a plurality of reference image blocks C, then motion information C may be calculated using the original image blocks C and the reference image blocks C; at this time, motion information a, motion information B, and motion information C are accumulated for the pixel points in the original image data.

In a specific implementation, for each type of each scale divided block, the Sum of Absolute residuals (SAD) between the original image block and the reference image block may be calculated as motion information, thereby speeding up the calculation.

For the case where the type is a number, the motion information may be expressed as:

wherein i represents a block (i.e. an original image block and a reference image block), m represents the number of pixels in the block, k represents pixels in the block, f (t) represents original image data, and B (t-1) represents reference image data.

wherein j represents a block (i.e., an original image block and a reference image block), n × r represents the size of the block and also represents the number of pixels in the block, k represents pixels in the block, f (t) represents original image data, and B (t-1) represents reference image data.

Of course, other parameters than SAD may be used as the motion information, for example, SATD (Sum of Absolute Transformed Difference after Hadamard transformation), SSD (Sum of Squared Difference), MAD (Mean Absolute Difference), MSD (Mean Squared Difference), and the like, and the present embodiment is not limited thereto.

S309, calculating candidate motion probabilities of pixel points at the same position between the original image data and the reference image data respectively based on the motion information under each type of scale.

In this embodiment, a probability mapping function for mapping the motion information to a probability representing the state of motion may be set in advance.

Taking SAD as an example, the probability is expressed as:

P_ij＝f_j(SAD_i)

where i denotes the block, j denotes the type of the partition block and its scale, P denotes the probability of being in a motion state, P ∈ [0, 1 ∈]，f_j(. cndot.) represents the probability mapping function under that type and its scale.

For example, the probability mapping function may be a linear function, such as f (x) ═ cx + d, where x is motion information, c and d are both hyper-parameters, and values of c and d are different for different types and different scales.

In a way of calculating candidate motion probability, if there is a scale in the same type, the motion information corresponding to each pixel point in the original image data can be substituted into the probability mapping function, so as to map to obtain the probability representing the motion state as the candidate motion probability.

In another way of calculating the candidate motion probability, if a plurality of scales exist in the same type, the probability of representing the motion state in each scale can be calculated, the blocking effect is removed, and different probabilities are fused, so that the candidate motion probability is obtained.

In this embodiment, S309 includes the steps of:

s3091, respectively mapping the motion information under the multiple scales to first intermediate motion probabilities of pixel points at the same positions between the original image data and the reference image data.

And substituting the motion information corresponding to each pixel point in the original image data into a probability mapping function, thereby mapping to obtain the probability representing the motion state as a first intermediate motion probability.

S3092, for the same scale, forming a connected characteristic region by partial regions in at least two original image blocks.

Segmenting the image data into blocks (original image blocks and reference image blocks) can bring about a blocking effect, that is, the blocks contain different objects, and the different objects are in a motion state or a static state, but all the pixel points in the blocks can apply the probability that the same representation is in the motion state, and for this reason, the blocking effect is slowed down or eliminated in an overlapping mode in the embodiment.

In a specific implementation, partial areas in at least two original image blocks can be taken out and combined into a connected area as a feature area.

By connected region, it is meant that any two points in the region can be connected by a fold line that completely belongs to the region.

In one example, for two adjacent original image blocks, half of the area is taken out, constituting the feature area.

For example, as shown in fig. 5A, the original image block 511 takes the lower half area and the original image block 512 takes the upper half area, and constitutes a feature area 513.

For another example, as shown in fig. 5B, the original image block 521 has a right half area and the original image block 522 has a left half area, and the feature area 523 is formed.

In another example, for four adjacent original image blocks, one quarter of the area is taken out to constitute the feature area.

For example, as shown in fig. 5C, the original image block 531 takes the area of the lower right portion, the original image block 532 takes the area of the lower left portion, the original image block 533 takes the area of the upper right portion, and the original image block 534 takes the area of the upper left portion, which constitute the feature area 535.

Of course, the above-mentioned manner of combining connected regions is only an example, and when implementing the embodiment of the present invention, other manners of combining connected regions may be set according to actual situations, for example, randomly determining a position in the original image data, determining a connected region based on the position (for example, determining a circular region with a specified radius by taking the position as a center), as a feature region, and the like, which is not limited in this embodiment of the present invention. In addition, besides the above-mentioned manner of combining the communication areas, a person skilled in the art may also use other manners of combining the communication areas according to actual needs, and the embodiment of the present invention is not limited thereto.

S3093, calculating second intermediate motion probability for the pixel points in the feature region based on all the first intermediate motion probabilities in the feature region aiming at the same scale.

The pixel points in the feature region have at least two first intermediate motion probabilities, and the probability that each pixel point in the feature region is in a motion state can be calculated by fusing the at least two first intermediate motion probabilities to serve as a second intermediate motion probability.

In one example, the second intermediate motion probability may be calculated by way of linear fusion, as follows:

wherein, P_1kRepresenting a first intermediate motion probability of number u, w_kRepresenting configuration to P_1kWeight of (1), P₂Representing a second intermediate motion probability.

Of course, the above manner of calculating the second intermediate motion probability is only an example, and when the embodiment of the present invention is implemented, other manners of calculating the second intermediate motion probability may be set according to actual situations, which is not limited in this embodiment of the present invention. In addition, besides the above-mentioned manner of calculating the second intermediate motion probability, a person skilled in the art may also adopt other manners of calculating the second intermediate motion probability according to actual needs, and the embodiment of the present invention is not limited to this.

S3094, calculating a third intermediate motion probability of the pixel point under each scale based on all the first intermediate motion probabilities and the second intermediate motion probabilities corresponding to the pixel point aiming at the same scale.

After overlapping, a pixel point in the original image data has a first intermediate motion probability and a second intermediate motion probability under the same scale, and the probability that the pixel point is in a motion state under the scale can be calculated by combining the first intermediate motion probability and the second intermediate motion probability to serve as a third intermediate motion probability.

In one example, the third intermediate motion probability may be calculated by way of linear fusion, as follows:

wherein, P_0vRepresenting a first intermediate motion probability, a second intermediate motion probability, the number of which is v, w_kRepresenting configuration to P_0kWeight of (1), P₃Representing a third intermediate motion probability.

In another example, the second intermediate motion probabilities may be fused directly, as represented below:

wherein, P_0vRepresenting a first intermediate motion probability, a second intermediate motion probability, the number of which is v, P₃Representing third intermediate motion probability

Of course, the above-mentioned manner of calculating the third intermediate motion probability is only an example, and when implementing the embodiment of the present invention, other manners of calculating the third intermediate motion probability may be set according to actual situations, which is not limited in this embodiment of the present invention. In addition, besides the above-mentioned manner of calculating the third intermediate motion probability, a person skilled in the art may also adopt other manners of calculating the third intermediate motion probability according to actual needs, and the embodiment of the present invention is not limited to this.

S3095, calculating candidate motion probabilities of the pixel points based on all the third intermediate motion probabilities corresponding to the pixel points according to all the scales.

The pixel point in the original image data has a plurality of (two or more) third intermediate motion probabilities under different scales, and the probability that the pixel point is in a motion state under the scale can be calculated by fusing the third intermediate motion probabilities to serve as candidate motion probabilities.

In one example, the candidate motion probabilities may be computed by way of linear fusion, as follows:

wherein, P_3vRepresenting a third intermediate motion probability of a number v, w_kRepresenting configuration to P_3kWeight of (1), P₄Representing candidate motion probabilities.

In another example, the third intermediate motion probability may be directly fused, as represented below:

wherein, P_3vRepresenting a third intermediate motion probability of a number v, P₄Representing candidate motion probabilities.

Of course, the above-mentioned manner of calculating the candidate motion probability is only an example, and when implementing the embodiment of the present invention, other manners of calculating the candidate motion probability may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above-mentioned manner of calculating the candidate motion probability, a person skilled in the art may also adopt other manners of calculating the candidate motion probability according to actual needs, and the embodiment of the present invention is not limited to this.

In addition, for the case where multiple scales exist in the same type, in order to further remove the blocking effect, filtering processing such as mean filtering, median filtering, and the like may be performed on the candidate motion probabilities.

S310, calculating the target motion probability of the pixel point at the same position between the original image data and the reference image data by combining all the candidate motion probabilities under all types of scales.

The pixel point in the original image data has a plurality of (two or more) candidate motion probabilities under different types, and the probability that the pixel point is finally represented in a motion state can be calculated by fusing the candidate motion probabilities to serve as a target motion probability.

In an example, the candidate motion probabilities may be calculated in a linear fusion manner, specifically, weights are configured for the candidate motion probabilities corresponding to pixel points at the same position between the original image data and the reference image data under scales of various types, products between the candidate motion probabilities and the weights are calculated, a sum of all the products is calculated, and the sum is used as a target motion probability of a pixel point, where the linear fusion manner is expressed as follows:

wherein, P_4vRepresenting the probability of a candidate motion, the number (i.e. type) of which is l, w_kRepresenting configuration to P_4kWeight of (1), P₅Representing the probability of object motion.

In another example, candidate motion probabilities may be fused directly, as represented below:

wherein, P_4vRepresenting the probability of a candidate motion, the number (i.e. type) of which is l, P₅Representing the probability of object motion.

Of course, the above manner of calculating the target motion probability is only an example, and when the embodiment of the present invention is implemented, other manners of calculating the target motion probability may be set according to actual situations, which is not limited in the embodiment of the present invention. In addition, besides the above-mentioned manner of calculating the target motion probability, a person skilled in the art may also adopt other manners of calculating the target motion probability according to actual needs, and the embodiment of the present invention is not limited to this.

It should be noted that, when the candidate motion probability and the target motion probability are calculated by using the direct fusion equation, the target motion probability may be greater than 1, and therefore, the present embodiment may normalize the target motion probability, thereby ensuring that the target motion probability is reasonable, and enabling the three-dimensional denoising process to be normally performed.

In a standardized manner, the target motion probability may be compared with 1, and if the target motion probability is greater than 1, the target motion probability is set to 1, which may ensure the distribution of the target motion probability with a lower value, thereby improving the accuracy of the target motion probability.

In addition to the above-mentioned normalization, other normalization methods can be used, such as Min-max normalization (Min-max normalization), log function transformation, atan function transformation, z-score normalization (zero-mean normalization), fuzzy quantization, and so on, which is not limited in this embodiment.

And S311, mapping the target motion probability into a first mixing coefficient, and configuring the first mixing coefficient to the reference image data.

In the present embodiment, a coefficient mapping function for mapping the target motion probability to a coefficient suitable for performing 3DNR may be set in advance.

Substituting the target motion probability into the coefficient mapping function, setting the output coefficient as a first mixing coefficient, and configuring the first mixing coefficient to the reference image data, wherein the first mixing coefficient can be expressed as:

w_t＝f_e(P₅)

wherein, w_tDenotes a first mixing coefficient, w_t∈[0，1]，f_e(. represents a probability mapping function, P)₅Representing the probability of object motion.

Illustratively, the coefficient mapping function may be a linear function, such as f (x) ═ gx + h, where x is the probability of target motion and g and h are both hyperparameters.

Of course, the coefficient mapping function may be a non-linear function instead of a linear function, which is not limited in this embodiment.

And S312, calculating a second mixing coefficient based on the first mixing coefficient, and configuring the second mixing coefficient to the original image data.

In this embodiment, the first blending coefficient and the second blending coefficient have a conversion relationship therebetween, and the second blending coefficient can be calculated by the conversion relationship and configured to the original image data.

For example, the first mixing coefficient may be subtracted from 1 to obtain the second mixing coefficient.

S313, on the basis of the reference image data configured with the first mixing coefficient, superposing the original image data configured with the second mixing coefficient to obtain target image data.

In this embodiment, after the first mixing coefficient and the second mixing coefficient are arranged, the reference image data and the original image data may be superimposed, and the image data after the superimposition is the image data after 3DNR, which is referred to as target image data.

At this time, 3DNR may be represented as:

B_t＝W_t*B_t-1+(1-W_t)*F_t

wherein, B_tAs target image data, w_tIs a first mixing coefficient, (1-w)_t) Is the second mixing coefficient, B_t-1For reference image data, F_tIs the original image data.

As shown in fig. 6, F (m, n; t) represents a pixel point with coordinates (m, n) of the image data of the t-th frame, and the pixel point has motion in the sequence of the image data as shown by an arrow, so that in the process of denoising using the ordinary 3DNR, the pixel point refers to the denoising result of the image data of the t-1frame, and the ordinary 3DNR is represented as:

B_t(m，n)＝W_t(m，n)*B_t-1(m，n)+[1-W_t(m，n)]*F_t(m，n)

wherein t is the time corresponding to the current frame image data of the video data, B_t-1For the result of denoising the t-1 th frame image data, F_tIs current frame image data of video data, B_tAs a result of denoising the current frame image data, W_t(m, n) are coefficients obtained by a single pixel point, so that in an actual use scene, 3 DNRs are easy to superpose artifacts if encountering a dynamic scene.

As shown in fig. 7A, the left image data 701 is original image data, the middle image data 702 is image data after applying ordinary 3DNR denoising, and the right image data 703 is image data after applying 3DNR denoising according to this embodiment.

For the convenience of observation, the

image data

701, 702, and 703 are adjusted in brightness and contrast to obtain image data 701 ', 702 ', and 703 ', respectively, as shown in fig. 7B.

As shown in fig. 7A and 7B, when there is an object moving in the direction of the arrow in the image data 701 (image data 701 '), after applying ordinary 3DNR denoising, an obvious artifact appears in the direction of the arrow in the image data 702 (image data 702 '), and after applying 3DNR denoising of the present embodiment, the image data 703 (image data 703 ') retains the denoising effect in the region of the top half of the sky, and at the same time, the artifact is improved significantly in the direction of the arrow.

In a mobile terminal configured with a Helio a22 chip, the 3DNR of this embodiment is applied to perform denoising processing on video data, and the speed can reach over 100FPS (Frames Per Second).

The Helio A22 chip is manufactured by using a 12nm process technology, has four A53 architecture designs (the highest main frequency is 2.0GHz), is integrated with a PowerVR graphic processor, and has the screen highest supporting 1600 multiplied by 720 resolution and 20: the ratio of 9, currently on the market, is a lower performing processor.

EXAMPLE III

Fig. 8 is a schematic structural diagram of a video denoising device according to a third embodiment of the present invention, where the device specifically includes the following modules:

an image data obtaining module 801, configured to obtain original image data and reference image data in video data, where the original image data is image data to be denoised in a current frame, and the reference image data is image data that has been denoised in a previous frame;

an image data partitioning module 802, configured to divide the original image data into original image blocks and divide the reference image data into reference image blocks according to multiple types of scales;

a target motion probability calculation module 803, configured to calculate a target motion probability between the original image data and the reference image data according to the original image block and the reference image block under multiple types of scales;

and the three-dimensional denoising module 804 is configured to perform three-dimensional denoising on the original image data according to the target motion probability and the reference image data to obtain target image data.

In one embodiment of the present invention, the image data blocking module 802 comprises:

the target value determining submodule is used for determining one or more first target values aiming at the scale with the number of types;

a first number dividing submodule configured to divide the original image data into regions, the number of which is the first target value, as original image blocks;

and the second number dividing submodule is used for dividing the reference image data into areas with the number of the first target value as the reference image blocks.

In one embodiment of the present invention, the target value determination submodule includes:

a resolution query unit configured to query a resolution of the video data;

a target value setting unit for setting a first target value positively correlated with the resolution.

the target position setting submodule is used for sequentially setting each position in the video data as a target position according to the size of the type;

a first size division submodule, configured to sequentially divide, in the original image data, an area that includes the target position and has a size equal to a second target value, as an original image block;

and the second size division submodule is used for sequentially dividing an area which contains the target position and has a size of a second target value in the reference image data to be used as a reference image block.

In one embodiment of the present invention, the object motion probability calculation module 803 includes:

the motion information calculation sub-module is used for respectively calculating motion information between the original image block and the reference image block under each type of scale;

a candidate motion probability calculation submodule for calculating candidate motion probabilities of pixel points located at the same position between the original image data and the reference image data based on the motion information under each type of scale;

and the scale fusion calculation submodule is used for calculating the target motion probability of the pixel point at the same position between the original image data and the reference image data by combining all the candidate motion probabilities under all types of scales.

In one embodiment of the present invention, the candidate motion probability calculation sub-module includes:

the first intermediate motion probability calculation unit is used for mapping the motion information under the multiple scales into first intermediate motion probabilities of pixel points at the same position between the original image data and the reference image data if the multiple scales exist under the same type;

the characteristic region forming unit is used for forming a communicated characteristic region by using partial regions in at least two original image blocks according to the same scale;

the second intermediate motion probability calculation unit is used for calculating second intermediate motion probabilities for pixel points in the feature region based on all the first intermediate motion probabilities in the feature region aiming at the same scale;

a third intermediate motion probability calculation unit, configured to calculate, for the same scale, a third intermediate motion probability of the pixel point in each scale based on all the first intermediate motion probabilities and the second intermediate motion probabilities corresponding to the pixel point;

and the probability fusion calculation unit is used for calculating the candidate motion probability of the pixel point based on all the third intermediate motion probabilities corresponding to the pixel point aiming at all the scales.

In one embodiment of the present invention, the candidate motion probability calculation sub-module further includes:

and the filtering processing unit is used for carrying out filtering processing on the candidate motion probability.

In one embodiment of the present invention, the scale fusion computation submodule includes:

the weight configuration unit is used for respectively configuring weights for the candidate motion probabilities corresponding to the pixel points at the same position between the original image data and the reference image data under the scales of various types;

a product calculation unit for calculating products between the candidate motion probabilities and the weights, respectively;

and the product sum calculating unit is used for calculating the sum value of all the products as the target motion probability of the pixel point.

In an embodiment of the present invention, the scale fusion computation submodule further includes:

and the normalization processing unit is used for setting the target motion probability to be 1 if the target motion probability is greater than 1.

In an embodiment of the present invention, the three-dimensional denoising module 804 includes:

a first mixing coefficient mapping sub-module for mapping the target motion probability to a first mixing coefficient;

a first mixing coefficient configuration submodule configured to configure the first mixing coefficient to reference image data;

a second mixing coefficient calculation sub-module for calculating a second mixing coefficient based on the first mixing coefficient;

a second mixing coefficient configuration submodule for configuring the second mixing coefficient to the original image data;

and the image data superposition submodule is used for superposing and configuring the original image data of the second mixing coefficient on the basis of the reference image data configured with the first mixing coefficient to obtain target image data.

The video denoising device provided by the embodiment of the invention can execute the video denoising method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 9 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention. As shown in fig. 9, the mobile terminal includes a processor 900, a memory 901, a communication module 902, an input device 903, and an output device 904; the number of the processors 900 in the mobile terminal may be one or more, and one processor 900 is taken as an example in fig. 9; the processor 900, the memory 901, the communication module 902, the input device 903 and the output device 904 in the mobile terminal may be connected by a bus or other means, and fig. 9 illustrates an example of connection by a bus.

The memory 901 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as modules corresponding to the video denoising method in the present embodiment (for example, an image data acquisition module 801, an image data blocking module 802, a target motion probability calculation module 803, and a three-dimensional denoising processing module 804 in the video denoising apparatus shown in fig. 8). The processor 900 executes various functional applications and data processing of the mobile terminal by running software programs, instructions and modules stored in the memory 901, that is, implements the video denoising method described above.

The memory 901 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the mobile terminal, and the like. Further, the memory 901 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 901 may further include memory located remotely from the processor 900, which may be connected to the mobile terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And the communication module 902 is configured to establish a connection with the display screen and implement data interaction with the display screen.

The input device 903 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal, and may also be a camera for acquiring images and a sound pickup apparatus for acquiring audio data.

The output device 904 may include an audio device such as a speaker.

It should be noted that the specific composition of the input device 903 and the output device 904 can be set according to actual conditions.

The processor 900 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 901, that is, implements the above-described connection node control method of the electronic whiteboard.

The mobile terminal provided by the embodiment of the invention can execute the video denoising method provided by any embodiment of the invention, and has corresponding functions and beneficial effects.

EXAMPLE five

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a video denoising method, and the method includes:

Of course, the computer program of the computer-readable storage medium provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the video denoising method provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the video denoising apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for denoising a video, comprising:

2. The method according to claim 1, wherein the dividing the original image data into original image blocks and the dividing the reference image data into reference image blocks at multiple types of scales respectively comprises:

determining one or more first target values for the scale with the number of types;

dividing the original image data into areas with the number of the first target values as original image blocks;

and dividing the reference image data into areas with the number of the first target values as reference image blocks.

3. The method of claim 2, wherein determining one or more first target values comprises:

querying a resolution of the video data;

a first target value is set that is positively correlated with the resolution.

4. The method according to claim 1, wherein the dividing the original image data into original image blocks and the dividing the reference image data into reference image blocks at multiple types of scales respectively comprises:

sequentially setting each position in the video data as a target position according to the size of the type;

sequentially dividing areas which contain the target positions and are of a second target value in the original image data to serve as original image blocks;

and sequentially dividing an area which contains the target position and has a second target value in the reference image data to serve as a reference image block.

5. The method according to any of claims 1-4, wherein said calculating a target motion probability between said original image data and reference image data from said original image block and said reference image block at a plurality of types of scales comprises:

respectively calculating motion information between the original image block and the reference image block under each type of scale;

calculating the candidate motion probability of pixel points at the same position between the original image data and the reference image data based on the motion information under each type of scale;

and calculating the target motion probability of the pixel points at the same position between the original image data and the reference image data by combining all the candidate motion probabilities under all types of scales.

6. The method of claim 5, wherein the calculating the candidate motion probability of the co-located pixel point between the original image data and the reference image data based on the motion information at each type of scale comprises:

if multiple scales exist in the same type, mapping the motion information in the multiple scales to be first intermediate motion probabilities of pixel points located at the same position between the original image data and the reference image data respectively;

aiming at the same scale, forming a communicated characteristic region by partial regions in at least two original image blocks;

calculating a second intermediate motion probability for the pixel points in the feature region based on all the first intermediate motion probabilities in the feature region for the same scale;

aiming at the same scale, calculating a third intermediate motion probability of the pixel point under each scale based on all the first intermediate motion probabilities and the second intermediate motion probabilities corresponding to the pixel point;

and calculating the candidate motion probability of the pixel point based on all the third intermediate motion probabilities corresponding to the pixel point according to all the scales.

7. The method of claim 6, wherein the calculating the candidate motion probability of the co-located pixel point between the original image data and the reference image data based on the motion information at each type of scale further comprises:

and carrying out filtering processing on the candidate motion probability.

8. The method of claim 5, wherein the calculating the target motion probability of the co-located pixel point between the original image data and the reference image data in combination with all the candidate motion probabilities at all types of scales comprises:

respectively configuring weights for the candidate motion probabilities corresponding to the pixel points at the same position between the original image data and the reference image data under the scales of various types;

calculating products between the candidate motion probabilities and the weights, respectively;

and calculating the sum value of all the products as the target motion probability of the pixel point.

9. The method of claim 5, wherein calculating the probability of target motion between the original image data and the reference image data from the original image block and the reference image block at multiple types of scales further comprises:

and if the target motion probability is greater than 1, setting the target motion probability to be 1.

10. The method as claimed in claim 1, 2, 3, 4, 6, 7, 8 or 9, wherein the three-dimensional de-noising processing is performed on the original image data according to the target motion probability and the reference image data to obtain target image data, and comprises:

mapping the target motion probability into a first mixing coefficient, and configuring the first mixing coefficient to reference image data;

calculating a second mixing coefficient based on the first mixing coefficient, and configuring the second mixing coefficient to original image data;

and on the basis of configuring the reference image data of the first mixing coefficient, overlapping and configuring the original image data of the second mixing coefficient to obtain target image data.

11. A video denoising apparatus, comprising:

12. A mobile terminal, characterized in that the mobile terminal comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the video denoising method of any of claims 1-10.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a video denoising method according to any one of claims 1-10.