CN111901606A

CN111901606A - Video coding method for improving caption coding quality

Info

Publication number: CN111901606A
Application number: CN202010759205.3A
Authority: CN
Inventors: 廖义; 李日; 谢亚光; 孙彦龙
Original assignee: Hangzhou Arcvideo Technology Co ltd
Current assignee: Hangzhou Arcvideo Technology Co ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-11-06

Abstract

The invention discloses a video coding method for improving caption coding quality. The method specifically comprises the following steps: in the airspace, the variance of the brightness value of each coding block in a frame is obtained, and if the variance is greater than a set threshold, the coding block is judged as a subtitle candidate block; in time domain, calculating the reference times of each subtitle candidate block in one frame by subsequent frames, if the reference motion vector is zero and the reference times exceed a set threshold, further judging the subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block; when the code rate of the coding block layer is controlled, if a certain block belongs to a caption block, the coding quantization parameter QP is reduced and then coding is carried out. The invention has the beneficial effects that: the existing x265 coding frame is fully utilized, and the time domain and space domain characteristics of the caption are combined, so that the caption area is quickly extracted, the coding quality of the caption area is improved, and the coding quality of the caption area is improved under the condition that the coding performance is not basically reduced.

Description

Video coding method for improving caption coding quality

Technical Field

The invention relates to the technical field related to video coding, in particular to a video coding method for improving subtitle coding quality.

Background

With the rapid development of communication technology and multimedia technology, video becomes an important way for information dissemination, and subtitles in video usually contain story content and key information, so that viewers can understand the content of video playing more conveniently, and therefore subtitle detection and subtitle enhancement of video also lead to intensive research of broad scholars.

How to extract video captions efficiently and accurately is a difficult point, and the difficulty of caption extraction is increased by the continuous change of the size and the alignment mode of caption characters and the change of the texture of a video background. The existing caption detection methods mainly include edge-based methods, stroke-based methods, texture-based methods, connected domain-based methods and the like, and these methods usually use pixels as units for calculation, so that the calculation amount is large, the real-time live broadcast of videos is not facilitated, and non-caption areas are easily mistakenly detected as caption areas.

x265 is an open-source HEVC (High Efficiency Video Coding) Video encoder, which adopts the technologies of scene pre-analysis, code rate control based on Lookahead-vbv, inter-frame parallel Coding, instruction set optimization, macroblock mode fast decision and the like, so that the HEVC encoder can better meet the requirements of High-Efficiency and real-time Coding. In the pre-analysis stage, the matching blocks and the corresponding motion vectors of the coding blocks in the reference frame in the subsequent N frames are analyzed, so that the motion complexity of each frame is judged, and a basis is provided for bit number distribution of the subsequent frames. In the code rate control stage, the control algorithm can be basically divided into two layers: frame layer rate control and coding block layer rate control. Dividing a target code rate into each frame by frame layer code rate control, and calculating a quantization parameter QP (quantization parameter) of each frame according to the complexity of each frame, the weight factor of each frame and the saturation of a buffer area; and the code rate control of the coding block layer calculates the QP of each coding block on the basis of the frame level QP according to the importance degree of each coding block, and if the coding block is the important coding block, a smaller QP is set to reduce the coding distortion of the coding block. If it is a trivial coding block, a larger QP is set, although the distortion increases, a certain number of bits can be saved in this area.

Disclosure of Invention

The present invention provides a video coding method for improving coding quality and improving subtitle coding quality to overcome the above-mentioned disadvantages in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme:

a video coding method for improving caption coding quality specifically comprises the following steps:

(1) in the airspace, the variance of the brightness value of each coding block in a frame is obtained, and if the variance is greater than a set threshold, the coding block is judged as a subtitle candidate block;

(2) in time domain, calculating the reference times of each subtitle candidate block in one frame by subsequent frames, if the reference motion vector is zero and the reference times exceed a set threshold, further judging the subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block;

(3) when the code rate of the coding block layer is controlled, if a certain block belongs to a caption block, the coding quantization parameter QP is reduced and then coding is carried out.

Aiming at the defects of the existing subtitle detection method, the invention provides a video coding method for improving the subtitle coding quality by taking an x265 video coder as a platform, which comprises the following steps: because the edge strength of the caption is high and a group of captions usually continuously appear in a plurality of frames, the method of the invention fully utilizes the existing x265 coding frame, combines the time domain and space domain characteristics of the caption, takes the coding block as a unit, quickly extracts the caption area, adjusts the coding quantization parameter aiming at the caption area and improves the coding quality of the caption area. Under the condition that the encoding performance is not reduced basically, the encoding quality of the subtitle area is improved.

Preferably, in the step (1), specifically: calculating by taking the coding block as a unit, and combining an x265 video encoder, solving the variance var of the brightness value of each coding block in the current frame in a pre-analysis stage by using a matching block and motion vector information provided by the existing pre-analysis module, recording the maximum value var _ max and the minimum value var _ min of var in the current frame, and calculating the texture complexity tex of each coding block in the current frame:

and if the tex is greater than a threshold TH1, judging the coding block as a subtitle candidate block, otherwise, judging the coding block as a non-subtitle block, wherein the threshold TH1 is in the range of [100, 255 ].

Preferably, in the step (2), specifically: and recording the time domain duration of each subtitle candidate block in the current frame as N, if the current subtitle candidate block is referred by a coding block in a certain subsequent frame and the motion vector is zero, recording N as N +1, after the subsequent N frames are analyzed, if the time domain duration of the subtitle candidate block exceeds a threshold TH2, further judging the current subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block, wherein the value range of N is between [10 and 50], and the value range of the threshold TH2 is between [3 and N ].

Preferably, in the step (3), specifically: in the coding block layer rate control, if a certain coding block belongs to a subtitle block, defining a coding quantization parameter QP _ mb of the current coding block as QP _ fm-dQp, where QP _ fm is a frame-level coding quantization parameter, dQp has a value range of [0, 10], and coding the coding block by using QP _ mb.

The invention has the beneficial effects that: the existing x265 coding frame is fully utilized, and the time domain and space domain characteristics of the caption are combined, so that the caption area is quickly extracted, the coding quality of the caption area is improved, and the coding quality of the caption area is improved under the condition that the coding performance is not basically reduced.

Drawings

FIG. 1 is a diagram of raw video;

FIG. 2 is a diagram of the effect of subtitle extraction according to the present invention;

fig. 3 is a diagram of subtitle coding effects for x265 coding;

fig. 4 is a subtitle encoding effect diagram of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

In the embodiment shown in fig. 1, a video encoding method for improving the quality of subtitle encoding specifically includes the following steps:

(1) in the airspace, the variance of the brightness value of each coding block in a frame is obtained, and if the variance is greater than a set threshold, the coding block is judged as a subtitle candidate block; the method specifically comprises the following steps: calculating by taking the coding block as a unit, and combining an x265 video encoder, solving the variance var of the brightness value of each coding block in the current frame in a pre-analysis stage by using a matching block and motion vector information provided by the existing pre-analysis module, recording the maximum value var _ max and the minimum value var _ min of var in the current frame, and calculating the texture complexity tex of each coding block in the current frame:

(2) In time domain, calculating the reference times of each subtitle candidate block in one frame by subsequent frames, if the reference motion vector is zero and the reference times exceed a set threshold, further judging the subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block; the method specifically comprises the following steps: and recording the time domain duration of each subtitle candidate block in the current frame as N, if the current subtitle candidate block is referred by a coding block in a certain subsequent frame and the motion vector is zero, recording N as N +1, after the subsequent N frames are analyzed, if the time domain duration of the subtitle candidate block exceeds a threshold TH2, further judging the current subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block, wherein the value range of N is between [10 and 50], and the value range of the threshold TH2 is between [3 and N ]. Wherein: the time domain duration of the subtitle candidate block refers to the number of times the subtitle candidate block appears in the subsequent frame.

(3) When the code rate of a coding block layer is controlled, if a certain block belongs to a caption block, coding is carried out after a coding quantization parameter QP is reduced; the method specifically comprises the following steps: in the coding block layer rate control, if a certain coding block belongs to a subtitle block, defining a coding quantization parameter QP _ mb of the current coding block as QP _ fm-dQp, where QP _ fm is a frame-level coding quantization parameter, dQp has a value range of [0, 10], and coding the coding block by using QP _ mb. Wherein: qp _ mb is a quantization value for performing quantization compression on the macroblock, and if Qp _ mb is larger, the larger the quantization amplitude is, the larger the image quality distortion encoded by the macroblock is; if Qp _ mb is smaller, it means that the quantization amplitude is smaller, and the picture quality distortion coded by the macroblock is smaller.

Aiming at the defects of the existing subtitle detection method, the invention provides a video coding method for improving the subtitle coding quality by taking an x265 video coder as a platform, which comprises the following steps: because the edge strength of the caption is large and a group of captions usually continuously appear in a plurality of frames, the method mainly comprises spatial domain extraction and time domain extraction when extracting the caption, wherein the variance of the brightness value of each coding block in one frame is solved in the spatial domain, and if the variance is greater than a certain threshold value, the coding block is judged as a caption candidate block; in time domain, calculating the reference times of each subtitle candidate block in one frame by the subsequent frame, if the reference time is zero and the reference times exceeds a certain threshold, further judging the subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block. When the code rate of the coding block layer is controlled, if a certain block belongs to a caption block, the coding quantization parameter QP is reduced, so that the coding quality of the caption area is improved. In addition, the method has low calculation complexity, and compared with the original x265 encoder, the encoding speed is not reduced basically.

The invention optimizes the quality of caption coding by a method for efficiently extracting the caption, and can be applied to video compression standards such as H.264, HEVC, AVS2, AVS3 and the like. Fig. 1 and fig. 2 show the subtitle extraction effect of the original video and the method of the present invention, respectively, and it can be seen that the method of the present invention can completely extract the subtitle region, and the other regions are not mistakenly detected as the subtitle region, which indicates that the accuracy of extracting the subtitle by the method of the present invention is high. Fig. 3 and fig. 4 respectively show the subtitle encoding effect of the original x265 encoding method and the method of the present invention, and it can be seen that the periphery of the subtitle of the original x265 encoding method is fuzzy, while the periphery of the subtitle of the method of the present invention is clear, which can better retain the information of the original video.

Claims

1. A video coding method for improving caption coding quality is characterized by comprising the following steps:

2. The video coding method for improving the coding quality of subtitles according to claim 1, wherein in the step (1), the steps are specifically as follows: calculating by taking the coding block as a unit, and combining an x265 video encoder, solving the variance var of the brightness value of each coding block in the current frame in a pre-analysis stage by using a matching block and motion vector information provided by the existing pre-analysis module, recording the maximum value var _ max and the minimum value var _ min of var in the current frame, and calculating the texture complexity tex of each coding block in the current frame:

3. The video coding method for improving the coding quality of subtitles according to claim 1, wherein in the step (2), the steps are specifically as follows: and recording the time domain duration of each subtitle candidate block in the current frame as N, if the current subtitle candidate block is referred by a coding block in a certain subsequent frame and the motion vector is zero, recording N as N +1, after the subsequent N frames are analyzed, if the time domain duration of the subtitle candidate block exceeds a threshold TH2, further judging the current subtitle candidate block as a subtitle block, otherwise, judging the subtitle candidate block as a non-subtitle block, wherein the value range of N is between [10 and 50], and the value range of the threshold TH2 is between [3 and N ].

4. The video coding method for improving the coding quality of subtitles according to claim 1, wherein in the step (3), the steps are specifically as follows: in the coding block layer rate control, if a certain coding block belongs to a subtitle block, defining a coding quantization parameter QP _ mb of the current coding block as QP _ fm-dQp, where QP _ fm is a frame-level coding quantization parameter, dQp has a value range of [0, 10], and coding the coding block by using QP _ mb.