CN101853381B

CN101853381B - Method and device for acquiring video subtitle information

Info

Publication number: CN101853381B
Application number: CN 200910081051
Authority: CN
Inventors: 杨锦春; 刘贵忠; 钱学明; 李智; 郭旦萍; 姜海侠; 南楠; 孙力; 王琛
Original assignee: Huawei Technologies Co Ltd; Xian Jiaotong University
Current assignee: Xian Jiaotong University; Huawei Cloud Computing Technologies Co Ltd
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2013-04-24
Anticipated expiration: 2029-03-31
Also published as: CN101853381A

Abstract

The present invention relates to a method and device for obtaining subtitle information. By performing subtitle detection based on wavelet on the luminance component image of a data frame in a video stream, the attribute information of the detected subtitle is obtained, and the detected subtitle is extracted according to the attribute information. subtitle. Thus, the subtitle information in the data frame can be accurately obtained. Since the subtitle detection based on wavelet does not need to limit the area where the subtitle is located, the embodiment of the present invention can accurately acquire the subtitle information in the video data without limiting the subtitle location area.

Description

Method and device for acquiring video subtitle information

技术领域 technical field

本发明涉及应用电子技术领域，尤其涉及一种视频字幕信息获取方法及装置。The invention relates to the field of applied electronic technology, in particular to a method and device for acquiring video subtitle information.

背景技术 Background technique

视频字幕给人以直观的形式展现视频节目内容，能够有效地辅助人们在视频欣赏中很好地把握节目的主题思想，进而理解视频的内容。另外视频字幕信息的检测和识别可以丰富基于文本的视频内容查询。因此对视频字幕信息进行有效的获取是一个必要的环节。Video subtitles show people the content of video programs in an intuitive form, which can effectively assist people to grasp the theme of the program in video appreciation, and then understand the content of the video. In addition, the detection and recognition of video subtitle information can enrich text-based video content query. Therefore, effective acquisition of video subtitle information is a necessary link.

发明人在实现本发明过程中发现，现有获取字幕信息的技术中，对字幕信息出现在视频画面中的位置信息比较敏感，且通常情况下，假设字幕区域是静止的，而且字幕位置也是固定在图像的中下部分，如果字幕信息不在所指定的检测范围内，那么字幕信息不能被很好地获取以及应用。In the process of realizing the present invention, the inventor found that the existing technology for obtaining subtitle information is sensitive to the position information of the subtitle information appearing in the video screen, and usually, it is assumed that the subtitle area is static and the subtitle position is also fixed In the middle and lower part of the image, if the subtitle information is not within the specified detection range, then the subtitle information cannot be well acquired and applied.

发明内容 Contents of the invention

本发明实施例提供一种视频字幕信息获取方法及装置，从而在不限制字幕位置区域的情况下，准确获取视频数据中的字幕信息。Embodiments of the present invention provide a method and device for acquiring video subtitle information, so as to accurately acquire subtitle information in video data without limiting the subtitle location area.

本发明实施例提供了一种视频字幕信息获取方法，包括：The embodiment of the present invention provides a method for acquiring video subtitle information, including:

对视频流中数据帧的亮度分量图像进行基于小波的字幕检测；Carry out subtitle detection based on wavelet on the luminance component image of the data frame in the video stream;

获取检测出的字幕的属性信息；Obtain the attribute information of the detected subtitles;

根据所述属性信息，提取检测出的字幕。Based on the attribute information, the detected subtitles are extracted.

本发明实施例还提供了一种视频字幕信息获取装置，包括：The embodiment of the present invention also provides a video subtitle information acquisition device, including:

检测模块，用于对视频流中数据帧的亮度分量图像进行基于小波的字幕检测；The detection module is used to carry out subtitle detection based on wavelet to the luminance component image of the data frame in the video stream;

第一获取模块，用于获取所述检测模块检测出的字幕的属性信息；A first acquisition module, configured to acquire attribute information of the subtitles detected by the detection module;

提取模块，用于根据所述第一获取模块获取的字幕属信息，提取所述检测模块检测出的字幕。An extraction module, configured to extract the subtitles detected by the detection module according to the subtitle attribute information acquired by the first acquisition module.

由上述本发明实施例提供的技术方案可以看出，本发明实施例中，通过对视频流中数据帧的亮度分量图像进行基于小波的字幕检测，并获取检测出的字幕的属性信息，根据所述属性信息，提取检测出的字幕。从而准确获取数据帧中的字幕信息。由于基于小波的字幕检测，无需对字幕所在的区域进行限制，因此，本发明实施例可以在不限制字幕位置区域的情况下，准确获取视频数据中的字幕信息。It can be seen from the technical solutions provided by the above-mentioned embodiments of the present invention that in the embodiments of the present invention, by performing wavelet-based subtitle detection on the luminance component image of the data frame in the video stream, and obtaining the attribute information of the detected subtitles, according to the The above attribute information is used to extract the detected subtitles. Thus, the subtitle information in the data frame can be accurately obtained. Since the subtitle detection based on wavelet does not need to limit the area where the subtitle is located, the embodiment of the present invention can accurately acquire the subtitle information in the video data without limiting the subtitle location area.

附图说明 Description of drawings

图1为本发明实施例提供的所述方法流程示意图一；FIG. 1 is a first schematic flow diagram of the method provided by the embodiment of the present invention;

图2为本发明实施例提供的所述方法流程示意图二；Fig. 2 is the second schematic flow diagram of the method provided by the embodiment of the present invention;

图3为本发明实施例提供的所述方法流程示意图三；Fig. 3 is the third schematic flow diagram of the method provided by the embodiment of the present invention;

图4为本发明实施例提供的所述装置结构示意图一；Fig. 4 is a schematic structural diagram of the device provided by the embodiment of the present invention;

图5为本发明实施例提供的所述装置结构示意图二；FIG. 5 is a second structural schematic diagram of the device provided by the embodiment of the present invention;

图6为本发明实施例提供的所述检测模块结构示意图一；FIG. 6 is a first structural schematic diagram of the detection module provided by the embodiment of the present invention;

图7为本发明实施例提供的所述检测模块结构示意图二；FIG. 7 is a second structural schematic diagram of the detection module provided by the embodiment of the present invention;

图8为本发明实施例提供的所述第一获取模块结构示意图；FIG. 8 is a schematic structural diagram of the first acquisition module provided by an embodiment of the present invention;

图9为本发明实施例提供的所述提取模块结构示意图。Fig. 9 is a schematic structural diagram of the extraction module provided by the embodiment of the present invention.

具体实施方式 Detailed ways

本发明实施例提供了一种视频字幕信息获取方法，如附图1所示，该方法通过对数据帧的亮度分量图像进行基于小波的字幕检测，并获取检测出的字幕的属性信息，根据所述属性信息，提取检测出的字幕。从而准确获取数据帧中的字幕信息。由于基于小波的字幕检测，无需对字幕所在的区域进行限制，因此，本发明实施例可以在不限制字幕位置区域的情况下，准确获取视频数据中的字幕信息。The embodiment of the present invention provides a method for acquiring video subtitle information. As shown in FIG. The above attribute information is used to extract the detected subtitles. Thus, the subtitle information in the data frame can be accurately obtained. Since subtitle detection based on wavelet does not need to limit the subtitle area, the embodiment of the present invention can accurately acquire subtitle information in video data without limiting the subtitle location area.

本发明实施例提供了的视频字幕信息获取方法的一个具体实施例，可以如附图2所示，该实施例具体可以包括：A specific embodiment of the video subtitle information acquisition method provided by the embodiment of the present invention can be shown in Figure 2, and this embodiment can specifically include:

步骤21，从视频数据流中获取指定数据帧的亮度分量图像。Step 21, acquiring the luminance component image of the specified data frame from the video data stream.

为了加快获取字幕信息的速度，本发明实施例具体可以从视频数据流中解码指定的数据帧，并获取指定数据帧的亮度分量图像。In order to speed up the speed of obtaining subtitle information, this embodiment of the present invention may specifically decode a specified data frame from a video data stream, and obtain a luminance component image of the specified data frame.

比如，仅解码帧号为奇数(或偶数)的帧内编码，即I帧(也可以是其他形式的视频帧，如预测编码帧，即P帧)的码流，获取I帧的亮度分量图像，而对I帧的色度分量，以及其它帧则快速跳过，从而加快了获取字幕信息的速度。For example, only decode the intra-frame encoding with an odd (or even) frame number, that is, the code stream of the I frame (or other forms of video frames, such as predictive coding frames, that is, the P frame), and obtain the brightness component image of the I frame , while the chrominance component of the I frame and other frames are quickly skipped, thereby speeding up the speed of obtaining subtitle information.

需要说明的是，本发明实施例并不限制视频数据流的压缩格式。It should be noted that the embodiment of the present invention does not limit the compression format of the video data stream.

步骤22，对选取的数据帧的亮度分量图像进行基于小波的字幕检测。Step 22, perform subtitle detection based on wavelet on the luminance component image of the selected data frame.

具体的，该步骤中对于已经选取的数据帧的亮度分量图像，采用基于小波的字幕检测。Specifically, in this step, subtitle detection based on wavelet is used for the luminance component image of the selected data frame.

在一个具体的实施例中，该步骤的具体执行过程可如附图3中所示，包括：In a specific embodiment, the specific execution process of this step can be as shown in Figure 3, including:

步骤221，对数据帧的亮度分量图像进行小波变换，获得水平高频子带纹理图、垂直高频子带纹理图以及对角线高频子带纹理图。Step 221: Perform wavelet transform on the luminance component image of the data frame to obtain a horizontal high-frequency sub-band texture map, a vertical high-frequency sub-band texture map, and a diagonal high-frequency sub-band texture map.

本发明实施例中所涉及的小波变换，具体可以为HAAR(哈尔)小波变换，墨西哥草帽小波变换，9-7小波变换，5-3小波变换，等等。The wavelet transform involved in the embodiments of the present invention may specifically be HAAR (Haar) wavelet transform, Mexican sombrero wavelet transform, 9-7 wavelet transform, 5-3 wavelet transform, and the like.

此步骤中，对已经选取的数据帧的亮度分量图像，进行小波变换，以获取一个低频子带，和水平、垂直、对角线三个方向的高频子带，其中，水平子高频带可以记为H、垂直高频子带可以记为V、对角线高频子带可以记为D。In this step, wavelet transform is performed on the luminance component image of the selected data frame to obtain a low-frequency sub-band and high-frequency sub-bands in the horizontal, vertical and diagonal directions, wherein the horizontal sub-high frequency sub-band It can be marked as H, the vertical high-frequency sub-band can be marked as V, and the diagonal high-frequency sub-band can be marked as D.

将小波变换后生成的H、V、D三个高频子带的系数分别求绝对值，获取水平高频子带纹理图(CH)、垂直高频子带纹理图(CV)和对角线高频子带纹理图(CD)。Calculate the absolute values of the coefficients of the three high-frequency subbands H, V, and D generated after wavelet transformation, and obtain the horizontal high-frequency sub-band texture map (CH), the vertical high-frequency sub-band texture map (CV) and the diagonal High frequency subband texture map (CD).

此步骤中还可以结合三个高频子带纹理图(CH、CV、CD)，获取综合高频子带纹理图(CS)。In this step, three high-frequency sub-band texture maps (CH, CV, CD) can also be combined to obtain a comprehensive high-frequency sub-band texture map (CS).

综合高频子带纹理图像中每个点的值可以通过如下公式获得：The value of each point in the integrated high-frequency subband texture image can be obtained by the following formula:

CS(i，j)＝CH(i，j)+CV(i，j)+CD(i，j)CS(i,j)=CH(i,j)+CV(i,j)+CD(i,j)

步骤222，根据水平高频子带纹理图、垂直高频子带纹理图以及对角线高频子带纹理图，获取数据帧的字幕点图像(TextPnt)。Step 222 , according to the horizontal high frequency subband texture map, the vertical high frequency subband texture map and the diagonal high frequency subband texture map, obtain the subtitle point image (TextPnt) of the data frame.

在一个具体的实施例中，此步骤中，具体可以包括以下环节：In a specific embodiment, this step may specifically include the following links:

首先，根据高频子带纹理图，生成初始字幕点图像。First, an initial subtitle point image is generated according to the high-frequency subband texture map.

以水平高频子带纹理图为例，对水平高频子带纹理图进行字幕点检测，以得到该水平高频子带初始字幕点图像(MAPH_ORG)。Taking the horizontal high-frequency subband texture map as an example, subtitle point detection is performed on the horizontal high-frequency subband texture map to obtain the initial subtitle point image (MAPH_ORG) of the horizontal high-frequency subband.

其中，该水平高频子带初始字幕点图像在坐标(i，j)处的取值是按照如下公式进行计算得到：Wherein, the value of the horizontal high-frequency sub-band initial subtitle point image at coordinates (i, j) is calculated according to the following formula:

$MAPH MAPH__ORG ORG ((i i,, j j)) = = \{\begin{matrix} 11,, & CH CH ((i i,, j j)) &GreaterEqual; &Greater Equal; TH TH \\ 00,, & CH CH ((i i,, j j)) < < TH TH \end{matrix}$

需要说明的是，取值为“0”表示背景，取值为“1”表示初始的字幕点，式中阈值(TH)的计算方法可如下：It should be noted that the value "0" represents the background, and the value "1" represents the initial subtitle point. The calculation method of the threshold (TH) in the formula can be as follows:

$TH TH = = \{\begin{matrix} 5050,, & MH MH * * 55 &GreaterEqual; &Greater Equal; 5050 \\ MH MH * * 55,, & 5050 > > MH MH * * 55 > > 1818 \\ 1818,, & MH MH * * 55 \leq \leq 1818 \end{matrix}$

公式中的MH是水平高频子带纹理图像中纹理强度均值。MH in the formula is the average texture intensity in the horizontal high-frequency sub-band texture image.

然后，对水平高频子带初始字幕点图像进行除噪声处理，以得到的水平方向最终字幕点图像(MAPH)。Then, denoising is performed on the initial subtitle point image of the horizontal high-frequency subband to obtain the final subtitle point image (MAPH) in the horizontal direction.

本发明实施例中所涉及的除噪声处理，具体可以采用如交叠滑动的方块滤波等成熟的技术处理方案，本发明实施例对此并不限制。The denoising processing involved in the embodiments of the present invention may specifically adopt mature technical processing schemes such as overlapping and sliding block filtering, which is not limited in the embodiments of the present invention.

接着，对垂直高频子带纹理图以及对角线高频子带纹理图进行类似的处理步骤以得到垂直子带初始字幕点图像(MAPV_ORG)和对角线子带初始字幕点图像(MAPD_ORG)，并对垂直子带初始字幕点图像和对角线子带初始字幕点图像分别进行除噪声处理，以得到垂直方向最终字幕点图像(MAPV)和对角线方向最终字幕点图像(MAPD)。Next, perform similar processing steps on the vertical high-frequency subband texture map and the diagonal high-frequency subband texture map to obtain the vertical subband initial subtitle point image (MAPV_ORG) and the diagonal subband initial subtitle point image (MAPD_ORG) , and de-noise the initial subtitle dot image in the vertical subband and the initial subtitle dot image in the diagonal subband to obtain the final subtitle dot image in the vertical direction (MAPV) and the final subtitle dot image in the diagonal direction (MAPD).

最后，将三个方向的最终字幕点图像(MAPH、MAPV、MAPD)求交集得到数据帧的字幕点图像(TextPnt)。Finally, intersect the final caption point images (MAPH, MAPV, MAPD) in the three directions to obtain the caption point image (TextPnt) of the data frame.

需要说明的是，本发明实施例中，对初始的字幕点图像(MAP_ORG)进行去除字幕噪声点，得到字幕区域的具体实现方法流程可采用以下程序实现：It should be noted that, in the embodiment of the present invention, the initial subtitle point image (MAP_ORG) is removed from the subtitle noise point, and the specific implementation method flow of obtaining the subtitle area can be realized by the following procedures:

//h，w分别表示子带图像的高度和宽度//h, w represent the height and width of the subband image respectively

block＝4；//方块的大小block=4;//The size of the block

dis＝3； //每一次方块偏移的距离dis=3; //The distance of each square offset

h_num＝(h/dis)；//方块在垂直方向偏移的次数h_num=(h/dis);//The number of times the block is offset in the vertical direction

w_num＝(w/dis)；//方块在水平方向偏移的次数w_num=(w/dis);//The number of times the block is offset in the horizontal direction

MAP＝MAPH_ORG；MAP = MAPH_ORG;

for(k＝1:h_num)for(k=1:h_num)

for(I＝1:w_num)for(I＝1:w_num)

if(((k-1)＊dis+1+block＞h)||((I-1)＊dis+1+block＞w))If(((k-1)＊dis+1+block＞h)||((I-1)＊dis+1+block＞w))

continue；//如果子块移出了图像了边界，跳出循环continue;//If the sub-block moves out of the boundary of the image, jump out of the loop

elseelse

num＝TextPntNum()； //统计方块内部含有字幕点的个数 num=TextPntNum(); //Statistics of the number of subtitle points inside the box

if(num＜(block＊block/2))if(num<(block＊block/2))

StartH＝(k-1)＊dis； StartH＝(k-1)＊dis;

EndH＝StartH+block； EndH＝StartH+block;

StartW＝(I-1)＊dis； StartW＝(I-1)＊dis;

EndW＝StartW+block； EndW＝StartW+block;

MAP(StartH:End H，StartW:EndW)＝0；MAP(StartH:End H, StartW:EndW)＝0;

//如果个数小于(block＊block/2)，此方块区域所有像素点为内设为0，//If the number is less than (block*block/2), all pixels in this square area are internally set to 0,

//即此方块中的字幕点为噪声点// That is, the subtitle points in this box are noise points

else//如果个数大于(block＊block/2)，此方块区域为真实的字幕点Else//If the number is greater than (block＊block/2), this square area is the real subtitle point

MAP(StartH:EndH，StartW:EndW)＝MAP_ORG(StartH:EndH，StartW:EndW) MAP(StartH:EndH, StartW:EndW)＝MAP_ORG(StartH:EndH, StartW:EndW)

endend

可以理解的是，以上实例仅为举例说明，对本发明实施例的保护范围不起到任何限制的作用。It can be understood that the above examples are for illustration only, and do not limit the protection scope of the embodiments of the present invention in any way.

步骤223，由数据帧的字幕点图像生成字幕区域图像(TextArea)。Step 223, generate a subtitle area image (TextArea) from the subtitle point image of the data frame.

在一个具体的实施例中，该步骤中具体可以包括以下环节：In a specific embodiment, this step may specifically include the following steps:

首先，对已经获取的字幕点图像分别进行水平方向的闭运算和开运算得到水平图像(Verlmg)。Firstly, the closed operation and the open operation in the horizontal direction are respectively performed on the acquired subtitle point image to obtain a horizontal image (Verlmg).

其中，闭运算的结构元素可以为20＊1的全“1”矩阵，开运算的结构元素可以为1＊2的全“1”矩阵，当然，闭运算和开运算所采用的结构元素可以根据实际需要进行灵活安排。Among them, the structural elements of the closing operation can be a 20*1 full "1" matrix, and the structural elements of the opening operation can be a 1*2 full "1" matrix. Of course, the structural elements used in the closing operation and the opening operation can be based on In practice, flexible arrangements are required.

接着，对字幕点图像进行垂直方向的闭运算和开运算得到垂直图像(Horlmg)。Next, perform vertical closing and opening operations on the subtitle point image to obtain a vertical image (Horlmg).

同样，闭运算的结构元素可以为1＊20的全“1”矩阵，开运算的结构元素可以为2＊1的全“1”矩阵；Similarly, the structural elements of the closing operation can be a 1*20 full "1" matrix, and the structural elements of the opening operation can be a 2*1 full "1" matrix;

然后，对获取的水平图像和垂直图像进行求并集操作，以得到包含所有字幕区域的最大点集图像(lmg)，其具体的获得方法如下：Then, perform a union set operation on the obtained horizontal image and vertical image to obtain the maximum point set image (lmg) including all subtitle regions. The specific obtaining method is as follows:

接下来，对最大点集图像进行闭运算以得到字幕区域图像。Next, the closed operation is performed on the maximum point set image to obtain the subtitle area image.

闭运算的结构元素可以采用6＊6的全“1”矩阵，或者其它矩阵。The structural elements of the closing operation may adopt a 6*6 all "1" matrix, or other matrices.

步骤224，确定字幕区域图像中字幕的条数以及字幕区域位置信息。Step 224, determine the number of subtitles in the subtitle area image and the position information of the subtitle area.

在一个具体的实施例中，该步骤中具体可以包括以下环节：In a specific embodiment, this step may specifically include the following links:

首先，对字幕区域图像中每一个字幕区域进行字幕为水平排列或垂直排列的区分。Firstly, distinguish whether the subtitles are arranged horizontally or vertically for each subtitle area in the subtitle area image.

区分的方法是根据字幕区域高与宽的相对大小。具体的，如果字幕区域的宽大于高，则此字幕区域内的字幕为水平排列，如果字幕区域的宽小于高，则该字幕区域内的字幕为垂直排列。The method of distinction is based on the relative size of the height and width of the subtitle area. Specifically, if the width of the subtitle area is greater than the height, then the subtitles in the subtitle area are arranged horizontally, and if the width of the subtitle area is smaller than the height, then the subtitles in the subtitle area are arranged vertically.

需要说明的是，字幕区域图像中的字幕区域的确认方法可以采用形态学中的贴标签方法，或者其它成熟的方法进行确认，本发明实施例对此并不限制。It should be noted that the method for confirming the subtitle area in the subtitle area image may be confirmed by using a labeling method in morphology or other mature methods, which is not limited in this embodiment of the present invention.

对于字幕为水平排列的字幕区域，确定此字幕区域在水平图像中相对应的区域，并且，通过此字幕区域在水平图像中的最上、下、左、右像素点的坐标位置，确定此字幕区域在水平图像中上边框、下边框、左边框、右边框的位置。For the subtitle area where the subtitles are arranged horizontally, determine the corresponding area of the subtitle area in the horizontal image, and determine the subtitle area through the coordinate positions of the uppermost, lower, left, and right pixel points of the subtitle area in the horizontal image The position of the top border, bottom border, left border, and right border in the horizontal image.

对于字幕为垂直排列的字幕区域，确定此字幕区域在垂直图像中相对应的区域，并采用与上述字幕为水平排列的字幕区域同样的方法，得到此字幕区域在垂直图像中上边框、下边框、左边框、右边框的位置。For the subtitle area where the subtitles are vertically arranged, determine the corresponding area of the subtitle area in the vertical image, and use the same method as the subtitle area where the above subtitles are horizontally arranged to obtain the upper border and lower border of the subtitle area in the vertical image , the position of the left border and the right border.

然后，在字幕区域定位框内相应的综合子带纹理图(CS)所对应的区域进行水平投影，并从综合子带纹理图投影曲线的峰谷信息，确定综合子带纹理图中字幕条数以及每条水平字幕的上边框和下边框位置。Then, horizontally project the area corresponding to the corresponding integrated sub-band texture map (CS) in the subtitle area positioning frame, and determine the number of subtitles in the integrated sub-band texture map from the peak and valley information of the projection curve of the integrated sub-band texture map and the position of the top and bottom borders of each horizontal subtitle.

具体的，可以通过投影曲线中波谷的数量确定字幕区域内字幕的条数，该过程具体可以包括：Specifically, the number of subtitles in the subtitle area can be determined by the number of troughs in the projection curve, and this process can specifically include:

将综合子带纹理图中的纹理均值除以一个参数(alfa)得到阈值。如果投影曲线的值小于此阈值即为波谷。由于波谷的位置就是两条字幕之间的中间位置，从而通过确定波谷的数量，确定此字幕区域中字幕的条数，即波谷数加1。需要说明的是，在本发明实施例中，参数(alfa)的取值范围可以为[2，3]，经过实际操作检验后，推荐参数alfa＝2.6。Divide the texture mean in the integrated subband texture map by a parameter (alfa) to get the threshold. If the value of the projected curve is less than this threshold, it is a trough. Since the position of the trough is the middle position between two subtitles, the number of subtitles in the subtitle area can be determined by determining the number of troughs, that is, the number of troughs plus 1. It should be noted that, in the embodiment of the present invention, the value range of the parameter (alfa) may be [2, 3], after the actual operation test, the recommended parameter alfa=2.6.

另外，由于波谷所分离开的字幕的上、下边框位置分别对应波谷的顶端和末端的坐标位置，因此，通过确定波谷所在的位置，可以确定此字幕区域中，每条水平字幕的上边框和下边框的位置。In addition, since the positions of the upper and lower borders of the subtitles separated by the trough correspond to the coordinate positions of the top and end of the trough respectively, by determining the position of the trough, the upper border and the lower border of each horizontal subtitle in this subtitle area can be determined. The position of the bottom border.

对于垂直排列的字幕，在字幕区域定位框内相应的综合子带纹理图区域进行垂直投影，并从投影曲线的峰谷关系确定其中字幕条数以及每条垂直字幕的左边框和右边框位置，其具体的实现方法与水平排列的字幕相同。For vertically arranged subtitles, perform vertical projection on the corresponding integrated subband texture map area in the subtitle area positioning frame, and determine the number of subtitles and the left and right border positions of each vertical subtitle from the peak-valley relationship of the projection curve, Its specific implementation method is the same as that of subtitles arranged horizontally.

通过上述操作，即可确定字幕在视频流中出现的位置等信息。Through the above operations, information such as the position where the subtitle appears in the video stream can be determined.

可选的，在一个实施例中，为了提高检测的准确度，还可以进一步包括：Optionally, in one embodiment, in order to improve the accuracy of detection, it may further include:

步骤225，对字幕区域进行是否为真实的字幕区域的检测。Step 225, check whether the subtitle area is a real subtitle area.

由于在字幕检测中，可能存在错误检测，将不是字幕的区域检测为字幕区域，因此，需要对确认的字幕区域进行真实性验证，这样能够有效提升字幕检测的性能。Since in the subtitle detection, there may be false detection, and an area that is not a subtitle is detected as a subtitle area, therefore, it is necessary to verify the authenticity of the confirmed subtitle area, which can effectively improve the performance of subtitle detection.

具体的，可以根据字幕纹理分布、灰度分布以及边缘点数目的分布情况确定检测区域是否为真实的字幕区域。Specifically, it may be determined whether the detection area is a real subtitle area according to the subtitle texture distribution, the gray level distribution, and the distribution of the number of edge points.

当一个字幕区域为真实字幕区域时，相应综合子带纹理图上的投影中波谷，以及小波变换后的低频分量图像投影的波谷的分布均匀。其中波谷的检测方法同步骤224中所记载的，均匀的度量方法是波谷的长度大小不超过波峰，并且波谷的方差较小。When a subtitle area is a real subtitle area, the valleys in the projection on the corresponding integrated subband texture map and the valleys in the projection of the low-frequency component image after wavelet transformation are evenly distributed. The detection method of the trough is the same as that described in step 224, and the uniform measurement method is that the length of the trough does not exceed the peak, and the variance of the trough is small.

步骤23，获取检测出的字幕的属性信息Step 23, acquiring the attribute information of the detected subtitles

具体的，该步骤中，可以对检测出的字幕进行匹配及跟踪操作，确定字幕属性信息。Specifically, in this step, matching and tracking operations may be performed on the detected subtitles to determine subtitle attribute information.

字幕匹配操作是根据前一I帧和当前I帧的字幕检测情况来判断检测出的字幕是否匹配，如果匹配则表明相匹配的字幕属于同一字幕，否则属于不同字幕。The subtitle matching operation is to judge whether the subtitles detected match according to the subtitle detection situation of the previous I frame and the current I frame. If it matches, it indicates that the matched subtitles belong to the same subtitle, otherwise they belong to different subtitles.

相邻两个需要执行字幕检测的I帧是否需要进行字幕匹配跟踪，是按照这两帧中所检测出的字幕条数目并按如下可能出现的情况进行判断：Whether two adjacent I frames that need to perform subtitle detection need to carry out subtitle matching tracking is to judge according to the subtitle bar number detected in these two frames and according to the following possible situations:

1)如果前一I帧和当前I帧的字幕条数均为0，则无需进行匹配和跟踪操作；1) If the number of subtitles in the previous I frame and the current I frame is 0, there is no need to perform matching and tracking operations;

2)如果前一I帧的字幕条数量为0，而当前I帧的字幕条数量不为0，则可以确定当前I帧的字幕条数全部为新出现的字幕，那么需要进行匹配和跟踪操作，以确定当前I帧中字幕的起始帧。2) If the number of subtitles in the previous I frame is 0, and the number of subtitles in the current I frame is not 0, then it can be determined that the number of subtitles in the current I frame is all new subtitles, so matching and tracking operations are required , to determine the start frame of the subtitle in the current I frame.

作起始帧判断时，首先需要根据当前I帧和下一I帧中的字幕匹配情况以及所确定的字幕属性来进行处理。如果下一I帧中没有字幕、或者有字幕但是和当前I帧中检测的字幕不匹配，则将当前I帧中检测的字幕当成错检并予以剔除，否则对当前I帧中所检测的新出现字幕条进行字幕跟踪。When judging the start frame, it first needs to be processed according to the subtitle matching situation in the current I frame and the next I frame and the determined subtitle attribute. If there is no subtitle in the next I frame, or there is a subtitle but does not match with the subtitle detected in the current I frame, then the subtitle detected in the current I frame is regarded as wrongly detected and removed, otherwise the new detected subtitle in the current I frame A subtitle bar appears for subtitle tracking.

3)如果前一I帧的字幕条数量不为0，而当前I帧的字幕条数量为0，则当前I帧的字幕条为消失字幕条，那么需要进行匹配和跟踪操作，以确定当前I帧中字幕的终止帧。3) if the subtitle bar quantity of previous I frame is not 0, and the subtitle bar quantity of current I frame is 0, then the subtitle bar of current I frame is disappearance subtitle bar, need to carry out matching and tracking operation so, to determine current I The ending frame of the subtitle in the frame.

4)如果前一I帧和当前I帧的字幕条数均不为0，则需要对前一I帧和当前帧I中的字幕进行匹配和跟踪操作，以确定前一I帧中哪些字幕是匹配的，哪些是消失的，以及当前I帧中哪些字幕是匹配，哪些是新出现的。对于在前一I帧中，哪些在前一I帧到当前I帧之间消失的I帧需要确定该字幕的终止帧，对于当前I帧中新出现的字幕条需要从前一I帧到当前I帧之间确定该字幕的出现帧。4) If the number of subtitles in the previous I frame and the current I frame is not 0, then it is necessary to match and track the subtitles in the previous I frame and the current frame I to determine which subtitles in the previous I frame are matches, which ones are gone, and which subtitles are matches and which ones are new in the current I-frame. For the previous I frame, which I frames that disappeared between the previous I frame and the current I frame need to determine the end frame of the subtitle, and for the new subtitle bar in the current I frame, it needs to be from the previous I frame to the current I frame Frame Between Determines the frame in which the subtitle appears.

那么可以看出，只要前一I帧或当前I帧中有一个帧的字幕条数不为零，即需要进行匹配和跟踪操作。Then it can be seen that as long as the number of subtitles in one frame in the previous I frame or the current I frame is not zero, matching and tracking operations are required.

本发明实施例中，可以通过抽样匹配的方式，来实现字幕的匹配操作，即计算当前I帧中待匹配字幕p与下一I帧中任意一条未匹配过的字幕q(1≤q≤n)在滑动匹配中的最小平均绝对误差(MAD：Mean AbsoluteDifference)，然后从n条字幕匹配中选取MAD值最小的，作为最佳匹配字幕，并进一步判断此最小MAD是否满足最小约束阈值。In the embodiment of the present invention, the subtitle matching operation can be realized by means of sampling matching, that is, calculating subtitle p to be matched in the current I frame and any unmatched subtitle q in the next I frame (1≤q≤n ) in sliding matching (MAD: Mean Absolute Difference), and then select the smallest MAD value from n subtitle matching as the best matching subtitle, and further judge whether the minimum MAD meets the minimum constraint threshold.

具体的，对于当前I帧的字幕q以及下一I帧字幕p，字幕所在的上下左右边框的位置分别为U_IC ^q，D_IC ^q，L_IC ^q，R_IC ^q以及U_IP ^p，D_IP ^p，L_IP ^p，R_IP ^p。Specifically, for the subtitle q of the current I frame and the subtitle p of the next I frame, the positions of the upper, lower, left, and right borders where the subtitles are located are respectively U _IC ^q , D _IC ^q , L _IC ^q , R _IC ^q and U _IP ^p , D _IP ^p , L _IP ^p , R _IP ^p .

若两个I帧都为水平排列，则抽取当前I帧的字幕q以及下一I帧字幕p，在水平方向上的公共区域中，左边边框的最大值 $Lpq = \max {L_{IP}^{p}, L_{IC}^{q}},$ 以及右边边框的最小值 $Rpq = \min {R_{IP}^{p}, R_{IC}^{q}},$ 如果Rpq-Lpq小于等于阈值，则认为不匹配(此处的阈值具体可为10)；如果大于阈值，则抽取水平方向上的公共区域中，下一I帧字幕p的中心cy( $cy = round [(U_{IP}^{p} + D_{IP}^{p}) / 2],$ 其中round[·]表示取整)处的像素IP(cy，Lpq:Rpq)，通过滑动匹配等方法确定其和当前I帧的字幕q，在高度为y处IC(y，Lpq:Rpq)的像素条的匹配误差MAD(y，q)，以及最佳匹配位置具体可通过如下公式计算获取：If the two I frames are arranged horizontally, then extract the subtitle q of the current I frame and the subtitle p of the next I frame, and in the common area in the horizontal direction, the maximum value of the left border $QUR = \max {L_{IP}^{p}, L_{IC}^{q}},$ and the minimum value of the right border $QUR = \min {R_{IP}^{p}, R_{IC}^{q}},$ If Rpq-Lpq is less than or equal to the threshold, then consider mismatch (the threshold here can be specifically 10); if greater than the threshold, then extract the center cy ( $cy = round [(u_{IP}^{p} + {D.}_{IP}^{p}) / 2],$ Among them, round[·] represents the pixel IP(cy, Lpq:Rpq) at the place of rounding), and the subtitle q between it and the subtitle q of the current I frame is determined by sliding matching and other methods, and the height of IC(y, Lpq:Rpq) at the height y The matching error MAD(y, q) of the pixel strip, and the best matching position Specifically, it can be calculated by the following formula:

$MAD MAD ((y the y,, q q)) = = \frac{11}{((Rpq QUR - - Lpq QUR))} {Σ Σ}_{x x = = Lpq QUR}^{Rpq QUR} | | IP IP ((cy cy,, x x)) - - IC IC ((y the y,, x x)) | |,, y the y &Element; &Element; [[{U u}_{IC IC}^{q q},, {D D.}_{IC IC}^{q q}]]$

${y the y}_{q q} = = arg arg \underset{y the y}{min min} {{MAD MAD ((y the y,, q q))}}$

${q q}_{00} = = arg arg \underset{q q}{min min} {{MAD MAD ((q q))}}$

如果在最佳匹配位置

下的MAD(q₀)≤MAD_th，则认为是匹配字幕。本发明实施例中，阈值MADth的较佳取值可以为MAD_th＝20。If in the best match position

If MAD(q 0 )≤MAD th under MAD(q ₀ )≤MAD _th , it is considered as matching subtitles. In the embodiment of the present invention, a preferred value of the threshold MADth may be MAD _th =20.

若为都为垂直排列，则抽取当前I帧的字幕q以及下一I帧的字幕p，在垂直方向上的公共区域中，上边边框的最大值 $Upq = \max {U_{IP}^{p}, U_{IC}^{q}},$ 以下边边框的最小值 $Dpq = \min {D_{IP}^{p}, D_{IC}^{q}},$ 如果Dpq-Upq≤10则认为不匹配；如果大于阈值，则抽取在垂直方向上公共区域中，下一I帧字幕p的中心cx( $cx = round [(L_{IP}^{p} + R_{IP}^{p}) / 2]$ )处的中心像素IP(Upq:Dpq，cx)，通过滑动匹配等方法确定其和当I前帧字幕q，在宽度为x处IC(Upq:Dpq，x)的像素条的匹配误差为MAD(x，q)，以及最佳匹配位置x0，具体的方法和上述水平字幕类似，然后从中选择最小MAD值所对应的字幕作为最佳匹配，如果最佳匹配位置的MAD(q₀)≤MAD_th则认为是匹配字幕。If they are all arranged vertically, then extract the subtitle q of the current I frame and the subtitle p of the next I frame, and in the common area in the vertical direction, the maximum value of the upper border $Upq = \max {u_{IP}^{p}, u_{IC}^{q}},$ The minimum value of the border below $DpQ = \min {{D.}_{IP}^{p}, {D.}_{IC}^{q}},$ If Dpq-Upq≤10, then it is considered that it does not match; if it is greater than the threshold, then extract in the common area in the vertical direction, the center cx( $cx = round [(L_{IP}^{p} + R_{IP}^{p}) / 2]$ ) at the center pixel IP (Upq: Dpq, cx), determine it and the current I current frame subtitle q by methods such as sliding matching, and the matching error of the pixel strip of IC (Upq: Dpq, x) at the width of x is MAD (x, q), and the best matching position x0, the specific method is similar to the above-mentioned horizontal subtitle, and then select the subtitle corresponding to the minimum MAD value as the best matching, if the best matching position MAD(q ₀ )≤MAD _th is considered as a matching subtitle.

对于匹配上的字幕，可以对其进行跟踪操作，从而确定字幕中起始帧和终止帧的位置。For the matching subtitles, a tracking operation can be performed on them to determine the positions of the start frame and the end frame in the subtitles.

具体的，可以根据从字幕匹配的相对位置差异所计算出的匹配速度，将其分成静态字幕和滚动字幕两种类型。如果匹配的字幕在两个执行字幕检测的帧中的位置不变则判断为静态字幕，否则判断为滚动字幕。Specifically, subtitles can be classified into two types: static subtitles and rolling subtitles according to the matching speed calculated from the relative position difference of subtitle matching. If the position of the matched subtitle in the two subtitle detection frames remains unchanged, it is judged as a static subtitle, otherwise it is judged as a rolling subtitle.

若为滚动字幕，则根据匹配速度以及当前帧中滚动字幕所在的位置，来确定该字幕边框在当前帧之前的某一帧恰好进入图像画面，以及在当前帧之后的某一帧刚好超出图像画面范围所对应的帧号的数据帧，作为出现帧及终止帧。If it is a scrolling subtitle, according to the matching speed and the position of the scrolling subtitle in the current frame, it is determined that a certain frame before the current frame of the subtitle border just enters the image frame, and a certain frame after the current frame just exceeds the image frame The data frame of the frame number corresponding to the range is used as the appearing frame and the ending frame.

若为静态字幕，则访问前一帧所在的图像组(GOP：group of pictures：图像组)的视频流，并对其中每帧的亮度分量图像进行解码操作，同时获取其字幕区域直流(DC)图像，计算在此GOP内，字幕区域DC图像的平均绝对误差MAD值，根据MAD值来确定静态字幕的出现帧和终止帧。If it is a static subtitle, access the video stream of the group of pictures (GOP: group of pictures: group of pictures) where the previous frame is located, and decode the luminance component image of each frame, and obtain its subtitle area direct current (DC) Image, calculate the average absolute error MAD value of the subtitle area DC image in this GOP, and determine the appearance frame and end frame of the static subtitle according to the MAD value.

在上述步骤中的静态字幕条跟踪中一个GOP内字幕区域DC图像的平均绝对误差是通过抽取该区域中的DC线条进行匹配予以实现的。具体如下：In the static subtitle bar tracking in the above steps, the average absolute error of the DC image in the subtitle area within a GOP is realized by extracting the DC lines in this area for matching. details as follows:

首先，实现对前一帧和当前帧之间的帧进行部分解码并获取DC图像。Firstly, realize partial decoding of the frame between the previous frame and the current frame and obtain the DC image.

然后，根据当前帧中的所得出的字幕边框位置得出其在DC图像中的相对应的坐标位置，并抽取其间DC图像中字幕所在区域的中心块处DC线条。Then, its corresponding coordinate position in the DC image is obtained according to the derived subtitle border position in the current frame, and the DC line at the center block of the area where the subtitle is located in the DC image is extracted.

接下来，计算给定的帧i和当前帧的DC线条差异值。Next, calculate the difference value of the DC line between the given frame i and the current frame.

在抽取DC线条时要考虑字幕的排列方向。对于水平排列的字幕，其中的第i帧和当前帧的DC线条差异值MADDC(i)，具体可按如下公式获取：The arrangement direction of subtitles should be considered when extracting DC lines. For horizontally arranged subtitles, the DC line difference MADDC(i) between the i-th frame and the current frame can be obtained according to the following formula:

$MADDC MADDC ((i i)) = = \frac{11}{L L} {Σ Σ}_{dcx dcx = = 11}^{L L} | | DC DC ((dcy DCY,, dcx dcx,, IC IC)) - - DC DC ((dcy DCY,, dcx dcx,, i i)) | | IP IP \leq \leq i i \leq \leq IC IC$

其中DC(y，x，i)表示第i帧所对应的DC图像，dcy表示DC图像中字幕区域在垂直方向上的中心位置。Where DC(y, x, i) represents the DC image corresponding to the i-th frame, and dcy represents the center position of the subtitle area in the DC image in the vertical direction.

对于垂直排列字幕的计算方法与上面方法类似。The calculation method for vertically arranged subtitles is similar to the above method.

对于出现帧或者终止帧的判断方法，可以通过在MADDC曲线上寻找突变点来确定。具体方法如下公式所示：As for the judging method of appearing frame or terminating frame, it can be determined by looking for a sudden change point on the MADDC curve. The specific method is shown in the following formula:

其中th1和th2是判断突变点的约束阈值，本发明实施例中选用的较佳约束阈值是th1＝3.5，th2＝9。Wherein th1 and th2 are the constraint thresholds for judging the mutation point, and the preferred constraint thresholds selected in the embodiment of the present invention are th1=3.5 and th2=9.

如果在以当前帧为中心，搜索半径为2个GOP长度范围内没有找到突变点，则将该字幕条作为错检测的字幕予以剔除；否则找出离当前帧前或后距离最近的数据帧，作为出现帧或者终止帧。If the current frame is the center and the search radius is 2 GOP lengths, no mutation point is found, then the subtitle bar is removed as a wrongly detected subtitle; otherwise, find the data frame closest to the front or back of the current frame, as a present frame or an end frame.

上式是对水平排列字幕计算差异值，对于垂直排列字幕的计算方法与上面类似的方法得到。The above formula is to calculate the difference value for horizontally arranged subtitles, and the calculation method for vertically arranged subtitles is obtained by a method similar to the above.

步骤24，根据字幕的属性信息，提取检测出的字幕。Step 24, extracting the detected subtitles according to the attribute information of the subtitles.

需要说明的是，本发明实施例提供的视频字幕信息获取方法中，可以实时的记录已经获取的字幕属性信息。It should be noted that, in the video subtitle information acquisition method provided by the embodiment of the present invention, the acquired subtitle attribute information can be recorded in real time.

字幕属性信息具体可以包括字幕的基本信息、场景信息、以及匹配信息等。The subtitle attribute information may specifically include basic subtitle information, scene information, matching information, and the like.

基本信息具体可以包括该字幕的基本属性信息，以及检测信息等；The basic information may specifically include basic attribute information and detection information of the subtitle;

场景信息具体可以包括该字幕的起始帧和终止帧，以及字幕是否跨越镜头标志等；The scene information may specifically include the start frame and end frame of the subtitle, and whether the subtitle crosses the shot mark, etc.;

匹配信息具体可以包括是否匹配的标志，以及匹配的位置信息等。The matching information may specifically include a flag indicating whether to match, matching location information, and the like.

其中，本发明实施例对于是否跨越镜头的判断方法，可以采用在所记录的起始帧之前的数据帧和终止帧之后的数据帧所在的区间内进行场景变化检测等成熟方法。本发明实施例对此并不限制。Among them, in the embodiment of the present invention, for the method of judging whether to cross a shot, a mature method such as scene change detection can be used in the interval where the recorded data frame before the start frame and the data frame after the end frame are located. The embodiment of the present invention does not limit this.

本发明实施例所涉及的字幕属性信息具体可如表1所示：The subtitle attribute information involved in the embodiment of the present invention can be specifically shown in Table 1:

表1Table 1

/*结构体，用来描述当前活动的字幕条的属性*/typedef struct ActiveTextLine{/*基本信息*/int frameIndex； //当前帧号int textPos[4]； //位置信息4维数组，左边界，上边界，右边界，下边界int rollingFlag； //滚动和静态的标识位， 0-静止， 1-垂直滚动， 2-水平滚动int verVel； //垂直速度向下为正速度，向上为负速度int horVel； //水平速度向右为正速度，向左为负速度bool direction； //分布方向， 0-水平分布， 1-垂直分布/*场景信息*/int startFrame ； //起始帧int startGOP； //起始GOPint endFrame； //终止帧int duration； //字幕出现的长度(帧数)bool startAbrupt；//起始帧是否出现突变， 0-没有，1-出现bool endAbrupt； //终止帧是否出现突变， 0 没有，1-出现bool crossScene； //字幕跨越镜头标志， 0-没有，1-出现int crossPos[10]；//记录字幕跨越镜头的帧号/*匹配信息*/bool matchFlag； //字幕匹配标志，为1，下一个I帧中出现匹配，为0，没有匹配int matchTextPos[4]； //匹配字幕的位置}ATL； /*Structure, used to describe the properties of the currently active subtitle bar*/typedef struct ActiveTextLine{/*Basic information*/int frameIndex; //Current frame number int textPos[4]; //4-dimensional array of position information, left Boundary, upper boundary, right boundary, lower boundary int rollingFlag; //Rolling and static flags, 0-stationary, 1-vertical scrolling, 2-horizontal scrolling int verVel; //vertical speed downward is positive speed, upward is Negative velocity int horVel; //horizontal velocity to the right is positive velocity, and to the left is negative velocity bool direction; //distribution direction, 0-horizontal distribution, 1-vertical distribution/*scene information*/int startFrame; //start Frame int startGOP; //Start GOPint endFrame; //End frame int duration; //The length of subtitles (number of frames) bool startAbrupt; //Whether there is a sudden change in the start frame, 0-no, 1-bool endAbrupt; //Whether there is a sudden change in the termination frame, 0 no, 1-bool crossScene appears; //subtitle cross-shot flag, 0-no, 1-appear int crossPos[10]; //record the frame number of the subtitle cross-shot/*matching information */bool matchFlag; //Subtitle match flag, 1, a match occurs in the next I frame, 0, no match int matchTextPos[4]; //The position of the matching subtitle}ATL;

另外，本发明实施例还可以以文本记录的形式，对实时获取的字幕属性信息进行保存。记录保存的文本具体可如表2所示：In addition, the embodiment of the present invention can also save the subtitle attribute information acquired in real time in the form of text records. The text of record keeping can be shown in Table 2:

表2Table 2

/*字幕条的属性记录文件格式*/TextNumIndex:#n //视频中的第n条字幕startFrame； //起始帧endFrame ； //终止帧rollingFlag； //滚动和静态的标识位direction； //分布方向， 0-水平分布， 1-垂直分布textPos[4]； //位置信息4维数组，左边界，上边界，右边界，下边界RollingMV[2]； //字幕滚动速度2维数组，垂直速度，水平速度OCRString； //字幕分割后的识别结果 /*Property record file format of subtitle bar*/TextNumIndex:#n //The nth subtitle in the video startFrame; //Start frame endFrame; //End frame rollingFlag; //Rolling and static flag direction; / /distribution direction, 0-horizontal distribution, 1-vertical distribution textPos[4]; //position information 4-dimensional array, left boundary, upper boundary, right boundary, lower boundary RollingMV[2]; //subtitle rolling speed 2-dimensional array , vertical speed, horizontal speed OCRString; //Recognition results after subtitle segmentation

那么，在此步骤中，具体根据已经记录的字幕属性信息，包括字幕的起始帧、终止帧以及出现位置等信息，抽取用于分割的字幕帧，然后执行融合多帧的字幕分割，并对分割的结果进行识别，具体可以包括：Then, in this step, according to the recorded subtitle attribute information, including information such as the start frame, end frame and appearance position of the subtitle, the subtitle frame for segmentation is extracted, and then the subtitle segmentation of the fusion of multiple frames is performed, and the Segmentation results are identified, which may specifically include:

从记录的字幕属性信息中，判断字幕属于静止还是滚动。From the recorded subtitle attribute information, it is judged whether the subtitle is still or rolling.

对于静止字幕，直接抽取起始和终止帧之间所有的I帧和P帧，相同位置的字幕区域图像；For still subtitles, directly extract all I frames and P frames between the start and end frames, and the subtitle area images at the same position;

对于滚动字幕，则根据滚动速度，抽取该字幕所有的I帧和P帧相应图像区域。For rolling subtitles, according to the scrolling speed, all corresponding image areas of the I frame and P frame of the subtitle are extracted.

在区域确定的基础上，将字幕持续帧中所有的I帧的字幕区域部分，先进行自适应阈值二值化分割，得到像素值只有0和255的二值图像；再将分割的所有I帧字幕区域图像，针对相同位置的像素值进行“与操作”，得到“I帧与图像”；然后将字幕持续帧中所有的I帧和P帧的字幕区域图像，针对相同位置的像素值求平均像素值，即求这些图像的一个平均图像，将此平均图像进行二值化分割，得到“I-P帧平均图像”；最后将得到的“I帧与图像”和“I-P帧平均图像”两幅图像进行“与操作”所得出效果图作为最终的分割结果。On the basis of the region determination, the subtitle area part of all I frames in the subtitle continuous frame is firstly segmented by adaptive threshold value binarization to obtain a binary image with pixel values of only 0 and 255; and then all the segmented I frames Subtitle area image, perform "AND operation" on the pixel values at the same position to obtain "I frame and image"; then average the subtitle area images of all I frames and P frames in the subtitle continuous frame for the pixel values at the same position Pixel value, that is, to find an average image of these images, and perform binary segmentation on this average image to obtain the "I-P frame average image"; finally, the obtained two images of "I frame and image" and "I-P frame average image" The effect map obtained by performing "AND operation" is used as the final segmentation result.

对于分割结果，可以在字幕识别过程中，采用文字识别(OCR：OpticalCharacter Recognition)软件，将分割出来的二值图像用进行识别。As for the segmentation result, in the subtitle recognition process, the text recognition (OCR: Optical Character Recognition) software can be used to recognize the segmented binary image.

上述描述可以看出，本发明实施例提供字幕信息获取方法，通过对视频流中数据帧的亮度分量图像进行基于小波的字幕检测，获取检测出的字幕的属性信息，根据所述属性信息，提取检测出的字幕提取，从而准确获取数据帧中字幕信息。由于基于小波的字幕检测，无需对字幕所在的区域进行限制，因此，本发明实施例提供的字幕信息获取方法，可以在不限定字幕位置区域的情况下，获取视频数据中的字幕信息。并且，由于只获取指定数据帧的亮度分量图像，因此，本发明实施例提供的字幕信息获取方法能够更快捷的获取字幕信息。而且，本发明实施例提供的字幕信息获取方法，还可以对获取的字幕进行字幕区域真实性的验证，以及匹配和跟踪操作，从而使本发明实施例提供的字幕信息获取方法可以更准确的获取字幕信息，有效提升字幕检测的性能。另外，本发明实施例提供的字幕信息获取方法，还可以对获取的字幕进行分割操作，从而更加方便了用户的使用。It can be seen from the above description that the embodiment of the present invention provides a subtitle information acquisition method, by performing wavelet-based subtitle detection on the luminance component image of the data frame in the video stream, to obtain the attribute information of the detected subtitle, and extract the subtitle information according to the attribute information The detected subtitles are extracted, so as to accurately obtain the subtitle information in the data frame. Since subtitle detection based on wavelet does not need to limit the subtitle area, the subtitle information acquisition method provided by the embodiment of the present invention can acquire subtitle information in video data without limiting the subtitle location area. Moreover, since only the luminance component image of the specified data frame is acquired, the subtitle information acquisition method provided by the embodiment of the present invention can acquire subtitle information more quickly. Moreover, the subtitle information acquisition method provided by the embodiment of the present invention can also verify the authenticity of the subtitle area for the acquired subtitles, as well as match and track operations, so that the subtitle information acquisition method provided by the embodiment of the present invention can obtain more accurately Subtitle information, effectively improving the performance of subtitle detection. In addition, the method for acquiring subtitle information provided by the embodiment of the present invention can also perform segmentation operations on the acquired subtitles, thereby making it more convenient for users to use.

本发明实施例还提供了一种字幕信息获取装置，如附图4所示，该装置包括检测模块410，第一获取模块420以及提取模块430。其中：The embodiment of the present invention also provides a subtitle information acquisition device, as shown in FIG. 4 , the device includes a detection module 410 , a first acquisition module 420 and an extraction module 430 . in:

检测模块410，用于对视频流中数据帧的亮度分量图像进行基于小波的字幕检测。The detection module 410 is configured to perform subtitle detection based on wavelet on the luminance component image of the data frame in the video stream.

第一获取模块420，用于获取检测模块410检测出的字幕的属性信息。The first acquiring module 420 is configured to acquire the attribute information of the subtitles detected by the detecting module 410 .

第一获取模块420获取的字幕属性信息具体可以包括字幕的基本信息、场景信息、以及匹配信息等。The subtitle attribute information obtained by the first obtaining module 420 may specifically include basic subtitle information, scene information, matching information, and the like.

本发明实施例所涉及的字幕属性信息具体可如表1所示。The subtitle attribute information involved in the embodiment of the present invention may be specifically shown in Table 1.

另外，本发明实施例还可以以文本记录的形式，对实时获取的字幕属性信息进行保存。记录保存的文本具体可如表2所示。In addition, the embodiment of the present invention can also save the subtitle attribute information acquired in real time in the form of text records. The text of the record keeping can be shown in Table 2.

提取模块430，用于根据第一获取模块420获取的字幕属信息，提取检测模块430检测出的字幕。The extraction module 430 is configured to extract the subtitles detected by the detection module 430 according to the subtitle attribute information acquired by the first acquisition module 420 .

在本发明实施例提供的字幕信息获取装置的一个具体实施例中，如附图5所示，该装置具体还可以包括第二获取模块440，用于获取指定数据帧的亮度分量图像。In a specific embodiment of the device for obtaining subtitle information provided by the embodiment of the present invention, as shown in FIG. 5 , the device may further include a second obtaining module 440, configured to obtain a luminance component image of a specified data frame.

本发明实施例所涉及的检测模块410，具体可如附图6所示，包括第一获取单元411，第二获取单元412，生成单元413，确定单元414。其中：The detection module 410 involved in the embodiment of the present invention may be specifically shown in FIG. 6 , including a first acquisition unit 411 , a second acquisition unit 412 , a generation unit 413 , and a determination unit 414 . in:

第一获取单元411，用于对第二获取模块430获取的亮度分量图像进行小波变换，获取水平、垂直以及对角线三个方向的高频子带纹理图。The first acquisition unit 411 is configured to perform wavelet transform on the luminance component image acquired by the second acquisition module 430 to acquire high-frequency sub-band texture maps in three directions: horizontal, vertical and diagonal.

具体的，第一获取单元411对已经选取的数据帧的亮度分量图像，进行小波变换，以获取一个低频子带，和水平、垂直、对角线三个方向的高频子带，其中，水平高频子带记为H、垂直高频子带记为V、对角线高频子带记为D。Specifically, the first acquiring unit 411 performs wavelet transform on the luminance component image of the selected data frame to acquire a low-frequency subband and high-frequency subbands in three directions: horizontal, vertical, and diagonal, wherein the horizontal The high-frequency sub-band is marked as H, the vertical high-frequency sub-band is marked as V, and the diagonal high-frequency sub-band is marked as D.

然后，将获取的水平、垂直以及对角线三个方向的高频子带的系数分别求绝对值，以获取水平高频子带纹理图、垂直高频子带纹理图以及对角线高频子带纹理图。Then, calculate the absolute values of the obtained high-frequency subband coefficients in the three directions of horizontal, vertical and diagonal directions to obtain the horizontal high-frequency sub-band texture map, the vertical high-frequency sub-band texture map, and the diagonal high-frequency sub-band texture map. subband texture map.

第一获取单元411还可以结合以获取的三个高频子带纹理图，获取综合高频子带纹理图(CS)。The first obtaining unit 411 may also combine the obtained three high-frequency sub-band texture maps to obtain a comprehensive high-frequency sub-band texture map (CS).

CS(i，j)＝CH(i，j)+CV(i，j)+CD(i，j)CS(i,j)=CH(i,j)+CV(i,j)+CD(i,j)

第二获取单元412，用于对第一获取单元41 1获取的水平、垂直以及对角线三个方向的高频子带纹理图，获取数据帧字幕点图像(TextPnt)。The second acquiring unit 412 is configured to acquire the data frame subtitle point image (TextPnt) for the high-frequency sub-band texture maps in the horizontal, vertical and diagonal directions acquired by the first acquiring unit 411.

第二获取单元412具体通过以下操作，获取数据帧的字幕点图像：The second obtaining unit 412 specifically obtains the subtitle point image of the data frame through the following operations:

生成单元413，用于根据第二获取单元412获取的字幕点图像，生成字幕区域图像。The generating unit 413 is configured to generate a subtitle area image according to the subtitle point image acquired by the second acquiring unit 412 .

生成单元413具体可以通过以下操作生成字幕区域图像：The generating unit 413 may specifically generate the subtitle area image through the following operations:

首先，对已经生成的字幕点图像分别进行水平方向的闭运算和开运算得到水平图像(Verlmg)。Firstly, the closed operation and the open operation in the horizontal direction are respectively performed on the generated subtitle point image to obtain a horizontal image (Verlmg).

其中，闭运算的结构元素可以为20＊1的全“1”矩阵，开运算的结构元素可以为1＊2的全“1”矩阵，当然，闭运算和开运算所采用的结构元素可以根据实际需要进行灵活安排；Among them, the structural elements of the closing operation can be a 20*1 full "1" matrix, and the structural elements of the opening operation can be a 1*2 full "1" matrix. Of course, the structural elements used in the closing operation and the opening operation can be based on Flexible arrangements are actually required;

确定单元414，用于确定生成单元413生成的字幕区域图像中字幕的条数以及字幕区域位置信息。The determining unit 414 is configured to determine the number of subtitles and the position information of the subtitle area in the subtitle area image generated by the generating unit 413 .

确定单元414具体可以通过以下操作确定字幕区域图像中字幕的条数以及字幕区域位置信息：The determining unit 414 can specifically determine the number of subtitles in the subtitle area image and the position information of the subtitle area through the following operations:

在本发明实施例提供的检测模块410的另一个具体实施例中，检测模块410进一步可如附图7所示，还可以包括检测单元415，用于对确定单元414确定的字幕区域进行是否属为真实的字幕区域的检测。In another specific embodiment of the detection module 410 provided by the embodiment of the present invention, the detection module 410 may further include a detection unit 415 as shown in FIG. for detection of real subtitle regions.

当一个字幕区域为真实字幕区域时，相应综合子带纹理图上的投影中波谷，以及小波变换后的低频分量图像投影的波谷的分布均匀。均匀的度量方法是波谷的长度大小不超过波峰，并且波谷的方差较小。When a subtitle area is a real subtitle area, the valleys in the projection on the corresponding integrated subband texture map and the valleys in the projection of the low-frequency component image after wavelet transformation are evenly distributed. The measure of uniformity is that the length of the trough does not exceed the size of the peak, and the variance of the trough is small.

本发明实施例提供的第一获取模块420，具体可如附图8所示，包括判断单元421，第一确定单元422以及第二确定单元423。其中：The first obtaining module 420 provided by the embodiment of the present invention may specifically be shown in FIG. 8 , including a judging unit 421 , a first determining unit 422 and a second determining unit 423 . in:

判断单元421，用于判断检测模块410检测出的字幕所在的当前I帧，与当前I帧的上一I帧是否匹配。The judging unit 421 is configured to judge whether the current I frame where the subtitle detected by the detection module 410 is located matches the previous I frame of the current I frame.

判断单元421执行判断的条件具体可以包括：前一I帧和当前I帧中的字幕条数是否均为零。The conditions for the judging unit 421 to judge specifically may include: whether the number of subtitles in the previous I frame and the current I frame are both zero.

如果前一I帧和当前I帧中，有一个I帧的字幕条数不为零，则判断单元421需要执行是否匹配的判断操作。If the number of subtitles in one I frame in the previous I frame and the current I frame is not zero, the judging unit 421 needs to perform a judging operation on whether they match.

需要说明的是，判断单元421的判断条件并不仅限于上述条件，可根据实际应用的需要，进行补充和调整。It should be noted that the determination conditions of the determination unit 421 are not limited to the above conditions, and may be supplemented and adjusted according to the needs of practical applications.

判断单元421可以通过抽样匹配方法，判断检测模块410检测出的字幕所在的当前I帧，与当前I帧的上一I帧是否匹配。The judging unit 421 can judge whether the current I frame where the subtitle detected by the detection module 410 is located matches with the previous I frame of the current I frame through a sampling matching method.

即计算当前I帧中待匹配字幕p与下一I帧中任意一条未匹配过的字幕q(1≤q≤n)在滑动匹配中的最小平均绝对误差(MAD：Mean AbsoluteDifference)，然后从n条字幕匹配中选取MAD值最小的，作为最佳匹配字幕，并进一步判断此最小MAD是否满足最小约束阈值。That is, calculate the minimum mean absolute error (MAD: Mean Absolute Difference) between the subtitle p to be matched in the current I frame and any unmatched subtitle q (1≤q≤n) in the next I frame in sliding matching, and then from n Select the subtitle with the smallest MAD value in the subtitle matching as the best matching subtitle, and further judge whether the minimum MAD meets the minimum constraint threshold.

若两个I帧都为水平排列，则抽取当前I帧的字幕q以及下一I帧字幕p，在水平方向上的公共区域中，左边边框的最大值 $Lpq = \max {L_{IP}^{p}, L_{IC}^{q}},$ 以及右边边框的最小值 $Rpq = \min {R_{IP}^{p}, R_{IC}^{q}},$ 如果Rpq-Lpq小于等于阈值，则认为不匹配(此处的阈值具体可为10)；如果大于阈值，则抽取水平方向上的公共区域中，下一I帧字幕p的中心cy( $cy = round [(U_{IP}^{p} + D_{IP}^{p}) / 2],$ 其中round[·]表示取整)处的像素IP(cy，Lpq:Rpq)，通过滑动匹配等方法确定其和当前I帧的字幕q，在高度为y处IC(y，Lpq:Rpq)的像素条的匹配误差MAD(y，q)，以及最佳匹配位置

具体可通过如下公式计算获取：If the two I frames are arranged horizontally, then extract the subtitle q of the current I frame and the subtitle p of the next I frame, and in the common area in the horizontal direction, the maximum value of the left border

QUR = \max {L_{IP}^{p}, L_{IC}^{q}},

and the minimum value of the right border

QUR = \min {R_{IP}^{p}, R_{IC}^{q}},

If Rpq-Lpq is less than or equal to the threshold, then consider mismatch (the threshold here can be specifically 10); if greater than the threshold, then extract the center cy (

cy = round [(u_{IP}^{p} + {D.}_{IP}^{p}) / 2],

Among them, round[·] represents the pixel IP(cy, Lpq:Rpq) at the place of rounding), and the subtitle q between it and the subtitle q of the current I frame is determined by sliding matching and other methods, and the height of IC(y, Lpq:Rpq) at the height y The matching error MAD(y, q) of the pixel strip, and the best matching position

Specifically, it can be calculated by the following formula:

$q_{0} = \arg \min_{q} {MAD (q)}$ 如果在最佳匹配位置

下的MAD(q₀)≤MA_th，则认为是匹配字幕。本发明实施例中，阈值MADth的较佳取值可以为MAD_th＝20。

q_{0} = \arg \min_{q} {MAD (q)}

If in the best match position

If MAD(q 0 )≤MA th under MAD(q ₀ )≤MA _th , it is considered as a matching subtitle. In the embodiment of the present invention, a preferred value of the threshold MADth may be MAD _th =20.

若为都为垂直排列，则抽取当前I帧的字幕q以及下一I帧的字幕p，在垂直方向上的公共区域中，上边边框的最大值 $Upq = \max {U_{IP}^{p}, U_{IC}^{q}},$ 以下边边框的最小值 $Dpq = \min {D_{IP}^{p}, D_{IC}^{q}},$ 如果Dpq-Upq≤10则认为不匹配；如果大于阈值，则抽取在垂直方向上公共区域中，下一I帧字幕p的中心cx( $cx = round [(L_{IP}^{p} + R_{IP}^{p}) / 2]$ )处的中心像素IP(Upq:Dpq，cx)，通过滑动匹配等方法确定其和当I前帧字幕q，在宽度为x处IC(Upq:Dpq，x)的像素条的匹配误差为MAD(x，q)，以及最佳匹配位置x0，具体的方法和上述水平字幕类似，然后从中选择最小MAD值所对应的字幕作为最佳匹配，如果最佳匹配位置

的MAD(q₀)≤MAD_th则认为是匹配字幕。If they are all arranged vertically, then extract the subtitle q of the current I frame and the subtitle p of the next I frame, and in the common area in the vertical direction, the maximum value of the upper border

Upq = \max {u_{IP}^{p}, u_{IC}^{q}},

The minimum value of the border below

DpQ = \min {{D.}_{IP}^{p}, {D.}_{IC}^{q}},

If Dpq-Upq≤10, then it is considered that it does not match; if it is greater than the threshold, then extract in the common area in the vertical direction, the center cx(

cx = round [(L_{IP}^{p} + R_{IP}^{p}) / 2]

) at the center pixel IP (Upq: Dpq, cx), determine it and the current I current frame subtitle q by methods such as sliding matching, and the matching error of the pixel strip of IC (Upq: Dpq, x) at the width of x is MAD (x, q), and the best matching position x0, the specific method is similar to the above-mentioned horizontal subtitle, and then select the subtitle corresponding to the minimum MAD value as the best matching, if the best matching position

MAD(q ₀ )≤MAD _th is considered as a matching subtitle.

判断单元在确定匹配后，触发第一确定单元422。After determining the match, the judging unit triggers the first determining unit 422 .

第一确定单元422，用于在判断单元421的判断结果为匹配时，根据字幕匹配的相对位置差异所计算出的匹配速度，确定检测出的字幕为动态字幕或静态字幕。The first determination unit 422 is configured to determine whether the detected subtitles are dynamic subtitles or static subtitles according to the matching speed calculated according to the relative position difference of subtitle matching when the judging result of the judging unit 421 is a match.

具体的，第一确定单元422可以根据从字幕匹配的相对位置差异所计算出的匹配速度，将其分成静态字幕和滚动字幕两种类型。Specifically, the first determination unit 422 may classify subtitles into two types: static subtitles and rolling subtitles according to the matching speed calculated from the relative position difference of subtitle matching.

如果匹配的字幕在两个执行字幕检测的数据帧中的位置不变则判断为静态字幕，否则判断为滚动字幕。If the positions of the matching subtitles in the two data frames performing subtitle detection are unchanged, it is judged as a static subtitle, otherwise it is judged as a rolling subtitle.

第二确定单元423，用于当第一确定单元422确定字幕为动态字幕时，根据动态字幕的匹配速度，以及当前帧在动态字幕中的位置，确定动态字幕的起始帧和终止帧；当第一确定单元422确定字幕为静态字幕时，抽取静态字幕中的直流线条，并对直流线条进行匹配操作，确定静态字幕的起始帧和终止帧。The second determination unit 423 is used to determine the start frame and end frame of the dynamic subtitle according to the matching speed of the dynamic subtitle and the position of the current frame in the dynamic subtitle when the first determination unit 422 determines that the subtitle is a dynamic subtitle; When the first determining unit 422 determines that the subtitle is a static subtitle, it extracts DC lines in the static subtitle, performs a matching operation on the DC lines, and determines a start frame and an end frame of the static subtitle.

若为滚动字幕，第二确定单元423则根据匹配速度以及当前帧中滚动字幕所在的位置，来确定该字幕边框在当前帧之前的某一帧恰好进入图像画面，以及在当前帧之后的某一帧刚好超出图像画面范围所对应的帧号的数据帧，作为出现帧及终止帧。If it is a scrolling subtitle, the second determining unit 423 determines that the subtitle border just enters the image frame in a certain frame before the current frame, and in a certain frame after the current frame according to the matching speed and the position of the scrolling subtitle in the current frame. The data frame whose frame is just beyond the frame number corresponding to the image frame range is used as the appearing frame and the ending frame.

若为静态字幕，第二确定单元423则访问前一帧所在的图像组(GOP：group of pictures：图像组)的视频流，并对其中每帧的亮度分量图像进行解码操作，同时获取其字幕区域直流(DC)图像，计算在此GOP内，字幕区域DC图像的平均绝对误差MAD值，根据MAD值来确定静态字幕的出现帧和终止帧。If it is a static subtitle, the second determining unit 423 accesses the video stream of the group of pictures (GOP: group of pictures: group of pictures) where the previous frame is located, and decodes the luminance component image of each frame, and obtains its subtitle at the same time Regional direct current (DC) image, calculate the average absolute error MAD value of the subtitle area DC image in this GOP, and determine the appearance frame and end frame of the static subtitle according to the MAD value.

本发明实时例提供的提取模块430，具体可如附图9所示，包括抽取单元431，分割单元432以及识别单元433。其中：The extraction module 430 provided in the real-time example of the present invention may be specifically shown in FIG. 9 , including an extraction unit 431 , a segmentation unit 432 and an identification unit 433 . in:

抽取单元431，用于根据字幕的起始帧、终止帧以及出现位置信息，抽取字幕中用于分割的字幕帧。The extracting unit 431 is configured to extract the subtitle frame for segmentation in the subtitle according to the start frame, end frame and appearance position information of the subtitle.

分割单元432，用于确定抽取单元431抽取的字幕帧对应的字幕区域，对所述字幕区域进行二值化分割，得到二值图像。The segmentation unit 432 is configured to determine the subtitle area corresponding to the subtitle frame extracted by the extraction unit 431, and perform binary segmentation on the subtitle area to obtain a binary image.

具体的，分割单元432具体根据已经记录的字幕属性信息，包括字幕的起始帧、终止帧以及出现位置等信息，抽取用于分割的字幕帧，然后执行融合多帧的字幕分割，并对分割的结果进行识别，具体可以包括：Specifically, the segmentation unit 432 extracts subtitle frames for segmentation according to the recorded subtitle attribute information, including information such as the start frame, end frame, and appearance position of the subtitle, and then performs subtitle segmentation for merging multiple frames, and performs segmentation Identify the results, which may include:

识别单元433，用于识别分割单元432得到的二值图像，提取字幕。The identification unit 433 is configured to identify the binary image obtained by the segmentation unit 432 and extract subtitles.

具体的，识别单元433可以采用文字识别(OCR：Optical CharacterRecognition)软件，对分割出来的二值图像用进行识别，提取其中的字幕。Specifically, the recognition unit 433 may use OCR (Optical Character Recognition) software to recognize the segmented binary image and extract subtitles therein.

上述描述可以看出，本发明实施例提供字幕信息获取装置，通过对视频流中数据帧的亮度分量图像进行基于小波的字幕检测，并对检测出的字幕进行匹配和跟踪操作，从而准确地确定该数据帧的字幕信息。由于基于小波的字幕检测，无需对字幕所在的区域进行限制，因此，本发明实施例提供的字幕信息获取装置，可以在不限定字幕位置区域的情况下，获取视频数据中的字幕信息。并且，由于只获取部分指定数据帧的亮度分量图像，并对获取的字幕进行字幕区域真实性的验证，以及匹配和跟踪操作，从而使本发明实施例提供的字幕信息获取装置可以更快捷、准确的获取字幕信息，有效提升字幕检测的性能。另外，本发明实施例提供的字幕信息获取装置，还可以对获取的字幕进行分割操作，从而更加方便了用户的使用。It can be seen from the above description that the embodiment of the present invention provides a device for obtaining subtitle information, which detects subtitles based on wavelets on the luminance component images of data frames in video streams, and performs matching and tracking operations on the detected subtitles, thereby accurately determining Subtitle information for this data frame. Since subtitle detection based on wavelet does not need to limit the subtitle area, the subtitle information acquisition device provided by the embodiment of the present invention can acquire subtitle information in video data without limiting the subtitle location area. Moreover, since only the luminance component images of part of the specified data frame are acquired, and the acquired subtitles are verified for the authenticity of the subtitle area, as well as matching and tracking operations, the subtitle information acquisition device provided by the embodiment of the present invention can be faster and more accurate Acquiring subtitle information can effectively improve the performance of subtitle detection. In addition, the device for obtaining subtitle information provided by the embodiment of the present invention can also perform segmentation operations on the obtained subtitles, thereby making it more convenient for users to use.

需要说明的是，上述本发明实施例中所涉及的公式或者数值，对于本发明实施例的保护范围不起任何限制影响，当采用其他小波变换、匹配跟踪技术手段时，完全可以进行相应的变换。It should be noted that the formulas or numerical values involved in the above-mentioned embodiments of the present invention have no limiting influence on the scope of protection of the embodiments of the present invention. When other wavelet transform and matching tracking techniques are used, corresponding transformations can be carried out completely. .

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到本发明可借助软件加必需的硬件平台的方式来实现，当然也可以全部通过硬件来实施，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案对背景技术做出贡献的全部或者部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present invention can be realized by means of software plus a necessary hardware platform, and of course all can be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, all or part of the contribution made by the technical solution of the present invention to the background technology can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, optical disks, etc. , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of the present invention.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. A method for obtaining video subtitle information, characterized in that, comprising:

Carry out wavelet transform to the brightness component image of data frame, generate horizontal subband, vertical subband and diagonal subband, calculate the absolute value of the coefficient of described horizontal subband, vertical subband and diagonal subband respectively, obtain Horizontal high frequency subband texture map, vertical high frequency subband texture map and diagonal high frequency subband texture map;

Subtitle point detection is performed on the horizontal high frequency subband texture map, vertical high frequency subband texture map and diagonal high frequency subband texture map to generate initial subtitle point images in three directions: horizontal, vertical and diagonal , respectively performing denoising processing on the initial subtitle point images in the three directions to obtain the final subtitle point images in the three directions, and intersecting the final subtitle point images in the three directions to obtain the subtitle points in the data frame image;

generating a subtitle area image from the subtitle point image of the data frame;

Determine the number of subtitles in the subtitle area image and the position information of the subtitle area;

Acquiring the attribute information of the detected subtitle, the attribute information including: basic information of the subtitle, scene information, and matching information;

Based on the attribute information, the detected subtitles are extracted.

2 . The method according to claim 1 , further comprising: obtaining the luminance component image of the specified data frame before performing wavelet-based subtitle detection on the luminance component image of the data frame. 3 .

3. method according to claim 1, is characterized in that, described generating subtitle area image by the subtitle point image of described data frame comprises:

Performing horizontal closing and opening operations on the subtitle point image to obtain a horizontal image, and performing vertical closing and opening operations on the subtitle point image to obtain a vertical image;

Perform a union operation on the obtained horizontal image and vertical image to obtain the largest point set image including all subtitle regions;

Perform a closing operation on the maximum point set image to obtain a subtitle area image.

4. method according to claim 3, is characterized in that, described determining the number of subtitles in the subtitle area image and subtitle area position information comprises:

Distinguishing between a horizontal subtitle area and a vertical subtitle area in the subtitle area image;

By the coordinate positions of the uppermost, lower, left and right pixel points of the horizontal subtitle area in the horizontal image, determine the position of the upper border, lower border, left border, and right border of the positioning frame in the horizontal image of the horizontal subtitle area ;By the coordinate positions of the uppermost, lower, left and right pixel points of the vertical subtitle area in the vertical image, determine the upper border, lower border, left border, and right border of the positioning frame in the vertical image of the vertical subtitle area Location;

Perform horizontal projection and vertical projection respectively in the area corresponding to the integrated high-frequency sub-band texture map in the horizontal subtitle area positioning frame and the corresponding integrated high-frequency sub-band texture map in the vertical subtitle area positioning frame Projecting, determining the peak and valley information of the projection curve, and determining the number of subtitles in the subtitle area and the positions of the upper and lower borders of the subtitles according to the peak and valley information.

5. The method according to any one of claims 1 to 4, wherein the attribute information of the subtitle includes a start frame and an end frame of the subtitle, as well as appearance position information.

6. The method according to claim 5, wherein the acquisition of the detected subtitle start frame and end frame comprises:

Judging whether the current I frame where the subtitles detected matches with the last I frame of the current frame;

If they match, then determine that the subtitles are dynamic subtitles or static subtitles according to the calculated matching speed according to the relative position difference of subtitle matching;

If the subtitle is a dynamic subtitle, then determine the start frame and end frame of the dynamic subtitle according to the matching speed of the dynamic subtitle and the position of the current frame in the dynamic subtitle;

If the subtitle is a static subtitle, extracting DC lines in the static subtitle, performing a matching operation on the DC lines, and determining a start frame and an end frame of the static subtitle.

7. The method according to claim 1, wherein said extracting detected subtitles according to said attribute information comprises:

Extracting subtitle frames for segmentation in the subtitles according to the start frame, end frame and occurrence position information of the subtitles;

determining the subtitle area corresponding to the extracted subtitle frame, and performing binary segmentation on the subtitle area to obtain a binary image;

Identify the binary image to obtain the subtitle.

8. A video subtitle information acquisition device, characterized in that, comprising:

The detection module is used to carry out subtitle detection based on wavelet to the luminance component image of the data frame in the video stream;

A first acquisition module, configured to acquire attribute information of the subtitles detected by the detection module;

An extraction module, configured to extract the subtitles detected by the detection module according to the subtitle attribute information obtained by the first acquisition module:

The detection module includes:

The first acquisition unit performs wavelet transform on the brightness component image of the data frame, and acquires the horizontal high-frequency sub-band texture map, the vertical high-frequency sub-band texture map and the diagonal high-frequency sub-band texture map;

The second acquisition unit is configured to acquire the subtitle point image of the data frame according to the horizontal, vertical and diagonal high-frequency sub-band texture maps acquired by the first acquisition unit;

a generation unit, configured to generate a subtitle area image according to the subtitle point image of the data frame acquired by the second acquisition unit;

A determining unit, configured to determine the number of subtitles in the subtitle area image generated by the generating unit and the position information of the subtitle area;

The first acquisition module includes:

A judging unit, configured to judge whether the current I frame where the subtitle detected by the detection module is matched with the previous frame of the current frame;

The first determining unit is used to determine whether the subtitles are dynamic subtitles or static subtitles according to the matching speed calculated according to the relative position difference of subtitle matching when the judgment result of the judging unit is a match;

The second determination unit is used to determine the start frame and end frame of the dynamic subtitle according to the matching speed of the dynamic subtitle and the position of the current frame in the dynamic subtitle when the subtitle is a dynamic subtitle; When the subtitle is a static subtitle, extract the DC lines in the static subtitle, and perform a matching operation on the DC lines to determine the start frame and end frame of the static subtitle.

9. The device according to claim 8, further comprising:

The second obtaining module is used to obtain the brightness component image of the specified data frame.

10. The device according to claim 9, wherein the detection module further comprises:

A detecting unit, configured to detect whether the subtitle area determined by the determining unit is a real subtitle area.

11. The device according to claim 8, wherein the extraction module comprises:

An extracting unit, configured to extract a subtitle frame for segmentation in the subtitle according to the start frame, end frame and occurrence position information of the subtitle;

A segmentation unit, configured to determine the subtitle area corresponding to the subtitle frame extracted by the extraction unit, and perform binary segmentation on the subtitle area to obtain a binary image;

The identification unit is used to identify the binary image obtained by the segmentation unit, and extract the subtitle.