CN111860262A

CN111860262A - Video subtitle extraction method and device

Info

Publication number: CN111860262A
Application number: CN202010665068.7A
Authority: CN
Inventors: 田广军; 郎梦园; 张立国; 金梅; 张勇
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-10-30
Anticipated expiration: 2040-07-10
Also published as: CN111860262B

Abstract

The invention discloses a method and a device for extracting video subtitles, which belong to the field of digital image processing and mainly comprise the steps of reading a video needing subtitle detection and detecting subtitle frames; positioning a caption area in a caption frame; subtitle extraction and OCR recognition. The caption frame detection is carried out by utilizing the inter-frame angular point difference, and a partial pixel accumulation method is proposed to position the caption area in the caption frame. The invention realizes the detection and extraction of the embedded captions in the video and provides guarantee for realizing subsequent functions such as retrieval, translation and the like.

Description

Video subtitle extraction method and device

Technical Field

The invention relates to the field of digital image processing, in particular to a video subtitle extraction method and device.

Background

At present, video retrieval mainly depends on matching of title keywords and video tags, the retrieval mode is single, and the title keywords and the video tags reflect video contents incompletely, so that the video retrieval according to the video subtitle contents can be used as one of supplementary retrieval modes. And with the coming of globalization, the situation that the characters are not communicated occurs sometimes, so that the automatic translation of the video captions is realized, and the user experience can be improved.

The extraction of the video subtitles mainly has two key points: extracting caption frames and positioning caption positions. Common subtitle frame extraction techniques include: histogram-based algorithms, pixel difference-based algorithms, contour-based algorithms, and the like; the common techniques for positioning the subtitle position include: edge-based positioning methods, connected region-based positioning methods, machine learning-based methods, and the like. The prior arts have problems of poor calculation effect, large calculation amount and the like.

Disclosure of Invention

Aiming at the problems of poor calculation effect and huge calculation amount of the existing video subtitle detection and extraction method, the invention aims to provide a video subtitle extraction method and a video subtitle extraction device so as to realize extraction of embedded subtitles in a video.

The invention provides the following technical scheme:

in one aspect, the present invention provides a method for extracting video subtitles, including:

reading a video of a subtitle to be detected;

detecting a subtitle frame in the video based on the number of the angular points;

positioning a caption area in each caption frame;

and extracting the caption from the positioned caption area and performing optical character recognition to obtain caption characters.

Preferably, the detecting the caption frame in the video based on the number of the corner points includes:

Converting each frame of image in the video into a gray level image;

detecting the angular points of each frame of image converted into the gray level image, and recording the number of the angular points of each frame;

taking the frame with the angular point number meeting the preset condition as a subtitle frame; the preset conditions include: the number of the angular points of the frame is greater than that of the angular points of the previous frame, and the number of the angular points of the frame is greater than 15; the absolute value of the difference between the number of the angular points of the frame and the previous frame is larger than the average value, and the average value is the average value of the absolute values of the difference between the number of the angular points of all the frames and the previous frame within 3 seconds after the frame.

Preferably, the positioning of the subtitle region in each subtitle frame includes:

for each subtitle frame, converting the subtitle frame into a gray image;

intercepting an image with one fourth of the height of the bottom of the gray level image;

performing edge detection on the intercepted image by using a Laplacian operator;

carrying out binarization processing on the image subjected to edge detection by using a Daohu method;

closing the binarized image;

and positioning the subtitle position in the image subjected to the closing operation by using the partial pixel accumulation method.

Preferably, the extracting subtitles from the located subtitle regions and performing optical character recognition includes:

Reading coordinates of a subtitle position;

intercepting a source image of the caption part according to the coordinates and converting the source image into a gray image;

performing median filtering on the gray level image;

performing edge detection on the image subjected to median filtering by using a Laplace algorithm;

binarizing the image by using a large law method;

and carrying out optical character recognition on the characters in the binarized image to obtain caption characters.

Preferably, the locating the subtitle position in the image subjected to the closing operation by using the partial pixel addition method includes:

the image subjected to the closing operation is centered and continuously positioned at [ L/2-L/40, L/2+ L/40 along the horizontal direction of the image]All pixel columns within the length range are set as the pixel value O_i；

Starting with i equal to 1 and i increasing by 1, when O_iWhen the SUM of the pixel and the lower continuous 20 pixel values is 1, O _ SUM is calculated_upWherein, O _ SUM_up＝O_i+O_i+1+O_i+2+O_i+3+...+O_i+20；

When O _ SUM_upWhen > 10, record O_iPosition coordinates (x)_i，y_i) Selecting the smallest y of all position coordinates recorded_iValue y of_iY being the top coordinate of the caption_minA value;

starting from when i equals W and i is decremented by 1, when O_iWhen the SUM of the pixel and the upper continuous 20 pixel values is 1, O _ SUM is calculated_downWherein, O _ SUM _down＝O_i+O_i-1+O_i-2+O_i-3+...+O_i-20(ii) a W is the pixel width of the image, and L is the pixel length of the image;

when O _ SUM_downWhen > 10, record O_iPosition coordinates (x)_i，y_i) Selecting the largest y of all position coordinates recorded_iValue y of_iY being the bottom coordinate of the caption_maxA value;

selecting the image subjected to the closing operation in the vertical direction of the image as [ y_min,y_max]All pixel rows within the range are given a pixel value of O_j；

Starting from the moment when j equals 1 and incrementing j by 1, the SUM O _ SUM of the right consecutive 20 pixel values is calculated_leftWherein, O _ SUM_left＝O_j+O_j+1+O_j+2+O_j+3+...+O_j+20；

When O _ SUM_leftWhen > 10, record O_jPosition coordinates (x)_j,y_j) Selecting the smallest x of all position coordinates of the record_jValue of x_jX being the left-hand coordinate of the subtitle_minA value;

starting from the moment when j equals L and j is decremented by 1, the SUM O _ SUM of the left consecutive 20 pixel values is calculated_rightWherein, O _ SUM_right＝O_j+O_j-1+O_j-2+O_j-3+...+O_j-20；

When O _ SUM_downWhen > 10, record O_jPosition coordinates (x)_j,y_j) Selecting the largest x of all position coordinates of the record_jValue of x_jX being the right coordinate of the caption_maxA value;

saving the position coordinates of the caption, wherein the coordinates of the upper left corner of the caption are (x)_min+5,y_min+5) and the lower right corner coordinate is (x)_max+5,y_max+5)。

In another aspect, the present invention further provides a video subtitle extracting apparatus, including:

the reading unit is used for reading the video of the subtitle to be detected;

The detection unit is used for detecting the subtitle frames in the video read by the reading unit based on the number of the angular points;

a positioning unit, configured to position the caption area in each caption frame detected by the detection unit;

and the extraction unit is used for extracting the caption from the caption area positioned by the positioning unit and carrying out optical character recognition to obtain caption characters.

Preferably, the detection unit is specifically configured to:

converting each frame of image in the video into a gray level image; carrying out angular point detection on each frame of image, and recording the number of angular points of each frame; taking the frame with the angular point number meeting the preset condition as a subtitle frame; the preset conditions include: the number of the angular points of the frame is greater than that of the angular points of the previous frame, and the number of the angular points of the frame is greater than 15; the absolute value of the difference between the number of the angular points of the frame and the previous frame is larger than the average value, and the average value is the average value of the absolute values of the difference between the number of the angular points of all the frames and the previous frame within 3 seconds after the frame.

Preferably, the positioning unit is specifically configured to:

for each subtitle frame, converting the subtitle frame into a gray image; intercepting an image with one fourth of the height of the bottom of the gray level image; performing edge detection on the intercepted image by using a Laplacian operator; carrying out binarization processing on the image subjected to edge detection by using a Daohu method; closing the binarized image; and positioning the subtitle position in the image subjected to the closing operation by using the partial pixel accumulation method.

Preferably, the extraction unit is specifically configured to:

reading coordinates of a subtitle position; intercepting a source image of the caption part according to the coordinates and converting the source image into a gray image; performing median filtering on the gray level image; performing edge detection on the image subjected to median filtering by using a Laplace algorithm; binarizing the image by using a large law method; and carrying out optical character recognition on the characters in the binarized image to obtain caption characters.

Preferably, the positioning unit is configured to position the subtitle position in the image subjected to the closing operation using partial pixel addition, and includes:

starting from when i equals W and i is decremented by 1, when O_iWhen the SUM of the pixel and the upper continuous 20 pixel values is 1, O _ SUM is calculated _downWherein, O _ SUM_down＝O_i+O_i-1+O_i-2+O_i-3+...+O_i-20(ii) a W is the pixel width of the image, and L is the pixel length of the image;

When O _ SUM_lefWhen t > 10, record O_jPosition coordinates (x)_j,y_j) Selecting the smallest x of all position coordinates of the record_jValue of x_jX being the left-hand coordinate of the subtitle_minA value;

The video subtitle extracting method and device provided by the invention have the following beneficial effects:

(1) In the video subtitle extraction method provided by the invention, the subtitle frames are judged by using the angular point number, and in the judgment condition of the angular point number, the angular point number difference between the two frames before and after is used for judgment, so that the video frames can be accurately detected, and the condition that detection is possibly missed by other methods because the subtitle characters are too few is effectively avoided; it is also possible to reduce the number of cases in which a transition frame in a video is determined to be a caption frame. The method has the advantages that the caption frame recognition is carried out by utilizing the characteristic that the number of the corner points is greatly changed when the caption appears, the condition that a plurality of caption frames containing the same caption are obtained when pixel difference calculation is used is effectively avoided, and therefore the subsequent calculation amount is reduced.

(2) In the video subtitle extraction method provided by the invention, when the subtitle region is detected, only the image with the height of one fourth of the bottom of the image is adopted for detection, so that the calculated amount is effectively reduced.

(3) In the video subtitle extraction method provided by the invention, the positions of the subtitles are positioned by using a partial pixel cumulative addition method, the method uses a traditional algorithm for calculation, the calculation amount is less than that of a machine learning-based method, and meanwhile, the detection effect can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a video subtitle extracting method according to an embodiment of the present invention;

fig. 2 is a flow chart of subtitle frame detection according to an embodiment of the present invention;

fig. 3 is a flowchart of subtitle region positioning according to an embodiment of the present invention;

FIG. 4 is a flow chart of subtitle extraction and optical character recognition according to an embodiment of the present invention;

fig. 5 is a flowchart of a partial pixel value accumulation method used in subtitle region positioning according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a partial pixel value accumulation area used in positioning a subtitle area according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides a video subtitle extracting method and device. The method can be applied to extracting the embedded captions in the video and is used for realizing subsequent functions such as retrieval, translation and the like.

Referring to fig. 1, a flowchart of a video subtitle detection and extraction method according to an embodiment of the present invention is shown. The method comprises the following steps:

S101, reading a video of a subtitle to be detected; and detects a caption frame in the video based on the number of corner points.

Because the number of the angular points in the image is changed greatly due to the appearance and the disappearance of the subtitles, the large change of the number of the angular points can directly reflect the appearance and the disappearance of the subtitles, meanwhile, the probability that the number of the angular points is very small after the subtitles appear is smaller than the threshold value of 15, the number of the angular points when the subtitles appear is increased, and the frame can be judged as a subtitle frame when the three conditions are met.

One possible implementation of detecting the caption frames in the video based on the number of corners is as follows:

referring to fig. 2, a flow chart of subtitle frame detection in an embodiment of the present invention is shown. The method comprises the following steps:

s201, reading a video and converting each frame of image into a gray image;

s202, performing Harris corner detection on each frame of image, and recording the number of corners of each frame;

s203, taking the frame with the number of the angular points meeting the preset condition as a caption frame; the preset conditions include: the number of the angular points of the frame is greater than that of the angular points of the previous frame, the number of the angular points of the frame is greater than 15, the absolute value of the difference value of the number of the angular points between the frame and the previous frame is greater than the average value, and the average value is the average value of the absolute values of the difference values of the number of the angular points between all the frames and the previous frame within 3 seconds after the frame.

And S204, storing all the caption frames.

Recording the number of angular points of the ith frame as N_iCalculating the absolute value M of the difference between the numbers of corner points of adjacent frames_i+1As shown in formula (1):

M_i+1＝|N_i+1-N_i| (1)

wherein, the range of i is [1, n ], and n is the total frame number of the video.

Setting N_i+1The corresponding image frame time node is t, and M of the frame and all the frames in the following 3s is calculated_i+1And for all M_i+1Averaging M _ AVG_i+1As shown in formula (2):

wherein z is the frame number within 3s after the time node t, and fps is the frame number per second of the video.

When N is present_i+1When the corresponding frame meets the judgment condition, judging N_i+1The corresponding frame is a caption frame, and the judgment condition is as follows (3):

and S102, positioning the subtitle area in each subtitle frame.

S103, extracting subtitles from the positioned subtitle areas and carrying out optical character recognition to obtain subtitle characters.

And S104, storing the obtained caption characters.

In the embodiment of the invention, the number of the angular points is used for judging the subtitle frames, and in the judgment condition of the number of the angular points, the difference value of the number of the angular points between the two frames before and after is used for judging, so that the video frames can be accurately detected, and the condition that detection is possibly missed by other methods because the number of the subtitle characters is too small is effectively avoided; it is also possible to reduce the number of cases in which a transition frame in a video is determined to be a caption frame. The method has the advantages that the caption frame recognition is carried out by utilizing the characteristic that the number of the corner points is greatly changed when the caption appears, the condition that a plurality of caption frames containing the same caption are obtained when pixel difference calculation is used is effectively avoided, and therefore the subsequent calculation amount is reduced.

In the above embodiment, one possible implementation manner of step S103 is as follows:

referring to fig. 3, a flowchart of subtitle area positioning according to an embodiment of the present invention is shown. The method comprises the following steps:

s301, converting the read subtitle frame into a grayscale image,

s302, the subtitle frame image is partially intercepted, only the image with the height of the bottom 1/4 of the image is saved, and as the position of the subtitle is basically positioned at the bottom 1/4 of the image, only a part of the image is reserved;

and S303, performing edge detection on the image by using a Laplacian (Laplacian) operator.

The Laplacian operator is used for edge detection, so that not only can subtitles in the image be effectively extracted, but also unnecessary noise can be better avoided.

And S304, carrying out binarization processing on the image after edge detection by using a Law method.

And S305, performing closing operation on the binarized image.

And S306, positioning the subtitle position by using a partial pixel accumulation method.

In the embodiment of the invention, when the subtitle region is detected, only the image with the height of one fourth of the bottom of the image is adopted for detection, so that the calculated amount is effectively reduced.

The specific implementation manner of step S306 is:

referring to fig. 5, a flowchart of a partial pixel value accumulation method used in subtitle region positioning according to an embodiment of the present invention is shown, including the following steps:

Step 1: and selecting all pixel columns which are centered and are continuously within the length range of [ L/2-L/40, L/2+ L/40] from the image subjected to the closing operation along the horizontal direction of the image, namely detecting the dotted line area 2 in the graph 6.

Let the value of the pixel be O_iStarting with i equal to 1 and i increasing by 1, when O_iWhen the SUM of the pixel and the lower continuous 20 pixel values is 1, O _ SUM is calculated_up，

Wherein, O _ SUM_up＝O_i+O_i+1+O_i+2+O_i+3+...+O_i+20(ii) a W is the pixel width of the image, L is the pixel length of the image, and i ranges from [1,2,3 … …, W-20](ii) a W and L are the pixel width and the pixel length, respectively, of the subtitle frame image 1 in fig. 6.

When O _ SUM_upWhen > 10, record O_iPosition coordinates (x)_i，y_i) Selecting the smallest y of all position coordinates recorded_iValue y of_iY being the top coordinate of the caption_minThe value is obtained.

Step 2: and selecting all pixel columns which are centered and are continuously within the length range of [ L/2-L/40, L/2+ L/40] from the image subjected to the closing operation along the horizontal direction of the image, namely detecting the dotted line area 2 in the graph 6.

Let the pixel value be O_iStarting from when i equals W and i decrements by 1, when O_iWhen the SUM of the pixel and the upper continuous 20 pixel values is 1, O _ SUM is calculated_down；

Wherein, O _ SUM_down＝O_i+O_i-1+O_i-2+O_i-3+...+O_i-20(ii) a W is the pixel width of the image, and L is the pixel length of the image; i ranges from [ W, W-1, W-2, … …,21 ]。

and step 3: selecting the image subjected to the closing operation in the vertical direction of the image as [ y_min,y_max]All pixel rows within the range; i.e. the dashed line region 4 in fig. 6 is detected.

Let the pixel value be O_jStarting with j equal to 1 and incrementing j by 1, the SUM O _ SUM of the right consecutive 20 pixel values is calculated_left；

Wherein, O _ SUM_left＝O_j+O_j+1+O_j+2+O_j+3+...+O_j+20(ii) a W is the pixel width of the image, L is the pixel length of the image, and j ranges from [1,2,3 … …, L-20 ]]。

and 4, step 4: selecting the image subjected to the closing operation along the vertical direction of the imageIs taken from [ y_min,y_max]All pixel rows within the range; i.e. the dashed line region 4 in fig. 6 is detected.

Let the pixel value be O_jStarting from when j equals L and j decrements by 1, the SUM O _ SUM of 20 consecutive pixel values on the left side is calculated_right；

Wherein, O _ SUM_right＝O_j+O_j-1+O_j-2+O_j-3+...+O_j-20(ii) a W is the pixel width of the image, L is the pixel length of the image, and j ranges from [ L, L-1, L-2, … …,21 ]。

and 5: obtaining and storing the position coordinates of the subtitles according to the calculation result, wherein the coordinate of the upper left corner is (x)_min+5,y_min+5) and the lower right corner coordinate is (x)_max+5,y_max+5), and the coordinate value of each coordinate point plus 5 is to avoid losing the edge of the point part of the character during processing. The range of the resulting subtitle corresponds to the subtitle region 3 format in fig. 6.

In the embodiment of the invention, the position of the caption is positioned by using a partial pixel accumulation method, and the method uses the traditional algorithm for calculation, so that the calculation amount is less than that of a machine learning-based method, and meanwhile, the detection effect can be effectively improved.

referring to fig. 4, a flowchart of subtitle extraction and optical character recognition according to an embodiment of the present invention is shown, including the following steps:

s401, reading coordinates of a subtitle position;

s402, intercepting a source image of a caption part according to coordinates and converting the source image into a gray image;

s403, performing median filtering on the gray level image to remove some possible interference;

s404, performing edge detection on the image after the interference is removed by using a Laplacian operator;

S405, carrying out binarization on the image by using a large law method;

and S406, recognizing characters in the image by using an OCR technology, and storing character information after recognition is finished.

Corresponding to the video subtitle extracting method, the invention also provides a video subtitle extracting device, which comprises:

the reading unit is used for reading the video of the subtitle to be detected;

In a possible embodiment, the detection unit is specifically configured to:

In a possible implementation, the positioning unit is specifically configured to:

In a possible implementation, the extraction unit is specifically configured to:

In a possible implementation, the locating unit is configured to locate the subtitle position in the image subjected to the closing operation using partial pixel addition, and includes:

starting from when i equals W and i is decremented by 1, when O_iWhen the SUM of the pixel and the upper continuous 20 pixel values is 1, O _ SUM is calculated_downWherein, O _ SUM_down＝O_i+O_i-1+O_i-2+O_i-3+...+O_i-20(ii) a W is the pixel width of the image, and L is the pixel length of the image;

when O _ SUM_downWhen > 10, record O_iPosition coordinates (x)_i，y_i) Selecting all position coordinates of the recordMiddle maximum y_iValue y of_iY being the bottom coordinate of the caption_maxA value;

When O _ SUM_leftWhen > 10, record O _jPosition coordinates (x)_j,y_j) Selecting the smallest x of all position coordinates of the record_jValue of x_jX being the left-hand coordinate of the subtitle_minA value;

In the embodiments provided in the present invention, it should be understood that the disclosed technical contents can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit may be a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for extracting a video subtitle, the method comprising:

reading a video of a subtitle to be detected;

positioning a caption area in each caption frame;

2. The method of claim 1, wherein the detecting the caption frame in the video based on the number of corners comprises:

converting each frame of image in the video into a gray level image;

3. The method of claim 1, wherein the locating the caption area in each caption frame comprises:

for each subtitle frame, converting the subtitle frame into a gray image;

closing the binarized image;

4. The method of claim 1, wherein extracting subtitles from the located subtitle regions and performing optical character recognition comprises:

Reading coordinates of a subtitle position;

performing median filtering on the gray level image;

binarizing the image by using a large law method;

5. The method of claim 3, wherein the locating the caption position in the image subjected to the close operation using partial pixel accumulation comprises:

6. A video subtitle extracting apparatus, the apparatus comprising:

the reading unit is used for reading the video of the subtitle to be detected;

7. The apparatus according to claim 6, wherein the detection unit is specifically configured to:

8. The apparatus according to claim 6, wherein the positioning unit is specifically configured to:

9. The apparatus according to claim 6, wherein the extraction unit is specifically configured to:

10. The apparatus of claim 8, wherein the positioning unit is configured to position the caption position in the image subjected to the close operation using a partial pixel accumulation method, and comprises:

the image subjected to the closing operation is centered and continuously positioned at [ L/2-L/40, L/2+ L/40 along the horizontal direction of the image ]All pixel columns within the length range, set the image thereinThe value of element is O_i；

When O _ SUM_leftWhen > 10, record O_jPosition coordinates (x)_j,y_j) Selecting the smallest x of all position coordinates of the record_jValue of x_jX being the left-hand coordinate of the subtitle _minA value;

starting from the moment when j equals L and j is decremented by 1, the SUM O _ SUM of the left consecutive 20 pixel values is calculated_rightWherein，O_SUM_right＝O_j+O_j-1+O_j-2+O_j-3+...+O_j-20；