CN108052941B

CN108052941B - News subtitle tracking method and device

Info

Publication number: CN108052941B
Application number: CN201711371730.2A
Authority: CN
Inventors: 刘楠
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2021-06-01
Anticipated expiration: 2037-12-19
Also published as: CN108052941A

Abstract

The invention discloses a method and a device for tracking news subtitles, wherein when area tracking is carried out for the first time, a tracking area in a video frame is set, a binary image corresponding to an image in the tracking area of the video frame during the first tracking is used as a reference image, and on the basis, the reference image obtained by processing during the first tracking is used as a basis, and a binary image tracking mode is adopted to track images in the tracking area of other video frames to be tracked. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.

Description

News subtitle tracking method and device

Technical Field

The invention belongs to the technical field of multimedia information processing, and particularly relates to a news subtitle tracking method and device.

Background

The news video contains a large amount of latest information, which has important value for video websites and news applications. The application of video websites or news needs to split and get online the whole news broadcast daily for users to click and watch the split news which are interesting, however, because of the large number of television stations, a large amount of manpower is often consumed to split the news, the split news is input into a title, and the news is uploaded to a publishing system. In addition, due to the timeliness requirement of news, the requirement on the processing speed of news videos is very strict, the cutting, splitting and online of the whole news program can be completed as soon as possible within a short time of news broadcasting, and a overstocked task post-processing mode cannot be adopted.

The news headline can provide a semantic clue with great significance for news splitting, and for a long news splitting algorithm, the appearance, the end and the repetition of the news headline usually mean different information and indicate the structure of the news. Therefore, the time point of the occurrence of the title in the news and the corresponding state (occurrence, termination, repetition, etc.) are critical to news splitting, and the acquisition of the information needs to depend on title positioning and tracking technology. For news content analysis, the content in the news title is the most intuitive summary of the news, and by means of an OCR (Optical Character Recognition) technology, the text content in the image can be directly acquired, the conversion from the bottom-layer features to semantic content is realized, and further, title extraction is realized, but the premise still is that the position of the title needs to be positioned.

The positioning detection of news headlines is often realized by tracking subtitle areas of news videos, at present, the tracking of news subtitles is generally performed in a color histogram mode, however, in such a mode, mistracking is easily caused due to the fact that text contents of different video frames are different but color histograms are similar, and therefore, a subtitle tracking scheme capable of accurately tracking the news headlines needs to be provided urgently in the field, so that a basis is provided for positioning detection of the news headlines, and further, a basis is provided for optical character recognition of news stripping or the news headlines.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method and an apparatus for tracking news subtitles, which are used for accurately tracking news subtitles, so as to provide a basis for positioning detection of the news subtitles, and further provide a basis for breaking news bars or identifying optical characters of the news subtitles.

Therefore, the invention discloses the following technical scheme:

a news subtitle tracking method, comprising:

obtaining a plurality of video frames to be tracked, wherein the video frames respectively comprise news title candidate areas to be tracked;

judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of a plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;

if the judgment result shows that the current video frame is tracked for the first time, setting at least partial area in the news title candidate area as a tracking area, and obtaining a binary image corresponding to the image in the tracking area of the current video frame as a reference image;

if the judgment result shows that the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode based on the reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.

Preferably, the setting of at least a partial area of the candidate news headline areas as a tracking area includes:

calculating at least a partial area of the candidate areas of the news headlines using a predetermined tracking area calculation formula, and using the at least partial area as a tracking area, the predetermined tracking area calculation formula including:

track.x＝rect.x+rect.w*Xratio1；

track.y＝rect.y+rect.h*Yratio1；

track.w＝rect.w*Xratio2；

track.h＝rect.h*Yratio2；

the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.

Preferably, the obtaining a binarized image corresponding to an image in a tracking area of the current video frame as a reference image includes:

selecting an image in a tracking area of the current video frame, and converting the selected image from a red, green and blue (RGB) image into a gray image or a brightness image;

and carrying out binarization processing on the gray level image or the brightness image by using a preset segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.

Preferably, the tracking area of the current video frame by using a binarized image tracking method based on the reference image obtained by processing in the first tracking includes:

obtaining a binary image corresponding to the image in the tracking area of the current video frame;

carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values;

judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.

Preferably, the above method, when the tracking end condition is met, ending the tracking of the plurality of video frames, includes:

when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.

A news caption tracking apparatus, comprising:

the device comprises an acquisition unit, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of video frames to be tracked, and the video frames respectively comprise news title candidate areas to be tracked;

the judging unit is used for judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of the plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;

a first tracking processing unit, configured to set at least a partial region in the candidate region of the news headline as a tracking region when the determination result indicates that the tracking is for the first time, and obtain a binarized image corresponding to an image in the tracking region of the current video frame as a reference image;

a second tracking processing unit, configured to track a tracking area of the current video frame in a binarization image tracking manner based on a reference image obtained by processing in the first tracking when the determination result indicates that the current video frame is not the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.

In the apparatus, it is preferable that the first tracking processing unit sets at least a partial area of the candidate news headline areas as a tracking area, and further includes:

track.x＝rect.x+rect.w*Xratio1；

track.y＝rect.y+rect.h*Yratio1；

track.w＝rect.w*Xratio2；

track.h＝rect.h*Yratio2；

In the foregoing apparatus, preferably, the first tracking processing unit obtains a binarized image corresponding to an image in a tracking area of the current video frame as a reference image, and further includes:

selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using the segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.

In the foregoing apparatus, preferably, the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking method based on a reference image obtained by processing in the first tracking, and further includes:

obtaining a binary image corresponding to the image in the tracking area of the current video frame; carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values; judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.

In the apparatus, it is preferable that the second tracking processing unit ends tracking the plurality of video frames until a preset tracking end condition is met, and the apparatus further includes: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.

According to the scheme, the method and the device for tracking the news subtitles, provided by the invention, set the tracking area in the video frame when area tracking is carried out for the first time, and use the binary image corresponding to the image in the tracking area of the video frame when tracking is carried out for the first time as the reference image. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a news subtitle tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a comparison of candidate regions and tracking regions selected from the candidate regions in an example video frame according to an embodiment of the present invention;

FIG. 3 is a schematic caption view of a news channel provided by an embodiment of the present invention;

FIG. 4 is an image effect of a binarized image (b) obtained after the image (a) is binarized according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a tracking principle for tracking video frames according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a news subtitle tracking apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a news subtitle tracking method, which aims to provide a basis for positioning detection of news titles and further provide a basis for breaking news bars or identifying optical characters of the news titles by accurately tracking the news titles.

Referring to a flowchart of a method for tracking news subtitles shown in fig. 1, in this embodiment, the method may include the following steps:

step 101, obtaining a plurality of video frames to be tracked, wherein the plurality of video frames respectively comprise news title candidate areas to be tracked.

The plurality of video frames to be tracked are generally a plurality of frames of video images with continuous time sequence or frame sequence, for example, a section of video stream in a news video or a plurality of frames of continuous video images correspondingly included in a whole video stream may be specifically used.

The candidate area of the news headline in the video frame may be an area in the video frame that is specified manually according to experience, and generally, the specified area is a position area where the news headline appears more frequently in the video image, for example, the specified area may be a bottom area of the video frame. The candidate area may also be a possible news headline area detected in the video frame using some algorithm.

The area positions of the news headline candidate areas in each video frame correspond to each other, and the correspondence may specifically mean that the news headline candidate areas in each video frame have the same area position in each video frame.

Step 102, judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of a plurality of video frames; the corresponding tracking process of the target area in one video frame is one-time tracking for the target area, and the target area is an area in the news title candidate area.

Upon obtaining a plurality of video frames to be tracked, a tracking process may be performed on the plurality of video frames. For a target area tracked in a current video frame, firstly, judging whether the area is tracked for the first time in the tracking process of the plurality of video frames. In this embodiment, for a certain area in the candidate areas of news titles, as the target area, a corresponding tracking process of the target area in one video frame is specifically regarded as one tracking for the target area.

In the present application, the "tracking for the first time" a certain area refers to initially tracking the area in the whole video, and correspondingly "tracking for the first time" the area refers to other tracking processes after initially tracking the area in the whole video, and the present application takes the tracking process for a certain area in each video frame as one tracking for the area, and does not adopt a description manner of "initially tracking" or "non-initially tracking" a certain area at an angle of the whole video, because the following tracking end condition needs to involve the number of tracking statistics, if the "initially tracking" or "non-initially tracking" manner is adopted, the number of tracking cannot be embodied, and in particular, reference may be made to the following description about the tracking end condition.

And 103, if the judgment result shows that the tracking is for the first time, setting at least part of the news headline candidate area as a tracking area, and obtaining a binary image corresponding to the image in the tracking area of the current video frame as a reference image.

If it is determined that the area is tracked for the first time by judgment, for the case of tracking the area for the first time, the inventor considers that an additional background area may be included in a news title candidate area of a video frame, so as to improve the tracking accuracy, the invention selects a tracking range from the candidate areas, and takes an area within the selected tracking range as an actual tracking area, wherein the tracking area is at least a partial area of the candidate areas, and generally, the tracking area is smaller than the candidate areas.

Next, the present invention exemplarily provides a way of setting a tracking area in a news headline candidate area.

Assuming that the positions of the news title candidate areas in the video frame are (rect.x, rect.y, rect.w, rect.h), where the rect.x and rect.y respectively represent the abscissa and ordinate of the starting position of the news title candidate area in the video frame, the rect.w represents the width of the news title candidate area, and the rect.h represents the height of the news title candidate area, the present embodiment sets the positions (track.x, track.y, track.w, track.h) of the tracking areas in the video frame as:

track.x＝rect.x+rect.w*Xratio1；

track.y＝rect.y+rect.h*Yratio1；

track.w＝rect.w*Xratio2；

track.h＝rect.h*Yratio2；

track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.

Referring to fig. 2, fig. 2 provides a schematic diagram of candidate regions in an example video frame and a comparison of tracking regions selected from the candidate regions.

After setting an actual tracking area based on a candidate area of a video frame, the present embodiment acquires a binarized image of an image in the tracking area in the video frame corresponding to the first tracking, and uses the binarized image as a reference image for subsequently tracking other video frames. The inventor discovers that the news subtitles often have the subtitle characteristic of 'adopting a simple background and showing in a high-contrast character mode compared with the background', and the characteristic can refer to the subtitle characteristic schematic diagram of various news channels provided by fig. 3, so that an important basis is provided for realizing binarization segmentation of a subtitle region.

Specifically, the binarized image of the image in the tracking area in the video frame corresponding to the first tracking can be obtained through the following processing procedures:

1) and selecting an image in the video frame tracking area, and converting the image into a gray image or a brightness image from an RGB (Red-Green-Blue) image.

Specifically, the image in the selected tracking area may be converted from an RGB color space to a gray scale and/or any luminance color separation space, such as YUV, HSV, HSL, LAB, etc.

The gray scale space conversion formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114；

the luminance color separation space, as exemplified by HSL, the conversion formula of luminance L (luminance) is:

L＝(max(R,G,B)+min(R,G,B))/2

the R, G, B represent the components of the image in the tracking area in the red, green and blue color channels, respectively.

Therefore, the images in the selected tracking areas can be converted from the RGB images into grayscale images or luminance images respectively by using the above corresponding calculation formulas.

2) A segmentation threshold is calculated.

For the gray scale or luminance image corresponding to the tracking area image, a gray scale division threshold value is calculated by using OTSU (maximum inter-class variance method).

Among them, the description of the OTSU method is:

(1) it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted.

(2) For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:

x(i)＝i*256/N

(3) is obtained in such a way that

The division threshold Th is x (t) corresponding to the maximum t.

3) Binarizing the gray level or brightness image corresponding to the tracking area image, and binarizing the image B_refAnd the image is used as a reference image when the tracking area in the subsequent video frame is subjected to binary tracking.

Wherein, the segmentation threshold Th is adopted to carry out binarization on the gray level image or the brightness image I, specifically, if I (x, y)<Th, then B_ref(x, y) is 0; otherwise, if I (x, y)>When Th is equal to B_ref(x,y)＝255。

Referring to fig. 4, fig. 4 shows the image effect of the binarized image (b) obtained after binarizing the image (a).

After the binarized image of the image in the tracking area in the video frame corresponding to the first tracking is obtained through the above processing, the binarized image is used as a reference image for subsequently tracking other video frames.

Step 104, if the judgment result shows that the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode based on the reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.

And if the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode according to the reference image obtained by processing in the first tracking.

Referring to the schematic diagram of the tracking principle for tracking the video frame provided in fig. 5, the tracking process for tracking the video frame in the binarized image tracking manner can be specifically implemented in the following processing manners:

1) firstly, converting an RGB image into a gray image or a brightness image from a tracking area image of a current video frame to be tracked.

Specifically, the image in the tracking area in the video frame to be tracked at present can be converted from RGB color space to gray scale and/or any luminance-color separation space, such as YUV, HSV, HSL, LAB, etc.

The gray scale space conversion formula is as follows:

Gray＝R*0.299+G*0.587+B*0.114；

L＝(max(R,G,B)+min(R,G,B))/2

the R, G, B represent the components of the tracking area image in the red, green, and blue color channels, respectively.

Therefore, the image in the tracking area in the current video frame to be tracked can be converted into a gray image or a brightness image from the RGB image by respectively adopting the corresponding calculation formulas.

2) And carrying out binarization on a gray level or brightness image corresponding to an image in a tracking area in a video frame to be tracked currently. I.e. for a pixel (x, y) in the image I, its corresponding binarized image B_curThe pixels of (a) are:

if I (x, y) < Th, Bcur (x, y) ═ 0; otherwise, if I (x, y) > Th, Bcur (x, y) > 255. Where Th is a division threshold value obtained by the processing at the time of the first tracking.

3) Binarizing image B of current video frame_curWith reference to picture B_refThe point-by-point difference is performed to obtain a point-by-point difference value, and an average value Diff of the point-by-point difference value is calculated.

The average value Diff of the point-by-point difference can be calculated using the following formula:

where W and H represent the width and height of the tracking area image, respectively.

4) The difference average value Diff of the obtained point-by-point difference value is compared with a preset difference threshold Th_trackingAnd comparing, and judging whether the average value Diff of the point-by-point difference values reaches a preset difference threshold value.

If the average value Diff of the point-by-point difference values does not reach the preset difference threshold Th_trackingI.e. Diff < Th_trackingIf the difference between the tracking area image of the currently tracked video frame and the tracking area image used as the reference in the first tracking is within the allowable range, the tracking of the current video frame is considered to be successful in the case, and the next video frame can be tracked continuously in the tracking state; otherwise, if the average value Diff of the point-by-point difference values reaches a preset difference threshold Th_trackingI.e. Diff>＝Th_trackingIf the tracking condition is not met, the tracking state needs to be returned continuously to the tracking state to track the next video frame.

And ending the tracking of the plurality of video frames until a preset tracking ending condition is met.

Wherein the end condition may be, but is not limited to: and the tracking failure times in the tracking process of the plurality of video frames reach a preset threshold value, or the tracking of all the video frames in the plurality of video frames is completed.

Thus, the tracking of the plurality of video frames may be ended when the number of tracking failures in the tracking of the plurality of video frames reaches a predetermined threshold, or the tracking of all the video frames in the plurality of video frames is completed.

Wherein, the number of times of failure to establish tracking can also play the following roles: aiming at the problem that the tracking failure is caused by the fact that the tracking failure times are set for the problem that the matching failure is caused by the fact that the image distortion is caused by the fact that the individual video signals are interfered, the tracking failure of the algorithm with the individual number of video frames can be allowed.

After the tracking is finished, a corresponding tracking result can be output so as to provide a basis for subsequent related applications.

For example, the frame number of the current video frame is returned, and/or the number of times of tracking success and the number of times of tracking failure in the tracking process are returned, and/or images in the tracking area are returned, and the like, so that the number of times of tracking success and the number of times of tracking failure which are returned are used as the basis for determining whether news headlines exist in the tracking area and further detecting the headlines, the returned tracking area image is used as the basis for identifying optical characters of the news headlines, and the returned frame number and the detected and positioned headlines are used as the basis for breaking news, and the like.

According to the news subtitle tracking method provided by the embodiment of the invention, when the region tracking is carried out for the first time, the tracking region in the video frame is set, the binarized image corresponding to the image in the tracking region of the video frame during the first tracking is used as the reference image, and on the basis, the reference image obtained by processing during the first tracking is used as the basis, and the image in the tracking region is tracked by adopting a binarized image tracking mode for other video frames to be tracked. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.

Another embodiment of the present invention discloses a news subtitle tracking apparatus, which is intended to provide a basis for positioning and detecting news titles by accurately tracking the news titles, and further provide a basis for breaking news bars or identifying optical characters of the news titles. Referring to fig. 6, a schematic structural diagram of a news subtitle tracking apparatus is shown, the apparatus including:

the device comprises an acquisition unit 1, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of video frames to be tracked, and the video frames respectively comprise news title candidate areas to be tracked; the judging unit 2 is configured to judge, for a target area tracked in a current video frame, whether the target area is tracked for the first time in a tracking process of the plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area; a first tracking processing unit 3, configured to set at least a partial area in the candidate area of the news headline as a tracking area when the determination result indicates that tracking is performed for the first time, and obtain a binarized image corresponding to an image in the tracking area of the current video frame as a reference image; a second tracking processing unit 4, configured to track, when the determination result indicates that the current video frame is not the first tracking, a tracking area of the current video frame in a binarization image tracking manner based on a reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.

In an implementation manner of the embodiment of the present invention, the first tracking processing unit, which sets at least a partial area of the candidate news headline areas as a tracking area, further includes:

track.x＝rect.x+rect.w*Xratio1；

track.y＝rect.y+rect.h*Yratio1；

track.w＝rect.w*Xratio2；

track.h＝rect.h*Yratio2；

In an implementation manner of the embodiment of the present invention, the obtaining, by the first tracking processing unit, a binarized image corresponding to an image in a tracking area of the current video frame as a reference image further includes: selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using the segmentation threshold value to obtain a binarization image corresponding to the tracking area of the current video frame, and using the binarization image as a reference image.

In an implementation manner of the embodiment of the present invention, the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking manner based on a reference image obtained by processing in the first tracking, and further includes: obtaining a binary image corresponding to the image in the tracking area of the current video frame; carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values; judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.

In an implementation manner of the embodiment of the present invention, the second tracking processing unit, when a preset tracking end condition is met, ends tracking the plurality of video frames, further includes: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.

The news subtitle tracking device provided by the embodiment of the invention sets the tracking area in the video frame when the area tracking is carried out for the first time, takes the binary image corresponding to the image in the tracking area of the video frame when the area tracking is carried out for the first time as the reference image, and on the basis, tracks the image in the tracking area of other video frames to be tracked by adopting a binary image tracking mode according to the reference image obtained by processing in the tracking for the first time. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.

The news subtitle tracking device disclosed by the embodiment of the invention is relatively simple in description because the device corresponds to the news subtitle tracking method disclosed by the above embodiment, and for relevant similarities, please refer to the description of the news subtitle tracking method part in the above embodiment, and the details are not described here.

In summary, the scheme of the invention has the following advantages: the invention provides a scheme for tracking news subtitles based on a binarization image tracking mode, which can provide a basis for positioning detection of news headlines and news stripping. Compared with the method for tracking by using the color histogram in the prior art, the method can more accurately track the subtitle area, and the error tracking caused by different text contents but similar color histograms can be avoided; compared with a method for solving the difference by directly utilizing the image, the scheme of the invention can more effectively avoid the interference of noise caused by video compression on the tracking performance and is more robust.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for tracking news subtitles, comprising:

if the judgment result shows that the tracking is for the first time, setting at least partial area in the news title candidate area as a tracking area, and obtaining a binary image corresponding to the whole image of the tracking area of the current video frame as a reference image;

if the judgment result shows that the tracking area is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode according to a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking; ending the tracking of the plurality of video frames until a preset tracking ending condition is met;

the tracking method of the current video frame by using a reference image obtained by processing in the first tracking as a basis and adopting a binarization image tracking mode comprises the following steps:

obtaining a binary image corresponding to the whole image of the tracking area of the current video frame;

carrying out point-by-point difference on the binarized image and a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking to obtain a point-by-point difference value, and calculating the average value of the point-by-point difference values;

2. The method of claim 1, wherein the setting at least a portion of the candidate areas for news headlines as tracking areas comprises:

calculating at least a partial area of the candidate areas of the news headlines by using a predetermined tracking area calculation formula, and using at least a partial area of the candidate areas of the news headlines as a tracking area, wherein the predetermined tracking area calculation formula comprises:

track.x＝rect.x+rect.w*Xratio1；

track.y＝rect.y+rect.h*Yratio1；

track.w＝rect.w*Xratio2；

track.h＝rect.h*Yratio2；

3. The method according to claim 1, wherein the obtaining a binarized image corresponding to an image in a tracking area of the current video frame as a reference image comprises:

4. The method according to claim 1, wherein ending the tracking of the plurality of video frames until a preset tracking end condition is met comprises:

5. A news caption tracking apparatus, comprising:

a first tracking processing unit, configured to set at least a partial region in the candidate region of the news headline as a tracking region when the determination result indicates that the tracking is for the first time, and obtain a binarized image corresponding to the entire image of the tracking region of the current video frame as a reference image;

a second tracking processing unit, configured to track the tracking area of the current video frame by using a binarized image tracking method based on a reference image corresponding to the entire image of the tracking area obtained by processing in the first tracking if the determination result indicates that the tracking is not the first tracking; ending the tracking of the plurality of video frames until a preset tracking ending condition is met;

the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking method based on a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking, and further includes:

6. The apparatus according to claim 5, wherein the first tracking processing unit sets at least a partial area of the candidate areas of news headlines as a tracking area, further comprising:

7. The apparatus according to claim 5, wherein said first tracking processing unit obtains, as a reference image, a binarized image corresponding to an image in a tracking area of the current video frame, and further comprises:

selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using a preset segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.

8. The apparatus according to claim 5, wherein the second tracking processing unit ends tracking the plurality of video frames until a preset tracking end condition is met, further comprising: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.