CN108052941B - News subtitle tracking method and device - Google Patents

News subtitle tracking method and device Download PDF

Info

Publication number
CN108052941B
CN108052941B CN201711371730.2A CN201711371730A CN108052941B CN 108052941 B CN108052941 B CN 108052941B CN 201711371730 A CN201711371730 A CN 201711371730A CN 108052941 B CN108052941 B CN 108052941B
Authority
CN
China
Prior art keywords
tracking
area
image
video frame
news
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711371730.2A
Other languages
Chinese (zh)
Other versions
CN108052941A (en
Inventor
刘楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201711371730.2A priority Critical patent/CN108052941B/en
Publication of CN108052941A publication Critical patent/CN108052941A/en
Application granted granted Critical
Publication of CN108052941B publication Critical patent/CN108052941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for tracking news subtitles, wherein when area tracking is carried out for the first time, a tracking area in a video frame is set, a binary image corresponding to an image in the tracking area of the video frame during the first tracking is used as a reference image, and on the basis, the reference image obtained by processing during the first tracking is used as a basis, and a binary image tracking mode is adopted to track images in the tracking area of other video frames to be tracked. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.

Description

News subtitle tracking method and device
Technical Field
The invention belongs to the technical field of multimedia information processing, and particularly relates to a news subtitle tracking method and device.
Background
The news video contains a large amount of latest information, which has important value for video websites and news applications. The application of video websites or news needs to split and get online the whole news broadcast daily for users to click and watch the split news which are interesting, however, because of the large number of television stations, a large amount of manpower is often consumed to split the news, the split news is input into a title, and the news is uploaded to a publishing system. In addition, due to the timeliness requirement of news, the requirement on the processing speed of news videos is very strict, the cutting, splitting and online of the whole news program can be completed as soon as possible within a short time of news broadcasting, and a overstocked task post-processing mode cannot be adopted.
The news headline can provide a semantic clue with great significance for news splitting, and for a long news splitting algorithm, the appearance, the end and the repetition of the news headline usually mean different information and indicate the structure of the news. Therefore, the time point of the occurrence of the title in the news and the corresponding state (occurrence, termination, repetition, etc.) are critical to news splitting, and the acquisition of the information needs to depend on title positioning and tracking technology. For news content analysis, the content in the news title is the most intuitive summary of the news, and by means of an OCR (Optical Character Recognition) technology, the text content in the image can be directly acquired, the conversion from the bottom-layer features to semantic content is realized, and further, title extraction is realized, but the premise still is that the position of the title needs to be positioned.
The positioning detection of news headlines is often realized by tracking subtitle areas of news videos, at present, the tracking of news subtitles is generally performed in a color histogram mode, however, in such a mode, mistracking is easily caused due to the fact that text contents of different video frames are different but color histograms are similar, and therefore, a subtitle tracking scheme capable of accurately tracking the news headlines needs to be provided urgently in the field, so that a basis is provided for positioning detection of the news headlines, and further, a basis is provided for optical character recognition of news stripping or the news headlines.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method and an apparatus for tracking news subtitles, which are used for accurately tracking news subtitles, so as to provide a basis for positioning detection of the news subtitles, and further provide a basis for breaking news bars or identifying optical characters of the news subtitles.
Therefore, the invention discloses the following technical scheme:
a news subtitle tracking method, comprising:
obtaining a plurality of video frames to be tracked, wherein the video frames respectively comprise news title candidate areas to be tracked;
judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of a plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;
if the judgment result shows that the current video frame is tracked for the first time, setting at least partial area in the news title candidate area as a tracking area, and obtaining a binary image corresponding to the image in the tracking area of the current video frame as a reference image;
if the judgment result shows that the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode based on the reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.
Preferably, the setting of at least a partial area of the candidate news headline areas as a tracking area includes:
calculating at least a partial area of the candidate areas of the news headlines using a predetermined tracking area calculation formula, and using the at least partial area as a tracking area, the predetermined tracking area calculation formula including:
track.x=rect.x+rect.w*Xratio1;
track.y=rect.y+rect.h*Yratio1;
track.w=rect.w*Xratio2;
track.h=rect.h*Yratio2;
the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
Preferably, the obtaining a binarized image corresponding to an image in a tracking area of the current video frame as a reference image includes:
selecting an image in a tracking area of the current video frame, and converting the selected image from a red, green and blue (RGB) image into a gray image or a brightness image;
and carrying out binarization processing on the gray level image or the brightness image by using a preset segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.
Preferably, the tracking area of the current video frame by using a binarized image tracking method based on the reference image obtained by processing in the first tracking includes:
obtaining a binary image corresponding to the image in the tracking area of the current video frame;
carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values;
judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.
Preferably, the above method, when the tracking end condition is met, ending the tracking of the plurality of video frames, includes:
when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.
A news caption tracking apparatus, comprising:
the device comprises an acquisition unit, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of video frames to be tracked, and the video frames respectively comprise news title candidate areas to be tracked;
the judging unit is used for judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of the plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;
a first tracking processing unit, configured to set at least a partial region in the candidate region of the news headline as a tracking region when the determination result indicates that the tracking is for the first time, and obtain a binarized image corresponding to an image in the tracking region of the current video frame as a reference image;
a second tracking processing unit, configured to track a tracking area of the current video frame in a binarization image tracking manner based on a reference image obtained by processing in the first tracking when the determination result indicates that the current video frame is not the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.
In the apparatus, it is preferable that the first tracking processing unit sets at least a partial area of the candidate news headline areas as a tracking area, and further includes:
calculating at least a partial area of the candidate areas of the news headlines using a predetermined tracking area calculation formula, and using the at least partial area as a tracking area, the predetermined tracking area calculation formula including:
track.x=rect.x+rect.w*Xratio1;
track.y=rect.y+rect.h*Yratio1;
track.w=rect.w*Xratio2;
track.h=rect.h*Yratio2;
the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
In the foregoing apparatus, preferably, the first tracking processing unit obtains a binarized image corresponding to an image in a tracking area of the current video frame as a reference image, and further includes:
selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using the segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.
In the foregoing apparatus, preferably, the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking method based on a reference image obtained by processing in the first tracking, and further includes:
obtaining a binary image corresponding to the image in the tracking area of the current video frame; carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values; judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.
In the apparatus, it is preferable that the second tracking processing unit ends tracking the plurality of video frames until a preset tracking end condition is met, and the apparatus further includes: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.
According to the scheme, the method and the device for tracking the news subtitles, provided by the invention, set the tracking area in the video frame when area tracking is carried out for the first time, and use the binary image corresponding to the image in the tracking area of the video frame when tracking is carried out for the first time as the reference image. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a news subtitle tracking method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a comparison of candidate regions and tracking regions selected from the candidate regions in an example video frame according to an embodiment of the present invention;
FIG. 3 is a schematic caption view of a news channel provided by an embodiment of the present invention;
FIG. 4 is an image effect of a binarized image (b) obtained after the image (a) is binarized according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a tracking principle for tracking video frames according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a news subtitle tracking apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a news subtitle tracking method, which aims to provide a basis for positioning detection of news titles and further provide a basis for breaking news bars or identifying optical characters of the news titles by accurately tracking the news titles.
Referring to a flowchart of a method for tracking news subtitles shown in fig. 1, in this embodiment, the method may include the following steps:
step 101, obtaining a plurality of video frames to be tracked, wherein the plurality of video frames respectively comprise news title candidate areas to be tracked.
The plurality of video frames to be tracked are generally a plurality of frames of video images with continuous time sequence or frame sequence, for example, a section of video stream in a news video or a plurality of frames of continuous video images correspondingly included in a whole video stream may be specifically used.
The candidate area of the news headline in the video frame may be an area in the video frame that is specified manually according to experience, and generally, the specified area is a position area where the news headline appears more frequently in the video image, for example, the specified area may be a bottom area of the video frame. The candidate area may also be a possible news headline area detected in the video frame using some algorithm.
The area positions of the news headline candidate areas in each video frame correspond to each other, and the correspondence may specifically mean that the news headline candidate areas in each video frame have the same area position in each video frame.
Step 102, judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of a plurality of video frames; the corresponding tracking process of the target area in one video frame is one-time tracking for the target area, and the target area is an area in the news title candidate area.
Upon obtaining a plurality of video frames to be tracked, a tracking process may be performed on the plurality of video frames. For a target area tracked in a current video frame, firstly, judging whether the area is tracked for the first time in the tracking process of the plurality of video frames. In this embodiment, for a certain area in the candidate areas of news titles, as the target area, a corresponding tracking process of the target area in one video frame is specifically regarded as one tracking for the target area.
In the present application, the "tracking for the first time" a certain area refers to initially tracking the area in the whole video, and correspondingly "tracking for the first time" the area refers to other tracking processes after initially tracking the area in the whole video, and the present application takes the tracking process for a certain area in each video frame as one tracking for the area, and does not adopt a description manner of "initially tracking" or "non-initially tracking" a certain area at an angle of the whole video, because the following tracking end condition needs to involve the number of tracking statistics, if the "initially tracking" or "non-initially tracking" manner is adopted, the number of tracking cannot be embodied, and in particular, reference may be made to the following description about the tracking end condition.
And 103, if the judgment result shows that the tracking is for the first time, setting at least part of the news headline candidate area as a tracking area, and obtaining a binary image corresponding to the image in the tracking area of the current video frame as a reference image.
If it is determined that the area is tracked for the first time by judgment, for the case of tracking the area for the first time, the inventor considers that an additional background area may be included in a news title candidate area of a video frame, so as to improve the tracking accuracy, the invention selects a tracking range from the candidate areas, and takes an area within the selected tracking range as an actual tracking area, wherein the tracking area is at least a partial area of the candidate areas, and generally, the tracking area is smaller than the candidate areas.
Next, the present invention exemplarily provides a way of setting a tracking area in a news headline candidate area.
Assuming that the positions of the news title candidate areas in the video frame are (rect.x, rect.y, rect.w, rect.h), where the rect.x and rect.y respectively represent the abscissa and ordinate of the starting position of the news title candidate area in the video frame, the rect.w represents the width of the news title candidate area, and the rect.h represents the height of the news title candidate area, the present embodiment sets the positions (track.x, track.y, track.w, track.h) of the tracking areas in the video frame as:
track.x=rect.x+rect.w*Xratio1;
track.y=rect.y+rect.h*Yratio1;
track.w=rect.w*Xratio2;
track.h=rect.h*Yratio2;
track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
Referring to fig. 2, fig. 2 provides a schematic diagram of candidate regions in an example video frame and a comparison of tracking regions selected from the candidate regions.
After setting an actual tracking area based on a candidate area of a video frame, the present embodiment acquires a binarized image of an image in the tracking area in the video frame corresponding to the first tracking, and uses the binarized image as a reference image for subsequently tracking other video frames. The inventor discovers that the news subtitles often have the subtitle characteristic of 'adopting a simple background and showing in a high-contrast character mode compared with the background', and the characteristic can refer to the subtitle characteristic schematic diagram of various news channels provided by fig. 3, so that an important basis is provided for realizing binarization segmentation of a subtitle region.
Specifically, the binarized image of the image in the tracking area in the video frame corresponding to the first tracking can be obtained through the following processing procedures:
1) and selecting an image in the video frame tracking area, and converting the image into a gray image or a brightness image from an RGB (Red-Green-Blue) image.
Specifically, the image in the selected tracking area may be converted from an RGB color space to a gray scale and/or any luminance color separation space, such as YUV, HSV, HSL, LAB, etc.
The gray scale space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114;
the luminance color separation space, as exemplified by HSL, the conversion formula of luminance L (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
the R, G, B represent the components of the image in the tracking area in the red, green and blue color channels, respectively.
Therefore, the images in the selected tracking areas can be converted from the RGB images into grayscale images or luminance images respectively by using the above corresponding calculation formulas.
2) A segmentation threshold is calculated.
For the gray scale or luminance image corresponding to the tracking area image, a gray scale division threshold value is calculated by using OTSU (maximum inter-class variance method).
Among them, the description of the OTSU method is:
(1) it is assumed that the grayscale image I can be divided into N grays (N < ═ 256), for which N grays the N-order grayscale histogram H of the image can be extracted.
(2) For each bit t (0< ═ t < N) in the histogram, the following formula is calculated:
Figure BDA0001513864390000091
Figure BDA0001513864390000092
Figure BDA0001513864390000093
x(i)=i*256/N
(3) is obtained in such a way that
Figure BDA0001513864390000094
The division threshold Th is x (t) corresponding to the maximum t.
3) Binarizing the gray level or brightness image corresponding to the tracking area image, and binarizing the image BrefAnd the image is used as a reference image when the tracking area in the subsequent video frame is subjected to binary tracking.
Wherein, the segmentation threshold Th is adopted to carry out binarization on the gray level image or the brightness image I, specifically, if I (x, y)<Th, then Bref(x, y) is 0; otherwise, if I (x, y)>When Th is equal to Bref(x,y)=255。
Referring to fig. 4, fig. 4 shows the image effect of the binarized image (b) obtained after binarizing the image (a).
After the binarized image of the image in the tracking area in the video frame corresponding to the first tracking is obtained through the above processing, the binarized image is used as a reference image for subsequently tracking other video frames.
Step 104, if the judgment result shows that the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode based on the reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.
And if the tracking is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode according to the reference image obtained by processing in the first tracking.
Referring to the schematic diagram of the tracking principle for tracking the video frame provided in fig. 5, the tracking process for tracking the video frame in the binarized image tracking manner can be specifically implemented in the following processing manners:
1) firstly, converting an RGB image into a gray image or a brightness image from a tracking area image of a current video frame to be tracked.
Specifically, the image in the tracking area in the video frame to be tracked at present can be converted from RGB color space to gray scale and/or any luminance-color separation space, such as YUV, HSV, HSL, LAB, etc.
The gray scale space conversion formula is as follows:
Gray=R*0.299+G*0.587+B*0.114;
the luminance color separation space, as exemplified by HSL, the conversion formula of luminance L (luminance) is:
L=(max(R,G,B)+min(R,G,B))/2
the R, G, B represent the components of the tracking area image in the red, green, and blue color channels, respectively.
Therefore, the image in the tracking area in the current video frame to be tracked can be converted into a gray image or a brightness image from the RGB image by respectively adopting the corresponding calculation formulas.
2) And carrying out binarization on a gray level or brightness image corresponding to an image in a tracking area in a video frame to be tracked currently. I.e. for a pixel (x, y) in the image I, its corresponding binarized image BcurThe pixels of (a) are:
if I (x, y) < Th, Bcur (x, y) ═ 0; otherwise, if I (x, y) > Th, Bcur (x, y) > 255. Where Th is a division threshold value obtained by the processing at the time of the first tracking.
3) Binarizing image B of current video framecurWith reference to picture BrefThe point-by-point difference is performed to obtain a point-by-point difference value, and an average value Diff of the point-by-point difference value is calculated.
The average value Diff of the point-by-point difference can be calculated using the following formula:
Figure BDA0001513864390000101
where W and H represent the width and height of the tracking area image, respectively.
4) The difference average value Diff of the obtained point-by-point difference value is compared with a preset difference threshold ThtrackingAnd comparing, and judging whether the average value Diff of the point-by-point difference values reaches a preset difference threshold value.
If the average value Diff of the point-by-point difference values does not reach the preset difference threshold ThtrackingI.e. Diff < ThtrackingIf the difference between the tracking area image of the currently tracked video frame and the tracking area image used as the reference in the first tracking is within the allowable range, the tracking of the current video frame is considered to be successful in the case, and the next video frame can be tracked continuously in the tracking state; otherwise, if the average value Diff of the point-by-point difference values reaches a preset difference threshold ThtrackingI.e. Diff>=ThtrackingIf the tracking condition is not met, the tracking state needs to be returned continuously to the tracking state to track the next video frame.
And ending the tracking of the plurality of video frames until a preset tracking ending condition is met.
Wherein the end condition may be, but is not limited to: and the tracking failure times in the tracking process of the plurality of video frames reach a preset threshold value, or the tracking of all the video frames in the plurality of video frames is completed.
Thus, the tracking of the plurality of video frames may be ended when the number of tracking failures in the tracking of the plurality of video frames reaches a predetermined threshold, or the tracking of all the video frames in the plurality of video frames is completed.
Wherein, the number of times of failure to establish tracking can also play the following roles: aiming at the problem that the tracking failure is caused by the fact that the tracking failure times are set for the problem that the matching failure is caused by the fact that the image distortion is caused by the fact that the individual video signals are interfered, the tracking failure of the algorithm with the individual number of video frames can be allowed.
After the tracking is finished, a corresponding tracking result can be output so as to provide a basis for subsequent related applications.
For example, the frame number of the current video frame is returned, and/or the number of times of tracking success and the number of times of tracking failure in the tracking process are returned, and/or images in the tracking area are returned, and the like, so that the number of times of tracking success and the number of times of tracking failure which are returned are used as the basis for determining whether news headlines exist in the tracking area and further detecting the headlines, the returned tracking area image is used as the basis for identifying optical characters of the news headlines, and the returned frame number and the detected and positioned headlines are used as the basis for breaking news, and the like.
According to the news subtitle tracking method provided by the embodiment of the invention, when the region tracking is carried out for the first time, the tracking region in the video frame is set, the binarized image corresponding to the image in the tracking region of the video frame during the first tracking is used as the reference image, and on the basis, the reference image obtained by processing during the first tracking is used as the basis, and the image in the tracking region is tracked by adopting a binarized image tracking mode for other video frames to be tracked. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.
Another embodiment of the present invention discloses a news subtitle tracking apparatus, which is intended to provide a basis for positioning and detecting news titles by accurately tracking the news titles, and further provide a basis for breaking news bars or identifying optical characters of the news titles. Referring to fig. 6, a schematic structural diagram of a news subtitle tracking apparatus is shown, the apparatus including:
the device comprises an acquisition unit 1, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of video frames to be tracked, and the video frames respectively comprise news title candidate areas to be tracked; the judging unit 2 is configured to judge, for a target area tracked in a current video frame, whether the target area is tracked for the first time in a tracking process of the plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area; a first tracking processing unit 3, configured to set at least a partial area in the candidate area of the news headline as a tracking area when the determination result indicates that tracking is performed for the first time, and obtain a binarized image corresponding to an image in the tracking area of the current video frame as a reference image; a second tracking processing unit 4, configured to track, when the determination result indicates that the current video frame is not the first tracking, a tracking area of the current video frame in a binarization image tracking manner based on a reference image obtained by processing in the first tracking; and ending the tracking of the plurality of video frames until a preset tracking ending condition is met.
In an implementation manner of the embodiment of the present invention, the first tracking processing unit, which sets at least a partial area of the candidate news headline areas as a tracking area, further includes:
calculating at least a partial area of the candidate areas of the news headlines using a predetermined tracking area calculation formula, and using the at least partial area as a tracking area, the predetermined tracking area calculation formula including:
track.x=rect.x+rect.w*Xratio1;
track.y=rect.y+rect.h*Yratio1;
track.w=rect.w*Xratio2;
track.h=rect.h*Yratio2;
the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
In an implementation manner of the embodiment of the present invention, the obtaining, by the first tracking processing unit, a binarized image corresponding to an image in a tracking area of the current video frame as a reference image further includes: selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using the segmentation threshold value to obtain a binarization image corresponding to the tracking area of the current video frame, and using the binarization image as a reference image.
In an implementation manner of the embodiment of the present invention, the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking manner based on a reference image obtained by processing in the first tracking, and further includes: obtaining a binary image corresponding to the image in the tracking area of the current video frame; carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values; judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.
In an implementation manner of the embodiment of the present invention, the second tracking processing unit, when a preset tracking end condition is met, ends tracking the plurality of video frames, further includes: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.
The news subtitle tracking device provided by the embodiment of the invention sets the tracking area in the video frame when the area tracking is carried out for the first time, takes the binary image corresponding to the image in the tracking area of the video frame when the area tracking is carried out for the first time as the reference image, and on the basis, tracks the image in the tracking area of other video frames to be tracked by adopting a binary image tracking mode according to the reference image obtained by processing in the tracking for the first time. Therefore, the invention provides a scheme for tracking news information by using a binary image tracking mode, which can effectively avoid the interference caused by the characteristics of the color histogram and the original image, and can not generate error tracking caused by different text contents but similar color histograms, so that the tracking performance is more stable and more robust.
The news subtitle tracking device disclosed by the embodiment of the invention is relatively simple in description because the device corresponds to the news subtitle tracking method disclosed by the above embodiment, and for relevant similarities, please refer to the description of the news subtitle tracking method part in the above embodiment, and the details are not described here.
In summary, the scheme of the invention has the following advantages: the invention provides a scheme for tracking news subtitles based on a binarization image tracking mode, which can provide a basis for positioning detection of news headlines and news stripping. Compared with the method for tracking by using the color histogram in the prior art, the method can more accurately track the subtitle area, and the error tracking caused by different text contents but similar color histograms can be avoided; compared with a method for solving the difference by directly utilizing the image, the scheme of the invention can more effectively avoid the interference of noise caused by video compression on the tracking performance and is more robust.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
For convenience of description, the above system or apparatus is described as being divided into various modules or units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. A method for tracking news subtitles, comprising:
obtaining a plurality of video frames to be tracked, wherein the video frames respectively comprise news title candidate areas to be tracked;
judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of a plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;
if the judgment result shows that the tracking is for the first time, setting at least partial area in the news title candidate area as a tracking area, and obtaining a binary image corresponding to the whole image of the tracking area of the current video frame as a reference image;
if the judgment result shows that the tracking area is not the first tracking, tracking the tracking area of the current video frame by adopting a binarization image tracking mode according to a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking; ending the tracking of the plurality of video frames until a preset tracking ending condition is met;
the tracking method of the current video frame by using a reference image obtained by processing in the first tracking as a basis and adopting a binarization image tracking mode comprises the following steps:
obtaining a binary image corresponding to the whole image of the tracking area of the current video frame;
carrying out point-by-point difference on the binarized image and a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking to obtain a point-by-point difference value, and calculating the average value of the point-by-point difference values;
judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.
2. The method of claim 1, wherein the setting at least a portion of the candidate areas for news headlines as tracking areas comprises:
calculating at least a partial area of the candidate areas of the news headlines by using a predetermined tracking area calculation formula, and using at least a partial area of the candidate areas of the news headlines as a tracking area, wherein the predetermined tracking area calculation formula comprises:
track.x=rect.x+rect.w*Xratio1;
track.y=rect.y+rect.h*Yratio1;
track.w=rect.w*Xratio2;
track.h=rect.h*Yratio2;
the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
3. The method according to claim 1, wherein the obtaining a binarized image corresponding to an image in a tracking area of the current video frame as a reference image comprises:
selecting an image in a tracking area of the current video frame, and converting the selected image from a red, green and blue (RGB) image into a gray image or a brightness image;
and carrying out binarization processing on the gray level image or the brightness image by using a preset segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.
4. The method according to claim 1, wherein ending the tracking of the plurality of video frames until a preset tracking end condition is met comprises:
when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.
5. A news caption tracking apparatus, comprising:
the device comprises an acquisition unit, a tracking unit and a tracking unit, wherein the acquisition unit is used for acquiring a plurality of video frames to be tracked, and the video frames respectively comprise news title candidate areas to be tracked;
the judging unit is used for judging whether a target area tracked in a current video frame is tracked for the first time in the tracking process of the plurality of video frames; the corresponding tracking process of a target area in one video frame is one-time tracking aiming at the target area, and the target area is an area in the news title candidate area;
a first tracking processing unit, configured to set at least a partial region in the candidate region of the news headline as a tracking region when the determination result indicates that the tracking is for the first time, and obtain a binarized image corresponding to the entire image of the tracking region of the current video frame as a reference image;
a second tracking processing unit, configured to track the tracking area of the current video frame by using a binarized image tracking method based on a reference image corresponding to the entire image of the tracking area obtained by processing in the first tracking if the determination result indicates that the tracking is not the first tracking; ending the tracking of the plurality of video frames until a preset tracking ending condition is met;
the second tracking processing unit tracks the tracking area of the current video frame by using a binarized image tracking method based on a reference image corresponding to the whole image of the tracking area obtained by processing in the first tracking, and further includes:
obtaining a binary image corresponding to the image in the tracking area of the current video frame; carrying out point-to-point difference on the binarized image and a reference image obtained by processing in the first tracking to obtain a point-to-point difference value, and calculating the average value of the point-to-point difference values; judging whether the average value of the point-by-point difference values reaches a preset difference threshold value or not, and if not, successfully tracking the current video frame; otherwise, the tracking of the current video frame fails.
6. The apparatus according to claim 5, wherein the first tracking processing unit sets at least a partial area of the candidate areas of news headlines as a tracking area, further comprising:
calculating at least a partial area of the candidate areas of the news headlines using a predetermined tracking area calculation formula, and using the at least partial area as a tracking area, the predetermined tracking area calculation formula including:
the method comprises the following steps that 1, a receiver.x and a receiver.y respectively represent horizontal and vertical coordinates of the starting point position of a news title candidate area in a video frame, a receiver.w represents the width of the news title candidate area, and a receiver.h represents the height of the news title candidate area; track.x and track.y respectively represent the horizontal and vertical coordinates of the starting position of the tracking area in the video frame, track.w represents the width of the tracking area, and track.h represents the height of the tracking area; xratio1, Xratio2, Yratio1 and Yratio2 are preset parameters, and coordinate systems in which the coordinates are located respectively take the width direction and the height direction of the video frame as the horizontal axis direction and the vertical axis direction.
7. The apparatus according to claim 5, wherein said first tracking processing unit obtains, as a reference image, a binarized image corresponding to an image in a tracking area of the current video frame, and further comprises:
selecting an image in a tracking area of the current video frame, and converting the image from a red, green and blue (RGB) image into a gray image or a brightness image; and carrying out binarization processing on the gray level image or the brightness image by using a preset segmentation threshold value to obtain a binarization image corresponding to the image in the tracking area of the current video frame, and using the binarization image as a reference image.
8. The apparatus according to claim 5, wherein the second tracking processing unit ends tracking the plurality of video frames until a preset tracking end condition is met, further comprising: when the tracking failure times in the plurality of video frame tracking processes reach a preset threshold value or the tracking of all the video frames in the plurality of video frames is finished, ending the tracking of the plurality of video frames.
CN201711371730.2A 2017-12-19 2017-12-19 News subtitle tracking method and device Active CN108052941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711371730.2A CN108052941B (en) 2017-12-19 2017-12-19 News subtitle tracking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711371730.2A CN108052941B (en) 2017-12-19 2017-12-19 News subtitle tracking method and device

Publications (2)

Publication Number Publication Date
CN108052941A CN108052941A (en) 2018-05-18
CN108052941B true CN108052941B (en) 2021-06-01

Family

ID=62133796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711371730.2A Active CN108052941B (en) 2017-12-19 2017-12-19 News subtitle tracking method and device

Country Status (1)

Country Link
CN (1) CN108052941B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769776B (en) * 2018-05-31 2021-03-19 北京奇艺世纪科技有限公司 Title subtitle detection method and device and electronic equipment
CN109800757B (en) * 2019-01-04 2022-04-19 西北工业大学 Video character tracking method based on layout constraint

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6473522B1 (en) * 2000-03-14 2002-10-29 Intel Corporation Estimating text color and segmentation of images
CN101276416A (en) * 2008-03-10 2008-10-01 北京航空航天大学 Text tracking and multi-frame reinforcing method in video
CN101448100A (en) * 2008-12-26 2009-06-03 西安交通大学 Method for extracting video captions quickly and accurately
CN102169540A (en) * 2011-03-28 2011-08-31 汉王科技股份有限公司 Camera-based point reading positioning method and device
CN106803937A (en) * 2017-02-28 2017-06-06 兰州理工大学 A kind of double-camera video frequency monitoring method and system with text log

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682292B (en) * 2012-05-10 2014-01-29 清华大学 Method based on monocular vision for detecting and roughly positioning edge of road
CN103546667B (en) * 2013-10-24 2016-08-17 中国科学院自动化研究所 A kind of automatic news demolition method towards magnanimity broadcast television supervision
US9646191B2 (en) * 2015-09-23 2017-05-09 Intermec Technologies Corporation Evaluating images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6473522B1 (en) * 2000-03-14 2002-10-29 Intel Corporation Estimating text color and segmentation of images
CN101276416A (en) * 2008-03-10 2008-10-01 北京航空航天大学 Text tracking and multi-frame reinforcing method in video
CN101448100A (en) * 2008-12-26 2009-06-03 西安交通大学 Method for extracting video captions quickly and accurately
CN102169540A (en) * 2011-03-28 2011-08-31 汉王科技股份有限公司 Camera-based point reading positioning method and device
CN106803937A (en) * 2017-02-28 2017-06-06 兰州理工大学 A kind of double-camera video frequency monitoring method and system with text log

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Caption detection and text recognition in news video;Zhe Yang;《2012 5th international congress on image and signal processing》;20130225;全文 *
一种快速新闻视频标题字幕探测与定位方法;刘海;《计算机应用研究》;20110831;第28卷(第8期);全文 *
基于灰度差分的新闻视频标题字幕探测;陈树越;《计算机与数字工程》;20110217;第38卷(第11期);全文 *
新闻视频中标题文本检测定位技术研究;陶永宽;《中国优秀硕士学位论文全文数据库 信息科技辑》;20090715;全文 *

Also Published As

Publication number Publication date
CN108052941A (en) 2018-05-18

Similar Documents

Publication Publication Date Title
US11367282B2 (en) Subtitle extraction method and device, storage medium
CN108882057B (en) Video abstract generation method and device
US9628837B2 (en) Systems and methods for providing synchronized content
US8781152B2 (en) Identifying visual media content captured by camera-enabled mobile device
US9860593B2 (en) Devices, systems, methods, and media for detecting, indexing, and comparing video signals from a video display in a background scene using a camera-enabled device
US10304458B1 (en) Systems and methods for transcribing videos using speaker identification
Jung Efficient background subtraction and shadow removal for monochromatic video sequences
JP2017522648A (en) Text detection in video
CN107633252B (en) Skin color detection method, device and storage medium
US8948452B2 (en) Image processing apparatus and control method thereof
US9082039B2 (en) Method and apparatus for recognizing a character based on a photographed image
CN108093314B (en) Video news splitting method and device
CN110399842B (en) Video processing method and device, electronic equipment and computer readable storage medium
CN108256508B (en) News main and auxiliary title detection method and device
CN108052941B (en) News subtitle tracking method and device
CN108108733A (en) A kind of news caption detection method and device
CN108615030B (en) Title consistency detection method and device and electronic equipment
CN108446603B (en) News title detection method and device
EP2017788A1 (en) Shielding-object video-image identifying device and method
CN108229476B (en) Title area detection method and system
CN108388872B (en) Method and device for identifying news headlines based on font colors
CN108171235B (en) Title area detection method and system
JP5441669B2 (en) Image processing apparatus and control method thereof
JP6091552B2 (en) Movie processing apparatus and movie processing system
CN108363981B (en) Title detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant