CN109800757B - Video character tracking method based on layout constraint - Google Patents

Video character tracking method based on layout constraint Download PDF

Info

Publication number
CN109800757B
CN109800757B CN201910006843.5A CN201910006843A CN109800757B CN 109800757 B CN109800757 B CN 109800757B CN 201910006843 A CN201910006843 A CN 201910006843A CN 109800757 B CN109800757 B CN 109800757B
Authority
CN
China
Prior art keywords
character
track
frame
area
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910006843.5A
Other languages
Chinese (zh)
Other versions
CN109800757A (en
Inventor
冯晓毅
宋真东
王西汉
蒋晓悦
夏召强
彭进业
谢红梅
李会方
何贵青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201910006843.5A priority Critical patent/CN109800757B/en
Publication of CN109800757A publication Critical patent/CN109800757A/en
Application granted granted Critical
Publication of CN109800757B publication Critical patent/CN109800757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

In order to solve the problem of multi-character tracking under the condition of large-amplitude camera movement, the invention provides a video character tracking method based on layout constraint. The input of the method is the character detection result of the video and the video frame, and the output is the track information after character tracking. Firstly, the character track is initialized according to the detection result of the initial video frame, and then the character track of the previous frame and the detection result of the current frame are sent to the tracking method of the invention to update the character track. The core of the text track updating is to correspond the text area detected by the current frame to the existing text track, and the process can be regarded as a data matching problem. Aiming at the problem, the invention designs a new data matching cost function and obtains the optimal matching result by solving the cost function. And repeating the track updating process until the video processing is finished, and finally obtaining a character tracking result. According to the method, the layout constraint is introduced into the data matching cost function, and the characters are tracked through the integral appearance structure among the character areas, so that the error tracking result caused by large-amplitude movement of the camera can be effectively avoided, and a better tracking effect is achieved.

Description

Video character tracking method based on layout constraint
Technical Field
The invention relates to the field of video processing, in particular to a character tracking method in a natural scene shooting video.
Background
The text in the video contains high-level semantic information and is usually closely related to the video content. Therefore, the extraction of video text plays an important role in many media analysis-based applications, such as blind assistance systems, driving assistance systems, autonomous mobile robots, and the like. The extraction of video characters generally comprises character detection and character tracking, wherein the character detection is completed to position character targets in video frame images, and the character tracking is completed to correspond the same character areas in a continuous image sequence. The text in the video usually has a temporal redundancy characteristic, i.e. the text will disappear after a certain time in the video. By utilizing the characteristic, the stability and the precision of video character detection can be improved through a character tracking technology. In addition, text tracking can also provide other relevant information for video analysis, such as: the time points of appearance and disappearance of characters in the video time sequence, the motion track of the characters in a certain period of time and the like. Some real-time processing systems can also take advantage of the temporal redundancy characteristics of text in video to increase system processing speed. It follows that text tracking techniques play an important role in video-based analytics applications.
The existing video character tracking method cannot well solve the problem of tracking multiple characters when a camera moves greatly. Because in natural scenes, text usually does not appear singly, but in a dense form. These characters often have the same size, aspect ratio and color characteristics, and most of the features extracted by the tracking algorithm cannot distinguish the characters well, which may cause a wrong match and may not track the characters correctly. This situation is exacerbated with large movements of the camera.
Based on the above problems, the invention provides a video character tracking method based on layout constraint, so as to solve multi-character tracking under large-amplitude camera movement.
Disclosure of Invention
In order to solve the problem of multi-character tracking under the condition of large-amplitude camera movement, the invention provides a video character tracking method based on layout constraint. The process flow of the invention is shown in figure 1. The input of the method is the character detection result of the video and the single-frame image of the video, and the output is the track of each character area in the video, namely the space information (position coordinate and width and height) in each frame. Firstly, initializing a character track through a character area detection result of an initial video frame, then sending the character track of a previous frame and the character area detection result of a current frame into the tracking method of the invention to update the character track, repeating the process until the video processing is finished, and finally obtaining a character tracking result. The core of the text track updating is to correspond the text area detected by the current frame to the existing text track, and the process can be regarded as a data matching problem. Aiming at the problem, the invention designs a new data matching cost function and obtains the optimal matching result by solving the cost function. According to the method, the layout constraint is introduced into the data matching cost function, and the characters are tracked through the integral appearance structure among the character areas, so that the error tracking result caused by large-amplitude movement of the camera can be effectively avoided, and a better tracking effect is achieved. The specific details of the present invention are as follows.
1. Designing data matching cost function
Firstly, a text area contained in a text track in a current frame is defined. Let the status of the ith character area in the t-th frame character track of the video be
Figure GDA0003527719500000021
Wherein
Figure GDA0003527719500000022
Is the abscissa and ordinate of the central point of the character area,
Figure GDA0003527719500000023
is the areaThe speed of movement in the image in the lateral and longitudinal directions,
Figure GDA0003527719500000024
the width and height of the text area,
Figure GDA0003527719500000025
extracting RGB color histograms for the color features of the character region, wherein each channel has 16 feature regions, and the number of the three channels is 48; the status of the character area in all the character tracks in the t-th frame is set as
Figure GDA0003527719500000026
Where i ∈ Nt,NtIndicating the number of text regions in the t-th frame.
For every two character areas, the correlation between the position and the speed needs to be established, and the correlation is regarded as a structural constraint which is shown by formula (1):
Figure GDA0003527719500000027
wherein
Figure GDA0003527719500000028
Structural constraints representing the text regions i and j, all constraints for i being represented as
Figure GDA0003527719500000029
Then all text regions in the t-th frame are constrained to
Figure GDA00035277195000000210
The tracking task is to match the character region detection result to the existing character track, and set the p character region detection result information in the t frame as
Figure GDA00035277195000000211
Figure GDA00035277195000000212
Is the central coordinate of the detection result of the character area,
Figure GDA00035277195000000213
the width and height of the text region detection result are obtained. The set of the detection results of all the character areas in the t-th frame is
Figure GDA00035277195000000214
Wherein p ∈ Mt,MtThe number of the character areas is detected.
Using binary symbols a in the inventioni,pShowing the matching condition of the character track and the character area detection result, when the character track i is matched with the character area detection result p, ai,p1, otherwise ai,p0. For the detection result of the text track in the t-1 th frame and the text area in the t-1 th frame, the data matching condition is described by formula (2):
A=argminC(St-1,Rt-1,Dt) (2)
wherein A ═ { a ═ ai,p|i∈Nt-1,p∈MtIn the method, one character track is matched with at most one character area detection result, C (S)t-1,Rt-1,Dt) And representing all possible pairing sets of the text tracks and the text area detection results. And the best match result is the set minimum argminC (S)t-1,Rt-1,Dt)。
In successive frames, the mutual distance between the words with the same background does not change much. When the camera is in motion, the text should maintain a similar appearance to other text. In the character tracking, the method simultaneously considers the similarity of characters in adjacent frames and the similarity of appearance of other characters related to the characters, and the similarity of appearance of layout around the characters in the adjacent frames is layout constraint. Cost function C (S) based on layout constraintt-1,Rt-1,Dt) As shown in equation (3):
Figure GDA0003527719500000031
wherein
Figure GDA0003527719500000032
And
Figure GDA0003527719500000033
the difference cost value of the text region detected by the t-1 frame text track and the t-1 frame text track is shown, and the cost is calculated by using the region size ratio and the overlapping rate, as shown in formulas (4) and (5):
Figure GDA0003527719500000034
Figure GDA0003527719500000035
in the formula
Figure GDA0003527719500000036
And
Figure GDA0003527719500000037
respectively showing the width and height of the ith character area of the t-1 th frame character track and the detection result p of the t-th frame character area,
Figure GDA0003527719500000038
the area of the overlap between the ith character area of the t-1 th frame character track and the minimum external surrounding frame of the p area of the detection result of the t frame character area is shown,
Figure GDA0003527719500000039
indicating the combined area.
In the formula (3)
Figure GDA00035277195000000310
The structural constraint of the t-1 frame representing the detection result p of the t-th frame character area
Figure GDA00035277195000000311
Prediction region
Figure GDA00035277195000000312
The similarity between the appearance feature of (2) and the appearance feature of the jth character area in the corresponding t-1 frame character track, and the calculation formulas are shown as (6) and (7):
Figure GDA00035277195000000313
Figure GDA00035277195000000314
in the formula Hb(s) represents the RGB color space normalized histogram feature, FbIs the total number of features, b is the index,
Figure GDA00035277195000000315
including the center point coordinates of the predicted region location and the width and height of the predicted region.
2. Cost function optimization and solution
In order to simplify the calculation, the invention uses the formula (8) and the formula (9) to restrict the matching of the track and the detection result, and when the condition is not satisfied, the matching is regarded as ai,p0. Equations (8) and (9) are as follows:
Figure GDA00035277195000000316
Figure GDA00035277195000000317
in the formula D(s)a,sb) Denotes saAnd sbDistance between saAnd sbIndicates the state between two character regions (w)a,ha) Is the width and height of the text region a, (Δ u)a,b,Δva,b) For the text areas a and b transversely sum in the imageThe relative moving speed in the longitudinal direction is considered to be not matched when the inter-area distance and the relative speed are too large. In the present invention, τ is 10.
Finally, all the paired character region cost values can be calculated according to the formula (2), and an N is obtainedt-1×MtThe similarity matrix of (2). The method for the alignment scheme [ J ] was used in reference 1 "Kuhn H W]Naval Research logics, 1955, (1-2):83-97. "the proposed method can calculate the best match result. And obtaining a 2 multiplied by Q matrix by comparing the similarity, wherein the matrix is a matching matrix of the index number of the character track and the index number of the character area detection result, and Q is the matching number. By using the matching matrix, new space information (position coordinates and width and height) of the existing character track in the current frame can be updated, namely character area tracking of the current frame is completed. For example: the t-1 th frame has 3 character tracks being tracked, the t-1 th frame has 3 detected character areas, and a matching matrix obtained by calculation through the algorithm of the invention is shown as (10):
Figure GDA0003527719500000041
the first column of the matrix represents the index number of the character track, and the second column represents the index number of the character area detection result. It indicates that the 1 st text track corresponds to the 2 nd detected text area, the 2 nd text track corresponds to the 1 st detected text area, and the 3 rd text track corresponds to the 3 rd detected text area. And replacing the corresponding coordinates and width and height information of the 3 detected character areas of the t-th frame with the space information in the corresponding character track according to the matching matrix to finish the updating of the character track of the t-th frame.
3. Advantageous effects
The method can accurately track the character track in the video when the large-scale camera moves. The invention uses the known database Minetto in the field of text tracking for testing. The Minetto database comprises 5 segments of scene text videos, and the resolution of the video frames is 640 multiplied by 480. In the testing stage, the text detection results of the video and each frame of video image are input into the tracking algorithm, and the algorithm outputs the track of each text area in the video, namely the space information (position coordinates and width and height) of the text area in each frame. The effectiveness of the algorithm is measured by calculating three known evaluation indexes of multi-target tracking accuracy (MOTP), multi-target tracking accuracy (MOTA) and track index change times (IDS). Compared With the method in the document 2 "Pei W Y, Yang C, Meng L Y, et al. scene Video Text Tracking With Graph Matching [ J ]. IEEE Access,2018,6: 19419-: the MOTP index is improved by 6 percent, the MOTA is improved by 19 percent, and the IDS is doubled.
Drawings
FIG. 1 is a flow chart of a method for video text tracking based on layout constraints.
Detailed Description
Referring to fig. 1, the specific steps of the video text tracking method based on layout constraint provided by the present invention are as follows:
step 1: input video and text detection results
The invention is based on the video character detection result. Text detection can be divided into on-line detection and off-line detection. For on-line detection, firstly inputting a video, then detecting characters frame by frame or frame skipping, inputting a detection result into the invention for character tracking, then detecting characters of the next frame, and repeating the process until the video processing is finished. For offline detection, firstly, a video is input, then character detection is carried out until the video processing is finished, and finally, the video and the detection structure of each frame are input into the invention for character tracking. The tracking method provided by the invention can be simultaneously applied to on-line detection and off-line detection.
Step 2: text track initialization
Track initialization is carried out on the detection result of the first frame of the video, each detected character area is regarded as a new character track and index numbering is carried out, and then the states S of all character areas are calculatedtSpeed (u) in its statet,vt) First stageThe starting point is (0,0), and t is 1. And calculating the structural constraint R between every two character areas according to the formula (1)tAnd t is 1. Meanwhile, removing the structure constraint which does not meet the requirement according to constraint formulas (8) and (9), and recording the residual structure constraint R1And the character track state S1
And step 3: text track update
And in the text track updating stage, matching the text area detection result of the t-th frame with the existing text track in the t-1 th frame, and replacing the corresponding spatial information (position coordinates and width and height) of the matched text area detection result with the spatial information of the text area in the corresponding text track. The input of the stage is the character track state S of the t-1 th framet-1Structural constraint Rt-1And the t-th frame character region detection result DtAnd outputting the updated character track space information.
Step 3.1: data matching
Matching the character track of the t-1 th frame with the character area detection result of the t-1 th frame to form Nt-1×MtAnd (4) combining the pairs. Then, the cost values of all pairs are calculated by formula (3) to obtain Nt-1×MtThe similarity matrix of (2). Before using the formula (3), the constraint judgment is first performed by the formula (8), and when the condition is not satisfied, the calculation of the formula (3) is skipped, and the pairing cost value is set to 999. The method for the alignment scheme [ J ] was used in reference 1 "Kuhn H W]Naval Research logics, 1955, (1-2):83-97. "the proposed method can calculate the best match result. The result is a 2 × Q matrix, which is a matching matrix of the index number of the text track and the index number of the text area detection result, wherein Q is the matching number.
Step 3.1: updating matches to tracks
If the text trajectory matches the current text region detection result, then use document 3 "Links I K F. an Introduction to the Kalman Filter [ J ]]1995. "kalman filtering algorithm uses text region detection results
Figure GDA0003527719500000051
For in-track state
Figure GDA0003527719500000052
Updating and updating the normalized color histogram of the text region
Figure GDA0003527719500000053
Obtain a new state St
Step 3.2: updating unmatched tracks
The existing character detection algorithm often has a missing detection phenomenon, so that a character track cannot be matched with a character area detection result. At this time, the updated character track state S is utilizedtAnd structural constraint R of t-1 framet-1The unmatched text tracks are predicted using equation (11), which is shown below:
Figure GDA0003527719500000061
wherein N isrFor the number of matching to the text tracks, (x, y) is the text region center coordinate, and (Δ x, Δ y) is the center coordinate distance difference in the structural constraint. And replacing the old coordinates which are not matched with the character track with the predicted area center coordinates, recording the replacement times, and when the replacement times are more than 3, determining that the character track disappears, and deleting the track information from the character track.
Step 3.3: initializing new tracks
If the t frame character area detection result p cannot be matched with any character track, a new character track is considered to appear, and a new track state is established
Figure GDA0003527719500000062
And adds the new track status to the existing track.
Step 3.4: updating character track structure constraints
And calculating the structural constraint between every two character tracks according to the formula (1). Meanwhile, removing the structure constraint which does not meet the requirement according to constraint formulas (8) and (9), and recording the residual structureConstraint Rt
Step 3.5: outputting an update track
And outputting the updated character track space information (position coordinates and width and height) of the t-th frame, and recording and updating the survival times of all tracks.
And 4, step 4: outputting character track information
And repeating the step 3 until the video processing is completed, wherein the text usually disappears after a period of time exists in the video according to the time redundancy characteristic of the text in the video. The invention filters the non-character area by using the time redundancy characteristic, judges the non-character area of the track when the survival frequency of the track is less than or equal to 15 frames, and deletes the character track information. And finally outputting the residual character track information after filtering.

Claims (1)

1. The video character tracking method based on layout constraint is characterized in that: the method comprises the steps of data matching cost function and cost function optimization and solving, and specifically comprises the following steps:
(1) data matching cost function:
firstly, defining a character area contained in a character track in a current frame; let the status of the ith character area in the t-th frame character track of the video be
Figure FDA0003527719490000011
Wherein
Figure FDA0003527719490000012
Is the abscissa and ordinate of the central point of the character area,
Figure FDA0003527719490000013
the speed of movement of the region in the image in the lateral and longitudinal directions,
Figure FDA0003527719490000014
the width and height of the text area,
Figure FDA0003527719490000015
extracting RGB color histograms for the color features of the character region, wherein each channel has 16 feature regions, and the number of the three channels is 48; the status of the character area in all the character tracks in the t-th frame is set as
Figure FDA0003527719490000016
Where i ∈ Nt,NtRepresenting the number of character areas in the t-th frame;
for every two character areas, the correlation between the position and the speed needs to be established, and the correlation is regarded as a structural constraint which is shown by formula (1):
Figure FDA0003527719490000017
wherein
Figure FDA0003527719490000018
Structural constraints representing the text regions i and j, all constraints for i being represented as
Figure FDA0003527719490000019
Then all text regions in the t-th frame are constrained to
Figure FDA00035277194900000110
The tracking task is to match the character region detection result to the existing character track, and set the p character region detection result information in the t frame as
Figure FDA00035277194900000111
Figure FDA00035277194900000112
Is the central coordinate of the detection result of the character area,
Figure FDA00035277194900000113
the width and height of the text region detection result are obtained, and the set of all text region detection results in the t-th frame is
Figure FDA00035277194900000114
Wherein p ∈ Mt,MtDetecting the number of the character areas;
binary symbol ai,pShowing the matching condition of the character track and the character area detection result, when the ith character area in the character track is matched with the character area detection result p, ai,p1, otherwise ai,p0; for the detection result of the text track in the t-1 th frame and the text area in the t-1 th frame, the data matching condition is described by formula (2):
A=argminC(St-1,Rt-1,Dt) (2)
wherein A ═ { a ═ ai,p|i∈Nt-1,p∈MtAt most, a text track matches a text area detection result, C (S)t-1,Rt-1,Dt) Representing all possible pairing sets of the character track and the character area detection result; and the best match result is the set minimum argminC (S)t-1,Rt-1,Dt);
In the continuous frames, the mutual distance between the characters with the same background does not change too much; when the camera is in motion, the text should maintain a similar appearance to other text; in the character tracking, the similarity of characters in adjacent frames and the similarity of other character appearances related to the characters are simultaneously considered, and the similarity of the layout appearances around the characters in the adjacent frames is the layout constraint; cost function C (S) based on layout constraintt-1,Rt-1,Dt) As shown in equation (3):
Figure FDA0003527719490000021
wherein
Figure FDA0003527719490000022
And
Figure FDA0003527719490000023
representing the difference cost value of the text track of the t-1 th frame and the text area detected by the t-1 th frame, and calculating the cost by using the area size ratio and the overlapping rate, as shown in formulas (4) and (5):
Figure FDA0003527719490000024
Figure FDA0003527719490000025
in the formula
Figure FDA0003527719490000026
And
Figure FDA0003527719490000027
respectively showing the width and height of the ith character area of the t-1 th frame character track and the detection result p of the t-th frame character area,
Figure FDA0003527719490000028
the area of the minimum external surrounding frame overlapping of the ith character area of the t-1 th frame character track and the p area of the detection result of the t frame character area is shown,
Figure FDA0003527719490000029
represents the combined area;
in the formula (3)
Figure FDA00035277194900000210
The structural constraint of the t-1 frame representing the detection result p of the t-th frame character area
Figure FDA00035277194900000211
Prediction region
Figure FDA00035277194900000212
The similarity between the appearance feature of (2) and the appearance feature of the jth character area in the corresponding t-1 frame character track, and the calculation formulas are shown as (6) and (7):
Figure FDA00035277194900000213
Figure FDA00035277194900000214
in the formula Hb(s) represents the RGB color space normalized histogram feature, FbIs the total number of features, b is the index,
Figure FDA00035277194900000215
the center point coordinate containing the position of the prediction area and the width and the height of the prediction area are included;
(2) optimizing and solving a cost function:
in order to simplify the calculation, matching of the trajectory and the detection result is constrained using formula (8) and formula (9), and when the condition is not satisfied, it is regarded as ai,p0; equations (8) and (9) are as follows:
Figure FDA00035277194900000216
Figure FDA00035277194900000217
in the formula saAnd sbThe state between two character areas is shown, when the distance between the areas and the relative speed are too large, the two areas are considered not to be matched, and the value of tau is 10;
finally, calculating the cost values of all the matched character areas according to the formula (2) to obtainTo a Nt-1×MtThe similarity matrix of (2 x Q) is obtained by comparing the similarity, the matrix is a matching matrix of the index number of the character track and the index number of the character area detection result, wherein Q is the matching number, and the matching matrix can be used for updating the new position coordinate and width and height of the existing character track in the current frame, namely completing character area tracking of the current frame.
CN201910006843.5A 2019-01-04 2019-01-04 Video character tracking method based on layout constraint Active CN109800757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910006843.5A CN109800757B (en) 2019-01-04 2019-01-04 Video character tracking method based on layout constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910006843.5A CN109800757B (en) 2019-01-04 2019-01-04 Video character tracking method based on layout constraint

Publications (2)

Publication Number Publication Date
CN109800757A CN109800757A (en) 2019-05-24
CN109800757B true CN109800757B (en) 2022-04-19

Family

ID=66558550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910006843.5A Active CN109800757B (en) 2019-01-04 2019-01-04 Video character tracking method based on layout constraint

Country Status (1)

Country Link
CN (1) CN109800757B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297875B (en) * 2020-02-21 2023-09-29 华为技术有限公司 Video text tracking method and electronic equipment
CN114463376B (en) * 2021-12-24 2023-04-25 北京达佳互联信息技术有限公司 Video text tracking method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276416A (en) * 2008-03-10 2008-10-01 北京航空航天大学 Text tracking and multi-frame reinforcing method in video
TW201039149A (en) * 2009-04-17 2010-11-01 Yu-Chieh Wu Robust algorithms for video text information extraction and question-answer retrieval
CN104244073A (en) * 2014-09-26 2014-12-24 北京大学 Automatic detecting and recognizing method of scroll captions in videos
WO2015165524A1 (en) * 2014-04-30 2015-11-05 Longsand Limited Extracting text from video
CN107545210A (en) * 2016-06-27 2018-01-05 北京新岸线网络技术有限公司 A kind of method of video text extraction
CN108052941A (en) * 2017-12-19 2018-05-18 北京奇艺世纪科技有限公司 A kind of news caption tracking and device
CN108229476A (en) * 2018-01-08 2018-06-29 北京奇艺世纪科技有限公司 Title area detection method and system
CN108256493A (en) * 2018-01-26 2018-07-06 中国电子科技集团公司第三十八研究所 A kind of traffic scene character identification system and recognition methods based on Vehicular video
CN108363981A (en) * 2018-02-28 2018-08-03 北京奇艺世纪科技有限公司 A kind of title detection method and device
CN108694393A (en) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 A kind of certificate image text area extraction method based on depth convolution

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276416A (en) * 2008-03-10 2008-10-01 北京航空航天大学 Text tracking and multi-frame reinforcing method in video
TW201039149A (en) * 2009-04-17 2010-11-01 Yu-Chieh Wu Robust algorithms for video text information extraction and question-answer retrieval
WO2015165524A1 (en) * 2014-04-30 2015-11-05 Longsand Limited Extracting text from video
CN104244073A (en) * 2014-09-26 2014-12-24 北京大学 Automatic detecting and recognizing method of scroll captions in videos
CN107545210A (en) * 2016-06-27 2018-01-05 北京新岸线网络技术有限公司 A kind of method of video text extraction
CN108052941A (en) * 2017-12-19 2018-05-18 北京奇艺世纪科技有限公司 A kind of news caption tracking and device
CN108229476A (en) * 2018-01-08 2018-06-29 北京奇艺世纪科技有限公司 Title area detection method and system
CN108256493A (en) * 2018-01-26 2018-07-06 中国电子科技集团公司第三十八研究所 A kind of traffic scene character identification system and recognition methods based on Vehicular video
CN108363981A (en) * 2018-02-28 2018-08-03 北京奇艺世纪科技有限公司 A kind of title detection method and device
CN108694393A (en) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 A kind of certificate image text area extraction method based on depth convolution

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video;Liang Wu等;《IEEE Transactions on Multimedia》;20150831;第17卷(第8期);1137-1152 *
Text Detection, Tracking and Recognition in Video: A Comprehensive Survey;Xu-Cheng Yin等;《IEEE Transactions on Image Processing》;20160630;第25卷(第6期);2752-2773 *
基于多帧图像的视频文字跟踪和分割算法;密聪杰等;《计算机研究与发展》;20061231;第43卷(第9期);1523-1529 *
网络视频字幕提取识别系统的设计与实现;刁月华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150915;第2015年卷(第9期);I138-1370 *

Also Published As

Publication number Publication date
CN109800757A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
Tokmakov et al. Learning motion patterns in videos
KR102235745B1 (en) Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
US10068137B2 (en) Method and device for automatic detection and tracking of one or multiple objects of interest in a video
CN109064484B (en) Crowd movement behavior identification method based on fusion of subgroup component division and momentum characteristics
CN106778712B (en) Multi-target detection and tracking method
CN102741884A (en) Mobile body detection device and mobile body detection method
CN109598735A (en) Method using the target object in Markov D-chain trace and segmented image and the equipment using this method
CN109800757B (en) Video character tracking method based on layout constraint
CN111161309B (en) Searching and positioning method for vehicle-mounted video dynamic target
CN116883458B (en) Transformer-based multi-target tracking system fusing motion characteristics with observation as center
CN114240997A (en) Intelligent building online cross-camera multi-target tracking method
Lee et al. SSPNet: Learning spatiotemporal saliency prediction networks for visual tracking
Zhou et al. Uhp-sot: An unsupervised high-performance single object tracker
Song et al. Cross-view contextual relation transferred network for unsupervised vehicle tracking in drone videos
Crivelli et al. Robust optical flow integration
Wang et al. Real-time UAV tracking based on PSR stability
Berger et al. Subspace tracking under dynamic dimensionality for online background subtraction
Hu et al. Entry-flipped transformer for inference and prediction of participant behavior
Zhou et al. Uhp-sot++: An unsupervised lightweight single object tracker
CN110942463B (en) Video target segmentation method based on generation countermeasure network
Long et al. Two-stream based multi-stage hybrid decoder for self-supervised multi-frame monocular depth
CN116630850A (en) Twin target tracking method based on multi-attention task fusion and bounding box coding
Delibaşoğlu PESMOD: Small moving object detection benchmark dataset for moving cameras
Zhang et al. AIPT: Adaptive information perception for online multi-object tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Feng Xiaoyi

Inventor after: Song Zhendong

Inventor after: Wang Xihan

Inventor after: Jiang Xiaoyue

Inventor after: Xia Zhaoqiang

Inventor after: Peng Jinye

Inventor after: Xie Hongmei

Inventor after: Li Huifang

Inventor after: He Guiqing

Inventor before: Feng Xiaoyi

Inventor before: Wang Xihan

Inventor before: Jiang Xiaoyue

Inventor before: Xia Zhaoqiang

Inventor before: Peng Jinye

Inventor before: Xie Hongmei

Inventor before: Li Huifang

Inventor before: He Guiqing

Inventor before: Song Zhendong

GR01 Patent grant
GR01 Patent grant