CN107545210A - A kind of method of video text extraction - Google Patents
A kind of method of video text extraction Download PDFInfo
- Publication number
- CN107545210A CN107545210A CN201610479702.1A CN201610479702A CN107545210A CN 107545210 A CN107545210 A CN 107545210A CN 201610479702 A CN201610479702 A CN 201610479702A CN 107545210 A CN107545210 A CN 107545210A
- Authority
- CN
- China
- Prior art keywords
- text
- video
- image
- character
- submodule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of method of video text extraction, including:By video signal process into single camera lens;The particular location of candidate's text is detected and oriented in single sequence of frames of video;On the basis of String localization, text is tracked inside video lens, obtains text filed sequence of the one text object in continuous multiple frames;The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain binaryzation text image;Text identification is carried out to the text image after binaryzation, obtains the character string information of text.Using technical scheme provided by the invention, the text in video preferably can be detected and separated from complicated polygon background, improve system effectiveness, and improve the quality of text, so as to be favorably improved text identification rate.
Description
Technical field
The present invention relates to based on image and areas of information technology, and in particular to a kind of method of video text extraction.
Background technology
In order to extract text from frame of video it is necessary to first find the frame comprising text, the position of text is then determined again.It is logical
In Chang Yi video frequency program, not every frame of video all includes text, sometimes in a video frequency program, it is possible to big portion
The frame divided does not all include text.Existing technology, typically each frame of video is detected, then to each frame of video
Matched, so as to remove the result of repetition.Video in one video is a lot, therefore so processing can consume largely
Time.
The content of the invention
In view of this, it is an object of the present invention to provide a kind of method of video text extraction, by Video segmentation Cheng Jing
Head, text then is detected inside each camera lens again, be advantageous to simplify problem, and can is located parallel after being divided into camera lens
Reason, improve the efficiency of system.In order to which some aspects of the embodiment to disclosure have a basic understanding, shown below is simple
Summary.The summarized section is not extensive overview, nor to determine key/critical component or describe these embodiments
Protection domain.Its sole purpose is that some concepts are presented with simple form, in this, as the preamble of following detailed description.
It is an object of the present invention to provide a kind of video text extraction method, including:
By video signal process into single camera lens;
The particular location of candidate's text is detected and oriented in single sequence of frames of video;
On the basis of String localization, text is tracked inside video lens, obtains one text object even
The text filed sequence of continuous multiframe;
The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain
Binaryzation text image;
Text identification is carried out to the text image after binaryzation, obtains the character string information of text.
In some optional embodiments, the positioning, following steps are specifically included:
Pretreatment, coarse positioning, projection cutting and screening.
In some optional embodiments, the tracking, in the following ways:
Position judgment, sequential judge and safeguarded that tracking array judges.
In some optional embodiments, also include pre-segmentation before the segmentation, specifically include:
When the video image is coloured image, the video image is converted into gray level image;
Binary conversion treatment is carried out to text block region image, the character and background in separate picture are to determine word
Accord with border;
Connected domain analysis is carried out to the binary image of generation, to obtain the positions and dimensions information of character stroke.
In some optional embodiments, in addition to:
The space of a whole page of video image is analyzed, the text feature in video image is obtained, obtained text message is entered
Row tissue, classification.
The present invention also aims to provide a kind of system of video text extraction, it is characterised in that including:
Video lens split module, for by video signal process into single camera lens;
String localization module, for detecting and orienting the particular location of candidate's text in single sequence of frames of video;
Text tracking module, on the basis of String localization, being tracked inside video lens to text, obtain
Text filed sequence of the one text object in continuous multiple frames;
Enhancing and binarization block, the text sequence for being obtained using text tracking are strengthened text, suppress the back of the body
Scape, binaryzation is then carried out, obtain binaryzation text image;
Text identification module, for carrying out text identification to the text image after binaryzation, obtain the character string of text
Information.
In some optional embodiments, the String localization module, specifically include following:
Pre-process submodule, coarse positioning submodule, projection cutting submodule and screening submodule.
In some optional embodiments, the text tracking module, in the following ways:
Position judgment submodule, sequential judging submodule and maintenance tracking array judging submodule.
In some optional embodiments, in addition to pre-segmentation module, specifically include:
Submodule is converted, for being coloured image when the video image, the video image is converted into gray level image;
Submodule is separated, for carrying out binary conversion treatment to text block region image, the word in separate picture
Accord with background to determine character boundary;
Submodule is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke
And dimension information.
In some optional embodiments, in addition to:
Printed page analysis module, for analyzing the space of a whole page of video image, the text feature in video image is obtained, will
Obtained text message carries out tissue, classification.
Using the method for the present invention, there is following effect:
Text in video preferably can be detected and separated from complicated polygon background, improve system effect
Rate, and the quality of text is improved, so as to be favorably improved text identification rate.
For above-mentioned and related purpose, one or more embodiments include will be explained in below and in claim
In the feature that particularly points out.Following explanation and accompanying drawing describe some illustrative aspects in detail, and its instruction is only
Some modes in the utilizable various modes of principle of each embodiment.Other benefits and novel features will be under
The detailed description in face is considered in conjunction with the accompanying and becomes obvious, the disclosed embodiments be will include all these aspects and they
Be equal.
Fig. 1 is a kind of method flow diagram of video text extraction provided by the invention;
Fig. 2 is that a kind of system of video text extraction provided by the invention forms schematic diagram.
Embodiment
The following description and drawings fully show specific embodiments of the present invention, to enable those skilled in the art to
Put into practice them.Other embodiments can include structure, logic, it is electric, process and other change.Embodiment
Only represent possible change.Unless explicitly requested, otherwise single component and function are optional, and the order operated can be with
Change.The part of some embodiments and feature can be included in or replace part and the feature of other embodiments.This hair
The scope of bright embodiment includes the gamut of claims, and claims is all obtainable equivalent
Thing.Herein, these embodiments of the invention can individually or generally be represented that this is only with term " invention "
For convenience, and if in fact disclosing the invention more than one, the scope for being not meant to automatically limit the application is to appoint
What single invention or inventive concept.
Embodiment one
A kind of method of the video text extraction of text is present embodiments provided, is comprised the following steps:
S101, by video signal process into camera lens one by one so that step S104 text tracking can be inside camera lens
Carry out, without being carried out in whole video sequence, problem can be simplified, and can support parallel String localization and with
Track, greatly improve system effectiveness.
So-called camera lens, refer to the sequence of video images that a camera operation of video camera is recorded.Shot boundary is two
The result that camera lens switches over, it is that the content of video is changed, that is to say, that shot boundary has reacted video content not
Continuity.
S102, the particular location of candidate's text is detected and oriented in single frame of video;False text is reduced as far as possible
The number in region.
S103, on the basis of String localization, text is tracked inside video lens, obtains one text object
In the text filed sequence of continuous multiple frames;It is ready for Text enhancement below.Another effect of text tracking is can be with
It was found that and exclude some do not have continuous multiple frames occur it is false text filed.
S104, the text sequence obtained using text tracking are strengthened text, are suppressed background, are then carried out two-value
Change, obtain the clean clearly binaryzation text image of comparison, be ready for last text identification.
S105, text identification is carried out to the text image after binaryzation, obtains the character string information of text.
Embodiment two
It is contemplated that the text message of structuring is extracted from video frequency program.Key link is divided into:Segmentation, positioning, with
Track, identification and printed page analysis.
System handles video using the strategy of " semi-automatic ", i.e.,:While providing a system to processed video, it is also desirable to carry
For corresponding configuration file.
Video frequency program species is various, the making style of different programs, such as character stroke width whether uniformly, character and background
Color contrast, the arrangement feature of character etc., differ greatly, the general text suitable for all video text messages can not be found
Eigen and processing method.
The specific aim of algorithm and its parameter in text-processing is very strong.One algorithm energy under some specific application environment
Enough reach very high performance, but after environment change degradation, it is necessary to using other suitable algorithms;Meanwhile for same
Individual algorithm, handling different types of video needs different parameter configurations.
One grade of program has fixed making style within very long one period, by observing and testing, it is determined that processing one
All videos of this grade of program just can be correctly handled after the most suitable configuration item of shelves program, accuracy rate and processing speed can
Meet actual demand.If the making style change of program, can just be adapted to quickly by changing configuration item.
For new video frequency program, by adding the configuration file of the program, system can handle all of the program
Video, extension are convenient and swift.
A kind of method of the video text extraction of text is present embodiments provided, is comprised the following steps:
S201, pretreatment
Pretreatment includes converting gray level image, binaryzation and connected domain analysis.In candidate's text area that positioning link obtains
Area image is coloured image, and gray level image is used in binaryzation and character recognition, it is therefore desirable to is changed.Conversion method
Have:
(1) extract light intensity level [DIP].
(2) some Color Channel (R, G and B) of coloured image is extracted, on the Color Channel between character and background
Intensity contrast it is most obvious.
(3) converting colors space, change the distance between different colours metric form [Color], obtain character and background
Between the obvious gray level image of intensity contrast.
(4) color strengthens.Respectively specify that the one or more of character and background represent color, using K averages [KMeans]
Method the pixel on coloured image is clustered, while the luminance component of pixel is extracted as gray level image, in ash
Strengthen character pixels on degree image, suppress background pixel, increase the intensity contrast between character and background.
In actual applications, should be according to the characteristics of video image, the color contrast relation especially between character and background,
Appropriate conversion method is configured, improves the effect of follow-up binary conversion treatment.
The character and background that binaryzation is used in separate picture, to determine that character boundary lays the foundation.Binarization methods are
An important direction being widely studied in OCR fields, has pointed out many algorithms [Bin] at present.Algorithm used herein
Have:
(1) global Binarization methods:Ostu[Ostu],Kittler[Kittler].
(2) local binarization algorithm:Niblack[Niblack]、Sauvola[Sauvola][Fast].
A kind of Binarization methods are only suitable for solving the problems, such as in some cases, needing in the application according to pending video figure
As quality condition, from different algorithms.
Connected domain analysis to be carried out to the bianry image of generation, to obtain the positions and dimensions information of character stroke.Connection
Domain analysis includes three parts content:Connected domain demarcation, screening and merging.Connected domain demarcation is to reflect pixel in bianry image
Connected relation between point, the algorithm in document [Label] used herein.After demarcation, it can obtain in bianry image
The information such as position, size and the pixel number of each connected region.In connected domain screening, design rule, remove those
Irrational connected domain in the features such as position, size, shape, dutycycle, interference is reduced for subsequent treatment, is laid the foundation.By
It is usually to be made up of multiple scattered strokes in chinese character, if do not merged it reasonably, be able to will influences to split
The selection of point.Connected domain merges algorithm with reference to relevant content in [Liu].
S202, by video signal process into camera lens one by one so that step S104 text tracking can be inside camera lens
Carry out, without being carried out in whole video sequence, problem can be simplified, and can support parallel String localization and with
Track, greatly improve system effectiveness.
S203, the particular location of candidate's text is detected and oriented in single frame of video;False text is reduced as far as possible
The number in region.
The target of positioning is that text filed position is determined in video image.Whole flow process is divided into four parts:Pretreatment,
Coarse positioning, projection cutting and screening.Pretreatment includes calculating stroke response and color cluster, and the former is uniform according to character stroke
Feature protrudes character, and the latter protrudes character according to the color characteristic of character, and one of which processing stream is selected according to configuration item
Journey.Coarse positioning according to character arrangements it is intensive the characteristics of it is text filed to detect, obtain its approximate location.Projection cutting will detect
Multiline text split into single file text, obtain text filed more accurate border, be easy to subsequent singulation.Carried in checking link
Text filed feature is taken, screens false-alarm.
1st, pre-process
Chinese character, English alphabet and numeral are all artificial symbols, and their common feature is exactly that stroke width is uniform, and
Natural forms does not often possess this feature, so can strengthen text by calculating stroke response, suppresses background.Calculate stroke
The step of response:
(1) spacing of stroke response is determined according to configuration file.When calculating stroke response, the pen of spacing and detected character
It is related to draw width.The stroke width of video character all relatively, but also has the larger video of character stroke width, prior basis
Observation sets suitable spacing to obtain preferable result.
(2) stroke response is calculated.Polarity in configuration file determines to need to calculate bright stroke response or dark stroke
Response, bright (dark) stroke response is horizontally and vertically being calculated in both direction respectively, is being rung using its maximum as final stroke
Should.
(3) binaryzation.Text filed stroke response is larger, and the stroke response of background area is smaller, passes through observation
Stroke response lag is set, binaryzation [DIP] is carried out to stroke response diagram, the region more than threshold value is foreground point, on the contrary then be
Background dot.
(4) expansive working [DIP] is carried out to obtained bianry image, to connect the stroke of some disconnections.
The stroke width fluctuation of character is larger in some cases, if character is characters in a fancy style, stroke feature unobvious, now
It can be positioned using the color characteristic of character, step is as follows:
(1) color cluster is carried out to image according to configuration file, character color and background color is indicated in configuration file,
Cluster uses K mean algorithms [KMeans].
(2) in cluster result, a kind of or a few class for being under the jurisdiction of character color is set to foreground point, other classes are set to the back of the body
Sight spot, generate bianry image.
(3) expansive working [DIP] is carried out to obtained bianry image, to connect the stroke of some disconnections.
2nd, coarse positioning
On bianry image, text filed approximate location is obtained by coarse positioning first, then carried out inside region
It is accurately positioned.Coarse positioning step has:
(1) connected domain is demarcated.After connected domain demarcation [Label], each connected region in bianry image can be obtained
The features such as position, size and pixel number.
(2) determine text filed.According to the geometrical constraint of real text block, such as:Size, arrangement position etc., determine potential
Text filed [Geometry], including determine single text filed and multiple text filed horizontally or vertically on direction
Merge.
3rd, cutting is projected
Often occurs multiline text in video image, multiline text is often detected as a text in rough detection
Block, reason have two aspects:
(1) background area between adjacent rows text has uniform width, can occur on stroke response diagram larger
Numerical value, in binaryzation, this part background area can be retained, so as to cause to be sticked together between multiline text.
(2) expansion link will also result in adhesion.
Subsequent singulation link requires that text filed is single file text, needs potential multiline text being cut into herein multiple
Single file text.Herein in units of connected domain, using the method for projection cutting [Slice1], effectively solve the viscous of multiline text
The adhesion of company and in some cases text and its ambient background, it is ensured that the candidate region after cutting is single file text.
4th, screen
False-alarm be present, it is necessary to be verified in candidate's text block that above-mentioned processing obtains, step is as follows:
(1) verified according to text filed geometric properties.Candidate's text block after projecting cutting is all single file
Text, some false-alarms can be screened out according to features such as the height, the ratio of width to height, area of text block.
(2) verified according to stroke response.Real text filed middle stroke is intensive, and calculating candidate is text filed to be put down
Equal stroke response, screens out the less region of average value.
(3) verified according to characteristics of gradient change.For character stroke trend compared with horn of plenty, orientation consistency is poor, calculates
Grad [DIP] of the text filed gray level image on eight directions, column hisgram of going forward side by side statistics, according to the direction one of gradient
Cause property can screen out some false-alarms.
Checking link can screen out most of false-alarm in positioning result, still can be according to currently obtaining tracking and splitting link
The information sifting false-alarm obtained.
S204, on the basis of String localization, text is tracked inside video lens, obtains one text object
In the text filed sequence of continuous multiple frames;It is ready for Text enhancement below.Another effect of text tracking is can be with
It was found that and exclude some do not have continuous multiple frames occur it is false text filed.
In video, text block would generally continue for some time, therefore one text block is in consecutive numbers frame even hundreds of frames
It can be all positioned on image.If all being split to each positioning result, being identified, substantial amounts of processing time can be wasted.Adopt
With the method for tracking, occurring into the period to disappear only carrying out once splitting, identifying to one text block, so as to avoid weight
Multiple processing.Also, the beginning and ending time of text block and disappearance mode are all the important evidences of printed page analysis link.
Tracking link includes position judgment, sequential judges and safeguarded array three parts.Position judgment and sequential judge difference
The analyzing and positioning result in terms of whether position is overlapping and whether content continues two, safeguarding that track array link patrols according to processing
Volume, provide text-independent block.
1st, position judgment
The position that one text block occurs on front and rear two field picture immobilizes, and the text block position obtained during positioning is mutual
It is overlapping, and the position that different text blocks occurs on front and rear two field picture is different, will not overlap, therefore, location overlap is
Two text blocks for positioning to obtain on frame before and after judgement whether be one text block necessary condition.Position relationship has four kinds:Solely
It is vertical, short weight is folded, overlapping and comprising the proportion according to shared by the area of two text block overlapping regions in text block, which is made, to be sentenced
It is disconnected.If independence or short weight are folded, then illustrate it is not related in position, without determining whether again;It is if overlapping
Or comprising then explanation may be from same text block, it is necessary to determine whether.
Another problem of position judgment is:According to the position of text block on front and rear frame, it is determined that needing the text block tracked
Border.Because videotext is superimposed upon on image, influenceed by background object, one text block may be in a few frames
Locating rectangle on image is bigger, and comprising substantial amounts of background area, or the locating rectangle on a few two field pictures is smaller,
Text block is not included completely, and these other error results will influence final positioning result.Meanwhile for majority just
True positioning result, the position error of text block boundary are gradually accumulated, and final tracking result can be made to become inaccurate.Therefore with
Track process should have the ability of the self-control to text block boundary.Step is as follows:
(1) if former and later two text blocks are inclusion relations, the border of the larger text block of area is taken as new border;
(2) upper left of two text blocks and bottom right vertex are combined, obtain four candidate rectangles;
(3) the average stroke response of four rectangles, maximizing r are calculatedmax;
(4) four rectangles are sorted from big to small according to area;
(5) since the maximum rectangle of area, if its average stroke response rX≥α×rmax, then the rectangle border make
For the border of text block, next rectangle is otherwise considered.
2nd, sequential judges
Sequential judges to be whether two text blocks for judging to navigate on consecutive frame from picture material come from same text
This.Sequential relationship has four kinds:
(1) keep.Text in front and rear two field pictures does not change.
(2) replace.Text in previous frame image is replaced by the new text in latter two field picture, and content of text is different.
(3) disappear.Text in previous frame image disappears.
(4) false-alarm.Position to obtain in previous frame image it is text filed be noise.
In the case where text position is fixed, the difference of two squares aggregate-value [SSD] of front and rear frame gray level image is judged in text
Hold the effective standard whether to change.If being not added with distinguishing to the pixel of text filed inside, whole region is calculated
Difference of two squares aggregate-value, then judged result easily by background change influenceed, it is larger only to compare those stroke responses herein
Pixel, these points are all located on character stroke, make the algorithm more stable.According to the gray scale difference between two text blocks
Different and stroke response difference carries out sequential judgement.
3rd, tracking array is safeguarded
In order to track the text block occurred in video, it is necessary to safeguard a tracking array.Maintenance task is as follows:
(1) to emerging text block on present frame, it is located result and is added to array;
(2) to the text block of lasting appearance, the element is kept in array;
(3) to the text block of disappearance, beginning and ending time and the disappearance mode of text block is determined, is found out in its beginning and ending time
Top-quality piece image, segmentation link is submitted to, the element is then deleted from array.
Another task for safeguarding tracking array is in the multiple image persistently occurred from text block, is picked out best in quality
A frame, submit to segmentation link, so contribute to reduce segmentation link difficulty, improve final recognition correct rate.This
It is a kind of mode [Survey] of video multiple image enhancing.It was observed that, the gradient distribution of text block image can in an experiment
Approximation reflection picture quality, the big image of gradient mean value, the contrast between character and background is obvious, gradient kurtosis
[Statistic] big image, graded are concentrated, image clearly.Election process is as follows:
(1) in positioning link calculated the Grad of each text block during text block checking, calculate the text
The average and kurtosis of the gradient segmentation of block;
(2) 5 maximum two field pictures of average are retained in the buffer;
(3) after text block disappears or is replaced, the maximum image output of kurtosis is chosen in 5 two field pictures of caching.
S205, the text sequence obtained using text tracking are strengthened text, are suppressed background, are then carried out two-value
Change, obtain the clean clearly binaryzation text image of comparison, be ready for last text identification.
S206, text identification is carried out to the text image after binaryzation, obtains the character string information of text.
S207, printed page analysis
The text species included in video is various, and different types of text implication is different, as shown below, the text in region
Originally include:The types such as title, subtitle, station symbol, adjunct, scroll bar.In video search and video automated cataloging, it is necessary to from
The text message of structuring is extracted in video, text type is the feature of equal importance with content of text.
The target of printed page analysis is that careful, accurate tissue and classification, export structure are carried out to it according to text feature
Text message, to meet the needs of different application aspect, including collect feature, text organizational and text classification three parts.
The temporal aspect of text block is used in printed page analysis, and temporal aspect has been handled in one section of program and just can determine that, therefore use
The mode of processed offline, i.e.,:Printed page analysis is just carried out after one section of program has been handled.
1st, feature is collected
The text feature used in printed page analysis is divided into 7 classes:
(1) polarity.Polarity reflects the shade relativity of text filed middle character and background, if polarity is 0 expression
Light background dark color character, polarity are 1 expression dark-background light color character.Segmentation link can utilize algorithm automatic decision text
Polarity [Segment];Polarity can also be provided in configuration file, and instructs to split with this.
(2) color.Color includes character color and background color.In some cases, it is not of the same race to be not enough to differentiation for polarity
The text of class, as under red background white and yellow character polarity be all 1, at this moment just need consider colouring information.Splitting
Link, binaryzation can determine text filed foreground point and background dot afterwards, the average color of character and background counted with this.
(3) character size.Character size includes the mean breadth and height of single character in line of text.In segmentation link
In, the width and height of single character can be obtained after carrying out pre-segmentation, the average width of single character in line of text is counted with this
Degree and height.
(4) text block position.Text block position includes the upper and lower of text block, right boundary.It can be provided in tracking link
The approximate bounds of text block, but inaccurately, adjacent different types of text block easily overlaps, in segmentation link according to most
Whole recognition result and pre-segmentation are as a result, it is possible to determine the accurate location of text block.
(5) recognition result.The character string that text block image obtains after over-segmentation, identification, provided in segmentation link.
(6) beginning and ending time of text block.At the time of text block appearing and subsiding, provided in tracking link.
(7) sequential relationship of text block.In tracking link, carry out providing four kinds of relations during sequential judgement:Keep, disappear,
Replacement and false-alarm, belong to text block has two kinds:Disappear and replace.
This 7 category feature is the basis of printed page analysis, in subsequent treatment, it should according to the characteristics of processed video, flexibly
Assemblage characteristic and design rule, ununified handling process.
2 text organizationals
Text organizational includes two parts:(1) on same two field picture multiline text merging;(2) it is same on continuous multiple frames image
The merging of one text block.
After projecting cutting, the text block of processing is all single file text, and these single file texts may need to combine
Complete implication could be expressed, such as:The headline of multirow.On same two field picture, according to the position of text block, character chi
The information such as very little, color, with reference to the characteristics of processed video, the single file text of spatial dispersion is combined into complete logic list
Position.
In some cases, the text continuously occurred, which may need to combine, could express complete implication, Huo Zhetong
One text occurs repeatedly, such as intermittently:Headline.This just needs the recognition result, character size, face according to text block
The information such as color, the text combination disperseed on the time is turned into complete logic unit.
3 text classifications
In different video frequency programs, the form of expression of text is different., can be total by observation for a kind of program
Knot draws the rule of some text classifications, but in another kind of program, rule may no longer be set up.Therefore, text classification does not have
Specific unified handling process, solving the step of problem is:
(1) the characteristics of observing various types of texts in processed video, including common ground and difference;
(2) text interested is classified, each type of text distributes a template, records the type text
The characteristics of, such as:Position, character size, polarity, color, sequential etc.;
(3) text feature and template are combined, is classified.
Embodiment three
The embodiment of the present invention three provides a kind of system of video text extraction, including:
Video lens split module 11, for by video signal process into single camera lens;
String localization module 12, for detecting and orienting the specific position of candidate's text in single sequence of frames of video
Put;
Text tracking module 13, on the basis of String localization, being tracked inside video lens to text, obtain
To one text object continuous multiple frames text filed sequence;
Enhancing and binarization block 14, the text sequence for being obtained using text tracking are strengthened text, are suppressed
Background, binaryzation is then carried out, obtain binaryzation text image;
Text identification module 15, for carrying out text identification to the text image after binaryzation, obtain the character sequence of text
Column information.
It is preferred that the String localization module 12, is specifically included following:
Pre-process submodule 121, coarse positioning submodule 122, projection cutting submodule 123 and screening submodule 124.
In some optional embodiments, the text tracking module 13, in the following ways:
Position judgment submodule 131, sequential judging submodule 132 and maintenance tracking array judging submodule 133.
In some optional embodiments, in addition to pre-segmentation module 16, specifically include:
Submodule 161 is converted, for being coloured image when the video image, the video image is converted into gray-scale map
Picture;
Submodule 162 is separated, for carrying out binary conversion treatment to text block region image, in separate picture
Character and background are to determine character boundary;
Submodule 163 is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke
Put and dimension information.
In some optional embodiments, in addition to:
Printed page analysis module 17, for analyzing the space of a whole page of video image, the text feature in video image is obtained,
Obtained text message is subjected to tissue, classification.
It should be understood that the particular order or level of the step of during disclosed are the examples of illustrative methods.Based on setting
Count preference, it should be appreciated that during the step of particular order or level can be in the feelings for the protection domain for not departing from the disclosure
Rearranged under condition.Appended claim to a method gives the key element of various steps with exemplary order, and not
It is to be limited to described particular order or level.
In above-mentioned detailed description, various features combine in single embodiment together, to simplify the disclosure.No
This open method should be construed to reflect such intention, i.e. the embodiment of theme claimed needs clear
The more features of feature stated in each claim to Chu.On the contrary, that reflected such as appended claims
Sample, the present invention are in the state fewer than whole features of disclosed single embodiment.Therefore, appended claims is special
This is expressly incorporated into detailed description, and wherein each claim is alone as the single preferred embodiment of the present invention.
Realized for software, technology described in this application can use the module for performing herein described function (for example, mistake
Journey, function etc.) realize.These software codes can be stored in memory cell and by computing device.Memory cell can
With realize in processor, can also realize outside processor, in the latter case, it via various means by correspondence
It is coupled to processor, these are all well known in the art.
Described above includes the citing of one or more embodiments.Certainly, in order to above-described embodiment is described and description portion
The all possible combination of part or method is impossible, but it will be appreciated by one of ordinary skill in the art that each implementation
Example can do further combinations and permutations.Therefore, embodiment described herein is intended to fall into appended claims
Protection domain in all such changes, modifications and variations.In addition, with regard to the term used in specification or claims
"comprising", the mode that covers of the word are similar to term " comprising ", just as " including " solved in the claims as link word
As releasing.In addition, the use of any one term "or" in the specification of claims is to represent " non-exclusionism
Or ".
Claims (10)
- A kind of 1. method of video text extraction, it is characterised in that including:By video signal process into single camera lens;The particular location of candidate's text is detected and oriented in single sequence of frames of video;On the basis of String localization, text is tracked inside video lens, obtains one text object continuous more The text filed sequence of frame;The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain two-value Change text image;Text identification is carried out to the text image after binaryzation, obtains the character string information of text.
- 2. the method as described in claim 1, it is characterised in that the positioning, specifically include following steps:Pretreatment, coarse positioning, projection cutting and screening.
- 3. the method as described in claim 1, it is characterised in that the tracking, in the following ways:Position judgment, sequential judge and safeguarded that tracking array judges.
- 4. the method as described in claim 1, it is characterised in that also include pre-segmentation before the segmentation, specifically include:When the video image is coloured image, the video image is converted into gray level image;Binary conversion treatment is carried out to text block region image, the character and background in separate picture are to determine character side Boundary;Connected domain analysis is carried out to the binary image of generation, to obtain the positions and dimensions information of character stroke.
- 5. the method as described in claim 1, it is characterised in that also include:The space of a whole page of video image is analyzed, the text feature in video image is obtained, obtained text message is subjected to group Knit, classify.
- A kind of 6. system of video text extraction, it is characterised in that including:Video lens split module, for by video signal process into single camera lens;String localization module, for detecting and orienting the particular location of candidate's text in single sequence of frames of video;Text tracking module, on the basis of String localization, being tracked inside video lens to text, obtain same Text filed sequence of the text object in continuous multiple frames;Enhancing and binarization block, the text sequence for being obtained using text tracking are strengthened text, suppress background, so After carry out binaryzation, obtain binaryzation text image;Text identification module, for carrying out text identification to the text image after binaryzation, obtain the character string information of text.
- 7. system as claimed in claim 6, it is characterised in that the String localization module, specifically include following:Pre-process submodule, coarse positioning submodule, projection cutting submodule and screening submodule.
- 8. system as claimed in claim 6, it is characterised in that the text tracking module, in the following ways:Position judgment submodule, sequential judging submodule and maintenance tracking array judging submodule.
- 9. system as claimed in claim 6, it is characterised in that also including pre-segmentation module, specifically include:Submodule is converted, for being coloured image when the video image, the video image is converted into gray level image;Separate submodule, for carrying out binary conversion treatment to text block region image, character in separate picture and Background is to determine character boundary;Submodule is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke and chi Very little information.
- 10. system as claimed in claim 6, it is characterised in that also include:Printed page analysis module, for analyzing the space of a whole page of video image, the text feature in video image is obtained, will be obtained Text message carry out tissue, classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610479702.1A CN107545210A (en) | 2016-06-27 | 2016-06-27 | A kind of method of video text extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610479702.1A CN107545210A (en) | 2016-06-27 | 2016-06-27 | A kind of method of video text extraction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107545210A true CN107545210A (en) | 2018-01-05 |
Family
ID=60961884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610479702.1A Pending CN107545210A (en) | 2016-06-27 | 2016-06-27 | A kind of method of video text extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107545210A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108388872A (en) * | 2018-02-28 | 2018-08-10 | 北京奇艺世纪科技有限公司 | A kind of headline recognition methods and device based on font color |
CN108664626A (en) * | 2018-05-14 | 2018-10-16 | 北京奇艺世纪科技有限公司 | A kind of title consistency detecting method, device and electronic equipment |
CN108960210A (en) * | 2018-08-10 | 2018-12-07 | 武汉优品楚鼎科技有限公司 | It is a kind of to grind the method, system and device for reporting board-like identification and segmentation |
CN109800757A (en) * | 2019-01-04 | 2019-05-24 | 西北工业大学 | A kind of video text method for tracing based on layout constraint |
CN110147724A (en) * | 2019-04-11 | 2019-08-20 | 北京百度网讯科技有限公司 | For detecting text filed method, apparatus, equipment and medium in video |
CN110489747A (en) * | 2019-07-31 | 2019-11-22 | 北京大米科技有限公司 | A kind of image processing method, device, storage medium and electronic equipment |
CN111639599A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN112749599A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Image enhancement method and device and server |
US11012730B2 (en) | 2019-03-29 | 2021-05-18 | Wipro Limited | Method and system for automatically updating video content |
CN113723401A (en) * | 2021-08-23 | 2021-11-30 | 上海千映智能科技有限公司 | Song menu extraction method based on morphological method |
WO2023115838A1 (en) * | 2021-12-24 | 2023-06-29 | 北京达佳互联信息技术有限公司 | Video text tracking method and electronic device |
-
2016
- 2016-06-27 CN CN201610479702.1A patent/CN107545210A/en active Pending
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108388872A (en) * | 2018-02-28 | 2018-08-10 | 北京奇艺世纪科技有限公司 | A kind of headline recognition methods and device based on font color |
CN108664626A (en) * | 2018-05-14 | 2018-10-16 | 北京奇艺世纪科技有限公司 | A kind of title consistency detecting method, device and electronic equipment |
CN108960210A (en) * | 2018-08-10 | 2018-12-07 | 武汉优品楚鼎科技有限公司 | It is a kind of to grind the method, system and device for reporting board-like identification and segmentation |
CN109800757B (en) * | 2019-01-04 | 2022-04-19 | 西北工业大学 | Video character tracking method based on layout constraint |
CN109800757A (en) * | 2019-01-04 | 2019-05-24 | 西北工业大学 | A kind of video text method for tracing based on layout constraint |
US11012730B2 (en) | 2019-03-29 | 2021-05-18 | Wipro Limited | Method and system for automatically updating video content |
CN110147724A (en) * | 2019-04-11 | 2019-08-20 | 北京百度网讯科技有限公司 | For detecting text filed method, apparatus, equipment and medium in video |
CN110147724B (en) * | 2019-04-11 | 2022-07-01 | 北京百度网讯科技有限公司 | Method, apparatus, device, and medium for detecting text region in video |
CN110489747A (en) * | 2019-07-31 | 2019-11-22 | 北京大米科技有限公司 | A kind of image processing method, device, storage medium and electronic equipment |
CN112749599A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Image enhancement method and device and server |
CN111639599A (en) * | 2020-05-29 | 2020-09-08 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN111639599B (en) * | 2020-05-29 | 2024-04-02 | 北京百度网讯科技有限公司 | Object image mining method, device, equipment and storage medium |
CN113723401A (en) * | 2021-08-23 | 2021-11-30 | 上海千映智能科技有限公司 | Song menu extraction method based on morphological method |
WO2023115838A1 (en) * | 2021-12-24 | 2023-06-29 | 北京达佳互联信息技术有限公司 | Video text tracking method and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107545210A (en) | A kind of method of video text extraction | |
CN104298982B (en) | A kind of character recognition method and device | |
CN101615252B (en) | Method for extracting text information from adaptive images | |
US8340420B2 (en) | Method for recognizing objects in images | |
CN102332096B (en) | Video caption text extraction and identification method | |
CN109874313A (en) | Text line detection method and line of text detection device | |
US9396404B2 (en) | Robust industrial optical character recognition | |
CN115082683A (en) | Injection molding defect detection method based on image processing | |
CN104700092B (en) | A kind of small characters digit recognition method being combined based on template and characteristic matching | |
CN108596166A (en) | A kind of container number identification method based on convolutional neural networks classification | |
CN105160297B (en) | Masked man's event automatic detection method based on features of skin colors | |
CN102915438A (en) | Method and device for extracting video subtitles | |
CN109308700A (en) | A kind of visual identity defect inspection method based on printed matter character | |
CN101408937B (en) | Method and apparatus for locating character row | |
MX2011002293A (en) | Text localization for image and video ocr. | |
CN104463138B (en) | The text positioning method and system of view-based access control model structure attribute | |
CN105469046B (en) | Based on the cascade vehicle model recognizing method of PCA and SURF features | |
CN101923741A (en) | Paper currency number identification method based on currency detector | |
JP2007235951A (en) | Vehicle image recognition apparatus and its method | |
Reina et al. | Adaptive traffic road sign panels text extraction | |
CN105868708A (en) | Image object identifying method and apparatus | |
CN108109133B (en) | Silkworm egg automatic counting method based on digital image processing technology | |
CN103606220A (en) | Check printed number recognition system and check printed number recognition method based on white light image and infrared image | |
CN111860137B (en) | Track turnout identification method based on vision | |
CN110334760B (en) | Optical component damage detection method and system based on RESUnet |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180105 |