CN107545210A

CN107545210A - A kind of method of video text extraction

Info

Publication number: CN107545210A
Application number: CN201610479702.1A
Authority: CN
Inventors: 张师群; 罗旻
Original assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Current assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2016-06-27
Filing date: 2016-06-27
Publication date: 2018-01-05

Abstract

The invention discloses a kind of method of video text extraction, including：By video signal process into single camera lens；The particular location of candidate's text is detected and oriented in single sequence of frames of video；On the basis of String localization, text is tracked inside video lens, obtains text filed sequence of the one text object in continuous multiple frames；The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain binaryzation text image；Text identification is carried out to the text image after binaryzation, obtains the character string information of text.Using technical scheme provided by the invention, the text in video preferably can be detected and separated from complicated polygon background, improve system effectiveness, and improve the quality of text, so as to be favorably improved text identification rate.

Description

A kind of method of video text extraction

Technical field

The present invention relates to based on image and areas of information technology, and in particular to a kind of method of video text extraction.

Background technology

In order to extract text from frame of video it is necessary to first find the frame comprising text, the position of text is then determined again.It is logical In Chang Yi video frequency program, not every frame of video all includes text, sometimes in a video frequency program, it is possible to big portion The frame divided does not all include text.Existing technology, typically each frame of video is detected, then to each frame of video Matched, so as to remove the result of repetition.Video in one video is a lot, therefore so processing can consume largely Time.

The content of the invention

In view of this, it is an object of the present invention to provide a kind of method of video text extraction, by Video segmentation Cheng Jing Head, text then is detected inside each camera lens again, be advantageous to simplify problem, and can is located parallel after being divided into camera lens Reason, improve the efficiency of system.In order to which some aspects of the embodiment to disclosure have a basic understanding, shown below is simple Summary.The summarized section is not extensive overview, nor to determine key/critical component or describe these embodiments Protection domain.Its sole purpose is that some concepts are presented with simple form, in this, as the preamble of following detailed description.

It is an object of the present invention to provide a kind of video text extraction method, including：

By video signal process into single camera lens；

The particular location of candidate's text is detected and oriented in single sequence of frames of video；

On the basis of String localization, text is tracked inside video lens, obtains one text object even The text filed sequence of continuous multiframe；

The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain Binaryzation text image；

Text identification is carried out to the text image after binaryzation, obtains the character string information of text.

In some optional embodiments, the positioning, following steps are specifically included：

Pretreatment, coarse positioning, projection cutting and screening.

In some optional embodiments, the tracking, in the following ways：

Position judgment, sequential judge and safeguarded that tracking array judges.

In some optional embodiments, also include pre-segmentation before the segmentation, specifically include：

When the video image is coloured image, the video image is converted into gray level image；

Binary conversion treatment is carried out to text block region image, the character and background in separate picture are to determine word Accord with border；

Connected domain analysis is carried out to the binary image of generation, to obtain the positions and dimensions information of character stroke.

In some optional embodiments, in addition to：

The space of a whole page of video image is analyzed, the text feature in video image is obtained, obtained text message is entered Row tissue, classification.

The present invention also aims to provide a kind of system of video text extraction, it is characterised in that including：

Video lens split module, for by video signal process into single camera lens；

String localization module, for detecting and orienting the particular location of candidate's text in single sequence of frames of video；

Text tracking module, on the basis of String localization, being tracked inside video lens to text, obtain Text filed sequence of the one text object in continuous multiple frames；

Enhancing and binarization block, the text sequence for being obtained using text tracking are strengthened text, suppress the back of the body Scape, binaryzation is then carried out, obtain binaryzation text image；

Text identification module, for carrying out text identification to the text image after binaryzation, obtain the character string of text Information.

In some optional embodiments, the String localization module, specifically include following：

Pre-process submodule, coarse positioning submodule, projection cutting submodule and screening submodule.

In some optional embodiments, the text tracking module, in the following ways：

Position judgment submodule, sequential judging submodule and maintenance tracking array judging submodule.

In some optional embodiments, in addition to pre-segmentation module, specifically include：

Submodule is converted, for being coloured image when the video image, the video image is converted into gray level image；

Submodule is separated, for carrying out binary conversion treatment to text block region image, the word in separate picture Accord with background to determine character boundary；

Submodule is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke And dimension information.

In some optional embodiments, in addition to：

Printed page analysis module, for analyzing the space of a whole page of video image, the text feature in video image is obtained, will Obtained text message carries out tissue, classification.

Using the method for the present invention, there is following effect：

Text in video preferably can be detected and separated from complicated polygon background, improve system effect Rate, and the quality of text is improved, so as to be favorably improved text identification rate.

For above-mentioned and related purpose, one or more embodiments include will be explained in below and in claim In the feature that particularly points out.Following explanation and accompanying drawing describe some illustrative aspects in detail, and its instruction is only Some modes in the utilizable various modes of principle of each embodiment.Other benefits and novel features will be under The detailed description in face is considered in conjunction with the accompanying and becomes obvious, the disclosed embodiments be will include all these aspects and they Be equal.

Fig. 1 is a kind of method flow diagram of video text extraction provided by the invention；

Fig. 2 is that a kind of system of video text extraction provided by the invention forms schematic diagram.

Embodiment

The following description and drawings fully show specific embodiments of the present invention, to enable those skilled in the art to Put into practice them.Other embodiments can include structure, logic, it is electric, process and other change.Embodiment Only represent possible change.Unless explicitly requested, otherwise single component and function are optional, and the order operated can be with Change.The part of some embodiments and feature can be included in or replace part and the feature of other embodiments.This hair The scope of bright embodiment includes the gamut of claims, and claims is all obtainable equivalent Thing.Herein, these embodiments of the invention can individually or generally be represented that this is only with term " invention " For convenience, and if in fact disclosing the invention more than one, the scope for being not meant to automatically limit the application is to appoint What single invention or inventive concept.

Embodiment one

A kind of method of the video text extraction of text is present embodiments provided, is comprised the following steps：

S101, by video signal process into camera lens one by one so that step S104 text tracking can be inside camera lens Carry out, without being carried out in whole video sequence, problem can be simplified, and can support parallel String localization and with Track, greatly improve system effectiveness.

So-called camera lens, refer to the sequence of video images that a camera operation of video camera is recorded.Shot boundary is two The result that camera lens switches over, it is that the content of video is changed, that is to say, that shot boundary has reacted video content not Continuity.

S102, the particular location of candidate's text is detected and oriented in single frame of video；False text is reduced as far as possible The number in region.

S103, on the basis of String localization, text is tracked inside video lens, obtains one text object In the text filed sequence of continuous multiple frames；It is ready for Text enhancement below.Another effect of text tracking is can be with It was found that and exclude some do not have continuous multiple frames occur it is false text filed.

S104, the text sequence obtained using text tracking are strengthened text, are suppressed background, are then carried out two-value Change, obtain the clean clearly binaryzation text image of comparison, be ready for last text identification.

S105, text identification is carried out to the text image after binaryzation, obtains the character string information of text.

Embodiment two

It is contemplated that the text message of structuring is extracted from video frequency program.Key link is divided into：Segmentation, positioning, with Track, identification and printed page analysis.

System handles video using the strategy of " semi-automatic ", i.e.,：While providing a system to processed video, it is also desirable to carry For corresponding configuration file.

Video frequency program species is various, the making style of different programs, such as character stroke width whether uniformly, character and background Color contrast, the arrangement feature of character etc., differ greatly, the general text suitable for all video text messages can not be found Eigen and processing method.

The specific aim of algorithm and its parameter in text-processing is very strong.One algorithm energy under some specific application environment Enough reach very high performance, but after environment change degradation, it is necessary to using other suitable algorithms；Meanwhile for same Individual algorithm, handling different types of video needs different parameter configurations.

One grade of program has fixed making style within very long one period, by observing and testing, it is determined that processing one All videos of this grade of program just can be correctly handled after the most suitable configuration item of shelves program, accuracy rate and processing speed can Meet actual demand.If the making style change of program, can just be adapted to quickly by changing configuration item.

For new video frequency program, by adding the configuration file of the program, system can handle all of the program Video, extension are convenient and swift.

S201, pretreatment

Pretreatment includes converting gray level image, binaryzation and connected domain analysis.In candidate's text area that positioning link obtains Area image is coloured image, and gray level image is used in binaryzation and character recognition, it is therefore desirable to is changed.Conversion method Have：

(1) extract light intensity level [DIP].

(2) some Color Channel (R, G and B) of coloured image is extracted, on the Color Channel between character and background Intensity contrast it is most obvious.

(3) converting colors space, change the distance between different colours metric form [Color], obtain character and background Between the obvious gray level image of intensity contrast.

(4) color strengthens.Respectively specify that the one or more of character and background represent color, using K averages [KMeans] Method the pixel on coloured image is clustered, while the luminance component of pixel is extracted as gray level image, in ash Strengthen character pixels on degree image, suppress background pixel, increase the intensity contrast between character and background.

In actual applications, should be according to the characteristics of video image, the color contrast relation especially between character and background, Appropriate conversion method is configured, improves the effect of follow-up binary conversion treatment.

The character and background that binaryzation is used in separate picture, to determine that character boundary lays the foundation.Binarization methods are An important direction being widely studied in OCR fields, has pointed out many algorithms [Bin] at present.Algorithm used herein Have：

(1) global Binarization methods：Ostu[Ostu],Kittler[Kittler].

(2) local binarization algorithm：Niblack[Niblack]、Sauvola[Sauvola][Fast].

A kind of Binarization methods are only suitable for solving the problems, such as in some cases, needing in the application according to pending video figure As quality condition, from different algorithms.

Connected domain analysis to be carried out to the bianry image of generation, to obtain the positions and dimensions information of character stroke.Connection Domain analysis includes three parts content：Connected domain demarcation, screening and merging.Connected domain demarcation is to reflect pixel in bianry image Connected relation between point, the algorithm in document [Label] used herein.After demarcation, it can obtain in bianry image The information such as position, size and the pixel number of each connected region.In connected domain screening, design rule, remove those Irrational connected domain in the features such as position, size, shape, dutycycle, interference is reduced for subsequent treatment, is laid the foundation.By It is usually to be made up of multiple scattered strokes in chinese character, if do not merged it reasonably, be able to will influences to split The selection of point.Connected domain merges algorithm with reference to relevant content in [Liu].

S202, by video signal process into camera lens one by one so that step S104 text tracking can be inside camera lens Carry out, without being carried out in whole video sequence, problem can be simplified, and can support parallel String localization and with Track, greatly improve system effectiveness.

S203, the particular location of candidate's text is detected and oriented in single frame of video；False text is reduced as far as possible The number in region.

The target of positioning is that text filed position is determined in video image.Whole flow process is divided into four parts：Pretreatment, Coarse positioning, projection cutting and screening.Pretreatment includes calculating stroke response and color cluster, and the former is uniform according to character stroke Feature protrudes character, and the latter protrudes character according to the color characteristic of character, and one of which processing stream is selected according to configuration item Journey.Coarse positioning according to character arrangements it is intensive the characteristics of it is text filed to detect, obtain its approximate location.Projection cutting will detect Multiline text split into single file text, obtain text filed more accurate border, be easy to subsequent singulation.Carried in checking link Text filed feature is taken, screens false-alarm.

1st, pre-process

Chinese character, English alphabet and numeral are all artificial symbols, and their common feature is exactly that stroke width is uniform, and Natural forms does not often possess this feature, so can strengthen text by calculating stroke response, suppresses background.Calculate stroke The step of response：

(1) spacing of stroke response is determined according to configuration file.When calculating stroke response, the pen of spacing and detected character It is related to draw width.The stroke width of video character all relatively, but also has the larger video of character stroke width, prior basis Observation sets suitable spacing to obtain preferable result.

(2) stroke response is calculated.Polarity in configuration file determines to need to calculate bright stroke response or dark stroke Response, bright (dark) stroke response is horizontally and vertically being calculated in both direction respectively, is being rung using its maximum as final stroke Should.

(3) binaryzation.Text filed stroke response is larger, and the stroke response of background area is smaller, passes through observation Stroke response lag is set, binaryzation [DIP] is carried out to stroke response diagram, the region more than threshold value is foreground point, on the contrary then be Background dot.

(4) expansive working [DIP] is carried out to obtained bianry image, to connect the stroke of some disconnections.

The stroke width fluctuation of character is larger in some cases, if character is characters in a fancy style, stroke feature unobvious, now It can be positioned using the color characteristic of character, step is as follows：

(1) color cluster is carried out to image according to configuration file, character color and background color is indicated in configuration file, Cluster uses K mean algorithms [KMeans].

(2) in cluster result, a kind of or a few class for being under the jurisdiction of character color is set to foreground point, other classes are set to the back of the body Sight spot, generate bianry image.

(3) expansive working [DIP] is carried out to obtained bianry image, to connect the stroke of some disconnections.

2nd, coarse positioning

On bianry image, text filed approximate location is obtained by coarse positioning first, then carried out inside region It is accurately positioned.Coarse positioning step has：

(1) connected domain is demarcated.After connected domain demarcation [Label], each connected region in bianry image can be obtained The features such as position, size and pixel number.

(2) determine text filed.According to the geometrical constraint of real text block, such as：Size, arrangement position etc., determine potential Text filed [Geometry], including determine single text filed and multiple text filed horizontally or vertically on direction Merge.

3rd, cutting is projected

Often occurs multiline text in video image, multiline text is often detected as a text in rough detection Block, reason have two aspects：

(1) background area between adjacent rows text has uniform width, can occur on stroke response diagram larger Numerical value, in binaryzation, this part background area can be retained, so as to cause to be sticked together between multiline text.

(2) expansion link will also result in adhesion.

Subsequent singulation link requires that text filed is single file text, needs potential multiline text being cut into herein multiple Single file text.Herein in units of connected domain, using the method for projection cutting [Slice1], effectively solve the viscous of multiline text The adhesion of company and in some cases text and its ambient background, it is ensured that the candidate region after cutting is single file text.

4th, screen

False-alarm be present, it is necessary to be verified in candidate's text block that above-mentioned processing obtains, step is as follows：

(1) verified according to text filed geometric properties.Candidate's text block after projecting cutting is all single file Text, some false-alarms can be screened out according to features such as the height, the ratio of width to height, area of text block.

(2) verified according to stroke response.Real text filed middle stroke is intensive, and calculating candidate is text filed to be put down Equal stroke response, screens out the less region of average value.

(3) verified according to characteristics of gradient change.For character stroke trend compared with horn of plenty, orientation consistency is poor, calculates Grad [DIP] of the text filed gray level image on eight directions, column hisgram of going forward side by side statistics, according to the direction one of gradient Cause property can screen out some false-alarms.

Checking link can screen out most of false-alarm in positioning result, still can be according to currently obtaining tracking and splitting link The information sifting false-alarm obtained.

S204, on the basis of String localization, text is tracked inside video lens, obtains one text object In the text filed sequence of continuous multiple frames；It is ready for Text enhancement below.Another effect of text tracking is can be with It was found that and exclude some do not have continuous multiple frames occur it is false text filed.

In video, text block would generally continue for some time, therefore one text block is in consecutive numbers frame even hundreds of frames It can be all positioned on image.If all being split to each positioning result, being identified, substantial amounts of processing time can be wasted.Adopt With the method for tracking, occurring into the period to disappear only carrying out once splitting, identifying to one text block, so as to avoid weight Multiple processing.Also, the beginning and ending time of text block and disappearance mode are all the important evidences of printed page analysis link.

Tracking link includes position judgment, sequential judges and safeguarded array three parts.Position judgment and sequential judge difference The analyzing and positioning result in terms of whether position is overlapping and whether content continues two, safeguarding that track array link patrols according to processing Volume, provide text-independent block.

1st, position judgment

The position that one text block occurs on front and rear two field picture immobilizes, and the text block position obtained during positioning is mutual It is overlapping, and the position that different text blocks occurs on front and rear two field picture is different, will not overlap, therefore, location overlap is Two text blocks for positioning to obtain on frame before and after judgement whether be one text block necessary condition.Position relationship has four kinds：Solely It is vertical, short weight is folded, overlapping and comprising the proportion according to shared by the area of two text block overlapping regions in text block, which is made, to be sentenced It is disconnected.If independence or short weight are folded, then illustrate it is not related in position, without determining whether again；It is if overlapping Or comprising then explanation may be from same text block, it is necessary to determine whether.

Another problem of position judgment is：According to the position of text block on front and rear frame, it is determined that needing the text block tracked Border.Because videotext is superimposed upon on image, influenceed by background object, one text block may be in a few frames Locating rectangle on image is bigger, and comprising substantial amounts of background area, or the locating rectangle on a few two field pictures is smaller, Text block is not included completely, and these other error results will influence final positioning result.Meanwhile for majority just True positioning result, the position error of text block boundary are gradually accumulated, and final tracking result can be made to become inaccurate.Therefore with Track process should have the ability of the self-control to text block boundary.Step is as follows：

(1) if former and later two text blocks are inclusion relations, the border of the larger text block of area is taken as new border；

(2) upper left of two text blocks and bottom right vertex are combined, obtain four candidate rectangles；

(3) the average stroke response of four rectangles, maximizing r are calculated_max；

(4) four rectangles are sorted from big to small according to area；

(5) since the maximum rectangle of area, if its average stroke response r_X≥α×r_max, then the rectangle border make For the border of text block, next rectangle is otherwise considered.

2nd, sequential judges

Sequential judges to be whether two text blocks for judging to navigate on consecutive frame from picture material come from same text This.Sequential relationship has four kinds：

(1) keep.Text in front and rear two field pictures does not change.

(2) replace.Text in previous frame image is replaced by the new text in latter two field picture, and content of text is different.

(3) disappear.Text in previous frame image disappears.

(4) false-alarm.Position to obtain in previous frame image it is text filed be noise.

In the case where text position is fixed, the difference of two squares aggregate-value [SSD] of front and rear frame gray level image is judged in text Hold the effective standard whether to change.If being not added with distinguishing to the pixel of text filed inside, whole region is calculated Difference of two squares aggregate-value, then judged result easily by background change influenceed, it is larger only to compare those stroke responses herein Pixel, these points are all located on character stroke, make the algorithm more stable.According to the gray scale difference between two text blocks Different and stroke response difference carries out sequential judgement.

3rd, tracking array is safeguarded

In order to track the text block occurred in video, it is necessary to safeguard a tracking array.Maintenance task is as follows：

(1) to emerging text block on present frame, it is located result and is added to array；

(2) to the text block of lasting appearance, the element is kept in array；

(3) to the text block of disappearance, beginning and ending time and the disappearance mode of text block is determined, is found out in its beginning and ending time Top-quality piece image, segmentation link is submitted to, the element is then deleted from array.

Another task for safeguarding tracking array is in the multiple image persistently occurred from text block, is picked out best in quality A frame, submit to segmentation link, so contribute to reduce segmentation link difficulty, improve final recognition correct rate.This It is a kind of mode [Survey] of video multiple image enhancing.It was observed that, the gradient distribution of text block image can in an experiment Approximation reflection picture quality, the big image of gradient mean value, the contrast between character and background is obvious, gradient kurtosis [Statistic] big image, graded are concentrated, image clearly.Election process is as follows：

(1) in positioning link calculated the Grad of each text block during text block checking, calculate the text The average and kurtosis of the gradient segmentation of block；

(2) 5 maximum two field pictures of average are retained in the buffer；

(3) after text block disappears or is replaced, the maximum image output of kurtosis is chosen in 5 two field pictures of caching.

S205, the text sequence obtained using text tracking are strengthened text, are suppressed background, are then carried out two-value Change, obtain the clean clearly binaryzation text image of comparison, be ready for last text identification.

S206, text identification is carried out to the text image after binaryzation, obtains the character string information of text.

S207, printed page analysis

The text species included in video is various, and different types of text implication is different, as shown below, the text in region Originally include：The types such as title, subtitle, station symbol, adjunct, scroll bar.In video search and video automated cataloging, it is necessary to from The text message of structuring is extracted in video, text type is the feature of equal importance with content of text.

The target of printed page analysis is that careful, accurate tissue and classification, export structure are carried out to it according to text feature Text message, to meet the needs of different application aspect, including collect feature, text organizational and text classification three parts. The temporal aspect of text block is used in printed page analysis, and temporal aspect has been handled in one section of program and just can determine that, therefore use The mode of processed offline, i.e.,：Printed page analysis is just carried out after one section of program has been handled.

1st, feature is collected

The text feature used in printed page analysis is divided into 7 classes：

(1) polarity.Polarity reflects the shade relativity of text filed middle character and background, if polarity is 0 expression Light background dark color character, polarity are 1 expression dark-background light color character.Segmentation link can utilize algorithm automatic decision text Polarity [Segment]；Polarity can also be provided in configuration file, and instructs to split with this.

(2) color.Color includes character color and background color.In some cases, it is not of the same race to be not enough to differentiation for polarity The text of class, as under red background white and yellow character polarity be all 1, at this moment just need consider colouring information.Splitting Link, binaryzation can determine text filed foreground point and background dot afterwards, the average color of character and background counted with this.

(3) character size.Character size includes the mean breadth and height of single character in line of text.In segmentation link In, the width and height of single character can be obtained after carrying out pre-segmentation, the average width of single character in line of text is counted with this Degree and height.

(4) text block position.Text block position includes the upper and lower of text block, right boundary.It can be provided in tracking link The approximate bounds of text block, but inaccurately, adjacent different types of text block easily overlaps, in segmentation link according to most Whole recognition result and pre-segmentation are as a result, it is possible to determine the accurate location of text block.

(5) recognition result.The character string that text block image obtains after over-segmentation, identification, provided in segmentation link.

(6) beginning and ending time of text block.At the time of text block appearing and subsiding, provided in tracking link.

(7) sequential relationship of text block.In tracking link, carry out providing four kinds of relations during sequential judgement：Keep, disappear, Replacement and false-alarm, belong to text block has two kinds：Disappear and replace.

This 7 category feature is the basis of printed page analysis, in subsequent treatment, it should according to the characteristics of processed video, flexibly Assemblage characteristic and design rule, ununified handling process.

2 text organizationals

Text organizational includes two parts：(1) on same two field picture multiline text merging；(2) it is same on continuous multiple frames image The merging of one text block.

After projecting cutting, the text block of processing is all single file text, and these single file texts may need to combine Complete implication could be expressed, such as：The headline of multirow.On same two field picture, according to the position of text block, character chi The information such as very little, color, with reference to the characteristics of processed video, the single file text of spatial dispersion is combined into complete logic list Position.

In some cases, the text continuously occurred, which may need to combine, could express complete implication, Huo Zhetong One text occurs repeatedly, such as intermittently：Headline.This just needs the recognition result, character size, face according to text block The information such as color, the text combination disperseed on the time is turned into complete logic unit.

3 text classifications

In different video frequency programs, the form of expression of text is different., can be total by observation for a kind of program Knot draws the rule of some text classifications, but in another kind of program, rule may no longer be set up.Therefore, text classification does not have Specific unified handling process, solving the step of problem is：

(1) the characteristics of observing various types of texts in processed video, including common ground and difference；

(2) text interested is classified, each type of text distributes a template, records the type text The characteristics of, such as：Position, character size, polarity, color, sequential etc.；

(3) text feature and template are combined, is classified.

Embodiment three

The embodiment of the present invention three provides a kind of system of video text extraction, including：

Video lens split module 11, for by video signal process into single camera lens；

String localization module 12, for detecting and orienting the specific position of candidate's text in single sequence of frames of video Put；

Text tracking module 13, on the basis of String localization, being tracked inside video lens to text, obtain To one text object continuous multiple frames text filed sequence；

Enhancing and binarization block 14, the text sequence for being obtained using text tracking are strengthened text, are suppressed Background, binaryzation is then carried out, obtain binaryzation text image；

Text identification module 15, for carrying out text identification to the text image after binaryzation, obtain the character sequence of text Column information.

It is preferred that the String localization module 12, is specifically included following：

Pre-process submodule 121, coarse positioning submodule 122, projection cutting submodule 123 and screening submodule 124.

In some optional embodiments, the text tracking module 13, in the following ways：

Position judgment submodule 131, sequential judging submodule 132 and maintenance tracking array judging submodule 133.

In some optional embodiments, in addition to pre-segmentation module 16, specifically include：

Submodule 161 is converted, for being coloured image when the video image, the video image is converted into gray-scale map Picture；

Submodule 162 is separated, for carrying out binary conversion treatment to text block region image, in separate picture Character and background are to determine character boundary；

Submodule 163 is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke Put and dimension information.

In some optional embodiments, in addition to：

Printed page analysis module 17, for analyzing the space of a whole page of video image, the text feature in video image is obtained, Obtained text message is subjected to tissue, classification.

It should be understood that the particular order or level of the step of during disclosed are the examples of illustrative methods.Based on setting Count preference, it should be appreciated that during the step of particular order or level can be in the feelings for the protection domain for not departing from the disclosure Rearranged under condition.Appended claim to a method gives the key element of various steps with exemplary order, and not It is to be limited to described particular order or level.

In above-mentioned detailed description, various features combine in single embodiment together, to simplify the disclosure.No This open method should be construed to reflect such intention, i.e. the embodiment of theme claimed needs clear The more features of feature stated in each claim to Chu.On the contrary, that reflected such as appended claims Sample, the present invention are in the state fewer than whole features of disclosed single embodiment.Therefore, appended claims is special This is expressly incorporated into detailed description, and wherein each claim is alone as the single preferred embodiment of the present invention.

Realized for software, technology described in this application can use the module for performing herein described function (for example, mistake Journey, function etc.) realize.These software codes can be stored in memory cell and by computing device.Memory cell can With realize in processor, can also realize outside processor, in the latter case, it via various means by correspondence It is coupled to processor, these are all well known in the art.

Described above includes the citing of one or more embodiments.Certainly, in order to above-described embodiment is described and description portion The all possible combination of part or method is impossible, but it will be appreciated by one of ordinary skill in the art that each implementation Example can do further combinations and permutations.Therefore, embodiment described herein is intended to fall into appended claims Protection domain in all such changes, modifications and variations.In addition, with regard to the term used in specification or claims "comprising", the mode that covers of the word are similar to term " comprising ", just as " including " solved in the claims as link word As releasing.In addition, the use of any one term "or" in the specification of claims is to represent " non-exclusionism Or ".

Claims

A kind of 1. method of video text extraction, it is characterised in that including：

By video signal process into single camera lens；

The particular location of candidate's text is detected and oriented in single sequence of frames of video；

On the basis of String localization, text is tracked inside video lens, obtains one text object continuous more The text filed sequence of frame；

The text sequence obtained using text tracking is strengthened text, is suppressed background, is then carried out binaryzation, obtain two-value Change text image；

Text identification is carried out to the text image after binaryzation, obtains the character string information of text.
2. the method as described in claim 1, it is characterised in that the positioning, specifically include following steps：

Pretreatment, coarse positioning, projection cutting and screening.
3. the method as described in claim 1, it is characterised in that the tracking, in the following ways：

Position judgment, sequential judge and safeguarded that tracking array judges.
4. the method as described in claim 1, it is characterised in that also include pre-segmentation before the segmentation, specifically include：

When the video image is coloured image, the video image is converted into gray level image；

Binary conversion treatment is carried out to text block region image, the character and background in separate picture are to determine character side Boundary；

Connected domain analysis is carried out to the binary image of generation, to obtain the positions and dimensions information of character stroke.
5. the method as described in claim 1, it is characterised in that also include：

The space of a whole page of video image is analyzed, the text feature in video image is obtained, obtained text message is subjected to group Knit, classify.
A kind of 6. system of video text extraction, it is characterised in that including：

Video lens split module, for by video signal process into single camera lens；

String localization module, for detecting and orienting the particular location of candidate's text in single sequence of frames of video；

Text tracking module, on the basis of String localization, being tracked inside video lens to text, obtain same Text filed sequence of the text object in continuous multiple frames；

Enhancing and binarization block, the text sequence for being obtained using text tracking are strengthened text, suppress background, so After carry out binaryzation, obtain binaryzation text image；

Text identification module, for carrying out text identification to the text image after binaryzation, obtain the character string information of text.
7. system as claimed in claim 6, it is characterised in that the String localization module, specifically include following：

Pre-process submodule, coarse positioning submodule, projection cutting submodule and screening submodule.
8. system as claimed in claim 6, it is characterised in that the text tracking module, in the following ways：

Position judgment submodule, sequential judging submodule and maintenance tracking array judging submodule.
9. system as claimed in claim 6, it is characterised in that also including pre-segmentation module, specifically include：

Submodule is converted, for being coloured image when the video image, the video image is converted into gray level image；

Separate submodule, for carrying out binary conversion treatment to text block region image, character in separate picture and Background is to determine character boundary；

Submodule is analyzed, for carrying out connected domain analysis to the binary image of generation, to obtain the position of character stroke and chi Very little information.
10. system as claimed in claim 6, it is characterised in that also include：

Printed page analysis module, for analyzing the space of a whole page of video image, the text feature in video image is obtained, will be obtained Text message carry out tissue, classification.