CN106127118A

CN106127118A - A kind of English word recognition methods and device

Info

Publication number: CN106127118A
Application number: CN201610430159.6A
Authority: CN
Inventors: 刁志敏
Original assignee: Zhuhai Gotech Intelligent Technology Co Ltd
Current assignee: Zhuhai Gotech Intelligent Technology Co Ltd
Priority date: 2016-06-15
Filing date: 2016-06-15
Publication date: 2016-11-16

Abstract

This application discloses a kind of English word recognition methods and device, the method includes: the video image of input is carried out stroke width conversion；The image of output after stroke width converts is carried out connected domain analysis, and to filter out from analysis result be text filed connected domain；The connected domain filtered out is merged, obtains line of text；Utilizing optical character recognition model to be identified described line of text, wherein, the training data of described optical character recognition model is English alphabet, and each English alphabet has the template of multiple different degree of corrosion；The line of text identified is carried out semantic analysis, selects the line of text meeting semanteme.The application improves English word identification accuracy under complex scene.

Description

A kind of English word recognition methods and device

Technical field

The present invention relates to technical field of character recognition, more particularly, it relates to a kind of English word recognition methods and device.

Background technology

Text is a key character in many application of computer vision, and the text in video image usually contains Abundant information, extracts the text in video image and identifies, for the analysis of video image content, understanding, information The aspects such as retrieval have great importance.

Extracting the contour feature of word from video image is the important ring during Text region, such as, at English During language word identification, the contour feature first extracting each English alphabet is needed to be merged to identify whole English list again Word.But owing to video image is natural scene, under complex scene, its background noise is overweight, letter profile disappearance can be made to be difficult to Identify thus English word missing inspection occurs and identifies mistake, affect English word identification accuracy.

Summary of the invention

In view of this, the present invention provides a kind of English word recognition methods and device, to improve English list under complex scene Word identification accuracy.

A kind of English word recognition methods, including:

The video image of input is carried out stroke width conversion；

The image of output after stroke width converts is carried out connected domain analysis, and to filter out from analysis result be text The connected domain in region；

The connected domain filtered out is merged, obtains line of text；

Utilize optical character recognition model that described line of text is identified, wherein, described optical character recognition model Training data is English alphabet, and each English alphabet has the template of multiple different degree of corrosion；

The line of text identified is carried out semantic analysis, selects the line of text meeting semanteme.

Wherein, described input picture is carried out stroke width conversion, including:

It is RGB image by the video image decoding of input；

Described RGB image is changed into gray-scale map；

Described gray-scale map is changed into normal window widget workbox image；

Utilize Canny edge detection operator that described normal window widget workbox image is carried out rim detection, obtain All edge pixel points；

Sobel operator is utilized to be calculated the gradient direction of each edge pixel point respectively；

Find the edge pixel point contrary with its gradient direction for each described edge pixel point, form edge pixel point Right；

Calculating respectively by the edge pixel point stroke width value to determining each described, the size of its stroke width value is This edge pixel between Euclidean distance.

Wherein, described filtering out from analysis result is text filed connected domain, including:

Filtering out from analysis result is text filed connected domain, and screening conditions include: the stroke width one of connected domain Cause；And pixel in connected domain the proportion identical with the color of English word to be identified be not less than first preset Value.

Filtering out from analysis result is text filed connected domain, and screening conditions include: the stroke width one of connected domain Cause；And the stroke variance of connected domain is not less than the second preset value, stroke average is not less than the 3rd preset value and connection field width is high Ratio is less than the 4th preset value.

Alternatively, described utilize optical character recognition model that described line of text is identified before, also include: utilize maximum Inter-class variance binaryzation filters the background noise of described line of text；

Corresponding, described utilize optical character recognition model that described line of text is identified, for: utilize optical character to know Line of text after background noise is filtered by other model is identified.

A kind of English word identification device, including:

Stroke width conversion module, for carrying out stroke width conversion to the video image of input；

Connected domain analysis screening unit, for the image of output after stroke width converts is carried out connected domain analysis, and Filtering out from analysis result is text filed connected domain；

Line of text combining unit, for merging the connected domain filtered out, obtains line of text；

OCR recognition unit, is used for utilizing optical character recognition model to be identified described line of text, wherein, and described light The training data learning character recognition model is English alphabet, and each English alphabet has the template of multiple different degree of corrosion；

Semantic analysis unit, carries out semantic analysis to the line of text identified, and selects the line of text meeting semanteme.

Wherein, described stroke width conversion module specifically includes:

RGB image conversion unit, being used for the video image decoding of input is RGB image；

Gray-scale map conversion unit, for changing into gray-scale map described RGB image；

SWT image conversion unit, for changing into SWT image described gray-scale map；

Edge detection unit, is used for utilizing Canny edge detection operator that described SWT image is carried out rim detection, obtains All edge pixel points；

Gradient direction computing unit, for utilizing sobel operator to be calculated the gradient direction of each edge pixel point respectively；

Stroke width computing unit, for finding the edge contrary with its gradient direction for each described edge pixel point Pixel, forms edge pixel point pair；Calculating by each edge pixel point stroke width value to determining respectively, size is this Edge pixel between Euclidean distance.

Wherein, described connected domain analysis screening unit is specifically for carrying out even the image of output after stroke width converts Logical domain analysis, and it is consistent therefrom to filter out stroke width, and also the pixel identical with the color of English word to be identified exists In connected domain, proportion is not less than the connected domain of the first preset value.

Wherein, described connected domain analysis screening unit is specifically for carrying out even the image of output after stroke width converts Logical domain analysis, and the stroke width therefrom filtering out connected domain is consistent, and the stroke variance of connected domain is not less than second presets Value, stroke average are not less than the 3rd preset value and connected domain the ratio of width to height connected domain less than the 4th preset value.

Alternatively, described device also includes: background noise filter element, for utilizing optical character recognition model to institute State before line of text is identified, filter the background noise of described line of text first with maximum between-cluster variance binaryzation.

From above-mentioned technical scheme it can be seen that the present invention is by corroding optical character recognition Model Identification difference in advance The English alphabet of degree is trained study, increases the discrimination under letter profile damage situations, reduces English word missing inspection Rate；And this present invention also carries out semantic analysis screening to the line of text identified, to select the line of text meeting semanteme, reduce English word fallout ratio, thus improve English word identification accuracy under complex scene.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.

Fig. 1 is a kind of English word recognition methods flow chart disclosed by the invention；

Fig. 2 is a kind of stroke width alternative approach flow chart disclosed by the invention；

Fig. 3 is a kind of English word identification apparatus structure schematic diagram disclosed by the invention；

Fig. 4 is another English word identification apparatus structure schematic diagram disclosed by the invention；

Fig. 5 is another English word identification apparatus structure schematic diagram disclosed by the invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.

See Fig. 1, the embodiment of the invention discloses a kind of English word recognition methods, to improve English list under complex scene Word identification accuracy, including:

Step 100: the video image of input is carried out stroke width conversion；

The purpose that the video image inputted carries out stroke width conversion is to obtain connected domain information.Stroke width converts Thinking as follows: first to input video image carry out rim detection, obtain marginal information；Then from each edge pixel Point sets out, and finds the edge pixel point that gradient direction therewith is contrary, forms an edge pixel point pair；Calculate each limit respectively Edge pixel between Euclidean distance, and this value is given this edge pixel point between all of pixel.Through stroke After width conversion, the image slices vegetarian refreshments of output represents possible stroke width.Utilize stroke width information can obtain possible literary composition This information, because the consistent connected domain of stroke width is likely to be text filed.

The detailed process of stroke width conversion is as shown in Figure 2.Including:

Step 101: be RGB image by the video image decoding of input；

Step 102: described RGB image is changed into gray-scale map；

Step 103: described gray-scale map is changed into SWT (Standard Widget Toolkit, normal window widget Workbox) image；

Step 104: utilize Canny edge detection operator that described SWT image is carried out rim detection, obtain all edges picture Vegetarian refreshments；Wherein, Canny edge detection operator is that the multistage rim detection that John F.Canny developed in 1986 is calculated Method；

Step 105: utilize sobel operator (Sobel operator, Sobel Operator) to be calculated each edge picture respectively The gradient direction of vegetarian refreshments；

Step 106: find the edge pixel point contrary with its gradient direction for each described edge pixel point, forms limit Edge pixel pair；

Step 107: calculate respectively by the edge pixel point stroke width value to determining each described, its stroke width value Size be this edge pixel between Euclidean distance.

Step 200: the image of output after stroke width converts is carried out connected domain analysis, and screens from analysis result Going out is text filed connected domain；

Connected domain refers to have in the image of output the adjacent prospect of same pixel value and position after stroke width converts The image-region of pixel composition.Connected domain analysis refers to each connection in the image of the output after stroke width converts Territory is found out and labelling.Prior art during English word identification, therefrom filter out be text filed connected domain time logical The most only consider that the stroke width of connected domain is the most consistent, but the interference of background color is easily caused English word false retrieval, therefore originally Embodiment increases by screening conditions: pixel in connected domain the proportion identical with the color of English word is not less than first Preset value, such as, English word to be identified is black, then may call for black pixel point proportion in connected domain the lowest In 60%.Additionally, for avoiding, because English word is too small, false retrieval occurs, it is also possible to it is further added by screening conditions: stroke variance is the lowest It is not less than the 3rd preset value and connected domain the ratio of width to height less than the 4th preset value in the second preset value, stroke average.

Step 300: the connected domain filtered out is merged, obtains line of text；

Such as, the several connected domains filtered out are to show that connected domain that content is l, display content are u the most successively Connected domain, display content be c connected domain, display content be k connected domain, display content be the connected domain of y, then merge it Rear available line of text lucky.

Step 400: utilize OCR (Optical Character Recognition, optical character recognition) model to described Line of text is identified, and wherein, the training data of described OCR model is that (described 26 English alphabets include 26 to English alphabet Capitalization English letter A～Z and/or 26 small English alphabet a～z), each English alphabet has multiple different degree of corrosion Template；

The present embodiment English alphabet to described OCR Model Identification difference degree of corrosion in advance is trained study, adds Discrimination under English alphabet profile damage situations, training pattern can use existing SVM (Support Vector Machine, support vector machine) algorithm, but do not limit to.Wherein, the template of described multiple different degree of corrosions, may is that complete The template of free from corrosion template, low degree corrosion, the template of middle degree corrosion and the template of high level corrosion.

Step 500: the line of text identified is carried out semantic analysis, selects the line of text meeting semanteme.

English word occurs in the number of times in video image and has regular hour regularity, and therefore the present embodiment is to The English word identified carries out semantic statistics, and the quantity of statistics is the most, and semantic statistical result is the most accurate, if this identifies Line of text do not meet semanteme, be not i.e. inconsistent with the semantic statistical result previously obtained, then the line of text that this identified is got rid of, To reduce fallout ratio, this is the basic thought that this line of text identified carries out semantic analysis.Such as, identified English word include happy, happiness, joy, relaxed etc. of repeatedly occurring in video image, its semantic similarity, If the text behavior pain that this identifies, then semantic contrary with the former due to it, it is known that it does not meets semanteme, is a false retrieval list Word, needs to get rid of.The present embodiment can use HMM (Hidden Markov Model, hidden Markov model) to carry out line of text Semantic analysis is added up, but does not limit to.

From the foregoing, it will be observed that the video image of input is carried out stroke width conversion, then to output after stroke width converts Image carries out connected domain analysis, and therefrom to filter out be text filed connected domain, then merges the connected domain filtered out, Obtain line of text, more described line of text is carried out OCR identification, be the routine techniques hands that word in video image is identified Section.But the interference that in video image, background noise is overweight can make letter profile disappearance be difficult to thus English word missing inspection occurs With identification mistake, affect English word identification accuracy.To this, the present embodiment is in advance to OCR Model Identification difference degree of corrosion English alphabet is trained study, increases the discrimination under letter profile damage situations, reduces English word loss；And And the present embodiment also carries out semantic analysis to the line of text identified, select the line of text meeting semanteme, reduce English word Fallout ratio, thus improve English word identification accuracy under complex scene.

Additionally, before utilizing OCR model that described line of text is identified, also can be first with OSTU (maximum between-cluster variance) Binaryzation filters the background noise of described line of text, and the line of text after background noise is filtered by recycling OCR model afterwards is carried out Identify.It has the beneficial effects that: by filter background noise, and line of text can be made clear-cut, reduces background noise and treats knowledge The interference corrosion of other English word, reduces false retrieval situation further.

Additionally, see Fig. 3, the embodiment of the invention also discloses a kind of English word identification device, to improve complex scene Lower English word identification accuracy, including:

Stroke width conversion module 100, for carrying out stroke width conversion to the video image of input；

Connected domain analysis screening unit 200, for the image of output after stroke width converts is carried out connected domain analysis, And to filter out from analysis result be text filed connected domain；

Line of text combining unit 300, for merging the connected domain filtered out, obtains line of text；

OCR recognition unit 400, is used for utilizing optical character recognition model to be identified described line of text, wherein, described The training data of optical character recognition model is English alphabet, and each English alphabet has the template of multiple different degree of corrosion；

Semantic analysis unit 500, carries out semantic analysis to the line of text identified, and selects the line of text meeting semanteme.

Wherein, seeing Fig. 4, stroke width conversion module 100 specifically includes:

RGB image conversion unit 101, being used for the video image decoding of input is RGB image one by one；

Gray-scale map conversion unit 102, for changing into gray-scale map described RGB image；

SWT image conversion unit 103, for changing into SWT image described gray-scale map；

Edge detection unit 104, is used for utilizing Canny edge detection operator that described SWT image is carried out rim detection, To all edge pixel points；

Gradient direction computing unit 105, for utilizing sobel operator to be calculated the gradient side of each edge pixel point respectively To；

Stroke width computing unit 106, for finding contrary with its gradient direction for each described edge pixel point Edge pixel point, forms edge pixel point pair；Calculate respectively by each edge pixel point stroke width value to determining, size For this edge pixel between Euclidean distance.

Wherein, connected domain analysis screening unit 200 is specifically for carrying out even the image of output after stroke width converts Logical domain analysis, and it is consistent therefrom to filter out stroke width, and also the pixel identical with the color of English word to be identified exists In connected domain, proportion is not less than the connected domain of the first preset value.

Or, connected domain analysis screening unit 200 is specifically for carrying out even the image of output after stroke width converts Logical domain analysis, and the stroke width therefrom filtering out connected domain is consistent, and the stroke variance of connected domain is not less than second presets Value, stroke average are not less than the 3rd preset value and connected domain the ratio of width to height connected domain less than the 4th preset value.

Alternatively, as it is shown in figure 5, described English word identification device also includes: background noise filter element 600, it is used for Before utilizing optical character recognition model that described line of text is identified, filter described first with maximum between-cluster variance binaryzation The background noise of line of text.

In sum, the present invention is by carrying out the English alphabet of optical character recognition Model Identification difference degree of corrosion in advance Training study, increases the discrimination under letter profile damage situations, reduces English word loss；And this present invention is also The line of text identified is carried out semantic analysis screening, to select the line of text meeting semanteme, reduces English word fallout ratio, Thus improve English word identification accuracy under complex scene.

In this specification, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is and other The difference of embodiment, between each embodiment, identical similar portion sees mutually.For device disclosed in embodiment For, owing to it corresponds to the method disclosed in Example, so describe is fairly simple, relevant part sees method part and says Bright.

Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention. Multiple amendment to these embodiments will be apparent from for those skilled in the art, as defined herein General Principle can realize in the case of without departing from the spirit or scope of the embodiment of the present invention in other embodiments.Therefore, The embodiment of the present invention is not intended to be limited to the embodiments shown herein, and be to fit to principles disclosed herein and The widest scope that features of novelty is consistent.

Claims

1. an English word recognition methods, it is characterised in that including:

The video image of input is carried out stroke width conversion；

The image of output after stroke width converts is carried out connected domain analysis, and to filter out from analysis result be text filed Connected domain；

The connected domain filtered out is merged, obtains line of text；

Utilize optical character recognition model that described line of text is identified, wherein, the training of described optical character recognition model Data are English alphabet, and each English alphabet has the template of multiple different degree of corrosion；

Method the most according to claim 1, it is characterised in that described input picture is carried out stroke width conversion, including:

It is RGB image by the video image decoding of input；

Described RGB image is changed into gray-scale map；

Described gray-scale map is changed into normal window widget workbox image；

Utilize Canny edge detection operator that described normal window widget workbox image is carried out rim detection, owned Edge pixel point；

Find the edge pixel point contrary with its gradient direction for each described edge pixel point, form edge pixel point pair；

Calculating by the edge pixel point stroke width value to determining each described respectively, the size of its stroke width value is this limit Edge pixel between Euclidean distance.

Method the most according to claim 1, it is characterised in that described filtering out from analysis result is text filed company Logical territory, including:

Filtering out from analysis result is text filed connected domain, and screening conditions include: the stroke width of connected domain is consistent；And And pixel in connected domain the proportion identical with the color of English word to be identified is not less than the first preset value.

Filtering out from analysis result is text filed connected domain, and screening conditions include: the stroke width of connected domain is consistent；And And the stroke variance of connected domain is not less than the second preset value, stroke average is not less than the 3rd preset value and connected domain the ratio of width to height does not surpasses Cross the 4th preset value.

5. according to the method according to any one of claim 1-4, it is characterised in that described utilize optical character recognition model pair Before described line of text is identified, also include: utilize maximum between-cluster variance binaryzation to filter the background noise of described line of text；

Corresponding, described utilize optical character recognition model that described line of text is identified, for: utilize optical character recognition mould Line of text after background noise is filtered by type is identified.

6. an English word identification device, it is characterised in that including:

Connected domain analysis screening unit, for carrying out connected domain analysis, and from dividing to the image of output after stroke width converts Filtering out in analysis result is text filed connected domain；

OCR recognition unit, is used for utilizing optical character recognition model to be identified described line of text, wherein, and described optics word Symbol identifies that the training data of model is English alphabet, and each English alphabet has the template of multiple different degree of corrosion；

Device the most according to claim 6, it is characterised in that described stroke width conversion module specifically includes:

Edge detection unit, is used for utilizing Canny edge detection operator that described SWT image is carried out rim detection, is owned Edge pixel point；

Stroke width computing unit, for finding the edge pixel contrary with its gradient direction for each described edge pixel point Point, forms edge pixel point pair；Calculating respectively by each edge pixel point stroke width value to determining, size is this edge Pixel between Euclidean distance.

Device the most according to claim 6, it is characterised in that described connected domain analysis screening unit is specifically for through pen After drawing width conversion, the image of output carries out connected domain analysis, and it is consistent therefrom to filter out stroke width, and with to be identified Pixel proportion in connected domain that the color of English word is identical is not less than the connected domain of the first preset value.

Device the most according to claim 6, it is characterised in that described connected domain analysis screening unit is specifically for through pen After drawing width conversion, the image of output carries out connected domain analysis, and the stroke width therefrom filtering out connected domain is consistent, Er Qielian The stroke variance in logical territory is not less than the second preset value, stroke average is not less than the 3rd preset value and connected domain the ratio of width to height is less than the The connected domain of four preset values.

10. according to the device according to any one of claim 6-9, it is characterised in that described device also includes: background noise mistake Filter unit, for before utilizing optical character recognition model to be identified described line of text, first with maximum between-cluster variance two The background noise of described line of text is filtered in value.