CN105989341A

CN105989341A - Character recognition method and device

Info

Publication number: CN105989341A
Application number: CN201510086612.1A
Authority: CN
Inventors: 许亮; 范伟; 孙俊; 直井聪
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-02-17
Filing date: 2015-02-17
Publication date: 2016-10-05

Abstract

The invention discloses a character recognition method and a character recognition device. The character recognition method includes the following steps that: a plurality of communicated components are extracted from an image containing a character; the communicated components are classified, so that first language communicated components and/or non-first language communicated components can be generated; the first language communicated components are clustered as a first language text line, and the non-first language communicated components are clustered as a non-first language text line; and a first language character and a non-first language character are recognized from the first language text line and the non-first language text line.

Description

Character recognition method and device

Technical field

The present invention relates to image processing field, be specifically related to identify the word in the address doorplate in image Method and apparatus.

Background technology

Along with the mobile device with shoot function uses more and more general in our daily life Time, such as mobile phone, digital camera etc., this photo making us obtain natural scene becomes the most square Just.Address doorplate is our the most highly important information, and people may utilize mobile device Shooting comprises the photo of address doorplate, with record or the position sharing oneself.Mark at numerical map In, need to extract the Word message in the address doorplate in a large amount of photos, be identified mark. Owing to the quantity of photo is the biggest, it is therefore desirable to replace artificial cognition otherwise with automatic knowledge, with fall Low workload.

Fig. 1 shows the flow chart of a kind of method identifying address doorplate information.As it is shown in figure 1, root According to the method 100, can detect from photo and extract doorplate figure after step S110 input photo As (step S120).Fig. 2 a and Fig. 2 b respectively illustrates the photo of input and extracts from this photo One example of the doorplate image gone out.As shown in figures 2 a and 2b, by step S120, can be from photo Detect and extract doorplate image.Then, the doorplate image extracted is carried out Text region, with Identify literal address (step S130).In the example shown in Fig. 2 a and 2b, may recognize that literary composition Word address " metallurgical North Road 99 ".Finally, literal address output (step S140) that will be identified, Automatically mark is identified with realize address.

At present, from photo, detect and extract the technology of doorplate image the most ripe, its correctness and Processing speed can meet current application demand.But, the doorplate image extracted is carried out literary composition The technology of word identification does not often reach requirement.On the one hand this be due to the Word message in many doorplates Layout structure is more complicated, is not easy to carry out Text region.On the other hand owing to generally wrapping in doorplate Word (e.g., Arabic numerals, English character, Chinese character etc.) containing more than one language, and for Each character, is required to use the identification engine comprising polyglot character to be identified, due to many The quantity of language character is very big, and this also have impact on the place utilizing character recognition engine to carry out Text region Reason speed.

Summary of the invention

In view of this, the present invention proposes a kind of character recognition method and device, with to the literary composition in image Word information is identified.

According to an aspect of the invention, it is provided a kind of character recognition method, including: from comprising literary composition The image of word extracts multiple communication means；The plurality of communication means is classified, to generate One language communication means and/or non-first language communication means；Described first language communication means is gathered Class is first language line of text, and is non-first language literary composition by described non-first language communication means cluster One's own profession；And identify first from described first language line of text and described non-first language line of text Spoken and written languages and non-first language word.

According to a further aspect in the invention, it is provided that a kind of character recognition device, including: extraction unit, Multiple communication means is extracted from the image comprising word；Taxon, to the plurality of communication means Classify, to generate first language communication means and/or non-first language communication means；Cluster is single Unit, is first language line of text by described first language communication means cluster, and by described non-first language Speech communication means cluster is non-first language line of text；And recognition unit, from described first language literary composition One's own profession and described non-first language line of text identify first language word and non-first language word.

According to technical scheme provided by the present invention, the Word message in the image comprising word can be entered Row effectively identifies, is particularly suitable for comprising polyglot and having the image of certain layout structure feature.

Accompanying drawing explanation

The embodiments of the present invention are read with reference to the drawings, other spy of the present invention be will be better understood Seeking peace advantage, accompanying drawing described here is intended merely to schematically illustrate embodiments of the present invention Purpose, and not all possible enforcement, and be not intended to limit the scope of the present invention.In the accompanying drawings:

Fig. 1 shows the flow chart of the method identifying address doorplate information in prior art；

Photo that Fig. 2 a and Fig. 2 b respectively illustrates input and the doorplate image extracted from this photo An example；

Fig. 3 shows the flow chart of the character recognition method according to one embodiment of the present invention；

Fig. 4 shows and extracts multiple from the image comprising word according to one embodiment of the present invention The flow chart of communication means；

Fig. 5 shows and extracts multiple from the image comprising word according to another embodiment of the present invention The alternative flow chart of communication means；

It is that Fig. 6 shows first language communication means cluster according to one embodiment of the present invention Non-first language communication means is also clustered the flow process for non-first language line of text by one language text row Figure；

It is that Fig. 7 shows first language communication means cluster according to another embodiment of the present invention Non-first language communication means is also clustered replacing for non-first language line of text by one language text row Select flow chart；

Fig. 8 shows according to one embodiment of the present invention from first language line of text and non-first language Speech line of text identifies first language word and the flow chart of non-first language word；

Fig. 9 shows and determines according to the layout structure feature of image according to one embodiment of the present invention The flow chart of the space of a whole page classification that image is residing in multiple space of a whole page classifications；

Figure 10 a to Figure 10 d shows the example of four space of a whole page classifications；

Figure 11 shows the block diagram of the character recognition device according to one embodiment of the present invention；

Figure 12 shows the block diagram of the recognition unit according to one embodiment of the present invention；

Figure 13 shows that the space of a whole page classification according to one embodiment of the present invention determines the frame of subelement Figure；

Figure 14 shows that the line of text according to one embodiment of the present invention processes the block diagram of subelement；

Figure 15 shows the block diagram of the extraction unit according to one embodiment of the present invention；

Figure 16 shows the block diagram of the cluster cell according to one embodiment of the present invention；

Figure 17 shows the block diagram of the cluster cell according to another embodiment of the present invention；And

Figure 18 shows the computer that can be used for implementing method and apparatus according to embodiments of the present invention Schematic block diagram.

Detailed description of the invention

Referring now to accompanying drawing, embodiments of the present invention are described in detail.Only it should be noted that following description It is only exemplary, and is not intended to limit the present invention.Additionally, in the following description, phase will be used Same drawing reference numeral represents the same or analogous parts in different accompanying drawing.Described below different real Execute the different characteristic in mode, can be bonded to each other, to form other embodiments in the scope of the invention.

In embodiments of the present invention, it is assumed that utilized technology well known by persons skilled in the art from photograph Sheet detects and extracts the image-region comprising word, doorplate image as shown in Figure 2 b.And Various process and operation in embodiments of the present invention are all to do on the basis of obtaining this image Go out.

Fig. 3 shows the flow chart of the character recognition method according to one embodiment of the present invention.Such as figure Shown in 3, character recognition method 300 includes step S310 to S340.Literary composition is comprised for acquired The image of word, in step S310, extracts multiple communication means from the image comprising word.Right The extraction of communication means can use any applicable mode well known by persons skilled in the art.Extracted Each communication means can comprise the part of one or more alphabetic character or alphabetic character.

Owing to image generally comprising the word of polyglot, in step s 320, by different language The multiple communication means extracted are classified, thus is divided into first language to connect communication means Parts and non-first language communication means.It is appreciated that and does not wraps for only comprising first language word Image containing other spoken and written languages, after the process of step S320, all interconnecting parts extracted Part is all classified as first language communication means；And the word for being comprised not is first language literary composition The image of word, after the process of step S320, all communication means extracted all are classified as Non-first language communication means.

In step S330, it is first language line of text by first language communication means cluster, and will Non-first language communication means cluster is non-first language line of text.The cluster of communication means can be used Any applicable mode well known by persons skilled in the art.Each line of text obtained by after clustered can Including one or more first language words or non-first language word.

In step S340, respectively from obtained first language line of text and non-first language text Row identifies first language word and non-first language word, to complete Word message in image Automatically identify.Available first language character string identification engine and non-first language character string identification engine From line of text, identify first language word and non-first language word respectively, this will following in detail State.

According to this embodiment of the present invention, by the word in image being categorized as first language and non- First language such that it is able to identify the Word message including polyglot from image, such as, Ah Arabic numbers, English alphabet, Chinese character etc..And, by different spoken and written languages are clustered into respectively Different line of text are to be identified, it is possible to utilize the identification engine of different language separately to process not With the line of text of language, and without using the identification engine of multilingual character to process each text OK, thus improve recognition speed.

Fig. 4 shows and extracts multiple from the image comprising word according to one embodiment of the present invention The flow chart of communication means.As shown in Figure 4, above-mentioned steps S310 can include sub-step S311 extremely S313.In sub-step S311, from image, extract connected unit.To connected unit in image Extract, can be completed by the means of any suitable prior art, be not described in detail in this.Subsequently, In sub-step S312, calculate the recognition confidence of each connected unit extracted, and in sub-step In rapid S313, recognition confidence is removed less than the connected unit of predetermined confidence threshold value, and will The connected unit retained merges, to form multiple communication means.For each image comprising word, Due to shooting problem or the character area (such as doorplate region) of image, itself there are some dirts sometimes, And making there are some in extracted connected unit is noise.In order to remove noise, of the present invention Embodiment has preset confidence threshold value.The recognition confidence of each connected unit by extracting Compare with default confidence threshold value, thus filter out the connected unit that confidence level is relatively low, remove Noise, to obtain the higher connected unit of confidence level.

Fig. 5 shows and extracts multiple from the image comprising word according to another embodiment of the present invention The alternative flow chart of communication means.As it is shown in figure 5, above-mentioned steps S310 can include sub-step S315 To S318.In sub-step S315, from image, extract connected unit.Subsequently, in sub-step In S316, calculate the stroke width of each connected unit extracted, and calculate all connections The average stroke width of unit.In sub-step S317, according to average stroke width, Determine stroke width range.For example it is assumed that calculated average stroke width is SW, Stroke width range can be defined as 0.5*SW～1.5*SW.Subsequently, in sub-step S318 In, stroke width is not at the connected unit in the range of this and removes, and by not removed connection Unit merges, to form multiple communication means.Process shown in Fig. 5 is another kind of optional denoising Mode, it utilizes stroke width, is used as filtercondition, to remove noise.

It addition, for not only including deep background and shallow characters region but also include the mixed image in territory, deep block, the shallow end, When extracting connected unit from image, image can be carried out front binaryzation and reverse side binaryzation, and will The result of front binaryzation and reverse side binaryzation is analyzed respectively, to extract connected unit respectively.? After, the connected unit extracted is merged, to form multiple communication means.

According to an embodiment of the invention, by different language to being carried in above-mentioned steps S320 When the multiple communication means taken out are classified, available the first language including all first language characters Speech grader is classified.Such as, setting first language is Arabic numerals 0-9, then available bag Including the grader of 0-9, the communication means extracted is categorized as first language communication means (is Ah Arabic numbers) and non-first language communication means (not being Arabic numerals).According to the present invention one Individual embodiment, the character quantity of first language, less than the character quantity of non-first language, so, can subtract Amount of calculation during subclassification, improves processing speed.It will be understood by those skilled in the art that also can be by non- First language is categorized as second language and non-first non-second language (by that analogy), it is possible to obtain relatively Good effect, its concrete mode is similar to the above, is not described in detail in this.

It is that Fig. 6 shows first language communication means cluster according to one embodiment of the present invention Non-first language communication means is also clustered the flow process for non-first language line of text by one language text row Figure.As shown in Figure 6, above-mentioned steps S330 can include sub-step S331 to S334.In sub-step In S331, compare the horizontal interval between first language communication means and vertical spacing, and according to than Relatively result is that each first language communication means arranges level marks or vertical labelling.Specifically, for Each first language communication means, compares between the level of its first language communication means adjacent with level Every and the size of vertical spacing with the most adjacent first language communication means.If horizontal interval is relatively Little, illustrate the compactest, then level marks is set for this communication means, otherwise then Vertical labelling is set.Similarly, in sub-step S332, according to non-first language communication means it Between horizontal interval and the comparative result of vertical spacing, for each non-first language communication means, water is set Flat labelling or vertically labelling.

Then, in sub-step S333, will have first language communication means and the tool of level marks The first language communication means having vertical labelling clusters respectively as the horizontal line of text of first language and first The vertical line of text of language.In sub-step S334, the non-first language with level marks is connected Parts and the non-first language communication means with vertical labelling cluster respectively as non-first language level Line of text and the vertical line of text of non-first language.

It is that Fig. 7 shows first language communication means cluster according to another embodiment of the present invention Non-first language communication means is also clustered replacing for non-first language line of text by one language text row Select flow chart.As it is shown in fig. 7, above-mentioned steps S330 can include sub-step S335 to S338.? In sub-step S335, compare the horizontal interval between first language communication means and vertical spacing with pre- If the size of threshold value, and be that first language communication means arranges level marks and perpendicular according to comparative result Straight labelling.Specifically, for each first language communication means, first language adjacent with level by it Horizontal interval between speech communication means compares with the threshold value preset, and by it with the most adjacent Vertical spacing between first language communication means compares with this threshold value.If horizontal interval is less than This threshold value, then arrange level marks, if horizontal interval is more than this threshold value, is then not provided with level marks. Equally, if vertical spacing is less than this threshold value, then vertical labelling is set, if vertical spacing is more than being somebody's turn to do Threshold value, then be not provided with vertical labelling.Similarly, in sub-step S336, relatively non-first language The size of horizontal interval between communication means and vertical spacing and the threshold value preset, and according to comparing knot Fruit arranges level marks and vertical labelling for non-first language communication means.

It is appreciated that for some communication means, is likely to be due to its horizontal interval and vertical spacing is the least In this threshold value preset, and it is provided with level marks and vertical labelling simultaneously.In sub-step S337 In, not only there is level marks but also there is first language communication means and the Fei of vertical labelling for each One language communication means, according to its horizontal interval with similar communication means and the comparison knot of vertical spacing Really, one of its level marks and vertical labelling are removed.That is, if first language communication means is in level Closer to another first language communication means on direction, then retain its level marks, otherwise, then retain Its vertical labelling.Subsequently, in sub-step S338, the first language with level marks is connected Parts and the first language communication means with vertical labelling cluster respectively as the horizontal text of first language Row and the vertical line of text of first language, and will have non-first language communication means and the tool of level marks The non-first language communication means having vertical labelling cluster respectively as the horizontal line of text of non-first language and The vertical line of text of non-first language.

By above-mentioned process, available through language classification the difference that determines Text region direction Line of text, but also need to determine the recognition sequence between different line of text.To this end, present applicant proposes one Plant the space of a whole page classification being determined pending image by default space of a whole page template, so that it is determined that different text Recognition sequence between row.

Fig. 8 shows according to one embodiment of the present invention from first language line of text and non-first language Speech line of text identifies first language word and the flow chart of non-first language word.Such as Fig. 8 institute Showing, above-mentioned steps S340 can include sub-step S341 to S343.In sub-step S341, according to First language line of text and non-first language line of text calculate the layout structure feature of image.Subsequently, exist In sub-step S342, according to the layout structure feature calculated, determine that this image is known many Which space of a whole page classification is individual space of a whole page classification belong to.After determining space of a whole page classification, i.e. determine not identical text Processing sequence between one's own profession.In sub-step S343, process it according to the space of a whole page classification of this image In first language line of text and non-first language line of text, to identify first language word and Fei One spoken and written languages.

According to an embodiment of the invention, the version of the image calculated in above-mentioned sub-step S341 Area structure character comprises the steps that the geometric properties of line of text the longest in first language line of text and identification are put The geometric properties of line of text the longest in reliability and non-first language line of text.

Specifically, the geometric properties of line of text can include the coboundary of this article one's own profession, lower boundary, the left side Average the ratio of width to height of communication means in boundary, right margin, this article one's own profession and/or adjacent communication means Equispaced.

According to one embodiment of present invention, the layout structure feature of image comprises the steps that first language is The recognition confidence (P) of long article one's own profession, 6 dimension geometric properties (tops of first language long article one's own profession Boundary-y0, lower boundary-y1, left margin-x0, the relative position of right margin-x1, communication means average The ratio of width to height, the equispaced of adjacent communication means) and 6 dimension geometry of second language long article one's own profession Feature (coboundary-y0, lower boundary-y1, left margin-x0, the relative position of right margin-x1, connection Average the ratio of width to height of parts, the equispaced of adjacent communication means).Wherein, first language long article The recognition confidence (P) of one's own profession refers to the average identification of all candidate's communication means in this article one's own profession Confidence level, its calculation is as follows:

P=(P_CC1+P_CC2+……+P_CCM)/M

The number of candidate's communication means during wherein M represents the long article one's own profession of first language.Except this it Outward, this average recognition confidence can also use other calculations, such as: all in this article one's own profession Based on width weighed the average recognition confidence of candidate's communication means, it may be assumed that

P = (P_{CC 1} * \frac{w_{CC 1}}{w} + P_{CC 2} * \frac{w_{CC 2}}{w} + . . . . . . + P_{CCM} * \frac{w_{CCM}}{w}) / M

Wherein w_CCMRepresenting the width of m-th candidate's communication means in this article one's own profession, w represents this article The width of one's own profession.

Fig. 9 shows and determines according to the layout structure feature of image according to one embodiment of the present invention The flow chart of the space of a whole page classification that image is residing in multiple space of a whole page classifications.As it is shown in figure 9, above-mentioned sub-step Rapid S342 can include sub-step S342a to S342b.In sub-step S342a, according to image Layout structure feature, calculates each confidence level probability in this image and multiple space of a whole page classifications.Subsequently, In sub-step S342b, the space of a whole page classification of the confidence level maximum probability with this image is defined as its version Face classification.According to an embodiment, image and the confidence level probability of each space of a whole page classification by trained really Fixed discriminant function is calculated by the nonlinear transformation that threshold value is limited, and this discriminant function is permissible It is linear or nonlinear.

According to an embodiment of the invention, available multiple training samples, according to layout structure Difference, pre-defined N kind space of a whole page classification.Figure 10 a to Figure 10 d shows four space of a whole page classifications Example.Utilize multiple known training sample, their layout structure feature can be calculated, thus can To train the grader of a N class, can be linear classifier or Nonlinear Classifier, such as, Classical Linear SVM grader in discrimination model.

The sample belonging to each space of a whole page classification is regarded as the positive sample of the category, and other all samples are seen Become the negative sample of the category, it is possible to training obtains a linear discriminant function, just to reach to distinguish, The purpose of negative sample, is shown below.

f_i(x)=w_i ^Tx+b_i, i=1 ..., N

Wherein, x represents layout structure characteristic vector, coefficient w_iAnd b_iBe in linear classifier with version Relevant trained of face classification i and the parameter that determines, f_iX () represents that space of a whole page classification i is tied about the space of a whole page The linear function of structure feature x.

Then this linear discriminant function is converted by following Sigmoid and is converted to confidence level probability:

P_{i} (x) = \frac{1}{1 + \exp [- (α f_{i} (x) + β)]}, i = 1, . . ., N

Wherein, parameter alpha is positive number, and parameter beta is real number, is all default coefficient, can by experiment really Fixed.Such as, α and β can be respectively set to 1 and 0.Each classification can share identical parameter α and β.P_iX () represents the confidence level probability of image and space of a whole page classification i.

Parameter w of above-mentioned N class grader_i、b_i, α and β can pass through training side known in the art Method obtains.The LibSVM tool kit increased income can be used to carry out the training of N class Linear SVM.

In practical operation, the layout structure feature of the image calculated can be input to above-mentioned training N class grader, so that the classification of classification confidence output probability with maximum is defined as this figure As space of a whole page classification residing in known multiple space of a whole page classifications.

According to an embodiment of the invention, above-mentioned sub-step S343 comprises the steps that and is utilized respectively One language character string identification engine and non-first language character string identification engine identification first language word With non-first language word.Further, a predeterminable corrected threshold.For the image of input, work as institute When the maximum confidence probability obtained is less than this corrected threshold preset, then can change and be respectively used to identify The character string identification engine of first language line of text and/or non-first language line of text.Additionally, also may be used Preset another corrected threshold.For the image of input, when all in the first language line of text that it is the longest When the average recognition confidence of communication means is respectively less than this another corrected threshold preset, adjustable first The line of text identification direction of language text row and/or non-first language line of text.Thus, can be by pre- If the adjusting thresholds processing mode to image, to improve accuracy.

Figure 11 shows the block diagram of the character recognition device according to one embodiment of the present invention.Such as figure Shown in 11, character recognition device 1100 comprises the steps that extraction unit 1110, taxon 1120, gathers Class unit 1130 and recognition unit 1140.Extraction unit 1110 can extract from the image comprising word Multiple communication means.Multiple communication means that extraction unit 1110 extracts can be entered by taxon 1120 Row classification, to generate first language communication means and/or non-first language communication means.Cluster cell The first language communication means cluster that taxon 1120 can be generated by 1130 is first language text OK, and by non-first language communication means cluster as non-first language line of text.Recognition unit 1140 The first language line of text that can be clustered from cluster cell 1130 and non-first language line of text identify Go out first language word and non-first language word.

Figure 12 shows the block diagram of the recognition unit according to one embodiment of the present invention.Such as Figure 12 Shown in, recognition unit 1140 comprises the steps that layout structure feature calculation subelement 1141, space of a whole page classification Determine that subelement 1142 and line of text process subelement 1143.Layout structure feature calculation subelement The 1141 first language line of text that can be clustered according to cluster cell 1130 and non-first language line of text Calculate the layout structure feature of image.Space of a whole page classification determines that subelement 1142 can be special according to layout structure Levy the layout structure feature of the image that computation subunit 1141 is calculated, determine that image is at multiple spaces of a whole page Space of a whole page classification residing in classification.Line of text processes subelement 1143 can determine son according to space of a whole page classification Unit 1142 processes first language line of text and non-first language literary composition for the space of a whole page classification that this image determines One's own profession, to identify first language word and non-first language word.

According to an embodiment of the invention, the layout structure feature of image comprises the steps that first language The geometric properties of line of text the longest in line of text and recognition confidence and non-first language line of text In the geometric properties of the longest line of text.

Figure 13 shows that the space of a whole page classification according to one embodiment of the present invention determines the frame of subelement Figure.As shown in figure 13, space of a whole page classification determines that subelement 1142 comprises the steps that confidence level probability calculation mould Block 1142a and space of a whole page category determination module 1142b.Confidence level probability evaluation entity 1142a can basis The layout structure feature of image, calculates this image and each space of a whole page class in the multiple space of a whole page classifications preset Other confidence level probability.Space of a whole page category determination module 1142b can be by with the confidence level probability of this image Big space of a whole page classification is defined as the space of a whole page classification of this image.

Figure 14 shows that the line of text according to one embodiment of the present invention processes the block diagram of subelement. As shown in figure 14, line of text processes subelement 1143 and comprises the steps that first language character string identification engine 1143a, non-first language character string identification engine 1143b, identification engine handover module 1143c and knowledge Other direction handover module 1143d.First language character string identification engine 1143a and non-first language word Symbol string identifies that engine 1143b can be respectively used to identify first language word and non-first language word.Right In the image inputted, when its maximum confidence probability is less than the first predetermined corrected threshold, identify Engine handover module 1143c can change and be respectively used to identify first language line of text and/or non-first language The character string identification engine of speech line of text.For the image inputted, when the first language literary composition that it is the longest When in one's own profession, the average recognition confidence of all communication means is respectively less than the second predetermined corrected threshold, know Other direction handover module 1143d adjustable first language line of text and/or non-first language line of text Line of text identification direction.

Figure 15 shows the block diagram of the extraction unit according to one embodiment of the present invention.Such as Figure 15 Shown in, extraction unit 1110 comprises the steps that extraction subelement 1111, recognition confidence computation subunit 1112, stroke width calculation subelement 1113, scope determine subelement 1114 and merge son list Unit 1115.Extract subelement 1111 and can extract connected unit from image.Recognition confidence calculates son Unit 1112 can calculate the identification confidence extracting each connected unit that subelement 1111 is extracted Degree.Stroke width calculation subelement 1113 can calculate extract subelement 1111 extracted every The stroke width of individual connected unit and average stroke width.Scope determines subelement 1114 Stroke width range can be determined according to the average stroke width calculated.Merge son single Recognition confidence can be less than connected unit and the stroke width of predetermined confidence threshold value by unit 1115 The connected unit that degree is not in stroke width range removes, and is closed by the connected unit retained And, to form multiple communication means.

Figure 16 shows the block diagram of the cluster cell according to one embodiment of the present invention.Such as Figure 16 Shown in, cluster cell 1130 comprises the steps that and compares subelement 1131, labelling subelement 1132 and cluster Subelement 1133.Relatively subelement 1131 can by the horizontal interval between first language communication means with Vertical spacing compares, and by the horizontal interval between non-first language communication means and vertical spacing Compare.Labelling subelement 1132 can be each according to the comparative result comparing subelement 1131 One language communication means and non-first language communication means arrange level marks or vertical labelling.Cluster Unit 1133 can will have the first language communication means of level marks and non-first language communication means Cluster is the horizontal line of text of first language and the horizontal line of text of non-first language respectively, and will have vertically The first language communication means of labelling and non-first language communication means cluster respectively erects into first language Straight line of text and the vertical line of text of non-first language.

Figure 17 shows the block diagram of the cluster cell according to another embodiment of the present invention.Such as Figure 17 Shown in, cluster cell 1130 comprises the steps that and compares subelement 1135, labelling subelement 1136, labelling Remove subelement 1137 and cluster subelement 1138.First language can be connected by relatively subelement 1135 Between the level between horizontal interval and vertical spacing and non-first language communication means between parts Every comparing with default threshold value respectively with vertical spacing.Labelling subelement 1136 can be according to comparing The comparative result of subelement 1135 is first language communication means and the setting of non-first language communication means Level marks and vertical labelling.For being marked with the first language of level marks and vertical both labellings even Each in logical parts and non-first language communication means, labelling removes subelement 1137 can be according to it With horizontal interval and the size of vertical spacing of similar communication means, remove its level marks and vertically mark One of note.Cluster subelement 1138 can have first language communication means and the Fei of level marks One language communication means clusters respectively as the horizontal line of text of first language and the horizontal text of non-first language OK, and by the first language communication means and non-first language communication means with vertical labelling gather respectively Class is the vertical line of text of first language and the vertical line of text of non-first language.

It will be understood by those skilled in the art that character recognition method provided by the present invention and device both may be used For identifying the doorplate Word message in the doorplate image-region in captured photo it can also be used to identify Word message in any image comprising word, be particularly suited for having certain layout structure feature, Image that belong to certain space of a whole page type, that comprise polyglot word.

It addition, still need here it is noted that in said apparatus each building block can pass through software, The mode of firmware, hardware or a combination thereof configures.Configure spendable specific means or mode for this Known to skilled person, do not repeat them here.In the case of being realized by software or firmware, From storage medium or network to computer (such as general shown in Figure 18 with specialized hardware structure Computer 1800) install constitute this software program, this computer when being provided with various program, It is able to carry out various functions etc..

In figure 18, CPU (CPU) 1801 is according in read only memory (ROM) 1802 The program stored or the program being loaded into random access memory (RAM) 1803 from storage part 1808 Perform various process.In RAM 1803, perform various always according to needs storage as CPU 1801 Data required during process etc..CPU 1801, ROM 1802 and RAM 1803 are via bus 1804 are connected to each other.Input/output interface 1805 is also connected to bus 1804.

Components described below is connected to input/output interface 1805: importation 1806 (includes keyboard, Mus Mark etc.), output part 1807 (include display, such as cathode ray tube (CRT), liquid crystal Show device (LCD) etc., and speaker etc.), storage part 1808 (including hard disk etc.), communications portion 1809 (including NIC such as LAN card, modem etc.).Communications portion 1809 warp Communication process is performed by network such as the Internet.As required, driver 1810 can be connected to defeated Enter/output interface 1805.Detachable media 1811 such as disk, CD, magneto-optic disk, quasiconductor are deposited Reservoir etc. can be installed in driver 1810 as required so that the computer read out Program is installed to store in part 1808 as required.

In the case of realizing above-mentioned series of processes by software, it is situated between from network such as the Internet or storage Matter such as detachable media 1811 installs the program constituting software.

It will be understood by those of skill in the art that this storage medium is not limited to its shown in Figure 18 In have program stored therein and equipment distributes the detachable media of the program that provides a user with separately 1811.The example of detachable media 1811 comprises disk (comprising floppy disk (registered trade mark)), CD (comprises Compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM 1802, Hard disk of comprising etc. in storage part 1808, wherein computer program stored, and with comprise setting of they For being distributed to user together.

The present invention also proposes the program product that a kind of storage has the instruction code of machine-readable.Described finger When making code be read by machine and perform, the above-mentioned method according to embodiment of the present invention can be performed.

Correspondingly, for carrying the depositing of program product that above-mentioned storage has the instruction code of machine-readable Storage media is intended to be included within the scope of the present invention.Described storage medium include but not limited to floppy disk, CD, Magneto-optic disk, storage card, memory stick etc..

It should be noted that, the method for the present invention be not limited to specifications described in time sequencing hold OK, it is also possible to sequentially, in parallel or independently perform according to other order.Therefore, this explanation The technical scope of the present invention is not construed as limiting by the execution sequence of the method described in book.

The description of embodiment each to the present invention is to be more fully understood that the present invention above, and it is only Exemplary, and be not intended to limit the invention.It should be noted that in the above description, for one Kind embodiment description and/or the feature illustrated can be in same or similar mode one or more Other embodiment individual uses, combined with the feature in other embodiment, or it is real to substitute other Execute the feature in mode.It will be understood by those skilled in the art that in the inventive concept without departing from the present invention In the case of, the variations and modifications carried out for embodiment described above, belong to this In the range of invention.

To sum up, in an embodiment according to the present invention, the invention provides following technical scheme.

Scheme 1, a kind of character recognition method, including:

Multiple communication means is extracted from the image comprising word；

The plurality of communication means is classified, to generate first language communication means and/or non- One language communication means；

It is first language line of text by described first language communication means cluster, and by described non-first language Speech communication means cluster is non-first language line of text；And

First language is identified from described first language line of text and described non-first language line of text Word and non-first language word.

Scheme 2, method as described in scheme 1, wherein from described first language line of text and described non- First language line of text identifies first language word and non-first language word includes:

Described image is calculated according to described first language line of text and described non-first language line of text Layout structure feature；

Layout structure feature according to described image, determines that described image is residing in multiple space of a whole page classifications Space of a whole page classification；And

Space of a whole page classification according to described image processes described first language line of text and described non-first language Speech line of text, to identify first language word and non-first language word.

Scheme 3, method as described in scheme 2, the layout structure feature of wherein said image includes: The geometric properties of line of text the longest in first language line of text and recognition confidence and non-first language The geometric properties of line of text the longest in speech line of text.

Scheme 4, method as described in scheme 3, wherein the geometric properties of line of text includes this article one's own profession Coboundary, lower boundary, left margin, right margin, communication means in this article one's own profession average wide high Ratio and/or the equispaced of adjacent communication means.

Scheme 5, method as according to any one of scheme 2-4, wherein according to the space of a whole page of described image Architectural feature, determines that the space of a whole page classification residing for described image is in multiple space of a whole page classifications includes:

Layout structure feature according to described image, calculates in described image and the plurality of space of a whole page classification Each confidence level probability；And

Space of a whole page classification with the confidence level maximum probability of described image is defined as the space of a whole page of described image Classification.

Scheme 6, method as described in scheme 5, wherein said image is general with the confidence level of each classification Rate is calculated by the nonlinear transformation that threshold value is limited by the trained discriminant function determined, described in sentence Other function is linear or nonlinear.

Scheme 7, method as described in scheme 5 or 6, wherein according to the space of a whole page class other places of described image Manage described first language line of text and described non-first language line of text, to identify first language word Include with non-first language word:

It is utilized respectively first language character string identification engine and non-first language character string identification engine is known Other first language word and non-first language word；And

Wherein, for maximum confidence probability less than the image of the first predetermined corrected threshold, change and divide Yong Yu not identify the character string identification engine of first language line of text and/or non-first language line of text； And

Wherein, for the average recognition confidence of all communication means in the longest first language line of text It is respectively less than the image of the second predetermined corrected threshold, adjusts first language line of text and/or non-first language The line of text identification direction of speech line of text.

Scheme 8, method as according to any one of scheme 1-7, wherein from the image comprising word Extract multiple communication means to include:

Connected unit is extracted from described image；

Calculate the recognition confidence of each connected unit extracted；And

Recognition confidence is removed less than the connected unit of predetermined confidence threshold value, and by not removed Connected unit merges, to form the plurality of communication means.

Scheme 9, method as according to any one of scheme 1-8, wherein from the image comprising word Extract multiple communication means to include:

Connected unit is extracted from described image；

Calculate the stroke width of each connected unit extracted and average stroke width；

According to average stroke width, determine stroke width range；And

The connected unit being not at by stroke width in described stroke width range removes, and Not removed connected unit is merged, to form the plurality of communication means.

Scheme 10, method as according to any one of scheme 1-9, wherein connect described first language Parts cluster is first language line of text, and by described non-first language communication means cluster for non-first Language text row includes:

According to the horizontal interval between first language communication means and the comparative result of vertical spacing and Horizontal interval between non-first language communication means and the comparative result of vertical spacing, for each first Language communication means and non-first language communication means arrange level marks or vertical labelling；

The first language communication means and non-first language communication means with level marks are gathered respectively Class is the horizontal line of text of first language and the horizontal line of text of non-first language；And

The first language communication means and non-first language communication means with vertical labelling are gathered respectively Class is the vertical line of text of first language and the vertical line of text of non-first language.

Scheme 11, method as according to any one of scheme 1-9, wherein connect described first language Parts cluster is first language line of text, and by described non-first language communication means cluster for non-first Language text row includes:

According to the horizontal interval between first language communication means with preset threshold value comparative result with And the comparative result of the horizontal interval between non-first language communication means and described default threshold value, for First language communication means and non-first language communication means arrange level marks；

Comparison knot according to the vertical spacing between first language communication means with described default threshold value Vertical spacing between fruit and non-first language communication means and the comparison knot of described default threshold value Really, vertical labelling is set for first language communication means and non-first language communication means；

For being marked with the first language communication means and non-first of level marks and vertical both labellings Each in language communication means, according to its horizontal interval with similar communication means and vertical spacing Comparative result, removes one of its level marks and vertical labelling；And

The first language communication means and non-first language communication means with level marks are gathered respectively Class is the horizontal line of text of first language and the horizontal line of text of non-first language, and will have vertical labelling First language communication means and non-first language communication means cluster respectively as the vertical text of first language Row and the vertical line of text of non-first language.

Scheme 12, method as according to any one of scheme 1-11, the wherein number of characters of first language Amount is less than the character quantity of non-first language.

Scheme 13, a kind of character recognition device, including:

Extraction unit, extracts multiple communication means from the image comprising word；

Taxon, classifies to the plurality of communication means, to generate first language communication means And/or non-first language communication means；

Cluster cell, is first language line of text by described first language communication means cluster, and by institute Stating non-first language communication means cluster is non-first language line of text；And

Recognition unit, identifies from described first language line of text and described non-first language line of text First language word and non-first language word.

Scheme 14, device as described in scheme 13, wherein said recognition unit includes:

Layout structure feature calculation subelement, according to described first language line of text and described non-first language Speech line of text calculates the layout structure feature of described image；

Space of a whole page classification determines subelement, according to the layout structure feature of described image, determines described image Space of a whole page classification residing in multiple space of a whole page classifications；And

Line of text processes subelement, processes described first language text according to the space of a whole page classification of described image Row and described non-first language line of text, to identify first language word and non-first language word.

Scheme 15, device as described in scheme 14, the layout structure feature of wherein said image includes: The geometric properties of line of text the longest in first language line of text and recognition confidence and non-first language The geometric properties of line of text the longest in speech line of text.

Scheme 16, device as described in scheme 14 or 15, wherein said space of a whole page classification determines that son is single Unit includes:

Confidence level probability evaluation entity, according to the layout structure feature of described image, calculates described image With each confidence level probability in the plurality of space of a whole page classification；And

Space of a whole page category determination module, determines the space of a whole page classification of the confidence level maximum probability with described image Space of a whole page classification for described image.

Scheme 17, device as described in scheme 16, wherein said line of text processes subelement and includes:

First language character string identification engine and non-first language character string identification engine, be respectively used to know Other first language word and non-first language word；

Identify engine handover module, for maximum confidence probability less than the first predetermined corrected threshold Image, changes and is respectively used to identify first language line of text and/or the character of non-first language line of text String identifies engine；And

Identify direction handover module, for all communication means flat in the longest first language line of text All recognition confidences are respectively less than the image of the second predetermined corrected threshold, adjust first language line of text and / or the line of text identification direction of non-first language line of text.

Scheme 18, device as according to any one of scheme 13-17, wherein said extraction unit includes:

Extract subelement, from described image, extract connected unit；

Recognition confidence computation subunit, calculates the recognition confidence of each connected unit extracted；

Stroke width calculation subelement, calculates the stroke width of each connected unit extracted Degree and average stroke width；

Scope determines subelement, according to average stroke width, determines stroke width range； And

Merge subelement, recognition confidence is less than connected unit and the character of predetermined confidence threshold value The connected unit that stroke width is not in described stroke width range removes, and by not removed Connected unit merges, to form the plurality of communication means.

Scheme 19, device as according to any one of scheme 13-18, wherein said cluster cell includes:

Relatively subelement, compares the horizontal interval between first language communication means with vertical spacing Relatively, and by the horizontal interval between non-first language communication means compare with vertical spacing；

Labelling subelement, is each first language interconnecting part according to the described comparative result comparing subelement Part and non-first language communication means arrange level marks or vertical labelling；And

Cluster subelement, connects the first language communication means with level marks with non-first language Parts cluster as the horizontal line of text of first language and the horizontal line of text of non-first language respectively, will have perpendicular First language communication means and the non-first language communication means of straight labelling cluster respectively as first language Vertically line of text and the vertical line of text of non-first language.

Scheme 20, device as according to any one of scheme 13-18, wherein said cluster cell includes:

Relatively subelement, by the horizontal interval between first language communication means and vertical spacing and non- Horizontal interval and vertical spacing between first language communication means compare with default threshold value respectively Relatively；

Labelling subelement, according to the described comparative result comparing subelement be first language communication means and Non-first language communication means arranges level marks and vertical labelling；

Labelling removes subelement, for being marked with the first language of level marks and vertical both labellings even Each, according between its with the level of similar communication means in logical parts and non-first language communication means Every the size with vertical spacing, remove one of its level marks and vertical labelling；And

Claims

1. a character recognition method, including:

Multiple communication means is extracted from the image comprising word；

2. the method for claim 1, wherein from described first language line of text and described non- First language line of text identifies first language word and non-first language word includes:

3. method as claimed in claim 2, the layout structure feature of wherein said image includes: The geometric properties of line of text the longest in first language line of text and recognition confidence and non-first language The geometric properties of line of text the longest in speech line of text.

4. method as claimed in claim 2 or claim 3, wherein the layout structure according to described image is special Levy, determine that the space of a whole page classification residing for described image is in multiple space of a whole page classifications includes:

5. method as claimed in claim 4, wherein processes institute according to the space of a whole page classification of described image State first language line of text and described non-first language line of text, to identify first language word and non- First language word includes:

6. the method as according to any one of claim 1-3, wherein carries from the image comprising word Take multiple communication means to include:

Connected unit is extracted from described image；

Calculate the recognition confidence of each connected unit extracted；And

7. the method as according to any one of claim 1-3, wherein carries from the image comprising word Take multiple communication means to include:

Connected unit is extracted from described image；

According to average stroke width, determine stroke width range；And

8. the method as according to any one of claim 1-3, wherein by described first language interconnecting part Part cluster is first language line of text, and is non-first language by described non-first language communication means cluster Speech line of text includes:

9. the method as according to any one of claim 1-3, wherein by described first language interconnecting part Part cluster is first language line of text, and is non-first language by described non-first language communication means cluster Speech line of text includes:

10. a character recognition device, including: