CN102024138A - Character identification method and character identification device - Google Patents

Character identification method and character identification device Download PDF

Info

Publication number
CN102024138A
CN102024138A CN2009101736929A CN200910173692A CN102024138A CN 102024138 A CN102024138 A CN 102024138A CN 2009101736929 A CN2009101736929 A CN 2009101736929A CN 200910173692 A CN200910173692 A CN 200910173692A CN 102024138 A CN102024138 A CN 102024138A
Authority
CN
China
Prior art keywords
character
mark
marked pixels
pixel
discern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2009101736929A
Other languages
Chinese (zh)
Other versions
CN102024138B (en
Inventor
常兰兰
孙俊
小泽宪秋
武部浩明
于浩
直井聪
堀田悦伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN 200910173692 priority Critical patent/CN102024138B/en
Priority to JP2010200193A priority patent/JP2011065643A/en
Publication of CN102024138A publication Critical patent/CN102024138A/en
Application granted granted Critical
Publication of CN102024138B publication Critical patent/CN102024138B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a character identification method and a character identification device. The character identification method according to one embodiment of the invention comprises the following steps of: extracting a part of mark pixels of a mark according to the position and shape characteristics of the mark on a marked character in a character image to be identified; expanding the extracted part of mark pixels into a mark line segment by comprising adjacent pixels with the same direction; acquiring a refined image of the character image to be identified; growing the expanded mark line segment into the identified mark along the track of the refined image; separating the identified mark from the character image; and identifying the separated character image.

Description

Character identifying method and character recognition device
Technical field
Relate generally to character identifying method of the present invention and character recognition device.More particularly, the present invention relates to a kind of character identifying method and character recognition device of the mark on can the separating character image.
Background technology
OCR (Optical Character Recognition, optical character identification) system has more and more popularized and has seemed for computer utility and become more and more important.The OCR system is converted to e-file with the document of paper spare form, has simplified the data input and has made it possible to carry out easily editor, management, distribution of flood tide document or the like.The recognition capability of OCR engine is the key factor that influences its application cost, has only the identification of pin-point accuracy could guarantee its using value.For common print text document, those standardized characters especially, current most of OCR engines can both be realized high recognition.
But, in some cases, such as registration form, questionnaire, bill etc., can be with some character marking so that to represent selection result, these marks have brought new challenge to the identification of OCR engine.At first, some marks have connected into a character with two or more characters, and this can cause the Character segmentation failure of OCR engine usually.Secondly, mark may occupy the zone bigger than character zone, and this will make character size diminish when the OCR engine carries out normalization, thereby causes follow-up recognition failures.
For this reason, proposed a kind of method in the prior art and extracted with character and have marked pixels on the mark of different colours by color filtration, but this method cisco unity malfunction when mark and character have same color.Existing another kind of method is that the gray scale difference according to mark and character comes separation marking and character and discerns, but the work of this method is also unstable, because occur that mark has same grayscale with character and situation about can't separate through regular meeting.
Summary of the invention
In view of the foregoing, the present invention proposes a kind of character identifying method and character recognition device, by utilize to mark and character all be suitable for locus and shape facility come separation marking and character, realize character recognition thus.According to character identifying method of the present invention and character recognition device, the mark on the character picture that the character picture that can detect easily and separate and will discern overlaps, thus recover character picture so that discern.
At first provide about brief overview of the present invention hereinafter, so that basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.
According to an aspect of the present invention, provide a kind of character identifying method, comprising: according to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction; By comprising neighbor the part marked pixels of described extraction is expanded to the mark line segment with equidirectional; Obtain the refined image of the described character picture that will discern; The mark line segment of described expansion is grown to the mark of identification along the track of described refined image; The mark of described identification is separated with described character picture; And the character picture of discerning described separation.
Character identifying method according to an embodiment of the invention comprises that also the candidate region of the described character picture that selection will be discerned is as described tab character.
According to another aspect of the present invention, provide a kind of character recognition device, comprising: the marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction; Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment; The refined image acquiring unit is configured to obtain the refined image of the described character picture that will discern; Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification; Separative element is configured to the mark of described identification is separated with described character picture; And recognition unit, be configured to discern the character picture of described separation.
Character recognition device according to an embodiment of the invention also comprises the tab character selected cell, and the candidate region that is configured to select the described character picture that will discern is as described tab character.
Preferably, described selection candidate region comprises: to horizontal direction and vertical direction described text block is divided into character zone by the text block alternating projection in the described character picture that will discern; Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the big young pathbreaker of the more described character zone of cutting apart; And with described contact area and described large scale zone as described tab character.
According to one embodiment of present invention, described extraction unit divides the mark pixel packets to draw together the part marked pixels of extracting outside the rectangle frame that comprises character.Specifically, described extraction unit divides the mark pixel packets to draw together: select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction; Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And whether the error of fitting of calculating described candidate's marked pixels group is marked pixels with definite described candidate's marked pixels group.
According to another embodiment of the invention, described extraction unit divides the mark pixel packets to draw together: estimate stroke width by analyzing the distance of swimming; Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And the pixel that will have on the width that two parts and each part are arranged on the ruler line segment that pass through feature suitable with described stroke width is defined as marked pixels.
According to still a further embodiment, described extraction unit divides the mark pixel packets to draw together: for each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column; Calculate reference coordinate according to described reference character; And be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.Preferably, when described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used to calculate described reference coordinate; And when described reference character be during along vertical direction, only the horizontal coordinate with described reference character is used to calculate described reference coordinate.
According to one embodiment of present invention, the part marked pixels of the described extraction of described expansion comprises: the directional diagram that obtains described tab character; And marked pixels by selecting before the pixel-expansion in the regional area that is included in described directional diagram with identical value.
According to one embodiment of present invention, the mark line segment of the described expansion of described growth comprises: be included in connection pixel in the track of described refined image one by one till running into the abutment.
As can be seen, according to character identifying method of the present invention and character recognition device, by utilizing locus and the shape facility that mark and character all are suitable for, easily separation marking and character, thus recover character picture easily so that discern.
In addition, the present invention also is provided for realizing the computer program of above-mentioned character identifying method.
In addition, the present invention also provides the computer program of computer-readable medium form at least, records the computer program code that is used to realize above-mentioned character identifying method on it.
Description of drawings
The present invention can wherein use same or analogous Reference numeral to represent identical or similar parts in institute's drawings attached by being better understood with reference to hereinafter given in conjunction with the accompanying drawings description.Described accompanying drawing comprises in this manual and forms the part of this instructions together with following detailed description, and is used for further illustrating the preferred embodiments of the present invention and explains principle and advantage of the present invention.In the accompanying drawings:
Fig. 1 (a) illustrates the example of the character picture that has mark that will discern;
Fig. 1 (b) illustrates the character picture that the character picture that has mark shown in Fig. 1 (a) is carried out mark and the output of character after separating according to embodiments of the invention;
Fig. 1 (c) illustrates the marking image that the character picture that has mark shown in Fig. 1 (a) is carried out mark and the output of character after separating according to embodiments of the invention;
Fig. 2 illustrates the process flow diagram of the processing procedure of character identifying method according to an embodiment of the invention;
Fig. 3 illustrates the process flow diagram according to the concrete processing procedure of embodiments of the invention in the selected marker character step of Fig. 2;
Fig. 4 illustrates according to the character picture example of embodiments of the invention after carrying out cutting and classification;
Fig. 5 (a) flag activation closely centers on the example of character;
Fig. 5 (b) illustrates the example that does not have the contact of available reference character situation;
Fig. 6 illustrates according to embodiments of the invention and divides the process flow diagram of first example process in the mark pixel step in the extraction unit of Fig. 2;
Fig. 7 (a) and 7 (b) illustrate the character picture projection waveform example figure in vertical direction that has mark;
Fig. 7 (c) and 7 (d) illustrate the character picture projection waveform example figure in the horizontal direction that has mark;
Fig. 8 illustrates according to embodiments of the invention and divides the process flow diagram of second example process in the mark pixel step in the extraction unit of Fig. 2;
Fig. 9 illustrates according to embodiments of the invention utilization and passes through feature and carry out the exemplary plot that the part marked pixels is extracted;
Figure 10 illustrates according to embodiments of the invention and divides the process flow diagram of the 3rd example process in the mark pixel step in the extraction unit of Fig. 2;
Figure 11 illustrates according to embodiments of the invention and utilizes reference coordinate to carry out the exemplary plot that the part marked pixels is extracted as a reference;
Figure 12 illustrates the process flow diagram according to the processing procedure of embodiments of the invention in the marked pixels step that the expansion of Fig. 2 is extracted;
The exemplary plot of the directional diagram of Figure 13 flag activation character;
Figure 14 illustrates the exemplary plot of the character picture that has mark that will discern after the refinement;
Figure 15 illustrates the configuration block scheme of character recognition device according to an embodiment of the invention; And
Figure 16 illustrates the structure calcspar that is used to implement according to the messaging device of character identifying method of the present invention.
It will be appreciated by those skilled in the art that in the accompanying drawing element only for simple and clear for the purpose of and illustrate, and not necessarily draw in proportion.For example, some size of component may have been amplified with respect to other elements in the accompanying drawing, so that help to improve the understanding to the embodiment of the invention.
Embodiment
To be described one exemplary embodiment of the present invention in conjunction with the accompanying drawings hereinafter.For clarity and conciseness, all features of actual embodiment are not described in instructions.Yet, should understand, in the process of any this practical embodiments of exploitation, must make a lot of decisions specific to this actual embodiment, so that realize developer's objectives, for example, meet and system and professional those relevant restrictive conditions, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, might be very complicated and time-consuming though will also be appreciated that development, concerning the those skilled in the art that have benefited from present disclosure, this development only is customary task.
At this, what also need to illustrate a bit is, for fear of having blured the present invention because of unnecessary details, only show in the accompanying drawings with according to closely-related apparatus structure of the solution of the present invention and/or treatment step, and omitted other details little with relation of the present invention.
For the ease of deepening understanding, will specifically be exemplified as character picture that how the example explanation will have mark and be separated into character picture and marking image and character picture is discerned with the character after obtaining discerning with shown in Figure 1 hereinafter the principle of the invention.As shown in fig. 1, Fig. 1 (a) illustrates the example of the character picture that has mark that will discern, Fig. 1 (b) illustrates the character picture that the character picture that has mark shown in Fig. 1 (a) is carried out mark and the output of character after separating according to embodiments of the invention, and Fig. 1 (c) illustrates the marking image that the character picture that has mark shown in Fig. 1 (a) is carried out mark and the output of character after separating according to embodiments of the invention.
To at first the basic functional principle of character identifying method according to an embodiment of the invention be described below with reference to Fig. 2 to Figure 14.
As shown in Figure 2, character identifying method according to this embodiment of the invention comprises: selected marker character step S210 is used to select the candidate region of the character picture that will the discern character that serves as a mark; Extraction unit is divided mark pixel step S220, is used for according to the position of the mark on the character picture tab character that will discern and the part marked pixels of the described mark of Shape Feature Extraction; The marked pixels step S230 that expansion is extracted is used for by comprising neighbor with equidirectional the part marked pixels of described extraction being expanded to the mark line segment; Obtain refined image (thinned image) step S240, be used to obtain the refined image of the described character picture that will discern; The mark line segment step S250 of growth expansion is used for along the track of described refined image the mark line segment of described expansion being grown to the mark of identification; Separating character and markers step S260 are used for the mark of described identification is separated with described character picture; And discern isolated character step S270, be used to discern the character picture of described separation.
Here the step S210 that it is pointed out that selected marker character recited above is an optional step.That is to say, can be without the selection of tab character and directly the character picture that has mark that will discern is carried out extraction unit divides mark pixel step S220 and later processing thereof, can realize equally mark being separated with character picture and the character picture after separating being discerned, thereby submit accuracy and the reliability of discerning to.
Next will be in conjunction with the accompanying drawings 3 selected marker character step S210, the extraction units that comprise to 14 pairs of character identifying methods shown in Figure 2 of accompanying drawing divide marked pixels step S230 that mark pixel step S220, expansion extract, obtain mark line segment step S250, the separating character and the markers step S260 of refined image step S240, growth expansion and the processing discerned in each step such as isolated character step S270 is described in detail.
Fig. 3 illustrates the process flow diagram of the concrete processing procedure in the selected marker character step S210 of Fig. 2 according to one embodiment of present invention.As shown in Figure 3, when the markd character of select tape, at first in step S310, to horizontal direction and vertical direction described text block is divided into character zone by the text block alternating projection in the described character picture that will discern.
Then, in step S320, the relatively size of the character zone of cutting apart in step S310, and the character zone after will cutting apart is divided three classes, i.e. contact area, large scale zone and normal size zone.Fig. 4 illustrates the character picture example after carrying out cutting and classification according to this embodiment of the invention.At last, the character that in step S330, contact area and large scale zone served as a mark, and be the nonflag character zone with the normal size area marking.
Here, also mark out reference character at each tab character, reference character is that those and tab character are positioned at delegation or the same character that lists.As shown in Figure 4,, marked out two reference characters, and, then had only a reference character for the contact situation for the large scale situation that illustrates.
In addition, if all character zones are the normal size zone, then this character picture that will discern is categorized as the nonflag character image.So, need not to carry out marked pixels step S230, mark line segment step S250 and separating character and the markers step S260 that obtains refined image step S240, growth expansion that extraction unit shown in Figure 2 divides mark pixel step S220, expansion to extract, treatment scheme directly advances to step S270 and carries out the character recognition processing.
According to process selecting shown in Figure 3 after the tab character, next will be according to the position of the mark on the tab character of selecting and the part marked pixels of the described mark of Shape Feature Extraction.When extracting the part marked pixels of mark, can handle accordingly according to the diverse location and the shape facility of mark.To make a concrete analysis of and describe at several concrete conditions below.
According to one embodiment of present invention, when extraction unit is divided the mark pixel, as shown in Figure 5, can extract the part marked pixels outside the rectangle frame that comprises character.Utilize this feature, extraction unit is divided the mark pixel easily when mark closely centers on character, shown in Fig. 5 (a).In addition, for the contact situation that does not have the available reference character, this processing mode also can obtain treatment effect preferably.
Fig. 6 illustrates according to embodiments of the invention and divides the process flow diagram of first example process among the mark pixel step S220 in the extraction unit of Fig. 2.As shown in Figure 6, at first in step S610, select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction.
Fig. 7 (a) and 7 (b) illustrate the character picture projection waveform example figure in vertical direction that has mark, and two vertical curves of the left and right sides are corresponding to two vertical curves on Fig. 7 (a) Chinese words both sides among Fig. 7 (b).Fig. 7 (c) and 7 (d) illustrate the character picture projection waveform example figure in the horizontal direction that has mark, and two vertical curves of the left and right sides are corresponding to two horizontal horizontal lines on Fig. 7 (c) Chinese words both sides among Fig. 7 (d).
Like this, to the shown example of Fig. 7 (d), pixel serves as a mark can to select pixel (respectively corresponding to two ripples outside the vertical curve of the left and right sides among Fig. 7 (b)) outside two vertical curves among Fig. 7 (a) and the pixel (respectively corresponding to two ripples outside the vertical curve of the middle left and right sides of Fig. 7 (d)) outside two horizontal horizontal lines among Fig. 7 (c) for Fig. 7 (a).
Then, in step S620, by utilizing the least square curve fitting method to set up curve model with the described candidate's marked pixels of match group, and in step S630, the error of fitting of calculating described candidate's marked pixels group is to determine whether described candidate's marked pixels group is marked pixels.If error of fitting is less, can think that then the pixel in this candidate's marked pixels group is a marked pixels.By the processing of step S620 and step S630, those can be judged as marked pixels and actual false marked pixels eliminating for character pixels.For example, for the pixel outside the right side vertical curve among Fig. 7 (a),, so it is made as is not marked pixels because the error of fitting between the curve model of actual pixel value and match exceeds predetermined threshold value.
In addition, when extraction unit is divided the mark pixel,, also can utilize the feature of passing through of contact fragment to determine marked pixels at contact situation recited above.Fig. 8 illustrates according to embodiments of the invention and divides the process flow diagram of second example process among the mark pixel step S220 in the extraction unit of Fig. 2.
As shown in Figure 8, the processing that divides the mark pixel according to the extraction unit of this embodiment, at first in step S810, estimate stroke width by analyzing the distance of swimming, in step S820, contact the feature of passing through of fragment then, and the pixel that will have on the width that two parts and each part are arranged on the ruler line segment that pass through feature suitable with described stroke width is defined as marked pixels in step S830 along the orthogonal directions inspection of contact direction.
Fig. 9 illustrates to utilize according to this embodiment of the invention and passes through the exemplary plot that feature is carried out the extraction of part marked pixels.The darker part of gray scale on the mark as shown in Figure 9 promptly is width two parts suitable with stroke width on the ruler, therefore these pixels is defined as marked pixels.
In addition, for above-described large-sized situation, when extraction unit is divided the mark pixel, can extract by the layout of analyzing reference character.Figure 10 illustrates according to embodiments of the invention and divides the process flow diagram of the 3rd example process among the mark pixel step S220 in the extraction unit of Fig. 2.
As shown in figure 10, when extraction unit is divided the mark pixel, at first at step S1010, for each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column, calculates reference coordinate at step S1020 according to described reference character then.After the reference coordinate of having determined character, be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark at step S1030.
When calculating reference coordinate in step S1020, when described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used to calculate described reference coordinate.Similarly, when described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used to calculate described reference coordinate.
Figure 11 illustrates and utilizes reference coordinate to carry out the exemplary plot that the part marked pixels is extracted as a reference according to this embodiment of the invention.As shown in figure 11, the pixel extraction outside two vertical dotted lines in the character picture is served as a mark pixel.
After having extracted the part marked pixels, in the marked pixels step S230 that expansion shown in Figure 2 is extracted, the part marked pixels of described extraction is expanded to the mark line segment by comprising neighbor with equidirectional according to method recited above.Figure 12 illustrates the process flow diagram of the processing procedure in the marked pixels step S230 that the expansion of Fig. 2 is extracted according to this embodiment of the invention.
As shown in figure 12, when the part marked pixels that expansion is extracted, at first obtain the directional diagram of tab character, then marked pixels in step S1220 by selecting before the pixel-expansion in the regional area that is included in described directional diagram with identical value at step S1210.
Figure 13 illustrates the directional diagram according to the tab character of a concrete example of the present invention.As shown in Figure 13, can calculate the gradient of each pixel on all directions according to following formula and obtain the directional diagram in tab character zone.
C_horizontal=|in(i,j)-in(i,j-1)|+|in(i,j)-in(i,j+1)|+|in(i-1,j)-in(i-1,j-1)|+|in(i-1,j)-in(i-1,j+1)|+|in(i+1,j)-in(i+1,j-1)|+|in(i+1,j)-in(i+1,j+1)|
C_vertical=|in(i,j)-in(i-1,j)|+|in(i,j)-in(i+1,j)|+|in(i,j-1)-in(i-1,j-1)|+|in(i,j-1)-in(i+1,j-1)|+|in(i,j+1)-in(i-1,j+1)|+|in(i,j+1)-in(i+1,j+1)|
C_diagonal135=|in(i,j)-in(i-1,j-1)|+|in(i,j)-in(i+1,j+1)|+2*|in(i,j+1)-in(i-1,j)|+2*|in(i,j-1)-in(i+1,j)|
C_diagonal45=|in(i,j)-in(i-1,j+1)|+|in(i,j)-in(i+1,j-1)|+2*|in(i,j-1)-in(i-1,j)|+2*|in(i,j+1)-in(i+1,j)|
When the marked pixels of selecting is before expanded, if the mark selected line segment is positioned on the same directional ray part in directional diagram, then should whole directional ray part all be labeled as marked pixels, realize expansion thus to the part marked pixels of extracting.
Return Fig. 2 now, after in step S230, the part marked pixels of extracting being expanded, obtain the refined image of the character picture that will discern at step S240, as shown in figure 14, illustrate according to the character picture that has mark that will discern after the refinement of a concrete example of the present invention.
Then, in step S250, be included in connection pixel in the track of described refined image one by one till running into the abutment, thus the mark line segment of expanding among the step S230 be grown to the mark of identification.Then, the mark with described identification in step S260 separates with described character picture, and the character picture of the described separation of identification in step S270.
Below 2 describe the processing procedure and the detailed operation principle thereof of character identifying method according to an embodiment of the invention in detail to accompanying drawing 14 in conjunction with the accompanying drawings.Below in conjunction with Figure 15 the structure and the principle of work thereof of character recognition device are according to an embodiment of the invention described.
As shown in figure 15, comprise according to the character recognition device of this embodiment: tab character selected cell 1510 is configured to select the candidate region of the character picture that will the discern character that serves as a mark; Marked pixels extraction unit 1520 is configured to according to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction; Expanding element 1530 is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment; Refined image acquiring unit 1540 is configured to obtain the refined image of the described character picture that will discern; Mark line segment growing element 1550 is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification; Separative element 1560 is configured to the mark of described identification is separated with described character picture; And recognition unit 1570, be configured to discern the character picture of described separation.
The tab character selected cell 1510 that comprises according to the character recognition device of this embodiment, marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560, and recognition unit 1570 waits the selected marker character step S210 in the character identifying method that the concrete processing procedure in each module describes with reference Fig. 2 to Figure 14 respectively, extraction unit is divided mark pixel step S220, the marked pixels step S230 that expansion is extracted, obtain refined image step S240, the mark line segment step S250 of growth expansion, separating character and markers step S260, and the processing of discerning in each steps such as isolated character step S270 is similar, omits further detailed description at this.
It is to be noted equally, but the tab character selected cell 1510 here is an arrangement, can not comprise tab character selected cell 1510 according to one embodiment of present invention, but only constitute by above-mentioned marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560 and recognition unit 1570, can realize separating of character picture and marking image too, thereby improve the accuracy of identification.
So, by above-described character identifying method according to an embodiment of the invention and character recognition device, can detect the mark that exists on the character picture that to discern exactly, and from character, isolate all or part of marked pixels, thereby can discern exactly.
In addition, because character identifying method and character recognition device according to an embodiment of the invention, adopted stable and mark position and shape facility come mark on the separating character image reliably, and position and shape facility equally also are applicable to character, therefore can guarantee that the pixel of being extracted belongs to marked pixels, also exactly character picture be discerned thereby can from character picture, isolate all or part of marked pixels credibly.
In addition, in character identifying method and character recognition device according to an embodiment of the invention, owing to adopted the track of the image after directional diagram and the refinement to carry out the expansion of mark line segment as a reference, constraint on the space is provided, help avoid thus character pixels is divided into marked pixels mistakenly, thereby separating character image and marking image exactly, for follow-up realization exactly the identification character image guarantee is provided.
Ultimate principle of the present invention has below been described in conjunction with specific embodiments, but, also it is to be noted, for those of ordinary skill in the art, can understand the whole or any steps or the parts of method and apparatus of the present invention, can be in the network of any calculation element (comprising processor, storage medium etc.) or calculation element, realized that with hardware, firmware, software or their combination this is that those of ordinary skills use their basic programming skill just can realize under the situation of having read explanation of the present invention.
Therefore, purpose of the present invention can also realize by program of operation or batch processing on any calculation element.Described calculation element can be known fexible unit.Therefore, purpose of the present invention also can be only by providing the program product that comprises the program code of realizing described method or device to realize.That is to say that such program product also constitutes the present invention, and the storage medium that stores such program product also constitutes the present invention.Obviously, described storage medium can be any storage medium that is developed in any known storage medium or future.
Realizing under the situation of embodiments of the invention by software and/or firmware, from storage medium or network to computing machine with specialized hardware structure, general purpose personal computer 700 for example shown in Figure 16 is installed the program that constitutes this software, this computing machine can be carried out various functions or the like when various program is installed.
In Figure 16, CPU (central processing unit) (CPU) 701 carries out various processing according to program stored among ROM (read-only memory) (ROM) 702 or from the program that storage area 708 is loaded into random-access memory (ram) 703.In RAM 703, also store data required when CPU 701 carries out various processing or the like as required.CPU 701, ROM 702 and RAM 703 are connected to each other via bus 704.Input/output interface 705 also is connected to bus 704.
Following parts are connected to input/output interface 705: importation 706 comprises keyboard, mouse or the like; Output 707 comprises display, such as cathode ray tube (CRT), LCD (LCD) or the like and loudspeaker or the like; Storage area 708 comprises hard disk or the like; With communications portion 709, comprise that network interface unit is such as LAN card, modulator-demodular unit or the like.Communications portion 709 is handled such as the Internet executive communication via network.
As required, driver 710 also is connected to input/output interface 705.Detachable media 711 is installed on the driver 710 as required such as disk, CD, magneto-optic disk, semiconductor memory or the like, makes the computer program of therefrom reading be installed to as required in the storage area 708.
Realizing by software under the situation of above-mentioned series of processes, such as detachable media 711 program that constitutes software is being installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 16 wherein having program stored therein, distribute separately so that the detachable media 711 of program to be provided to the user with device.The example of detachable media 711 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 702, the storage area 708 or the like, computer program stored wherein, and be distributed to the user with the device that comprises them.
It is pointed out that also that in apparatus and method of the present invention obviously, each parts or each step can decompose and/or reconfigure.These decomposition and/or reconfigure and to be considered as equivalents of the present invention.And, carry out the step of above-mentioned series of processes and can order following the instructions naturally carry out in chronological order, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.
Though described the present invention and advantage thereof in detail, be to be understood that and under not breaking away from, can carry out various changes, alternative and conversion by the situation of the appended the spirit and scope of the present invention that claim limited.And, the application's term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby make the process, method, article or the device that comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or also be included as this process, method, article or device intrinsic key element.Do not having under the situation of more restrictions, the key element that limits by statement " comprising ... ", and be not precluded within process, method, article or the device that comprises described key element and also have other identical element.
Remarks
1. 1 kinds of character identifying methods of remarks comprise:
According to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction;
By comprising neighbor the part marked pixels of described extraction is expanded to the mark line segment with equidirectional;
Obtain the refined image of the described character picture that will discern;
The mark line segment of described expansion is grown to the mark of identification along the track of described refined image;
The mark of described identification is separated with described character picture; And
Discern the character picture of described separation.
Remarks 2. also comprises according to remarks 1 described character identifying method:
The candidate region of the described character picture that selection will be discerned is as described tab character.
Remarks 3. is according to remarks 2 described character identifying methods, and wherein said selection candidate region comprises:
To horizontal direction and vertical direction described text block is divided into character zone by the text block alternating projection in the described character picture that will discern;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the big young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 4. is according to remarks 3 described character identifying methods, and wherein said extraction unit divides the mark pixel packets to draw together the part marked pixels of extracting outside the rectangle frame that comprises character.
Remarks 5. is according to remarks 4 described character identifying methods, and wherein said extraction unit divides the mark pixel packets to draw together:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether described candidate's marked pixels group is marked pixels.
Remarks 6. is according to remarks 3 described character identifying methods, and wherein said extraction unit divides the mark pixel packets to draw together:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
The pixel that will have on the width that two parts and each part are arranged on the ruler line segment that pass through feature suitable with described stroke width is defined as marked pixels.
Remarks 7. is according to remarks 3 described character identifying methods, and wherein said extraction unit divides the mark pixel packets to draw together:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 8. is according to remarks 7 described character identifying methods, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used to calculate described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used to calculate described reference coordinate.
Remarks 9. is according to any one the described character identifying method in the remarks 1 to 8, and the part marked pixels of the described extraction of wherein said expansion comprises:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion in the regional area that is included in described directional diagram with identical value.
Remarks 10. is according to any one the described character identifying method in the remarks 1 to 8, and the mark line segment of the described expansion of wherein said growth comprises:
Be included in connection pixel in the track of described refined image one by one till running into the abutment.
11. 1 kinds of character recognition devices of remarks comprise:
The marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction;
Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment;
The refined image acquiring unit is configured to obtain the refined image of the described character picture that will discern;
Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification;
Separative element is configured to the mark of described identification is separated with described character picture; And
Recognition unit is configured to discern the character picture of described separation.
Remarks 12. also comprises according to remarks 11 described character recognition devices:
The tab character selected cell, the candidate region that is configured to select the described character picture that will discern is as described tab character.
Remarks 13. is according to remarks 12 described character recognition devices, and wherein said tab character selected cell also is configured to:
To horizontal direction and vertical direction described text block is divided into character zone by the text block alternating projection in the described character picture that will discern;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the big young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 14. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to extract the part marked pixels outside the rectangle frame that comprises character.
Remarks 15. is according to remarks 14 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether described candidate's marked pixels group is marked pixels.
Remarks 16. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
The pixel that will have on the width that two parts and each part are arranged on the ruler line segment that pass through feature suitable with described stroke width is defined as marked pixels.
Remarks 17. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 18. is according to remarks 17 described character recognition devices, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used to calculate described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used to calculate described reference coordinate.
Remarks 19. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said expanding element also is configured to:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion in the regional area that is included in described directional diagram with identical value.
Remarks 20. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said mark line segment growing element also is configured to be included in one by one connection pixel in the track of described refined image till running into the abutment.

Claims (10)

1. character identifying method comprises:
According to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction;
By comprising neighbor the part marked pixels of described extraction is expanded to the mark line segment with equidirectional;
Obtain the refined image of the described character picture that will discern;
The mark line segment of described expansion is grown to the mark of identification along the track of described refined image;
The mark of described identification is separated with described character picture; And
Discern the character picture of described separation.
2. character identifying method according to claim 1 also comprises:
The candidate region of the described character picture that selection will be discerned is as described tab character.
3. character identifying method according to claim 2, wherein said selection candidate region comprises:
To horizontal direction and vertical direction described text block is divided into character zone by the text block alternating projection in the described character picture that will discern;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the big young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
4. character identifying method according to claim 3, wherein said extraction unit divide the mark pixel packets to draw together:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether described candidate's marked pixels group is marked pixels.
5. character identifying method according to claim 3, wherein said extraction unit divide the mark pixel packets to draw together:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
The pixel that will have on the width that two parts and each part are arranged on the ruler line segment that pass through feature suitable with described stroke width is defined as marked pixels.
6. character identifying method according to claim 3, wherein said extraction unit divide the mark pixel packets to draw together:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
7. according to any one the described character identifying method in the claim 1 to 6, the part marked pixels of the described extraction of wherein said expansion comprises:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion in the regional area that is included in described directional diagram with identical value.
8. according to any one the described character identifying method in the claim 1 to 6, the mark line segment of the described expansion of wherein said growth comprises:
Be included in connection pixel in the track of described refined image one by one till running into the abutment.
9. character recognition device comprises:
The marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will discern and the part marked pixels of the described mark of Shape Feature Extraction;
Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment;
The refined image acquiring unit is configured to obtain the refined image of the described character picture that will discern;
Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification;
Separative element is configured to the mark of described identification is separated with described character picture; And
Recognition unit is configured to discern the character picture of described separation.
10. character recognition device according to claim 9 also comprises:
The tab character selected cell, the candidate region that is configured to select the described character picture that will discern is as described tab character.
CN 200910173692 2009-09-15 2009-09-15 Character identification method and character identification device Expired - Fee Related CN102024138B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 200910173692 CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device
JP2010200193A JP2011065643A (en) 2009-09-15 2010-09-07 Method and apparatus for character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910173692 CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device

Publications (2)

Publication Number Publication Date
CN102024138A true CN102024138A (en) 2011-04-20
CN102024138B CN102024138B (en) 2013-01-23

Family

ID=43865419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910173692 Expired - Fee Related CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device

Country Status (2)

Country Link
JP (1) JP2011065643A (en)
CN (1) CN102024138B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184396A (en) * 2011-06-13 2011-09-14 北方工业大学 Document image tilt correction method based on OCR recognition feedback
CN102567725A (en) * 2011-12-23 2012-07-11 国网电力科学研究院 Soft segmentation method of financial OCR system handwritten numerical strings
CN102867178A (en) * 2011-07-05 2013-01-09 富士通株式会社 Method and device for Chinese character recognition
CN103218801A (en) * 2012-01-06 2013-07-24 富士施乐株式会社 Image processing apparatus and method, specifying mark estimating apparatus and method
CN104021385A (en) * 2013-03-02 2014-09-03 北京信息科技大学 Video subtitle thinning method based on template matching and curve fitting
US9087272B2 (en) 2013-07-17 2015-07-21 International Business Machines Corporation Optical match character classification
CN106845473A (en) * 2015-12-03 2017-06-13 富士通株式会社 For determine image whether be the image with address information method and apparatus
CN109542285A (en) * 2018-11-16 2019-03-29 北京小米移动软件有限公司 Image processing method and device
CN114365056A (en) * 2019-08-09 2022-04-15 弗劳恩霍夫应用研究促进协会 Device, method for controlling said device and group or cluster of devices

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1025764C (en) * 1992-05-12 1994-08-24 浙江大学 Characters recognition method and system
US6047251A (en) * 1997-09-15 2000-04-04 Caere Corporation Automatic language identification system for multilingual optical character recognition
US7024042B2 (en) * 2000-10-04 2006-04-04 Fujitsu Limited Word recognition device, word recognition method, and storage medium

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184396A (en) * 2011-06-13 2011-09-14 北方工业大学 Document image tilt correction method based on OCR recognition feedback
CN102867178B (en) * 2011-07-05 2015-06-10 富士通株式会社 Method and device for Chinese character recognition
CN102867178A (en) * 2011-07-05 2013-01-09 富士通株式会社 Method and device for Chinese character recognition
CN102567725A (en) * 2011-12-23 2012-07-11 国网电力科学研究院 Soft segmentation method of financial OCR system handwritten numerical strings
CN103218801B (en) * 2012-01-06 2018-01-30 富士施乐株式会社 Image processing equipment and method and specified mark estimation apparatus and method
CN103218801A (en) * 2012-01-06 2013-07-24 富士施乐株式会社 Image processing apparatus and method, specifying mark estimating apparatus and method
CN104021385A (en) * 2013-03-02 2014-09-03 北京信息科技大学 Video subtitle thinning method based on template matching and curve fitting
CN104021385B (en) * 2013-03-02 2017-11-21 北京信息科技大学 Video caption thinning method based on template matches and curve matching
US9087272B2 (en) 2013-07-17 2015-07-21 International Business Machines Corporation Optical match character classification
CN106845473A (en) * 2015-12-03 2017-06-13 富士通株式会社 For determine image whether be the image with address information method and apparatus
CN106845473B (en) * 2015-12-03 2020-06-02 富士通株式会社 Method and device for determining whether image is image with address information
CN109542285A (en) * 2018-11-16 2019-03-29 北京小米移动软件有限公司 Image processing method and device
CN114365056A (en) * 2019-08-09 2022-04-15 弗劳恩霍夫应用研究促进协会 Device, method for controlling said device and group or cluster of devices
CN114365056B (en) * 2019-08-09 2024-04-12 弗劳恩霍夫应用研究促进协会 Device, method for controlling said device, and device group or group of devices

Also Published As

Publication number Publication date
CN102024138B (en) 2013-01-23
JP2011065643A (en) 2011-03-31

Similar Documents

Publication Publication Date Title
CN102024138B (en) Character identification method and character identification device
US11182604B1 (en) Computerized recognition and extraction of tables in digitized documents
CN109308476A (en) Billing information processing method, system and computer readable storage medium
CN112597773B (en) Document structuring method, system, terminal and medium
CN102375988A (en) File image processing method and equipment
CN102332096A (en) Video caption text extraction and identification method
CN104966051A (en) Method of recognizing layout of document image
CN102890783A (en) Method and device for recognizing direction of character in image block
CN103034848A (en) Identification method of form type
Harit et al. Table detection in document images using header and trailer patterns
CN105719243A (en) Image processing device and method
CN113095267B (en) Data extraction method of statistical chart, electronic device and storage medium
CN110263739A (en) Photo table recognition methods based on OCR technique
CN104077593A (en) Image processing method and image processing device
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN113610068B (en) Test question disassembling method, system, storage medium and equipment based on test paper image
CN113537227B (en) Structured text recognition method and system
CN102890784B (en) The method and apparatus in the direction of word in recognition image block
Wang et al. Scene table structure recognition with segmentation collaboration and alignment
JP5601027B2 (en) Image processing apparatus and image processing program
JP4849883B2 (en) Row direction determination program, method and apparatus
JP5189056B2 (en) Mark item recognition device, mark item recognition method, and mark item recognition program
CN102855477A (en) Method and device for recognizing direction of characters in image block
KR100834602B1 (en) Character recognition apparatus and character recognition method
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130123

Termination date: 20130915