CN100351849C - Character recognition apparatus and character recognition method - Google Patents
Character recognition apparatus and character recognition method Download PDFInfo
- Publication number
- CN100351849C CN100351849C CNB2005100551946A CN200510055194A CN100351849C CN 100351849 C CN100351849 C CN 100351849C CN B2005100551946 A CNB2005100551946 A CN B2005100551946A CN 200510055194 A CN200510055194 A CN 200510055194A CN 100351849 C CN100351849 C CN 100351849C
- Authority
- CN
- China
- Prior art keywords
- character
- document
- field
- dictionary database
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
Abstract
The present invention provides a character recognition apparatus including: plural dictionary databases that contain terms or characters classified into respective fields; a determination unit that determines which field the contents of a document shown by document image data belong to; a selection unit that selects a dictionary database pertaining to the field determined by the determination unit from among the plural dictionary databases; a recognition unit that recognizes a term or a character written in the document shown by the document image data by using the terms or characters stored in the selected dictionary database as candidates; and an output unit that outputs the result of recognition by the recognition unit.
Description
Technical field
The present invention relates to be used for the technology of the character that identification reads from document (document).
Background technology
In the character recognition technologies that is called OCR (optical character recognition reader), in advance the candidate of a large amount of characters or term is registered in the dictionary database.To being registered in the character (term) in the dictionary database and comparing, with the character (term) in identification the document from the optically read character of document (term).Therefore, recognition accuracy depends on to a great extent whether dictionary database comprises suitable character or term.
The multilingual that is known as such as Japanese and English provides pre-prepd dictionary database.Then, the word of being made up of a plurality of characters that obtain by the document recognition process is discerned, thereby selected in the aforementioned dictionary database one.If the word of being discerned is registered in the selected dictionary with predetermined value or the ratio (correlation ratio) that is higher than this predetermined value, use this dictionary to proceed identifying so.If this ratio drops to below the predetermined value, re-use another dictionary database so and carry out aforementioned processing.Yet, identification character and identified word exactly rightly in the stage of this technical requirement before dictionary enquiry.In addition, this technology is intended to be used for speech selection, therefore is helpless to improve for example recognition accuracy of Japanese document itself.
Knownly providing another kind of technology, is a series of character strings that unit comes dissociated optical to read with several characters wherein, to extract term candidate.Then, the connection (linkage) of determining a plurality of characters in each term candidate whether with the term candidate symbol of in dictionary database, registering in one be complementary.If do not match, extract term candidate so by different way.Yet this technical requirement is prepared all characters that constitute term candidate in advance and is connected.Therefore the capacity of database becomes very big.In addition, search for all connections character by character and make that processing is greatly complicated, thereby need a large amount of processing times.
Summary of the invention
In view of above situation has proposed the present invention, the invention provides a kind of new mechanism that is used for the identification of pin-point accuracy more document institute write characters.
For addressing the above problem, the invention provides a kind of character recognition device, it comprises: a plurality of dictionary databases comprise the term or the character that are referred in the every field; Determining unit is determined the field under the content of the document that document image data is represented; Selected cell is selected the relevant dictionary database of determining with determining unit in field from described a plurality of dictionary databases; Recognition unit by using the term stored in the selected dictionary database or character as the candidate, is discerned term or the character write in the document of being represented by document image data; And output unit, the recognition result of output recognition unit.According to this character recognition device, determine the field that document content is affiliated earlier, and then select to be suitable for the field particular term dictionary database in this field and to use it for character recognition.Can expect to improve recognition accuracy thus.
Description of drawings
Embodiments of the present invention is described in detail with reference to the accompanying drawings below, in these accompanying drawings:
Fig. 1 is the block diagram that illustrates according to the formation of the character recognition device of first embodiment;
Fig. 2 is the process flow diagram that the operation of described character recognition device is shown;
Fig. 3 is the process flow diagram that the operation of described character recognition device is shown;
Fig. 4 is the block diagram that illustrates according to the formation of the character recognition device of second embodiment;
Fig. 5 (a) to (e) conceptually illustrates the figure that will store the content in block (section) form database into;
Fig. 6 is the process flow diagram that the operation of described character recognition device is shown; And
Fig. 7 is the process flow diagram that the operation of described character recognition device is shown.
Embodiment
Below embodiments of the invention are described.
(1) first embodiment
Fig. 1 is the block diagram that illustrates according to the formation of the character recognition device 10 of first embodiment.This character recognition device 10 can be realized by the computing machine that embeds in scanner, the composite machine (hybrid machine) etc., perhaps can realize by the computing machine that is used as the main process equipment that is connected with scanner or composite machine.In this first embodiment, prepared to comprise the term that is referred in the every field or a plurality of fields particular term dictionary database of character, belong to which field with the content of determining document.Then, from the particular term dictionary database of described a plurality of fields, select the field particular term dictionary database relevant with fixed field.The term or the character that are stored in this field particular term dictionary database by use come execution character identification as the candidate.For example, Fig. 1 shows field particular term dictionary database 11a, 11b and 11c.Field particular term dictionary database 11a is included in frequent term or the character that occurs in the image processing field.Field particular term dictionary database 11b is included in frequent term or the character that occurs in the photography.Field particular term dictionary database 11c is included in frequent term or the character that occurs in the political realms.Yet, except these fields, can also be various fields, as IT, computing machine, law, name, place name and exabyte, prepare suitable field particular term dictionary database.
Form database 12 comprises the title in field under the format information that is used to describe document format and the document content by the mode of mutual correspondence.More particularly, this format information comprises: the format identifier of the document (as order and application form) of each different-format is given in assignment; Information with the feature that is used to describe each form (form of form itself and structure).Character recognition device 10 determines according to being stored in the content in this form database 12 and the content of document image data which field is the content of document belong to.
Memory block particular document attribute storage unit 13 is included in and is appointed as the memory block on document image data storage purpose ground and the corresponding relation between the corresponding domain name when generating document image data.In composite machine of current popular etc., can store the image that reads by scanner with from the corresponding memory block of the numbering of the menu appointment that calls " mailbox (mailbox) " into.Can be exactly above-mentioned " when generating document image data, being appointed as the memory block on document image data storage purpose ground " from this " mailbox " specified memory.In this " mailbox ", for example, specified numbering has nothing in common with each other for the organization unit in the company (department, section office) or for the user usually.Therefore, a plurality of memory blocks that have been assigned identical numbering comprise the document image data in similar field usually.For example, in the mailbox that the Flame Image Process development department of Ying You company uses, the document of being stored is relevant with Flame Image Process usually.Therefore, each memory block in the mailbox with will or organize the field at place to be stored in accordingly mutually in the memory block particular document attribute storage unit 13 by the user of these memory blocks of full-time use.This makes character recognition device 10 only just can determine for the numbering of mailbox appointment which field document content belongs to by reference.
Standard character characteristic quantity storage unit 14 comprises the characteristic quantity of the standard font (character pattern) about each independent character.10 pairs of character recognition devices be stored in this standard character characteristic quantity storage unit 14 characteristic quantity with compare from the characteristic quantity of the optically read font of document, and according to the matching degree identification character between them.
Additional disclosure be that a plurality of fields comprise a plurality of fields and the lower a plurality of fields of interrelated degree that interrelated degree is higher.For example, image processing field and photography have higher interrelated degree.Image processing field and political realms, or how many interdependences photography and political realms do not have.In field degree of association storage unit 15, store the information that is used for this degree of association between the qualification field.For example, suppose the most relevance kilsyth basalt is shown " 1 ".So, the information that is stored in the field degree of association storage unit 15 makes the degree of association of image processing field and photography be " 0.8 ", and makes the degree of association of image processing field and political realms and photography and political realms be all " 0.1 ".
Fig. 2 and 3 is process flow diagrams that the operation of character recognition device 10 is shown.
In Fig. 2, at first, document reading unit 16 utilizes the rayed document with the image on the optically read document, and generates document image data (step S11).From document reading unit 16 the document view data is offered document content determining unit 17.Document content determining unit 17 determines according to process flow diagram shown in Figure 3 the document belongs to which field (step S12).
In Fig. 3,17 references of document content determining unit are stored in the content in the memory block particular document attribute storage unit 13, and determine whether to exist any field that is associated with the zone that comprises described document image data (step S21).Here, if there is any field (is "Yes" at step S21 place) be associated, document content determining unit 17 is identified as field (step S27) under the document content to this field so.
On the other hand, the field that if there is no is associated (is "No" at step S21 place), document content determining unit 17 determines whether the represented image of document image data comprises any format identifier (step S22) so.For example, some format identifier writes on the document bight.Here, if detect any format identifier (being "Yes" at step S22 place) in image, document content determining unit 17 is discerned the field (step S27) corresponding to this format identifier with reference to the content that is stored in the form database 12 so.
On the other hand, if do not detect format identifier (being "No" at step S22 place), 17 pairs of forms by the represented document of document image data of document content determining unit (form and structure) are analyzed (step S23) so.Then, if can be according to analysis result and its field of content recognition (being "Yes" at step S24 place) that is stored in the form database 12, document content determining unit 17 identifies its field (step S27) so.
On the other hand, if can't be according to its field of format identification (is "No" at step S24 place), 17 pairs of a part of execution character identifications (step S25) of document content determining unit so by the represented document of document image data.Handle the character that obtains or term as search key by using via this identification, 17 pairs of all spectra particular term of document content determining unit dictionary database 11a, 11b and 11c search for (step S26).Comprise coupling or similar term or any field particular term dictionary database of character if find in this search, document content determining unit 17 identifies its field (step S27) so.
Here, can come the character recognition at execution in step S25 place to handle by following several method.
Some document not only comprises printed character (typed character) but also comprises handwritten character.For these documents, the accuracy of identification printed character is higher relatively.Therefore, document content determining unit 17 is based on the field of the character identification result of printed character being determined document.Specifically, document content determining unit 17 is divided into printed character zone of writing out with printed character and the handwritten character zone of writing out with handwritten character to the character zone of the represented document of document image data.17 pairs of document content determining units write on the printed character execution character identification processing in the printed character zone then.Then, by using recognition result as search key, 17 pairs of all spectra particular term of document content determining unit dictionary database 11a, 11b and 11c search for.
In addition, the user can use pen etc. to mark on the feature of document.For example, utilize wire tag (line marker) that feature is enclosed picture, adds glissade or colluded note sometimes.17 pairs of document image datas of document content determining unit are analyzed, if there is any gauge point, so preferential identification writes on the character at this some place.Then, by using recognition result as search key, 17 pairs of all spectra particular term of document content determining unit dictionary database 11a, 11b and 11c search for.In addition, write on the character at document top and constitute the title or the exercise question of document usually, so be generally suitable for determining which field is the content of the document belong to the character that the font size bigger than other character write out.Therefore, 17 pairs of document image datas of document content determining unit are analyzed, and, if there is any character that writes on the document top or write out, so preferentially discern these characters with the font size bigger than other character.Then, by using recognition result as search key, 17 pairs of all spectra particular term of document content determining unit dictionary database 11a, 11b and 11c search for.
Get back to Fig. 2, glossary selected cell 18 is selected and the relevant field particular term dictionary database of being determined by document content determining unit 17 (step S13) in field.For example, when the content of document is confirmed as belonging to image processing field, the field particular term dictionary database 11a that glossary selected cell 18 is selected about image processing field.In addition, glossary selected cell 18 is with reference to the content that is stored in the field degree of association storage unit 15, also select field particular term dictionary database 11b, this field particular term dictionary database 11b be restricted to the field relevant (being photography) that has the certain degree of association or the higher degree of association with above-mentioned image processing field here.
Next, character recognition unit 19 by with reference to be stored in characteristic quantity in the standard character characteristic quantity storage unit 14, from the content of characteristic quantity and the selected field particular term dictionary database 11a and the 11b of the optically read font of document, discern character or term (step S14) in the document.Output unit 20 is exported recognition result (step S15) by using the preordering method that shows such as panel.
According to above-mentioned first embodiment, select to comprise the field particular term dictionary database of suitable character or term in view of the content of document.Expection can improve recognition accuracy thus.
(2) second embodiment
In above-mentioned first embodiment, the entire document execution character is discerned by using selected field particular term dictionary database.In following second embodiment, single document is divided into a plurality of zones, then, for character recognition selects to be suitable for each regional field particular term dictionary database.Fig. 4 is the block diagram that illustrates according to the formation of the character recognition device 30 of second embodiment.Indicate by identical label with assembly identical among Fig. 1.Character recognition device 30 shown in Figure 4 is with the difference of the character recognition device of first embodiment shown in Figure 1: the former is provided with block form database 31 and document content determining unit 34 (block division unit 32 and block content determining unit 33), replaces form database 12, memory block particular document attribute storage unit 13, field degree of association storage unit 15 and document content determining unit 17.Block form database 31 comprises the form that is used for describing the block that document will fill and the information of size.For example, this information comprises form and the size as the various blocks of Fig. 5 (a)-(e) conceptually illustrate.
Fig. 6 and Fig. 7 are the process flow diagrams that the operation of character recognition device 30 is shown.
The difference of operation shown in Figure 6 and aforementioned operation shown in Figure 2 is: the former comprises the step S32 that carries out on the piece ground district by district processing to S35, replaces the processing to S15 to the step S12 of entire document execution.That is, document reading unit 16 utilizes the rayed document with the image on the optically read document, and generates document image data (step S11).Then, document content determining unit 34 is determined content (field) (step S32) in piece ground district by district.Specifically, as shown in Figure 7, block division unit 32 is initial with reference to the content that is stored in the block form database 31, and is that unit divides document (step S41) with the block that will fill.Then, block content determining unit 33 is analyzed form and any printed character, symbol and the mark (for example, such as the printed character of " name " and " address " and the symbol of expression postcode or telephone number) big or small and that write of block in this block.Based on this analysis result, 33 pairs of block content determining units write on the field of the content in the block and discern (step S42).For example, the content that has the block of " address " describing should belong to the place name field.Content with block of " name " description should belong to the name field.Before shown in Figure 7 finishing dealing with to this processing of all onblock executing (being "Yes" at step S43 place).
Get back to Fig. 6, glossary selected cell 18 select with by the document content determining unit 34 relevant field particular term dictionary database (step S33) in field determined of piece ground district by district.Character recognition unit 19 by with reference to be stored in characteristic quantity in the standard character characteristic quantity storage unit 14, from the characteristic quantity of the optically read font of document and the content of the field particular term dictionary database of piece ground selection district by district, discern character or term (step S34) in the block.Output unit 20 is exported recognition result (step S35) by using the preordering method that shows such as panel.
According to above-mentioned second embodiment, be that unit divides document with the block that will fill, and according to the suitable field particular term dictionary database of the content choice of each block.Therefore comparing with first embodiment can be by higher accuracy execution character identification.
(3) modified example
Can implement the present invention by the following modified example of above-mentioned a plurality of embodiment.
Field and field particular term dictionary database be not limited among described a plurality of embodiment illustrative those, but can according to character recognition handle at the type and the content of document freely be provided with.
Can also make up and implement first embodiment and second embodiment.For example, in a second embodiment, can as among first embodiment, take in execution character identification to the degree of association between the field.
When the character zone in the document is divided into a plurality of subarea, can be unit with the chapter in the document, paragraph, but not be unit with the block that will fill, divide.
Can adopt at recording medium (as magnetic recording media, optical record medium and ROM, they are readable for CPU or other processor) form of enterprising line item, character recognition device 10 and 30 is offered character recognition device 10 and 30 in order to the control programs of carrying out aforementioned operation.Also can download to character recognition device 10 and 30 to control program by network such as the Internet.
As mentioned above, some embodiments of the present invention are summarized as follows.
Embodiments of the invention provide a kind of character recognition device, and it comprises: a plurality of dictionary databases comprise the term or the character that are referred in the every field; Determining unit is determined the field under the content of the document that document image data is represented; Selected cell is selected the relevant dictionary database of determining with determining unit in field from described a plurality of dictionary databases; Recognition unit by using the term stored in the selected dictionary database or character as the candidate, is discerned term or the character write in the document of being represented by document image data; And output unit, the recognition result of output recognition unit.According to this character recognition device, determine the field that document content is affiliated earlier, and then select to be suitable for the field particular term dictionary database in this field and to use it for character recognition.Can expect to improve recognition accuracy thus.
In this embodiment of the present invention, character recognition device also comprises the area dividing unit that is used for the area dividing with character of document is become a plurality of subareas.Determining unit determines to write on the affiliated field of content in the subarea of being divided with pursuing the subarea.Selected cell is selected the every field relevant dictionary database definite with determining unit.Recognition unit is discerned the term or the character that write in the described zone by using the term stored in the selected dictionary database or character as the candidate.According to this aspect, can select to be suitable for document each subarea field particular term dictionary database and use it for character recognition.
In this embodiment of the present invention, determining unit is divided into printed character zone of writing out with printed character and the handwritten character zone of writing out with handwritten character to the character zone by the represented document of document image data, to writing on the printed character execution character identification in the printed character zone, and recognition result and the term or the character that are stored in in described a plurality of dictionary database each compared, to determine to write on the field under the content in the document that document image data represents.Some document had both comprised printed character and had also comprised handwritten character.For these documents, the accuracy of identification printed character is higher relatively.Therefore, can carry out suitable field and determine by determine the field of document based on the result who printed character is carried out character recognition.
In this embodiment of the present invention, character recognition device also comprises attributes store, and this attributes store comprises the memory block on the storage purpose ground that is designated as these data when generating document image data and the corresponding relation between the corresponding dictionary database.Determining unit is selected the dictionary database corresponding with the memory block that comprises described document image data according to the corresponding relation that is stored in this attributes store.In composite machine of current popular etc., can store the image that scanner reads into from the corresponding memory block of the numbering of the menu appointment that calls " mailbox ".In this " mailbox ", for example, specified numbering has nothing in common with each other for the organization unit in the company (department, section office) or for the user usually.Therefore, a plurality of memory blocks that have been assigned identical numbering comprise the document image data in similar field usually.Therefore, the memory block on the storage purpose ground that when generating document image data, is designated as these data (for example, each memory block in the mailbox) stores accordingly mutually with the specific dictionary storage unit in field (for example, the field that use by the user or the tissue of these memory blocks of full-time use).This only makes just can determine field under the document content by designated storage area.
In this embodiment of the present invention, character recognition device also comprises degree of association storer, and this degree of association memory stores is used for the degree of association that the degree of association between the field is limited.Selected cell is selected to be defined as the dictionary database that the field of determining with determining unit has the field of certain degree of association by the degree of association.
Embodiments of the invention provide a kind of character identifying method, and it may further comprise the steps: store term or character by the field in a plurality of dictionary databases; Determine the affiliated field of content of the document that document image data is represented; From described a plurality of dictionary databases, select the dictionary database relevant with determined field; By using the term stored in the selected dictionary database or character, the term or the character that write in the document that document image data represents are discerned as the candidate; And output recognition result.
In this embodiment of the present invention, described character identifying method also comprises: the area dividing with character of document is become a plurality of subareas.Determining step comprises: determine to write on the affiliated field of content in the subarea that is marked off with pursuing the subarea.The selection step comprises: select to determine the dictionary database that the field is relevant with each.Identification step comprises: by using the term stored in the selected dictionary database or character as the candidate, the term or the character that write in the described zone are discerned.
In this embodiment of the present invention, determining step comprises: the character zone of the document that document image data is represented is divided into printed character zone of writing out with printed character and the handwritten character zone of writing out with handwritten character; To writing on the printed character execution character identification in the printed character zone; And recognition result and the term or the character that are stored in in described a plurality of dictionary database each compared, to determine to write on the field under the content in the document that document image data represents.
In this embodiment of the present invention, described character identifying method is further comprising the steps of: store the memory block on the storage purpose ground that is designated as these data when generating document image data and the corresponding relation between the corresponding dictionary database in attributes store.Determining step comprises: according to the corresponding relation that is stored in the attributes store, select the dictionary database corresponding with the memory block that comprises described document image data.
In this embodiment of the present invention, described character identifying method is further comprising the steps of: storage is used for the degree of association that the degree of association between the field is limited in degree of association storer.The selection step comprises: selection is defined as by the degree of association and determines that the field has the dictionary database in the field of certain degree of association.
Above-mentioned description to the embodiment of the invention provides for carrying out illustration and explanation.It is not exhaustive or limit the invention to disclosed precise forms.Obviously, those skilled in the art will know many modifications and modified example.Embodiment selected and that describe is for best illustrated principle of the present invention and practical application thereof, thereby makes those skilled in the art to understand to can be applicable to other embodiment or the modification of the application-specific conceived.Scope of the present invention is limited by claims and equivalent thereof.
Claims (10)
1, a kind of character recognition device comprises:
A plurality of dictionary databases comprise the term or the character that are referred in the every field;
Determining unit is determined the field under the content of the document that document image data is represented;
Selected cell is selected the relevant dictionary database of determining with determining unit in field from described a plurality of dictionary databases;
Recognition unit by using the term stored in the selected dictionary database or character as the candidate, is discerned term or the character write in the document of being represented by document image data; And
Output unit, the recognition result of output recognition unit.
2, character recognition device as claimed in claim 1 also comprises the area dividing unit that is used for the area dividing with character of document is become a plurality of subareas, and wherein:
Determining unit determines to write on the affiliated field of content in the subarea of being divided with pursuing the subarea;
Selected cell is selected the every field relevant dictionary database definite with determining unit;
Recognition unit is discerned the term or the character that write in the described zone by using the term stored in the selected dictionary database or character as the candidate.
3, character recognition device as claimed in claim 1, wherein
Determining unit is divided into printed character zone of writing out with printed character and the handwritten character zone of writing out with handwritten character to the character zone of the document that document image data is represented, to writing on the printed character execution character identification in the printed character zone, and recognition result and the term or the character that are stored in in described a plurality of dictionary database each compared, to determine to write on the field under the content in the document that document image data represents.
4, character recognition device as claimed in claim 1 also comprises attributes store, and this attributes store comprises the memory block on the storage purpose ground that is designated as these data when generating document image data and the corresponding relation between the corresponding dictionary database, and wherein
Determining unit is selected the dictionary database corresponding with the memory block that comprises described document image data according to the corresponding relation that is stored in this attributes store.
5, character recognition device as claimed in claim 1 also comprises degree of association storer, and this degree of association memory stores is used for the degree of association that the degree of association between the field is limited; And wherein
Selected cell is selected to be defined as the dictionary database that the field of determining with determining unit has the field of certain degree of association by the degree of association.
6, a kind of character identifying method may further comprise the steps:
Storing step is stored term or character by the field in a plurality of dictionary databases;
Determining step is determined the field under the content of the document that document image data is represented;
Select step, from described a plurality of dictionary databases, select the dictionary database relevant with determined field;
Identification step by using the term stored in the selected dictionary database or character as the candidate, is discerned the term or the character that write in the document that document image data represents; And
The output step, the output recognition result.
7, character identifying method according to claim 6, further comprising the steps of: the area dividing with character of document is become a plurality of subareas, and wherein:
Determining step comprises: determine to write on the affiliated field of content in the subarea that is marked off with pursuing the subarea;
The selection step comprises: select to determine the dictionary database that the field is relevant with each; And
Identification step comprises: by using the term stored in the selected dictionary database or character as the candidate, the term or the character that write in the described zone are discerned.
8, character identifying method according to claim 6, wherein
Determining step comprises:
The character zone of the document that document image data is represented is divided into printed character zone of writing out with printed character and the handwritten character zone of writing out with handwritten character;
To writing on the printed character execution character identification in the printed character zone; And
Recognition result and the term or the character that are stored in in described a plurality of dictionary database each are compared, to determine to write on the field under the content in the document that document image data represents.
9, character identifying method according to claim 6, further comprising the steps of: in attributes store, store the memory block on the storage purpose ground that when generating document image data, is designated as these data and the corresponding relation between the corresponding dictionary database, and wherein
Determining step comprises: according to the corresponding relation that is stored in the attributes store, select the dictionary database corresponding with the memory block that comprises described document image data.
10, character identifying method according to claim 6, further comprising the steps of: storage is used for the degree of association that the degree of association between the field is limited in degree of association storer; And wherein
The selection step comprises: selection is defined as by the degree of association and determines that the field has the dictionary database in the field of certain degree of association.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004245311A JP2006065477A (en) | 2004-08-25 | 2004-08-25 | Character recognition device |
JP2004245311 | 2004-08-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1741034A CN1741034A (en) | 2006-03-01 |
CN100351849C true CN100351849C (en) | 2007-11-28 |
Family
ID=35943131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100551946A Expired - Fee Related CN100351849C (en) | 2004-08-25 | 2005-03-16 | Character recognition apparatus and character recognition method |
Country Status (3)
Country | Link |
---|---|
US (1) | US20060045340A1 (en) |
JP (1) | JP2006065477A (en) |
CN (1) | CN100351849C (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080008391A1 (en) * | 2006-07-10 | 2008-01-10 | Amir Geva | Method and System for Document Form Recognition |
JP5239419B2 (en) * | 2008-03-14 | 2013-07-17 | オムロン株式会社 | Character recognition program, character recognition electronic component, character recognition device, character recognition method, and data structure |
JP2010217996A (en) * | 2009-03-13 | 2010-09-30 | Omron Corp | Character recognition device, character recognition program, and character recognition method |
JP2011065322A (en) * | 2009-09-16 | 2011-03-31 | Konica Minolta Holdings Inc | Character recognition system and character recognition program, and voice recognition system and voice recognition program |
CN102855264B (en) * | 2011-07-01 | 2015-11-25 | 富士通株式会社 | Document processing method and device thereof |
US9082035B2 (en) * | 2011-08-29 | 2015-07-14 | Qualcomm Incorporated | Camera OCR with context information |
DE102012008512A1 (en) * | 2012-05-02 | 2013-11-07 | Eyec Gmbh | Apparatus and method for comparing two graphics and text elements containing files |
JP6140946B2 (en) * | 2012-07-26 | 2017-06-07 | キヤノン株式会社 | Character recognition system and character recognition device |
JP2014067303A (en) * | 2012-09-26 | 2014-04-17 | Toshiba Corp | Character recognition device and method and program |
CN104903802B (en) * | 2013-02-28 | 2017-03-08 | 发纮电机株式会社 | Mapping editing device |
CN105427696A (en) * | 2015-11-20 | 2016-03-23 | 江苏沁恒股份有限公司 | Method for distinguishing answer to target question |
CN108921103B (en) * | 2018-07-05 | 2019-04-16 | 掌阅科技股份有限公司 | For the label synchronous method of check and correction, calculating equipment and computer storage medium |
KR20200010777A (en) * | 2018-07-23 | 2020-01-31 | 휴렛-팩커드 디벨롭먼트 컴퍼니, 엘.피. | Character recognition using previous recognition result of similar character |
JP2022148922A (en) * | 2021-03-24 | 2022-10-06 | 富士フイルムビジネスイノベーション株式会社 | Information processing device and program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1059414A (en) * | 1991-03-12 | 1992-03-11 | 窦祖烈 | The interpretation method of Chinese sentence |
CN1215201A (en) * | 1997-10-16 | 1999-04-28 | 富士通株式会社 | Character identifying/correcting mode |
CN1221927A (en) * | 1997-12-19 | 1999-07-07 | 松下电器产业株式会社 | Character recognizor and its method, and recording medium for computer reading out |
JPH11203414A (en) * | 1998-01-08 | 1999-07-30 | Fuji Xerox Co Ltd | Broadly classified dictionary preparing device |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4944022A (en) * | 1986-12-19 | 1990-07-24 | Ricoh Company, Ltd. | Method of creating dictionary for character recognition |
JP2713622B2 (en) * | 1989-11-20 | 1998-02-16 | 富士通株式会社 | Tabular document reader |
JP3275153B2 (en) * | 1993-03-03 | 2002-04-15 | 株式会社日立製作所 | Dictionary distribution system and dictionary distribution management method |
JP3375766B2 (en) * | 1994-12-27 | 2003-02-10 | 松下電器産業株式会社 | Character recognition device |
US6101515A (en) * | 1996-05-31 | 2000-08-08 | Oracle Corporation | Learning system for classification of terminology |
JP3525997B2 (en) * | 1997-12-01 | 2004-05-10 | 富士通株式会社 | Character recognition method |
JP3895892B2 (en) * | 1999-09-22 | 2007-03-22 | 株式会社東芝 | Multimedia information collection management device and storage medium storing program |
JP4377494B2 (en) * | 1999-10-22 | 2009-12-02 | 東芝テック株式会社 | Information input device |
US6603464B1 (en) * | 2000-03-03 | 2003-08-05 | Michael Irl Rabin | Apparatus and method for record keeping and information distribution |
US20040205671A1 (en) * | 2000-09-13 | 2004-10-14 | Tatsuya Sukehiro | Natural-language processing system |
-
2004
- 2004-08-25 JP JP2004245311A patent/JP2006065477A/en not_active Withdrawn
-
2005
- 2005-03-16 US US11/080,489 patent/US20060045340A1/en not_active Abandoned
- 2005-03-16 CN CNB2005100551946A patent/CN100351849C/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1059414A (en) * | 1991-03-12 | 1992-03-11 | 窦祖烈 | The interpretation method of Chinese sentence |
CN1215201A (en) * | 1997-10-16 | 1999-04-28 | 富士通株式会社 | Character identifying/correcting mode |
CN1221927A (en) * | 1997-12-19 | 1999-07-07 | 松下电器产业株式会社 | Character recognizor and its method, and recording medium for computer reading out |
JPH11203414A (en) * | 1998-01-08 | 1999-07-30 | Fuji Xerox Co Ltd | Broadly classified dictionary preparing device |
Also Published As
Publication number | Publication date |
---|---|
CN1741034A (en) | 2006-03-01 |
US20060045340A1 (en) | 2006-03-02 |
JP2006065477A (en) | 2006-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100351849C (en) | Character recognition apparatus and character recognition method | |
CN100351839C (en) | File searching and reading method and apparatus | |
US6671684B1 (en) | Method and apparatus for simultaneous highlighting of a physical version of a document and an electronic version of a document | |
US8285047B2 (en) | Automated method and system for naming documents from a scanned source based on manually marked text | |
US20090144277A1 (en) | Electronic table of contents entry classification and labeling scheme | |
US6178417B1 (en) | Method and means of matching documents based on text genre | |
US20090123071A1 (en) | Document processing apparatus, document processing method, and computer program product | |
US7593961B2 (en) | Information processing apparatus for retrieving image data similar to an entered image | |
CN1254894A (en) | Method for font access, register, display, printing and file processing,and record medium | |
CN101533317A (en) | Fast recording device with handwriting identifying function and method thereof | |
CN1894685A (en) | Translation tool | |
CN1838113A (en) | Translation processing method, document translation device, and programs | |
JP2004334339A (en) | Information processor, information processing method, and storage medium, and program | |
US7359896B2 (en) | Information retrieving system, information retrieving method, and information retrieving program | |
JPH11282955A (en) | Character recognition device, its method and computer readable storage medium recording program for computer to execute the method | |
CN1106620C (en) | Information processing method and apparatus | |
Couasnon et al. | Making handwritten archives documents accessible to public with a generic system of document image analysis | |
Garris et al. | NIST Scoring Package User’s Guide | |
CN100444194C (en) | Automatic extraction device, method and program of essay title and correlation information | |
JP3145071B2 (en) | Character recognition method and device | |
CN117688162B (en) | Full text retrieval method and system based on OCR (optical character recognition) | |
US20040083242A1 (en) | Method and apparatus for locating and transforming data | |
Furukawa et al. | D-pen: A digital pen system for public and business enterprises | |
Al-Barhamtoshy et al. | Universal metadata repository for document analysis and recognition | |
JP4261831B2 (en) | Character recognition processing method, character recognition processing device, character recognition program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20071128 Termination date: 20170316 |
|
CF01 | Termination of patent right due to non-payment of annual fee |