CN1153358A

CN1153358A - Chinese and English table recognition system and method

Info

Publication number: CN1153358A
Application number: CN 96106616
Authority: CN
Inventors: 徐英士; 陈谋琰; 林文雯; 屠乐梃; 周开祥
Original assignee: Industrial Technology Research Institute ITRI
Current assignee: Transpacific IP Pte Ltd.
Priority date: 1995-06-13
Filing date: 1996-06-07
Publication date: 1997-07-02
Anticipated expiration: 2016-06-07
Also published as: CN1107280C

Abstract

A system for recognizing Chinese and English forms is composed of printed numerals recognition module, alphanumeric script recognition module, printed Chinese character recognition module and Chinese character script recognition module. Its recognition method includes form query, extracting cell to certificate optically scanned data, and artificial correcting. Said form query includes defining boundary of volumn containing data, the nature of data in volumn, the attribute of cell in volumn and position where character is filled. The steps for certificating optically scanned data comprise extracting a volumn, correcting coordinate of volumn, extracting at least one line of text, correcting coordinate of the text line, extracting a cell and correcting coordinate of cell.

Description

The recognition system and the recognition methods of Chinese and English list

The present invention relates to a kind of recognition system and recognition methods of Chinese and English list.Or rather, but the present invention is the recognition system and the recognition methods of the Chinese and English list of an identification printing and hand-written Chinese and character numeral.

Many commerce and government unit all need handle the printing list of filling in handwriting, have many methods can be with this data pick-up, handle and stored.For instance, can utilize image-scanning device and optical character identification technique to extract printing or hand-written data on the list.Form image itself can be via take a picture producing micro monolithic or microfilm, or utilize optical scanning and produce image storage in hard disc of computer or other electron storage medium.Renowned company such as Toshiba (Toshiba), Sanyo (Sanyo), Hitachi (Hitachi), Panasonic (Panasonic) etc. have all released combining image scanning and have handled Japanese and alphanumeric data with the list reading system of optics character identification (OCR) device.

The list that a kind of OCR device is commonly used is A8 or A4 size, always the list of dark grid is arranged.Fig. 1 is an example of describing such list.Comment on the list need be printd in advance in the field position of regulation, and literal to be filled out need fill in in the spaced field between dark grid sign, word and word.Comment does not need with concealed wire (lattice) separately.

Dark lattice list 20 shown in Figure 1 has fields such as 22,24,26,28 can fill in literal, for example: in the medical policy that exemplifies, comprise insurant's name 22, patient name 24, employer's name 26 and patient and insurant's name 28.Related text is received in the grid 30 that contains concealed wire 32, can only fill in Chinese or English alphabet numeral in each grid that is defined by concealed wire 34.Position symbol 36 is printed on the list 20, and in a preferred embodiment, this symbol 36 is positioned at four jiaos of list, the inclination of list and shift state when being used to correct scan.

Fig. 2 shows the part amplification sample of list 20, wherein print the character part, for example " insurant's name " 38 and " patient name " 39 do not have concealed wire to separate, but the hand-written character that falls within hurdle 40 and 42 (being shown in the dotted line field of Fig. 2) is then write in grid 34.Grid 34 is made up of the dark lattice 32 that are positioned at field 44,46.

A character identification system often can't guarantee that identification is errorless, and particularly when the hand-written character of identification, the identification mistake is unavoidable.So, artificial corrigendum (being carried out by operating personnel) is imperative.Some typical character identification system often refuses to recognize hasty and careless or illegal character.When refuse to recognize increase greatly with the misidentification character after, the word correction rate is for automatic system, and is more important than general artificial data login system.Therefore, optics character identification system preferably can provide the cover method that corrigendum can't the identification character effectively.

The list identification result can be divided into three kinds of situations:

1, entirely true: but in the list each character all identification and each field all by the aftertreatment inspection, for example dictionary inspection (whether the identification field meets a word in the dictionary), grammar testing (whether the identification field meets the default syntax) etc.Not needing any artificial corrigendum after debating knowledge, also is can't correct (for example: character identification mistake is desired equally by the aftertreatment inspection) because of the hiding mistake of system causes even any mistake is arranged.Its concealing errors of the system of a practicality must be lower than the manual entry system.

2, artificial corrigendum: need after the list identification through artificial screen corrigendum.But when some character refused to recognize or field in all identifications but of character by integrity checking, then list must section through artificial corrigendum.

3, refuse to recognize for whole: can't the identification character in list too many (for example, because quality of scanning is too poor, list mistake or the hasty and careless list of written handwriting are refused to recognize, the character on this moment all lists must be by artificial input.

Some external optics character identification systems have proposed different solutions at the problems referred to above.For example, United States Patent (USP) 5,251, No. 273 (Betts etc.) propose a data disposal system and method, the mistake that produces behind the identification of the corrigendum scanning in regular turn list.Comprise three Identification Data corrigendum processors in the device that this reference example proposed, i.e. artificial checking of (1) artificial intelligence process device (2) database error detection processor (3) and corrigendum processor.Data structure records recognition result that a kind of machine produces and corrigendum are historical, and it is sent to each processor successively.After artificial intelligence and Database error corrigendum processor processing is finished, can the display field bit image on the workstation screen for artificial corrigendum.

United States Patent (USP) 5,305, No. 396 (Betts) proposes a data handling system and method, can select character identification flow process and Identification Data corrigendum flow process at different client's lists.This reference example proposes elder generation's input one list masterplate before identification, and this masterplate contains the system operating parameters of establishing according to customer demand, and this list masterplate must elder generation be read by system before a large amount of identifications.

United States Patent (USP) 5,235, No. 654 (Anderson etc.) propose progressive data pick-up, a data handling system, handle the form image after scanning.Its content is one can produce the system that new table is done automatic processing.

United States Patent (USP) 5,153, No. 927 (Yamanari) proposes a character reading system and a method, and this patent proposes a character reading system, and this system allows the user to prepare user's special handling procedure, and the specification of this handling procedure need not known by system.This patent proposes two and handles section, and promptly standard processing section and user are from ordering the processing section.This user handles the field that section permission user sets its hope inspection arbitrarily from ordering, and not influence standard is handled section.

United States Patent (USP) 5,233, No. 627 (Yamanari etc.) propose a literal recognizer with special more orthofunction, and a kind of character reading device of this patent disclosure can be avoided hiding original form image when screen display comprises the image of the unit that refuses to read.

The object of the present invention is to provide a kind of character device for identifying with Chinese and English list query function.

Another object of the present invention is to provide the optics character recognition device that to discern character in printing and the hand-written list.

A further object of the present invention is to provide the high device of a kind of recognition efficiency to reach manually more correction method.

Above-mentioned purpose reaches via the recognition device with printed character digit recognition module, handwritten word alphameric identification module, the Chinese identification module of printing, hand-written Chinese identification group.After the data that are extracted are finished identification,, can be presented at again on the screen for watching and correcting if needed.

A preferred embodiment of the present invention comprises that one can " inquire about " the list inquiry module of data address in the form, so can make optical character identification (OCR) device directly enter the field with pending literal and handles.This module can be inquired about character position, and when a large amount of the processing, and the difference of setting mark institute when improving scanning was produced and the permissible aberration that is offset when relatively the position of setting mark was with inquiry on the scan table single image.

The present invention also proposes a kind of printing and hand-written data of extracting, and this data storing is installed with the optical character identification (OCR) in hand-written Chinese and the alphanumeric image document in comprising printing.

A preferred embodiment also provides a kind of corrigendum flow process of carrying out gradually, and wherein artificial corrigendum is only just carried out when being necessary.Its corrigendum program be according to the size of workload by letter to numerous arrangement.That is lower-cost part (more not time-consuming) is carried out earlier.In this embodiment, implementing the character corrigendum earlier, secondly is the field corrigendum, is the whole Zhang Gengzheng of list at last.

List querying method of the present invention, extract the method for character method, checking optical scanning data message correctness and manually correct the step of method of list as follows:

The embodiment that reaches in conjunction with the accompanying drawings of the present invention is described in detail as follows feature:

Description of drawings:

Fig. 1 shows a dark lattice list example;

Fig. 2 is the part enlarged drawing of Fig. 1 list;

Fig. 3 is the calcspar of Chinese and English list recognition device;

Fig. 4 is the process flow diagram of list inquiry among the present invention;

Fig. 5 is the workflow diagram of apparatus of the present invention;

Fig. 6 is the process flow diagram of the corrigendum of the screen among the present invention program;

One when Fig. 7 corrects for character picture is described;

One when Fig. 8 corrects for field picture is described;

Fig. 9 is character corrigendum screen flow figure.

Write as Fig. 3, it is a preferred embodiment of optical character device for identifying of the present invention, and this system 50 comprises sheet conveying system 51, and this sheet conveying system 51 passes through optical scanner (" OCR scanner ") 52 with list along the direction of arrow.The preferred embodiment 52 of a scanner utilizes this list of illuminated with laser light, and utilizes the storage unit of charge coupled device ccd for example to produce the bianry image of this list.It is the bianry image of logical zero that this scanner can produce each pixel NOT logic " 1 ".A kind of model of OCR scanner 52 is that TDC2610W is (by Terminal DataCorp manufacturing.)

Scanner 52 can connect with processor 54 (for example, a general purpose computer or the hardware handles unit of a specific use).The hardware cell of processor can be optical processing unit or electronic processing unit, for example " Resister Summing Network " and Digital Logic circuit.This processor can comprise a microprocessor 56 and all the other elements, screen or monitor 58, keyboard or all the other input medias 60.This processor 54 can comprise the document image after a memory storage 62 stores scanning.This memory storage can be hardware, RAM or all the other memory storages.

Identifying is as follows:

The list of desire identification produces binary image data and deposits storer 62 in via microprocessor 54 processing via feeder 51 and scanner 52 scannings; Application program, character feature, database, list check that knowledge base etc. all is stored in the memory storage 62, when execution debate when knowing image debate know program data base etc. be written in the dynamic RAM by microprocessor 56 controls and progressively carry out until this batch image all handled produce batch in bay be stored in the hard disk.The corrigendum operation is carried out in microprocessor 56 controls, wherein need display image on screen 58, and operate by operating personnel and to export/to go into keyboard, microprocessor 56 is after receiving keyboard 60 inputs, with the corrigendum program in this input value transmission primary memory, so that program continues to carry out till the corrigendum work flow is finished.In scanning process, form image and character recognition data display screen 58, after following character identification program 58 was finished, the character that a preferred embodiment of the present invention can't identification was shown in screen, and the user can utilize keyboard 60 that correct character is replaced and be refused to recognize and the character of misidentification.As following discussion, field that can't identification and list then are shown in does artificial corrigendum on the screen.

For making the literal in OCR system of the present invention " reading " list, the preferable practice is that there is literal to be read in which zone on elder generation of system " inquiry " list, and these literal are with which kind of pattern (for example, printing or hand-written) to occur, and what these literal are.Because different field positions and character character are the OCR device and inquire about before list identification, data pick-up will be very fast, also more correct, and character extraction action is also more efficient.Behind the position relatively expection and real list sprocket bit, list tilts and the border of different fields can accurately be found out.

So make the OCR device need be extracted with containing and the important field of recognition character independent in whole list.As described below, identification and post-treatment parameters also preset, and treatment effeciency is improved.In other words, character character (as printing/hand-written and Chinese/alphanumeric) is handled for identification and is preset, and field is described (name, sex, address etc.) and preset for the words aftertreatment.

The list polling routine:

Fig. 4 is flow process Figure 70 of list inquiry, and at first a blank list is earlier through scanning (step 72), and form image is presented on the computer screen.The operator determines to define one of them field (for example, " insurant's name "), uses peripheral device, Genius mouse for example, and the operator pulls out a square type zone that comprises the identification field.OCR software is found out the field border (step 74) of this field X and Y direction, and the position of so filling in the character grid can indicate automatically.

Then define a character (or field is described step 76), this character is pointed out the classification of data in the field.For example, first field points out to include " insurant's name ", and second field points out to include " patient name " (seeing Fig. 1 and Fig. 2).After field defines, then will define the character attribute (step 78) that includes, that is character should be printing or hand-written English words or printing or hand-written Chinese words in the defined field.For example " patient name " field planted agent fills in hand-written English character.

After field border, character, attribute all defined, defining each concealed wire " grid " 34 (see figure 2)s was that character is filled in place's (step 80).So, this device can be inquired about the desired location of each hand-written character.

Then, the operator defines the position (step 82) of sprocket bit 36, and in preferred embodiment of the present invention, sprocket bit 36 must be positioned at four jiaos of list, and data should adopt laterally and fill in.Then define the character (step 84) of sprocket bit 36 again.

This query script can make OCR50 extract data automatically in the list of filling in.So can quicken character extraction process subsequently, and increase tolerance deviation the angle of inclination.

After the data in all blank lists are all inquired about, the promptly accurately good list that is filled with data of reading of native system, this must extract and character identification two steps through character.Wherein the character extracted data comprises three parts: field extracts, row extracts and character extracts.Character extracts and is further divided into printing character extraction (comprising Chinese and alphanumeric) and hand-written character extraction (comprising Chinese and alphanumeric).

Data extract

Fig. 5 is the identification workflow diagram of preferable enforcement of the present invention, and wherein system 100 is divided into three parts: sweep test 102, character recognition portion divide 104 and identification aftertreatment part 106.

Its workflow is:

At first, the list of filling in places sheet transport system 51, through OCR scanner 52 (as Fig. 3), finishes scanning 110, and this scan image compares with the empty forms forms data that searches and be stored in storer 112 again.

Data pick-up can be divided into three steps.At first, find the field position that comprises extracted data earlier, and consider any possible skew.Next determines the literal line position in the field, and this is that literal line extracts; At last, extract the position of character in the literal line, this is that character extracts.Character extracts can be divided into two steps again, promptly prints character and extracts and hand-written character extraction.

1, field extracts

Extracting module 114 extracts desire identification field and proofreaies and correct the field coordinate.Its step is as follows: at first determine the skew and the inclination of list, this module tolerable tilts (maximum 5 degree) and is offset (list moves during scanning).These two kinds of variations are subjected to the mechanical constraint of paper feed system 51.The determining positions of sprocket bit 36 border of list 20, (for example, in the present embodiment, the border that sprocket bit 36 is pointed out list 20 (for example: in the present embodiment, sprocket bit 36 is pointed out four jiaos of list) and the sprocket bit position that obtains via " searching " on the position of relatively importing the list sprocket bit and the blank list, and learn the inclination and the side-play amount of input list.

Then, the literal character that this module reference column bit data storehouse 112 is write down determines its desired location, and extracts field.Because the inclination and the side-play amount of known list, desiring identification hurdle bit position all can be via calculating and get with respect to blank list.

2, literal line extracts

Then, literal line extracts and row coordinate correction execution in the following manner.The position of literal line in the hurdle is decided in module 114 query word character data storehouses 112, and extracts the position of literal line.If literal line is arranged in the field, then carry out the level projection, it is described below: at first drop in the hurdle stain with the character of delegation with horizontal scanning line decision, these horizontal lines combine and form the accumulation projection amount, and the border of literal line can be by the determining positions of stain in the horizontal line.Then, the position that the field original position that is got by inquiry is used to proofread and correct literal line, that is the original address that utilization " inquiry " obtains is to find out the divisible two overlapping capable optimum level cut-off rules of input characters.When the character string in the literal line surpassed the up-and-down boundary of inquiry literal line, field can be divided into the number row safely, and can obtain correct literal line coordinate this moment.

3, character extracts

Next, character extracts and following steps are carried out in the coordinate corrigendum: utilize the vertical projection of character image in the row to extract character in the row, promptly utilize the vertical scan line character to form the vertical projection amount.The minimum value nidus of projection amount is the boundary position of character.Literal line database field 112 can be used to determine that character is a block letter or hand-written.The desired location of character can be in order to adjust the extraction coordinate of desiring character in the identification field when inquiring about blank list, and it is more effective that character is extracted.The interior character order of literal line is according to the horizontal base scale value, that is its X-coordinate is arranged.

(i) the printing character extracts:

The printing character extracts module 116 and extracts the literal character data storehouses 112 indicated field data that comprise printed data, and its is Chinese or English words with reference to 112 to predict this character.The Chinese printed data is sent into the Chinese identification module 118 of printing, and the alphanumeric printed data is sent into printed character DIGITAL IDENTIFICATION module 120.

Then, carry out the identification of printing character.Known many optical identifying apparatus as shown in Figure 5, comprise module 118,120.(for example, referring to Mc Graw Hill Encyclopediaof Electronics and Computers, pp.109-111 (Mc GRAW-Hill1984)).The optical identifier of identification printing character adopts masterplate comparative approach identification character usually.Yet printing character recognition device 118,120 extracts different features and utilizes and judge DIGITAL IDENTIFICATION expert database 122, and prints Chinese recognition device 116 with reference to the Chinese identification expert database 124 of printing.

(ii) hand-written character extracts:

Hand-written character extracts module 130 and extracts the literal line character data storehouse 112 indicated field data that contain hand-written data, and it includes Chinese or English digital data with reference to 112 with this hand-written field of precognition.The Chinese hand-written data is sent into hand-written Chinese identification module 132, and the alphanumeric hand-written data is sent into handwritten word alphameric identification module 134.

Then carry out hand-written character identification.The hand-written Chinese character that extracts and at least one hand-written Chinese character identification expert 136 compare, and handwritten word alphameric character also compares with at least one handwritten word alphameric character identification expert 138.Have two kinds of preferred mode to carry out identifications, the first adopts statistics identification expert, the feature extraction of extracting character is gone out, and with the storage data storehouse in feature relatively, select near the person as identification result.

Second method is to utilize several identification experts " ballot " to select correct identification result.Adopt four identification experts in preferred embodiment of the present invention, one is above-mentioned data craft; It two is structural loose contrast identification expert; It three is structural periphery contrast identification expert; It four is the neural network of software simulation.Loose contrast identification expert comprises a hop count order, pen section shape (convex or concavity, direction etc.), a segment length and position, turning point etc. with the key feature of character image backboneization and drawing-out structure.Loose comparison sorter is then in order to distinguish unknown character.

Periphery identification expert extracts the periphery of character image and the feature of drawing-out structure, comprises position, number, unique point kind.These features comprise as layout informations such as the number in cavity in the character and positions; Dynamic contrast and layout sorter are used to distinguish unknown character.

System network identification expert extracts general statistical nature, and adopts the system network of expansion backward to distinguish unknown character.

All the other methods also can be used to the hand-written character of identification.

4, identification aftertreatment:

The identification aftertreatment includes two steps: i.e. words aftertreatment and screen corrigendum.Words aftertreatment module 140 comprises address aftertreatment and field inspection.

1, words aftertreatment:

The words aftertreatment utilizes dictionary cross-check character identification correctness.For example, dictionary can comprise the title of city, small towns, road and segmentation in a certain geographic area.The words that identification produces can contrast to determine whether identification is correct with dictionary.In addition, postcode also can be in order to cross-check.

The codomain scope of each character is checked in the field inspection, and whether the character in the field meets the algebraic relation of setting.

2, screen corrigendum:

Fig. 6 is the more process flow diagram of correction method 200 of a preferable screen.The form image of scanning is admitted to list identification system (step 202), list be included into " entirely true ", " artificial corrigendum " or: one of " refusal is accepted " three classes (step 204), right-on form image deposits in earlier in the database (step 222).

The list that needs artificial corrigendum is when handling, and whether decision earlier refuses the unit's (step 206) of reading, and the unit that refuses to read needs by artificial corrigendum (step 208).

When carrying out character (or field) corrigendum, screen corrigendum device 144 unit's (or field) that will refuse to read is presented at (see figure 3) on the screen 58, as shown in Figure 7.First image of refusing to read is presented at and supplies corrigendum on the screen, and these characters belong to same batch, but can be from different lists.But therefore the many lists of single treatment more can raise the efficiency when so making corrigendum.

When list needs artificial corrigendum, but when there is no the unit that refuses to read and existing, the character string in the expression field is checked (step 210) by the field aftertreatment, promptly need carry out field corrigendum (step 214) this moment.

Screen display example when Fig. 8 corrects for carrying out field.As shown in Figure 8, in preferred embodiment of the present invention, monitor that Figure 58 adopts the split screen mode, the field image is presented at a side (being the screen first half in this example) identification result is presented at opposite side (being the screen Lower Half in this example).The user can be with reference to the field image inspection on the screen and corrigendum identification mistake or the character refusing to recognize, and operating personnel for example can utilize the input media of keyboard to import correct character.

If list is by the field inspection, it also deposits database (step 222) in, but do not refuse to recognize as if list, and carry out whole list manual entry (step 218), that is the interior all data of list this moment are by manually typewriting input again by the then whole list of field inspection (being step 216).If the list after the corrigendum can be accepted (that is institute wrong manually corrected), form data promptly is stored in database (step 222) otherwise a promptly whole list refuses to recognize (step 224).

At last, the data of identification generation are sent to format conversion module 146 and convert thereof into database format commonly used.Data after this format conversion and form image can store, inquire about, sort or do other purposes.

When corrigendum refuses to read unit, adopt the principle of the step elder generation execution of workload minimum, that is inspect and correct character earlier but not field or whole list.In addition, character corrigendum step can improve the possibility of list by field inspection and whole inspection, so can handle many lists simultaneously effectively.

The explanation of corrigendum work flow:

The corrigendum operation is that the image with part character, field or whole list shows on screen, is had a question after the part with visual judgement by operating personnel, utilizes keyboard to import lteral data in this character, field or whole the list, imports with the indirect labor.Computer provides following function basically:

1, selects suspicious data, the character (being that what is called refuses to recognize) that can't recognize certainly comprising the character identification; Though or whole field can recognize, the aftertreatment knowledge of this field of gained is checked the identification result of this field when utilizing list to check, does not but meet this aftertreatment knowledge, and this moment, this field image was promptly chosen; In addition, if because list tilts or writer's writing is raised very much grass, in making list, surpass a certain proportion of character or field can't identification the time (according to the list check result as can be known such list how many characters and how many fields should be arranged) then this whole form image can be chosen.The action of the above-mentioned target of choosing (or judging the action which data is chosen) be by be stored in after the CPU of the computer identification in a collection of list that the image in the hard disk mixes with lteral data bay word for word first, pursue that field monitors and computing after, with doubt character or field, even the related data of whole form image (sequence number, image boundary coordinate etc.) is stored in the dynamic RAM, shows for successive image and utilizes;

2, show suspicious data: when data select finish after, CPU carries out promptly that bay monitors in the list in the hardware to being stored in, and, object (comprising character, field or whole image) is presented on the screen according to the above-mentioned related data that is stored in the dynamic RAM.Consider efficiency, therefore the order that shows is to field to whole form image by character;

3, artificial corrigendum: above-mentioned display action except that displayed image on screen, and shows the input characters district, to provide operating personnel with the pairing model answer of this display image, via keyboard input computer under image.CPU promptly carries out field aftertreatment inspection, with the correctness of decision data after receiving these input data.For example: when all refusing in this batch list read unit all import finish after, CPU carries out the aftertreatment inspection, sequence number that will the person of not being inconsistent is again in the typing dynamic RAM, shows usefulness with corrigendum for follow-up image.

Via above-mentioned three basic functions, follow the flow process of Fig. 1, can obtain high efficiency screen corrigendum, simultaneously, also can corresponding every list in hard disk, produce the pure words shelves of every list content.

Effect of the present invention comprises that the convenience of operation and character extract the time Reduce. Character identification speed increases, and formula is manually corrected process especially more step by step Just scanning an effective method very of list after the identification. In addition, can be at screen The corrigendum identification result reaches and effectively extracts and storage data on the curtain. So improve Input, reading, storage print in a large number, the ability of hand-written form data.

Claims

1, a kind of recognition methods of Chinese and English list is characterized in that, comprises list inquiry, optical character identification and words post-processing step,

The list inquiry comprises the steps:

(a) definition contains the border of data message field;

(b) define data message character in this field;

(c) define the attribute of character in this field; And

(d) define the position that the character expection is inserted in this field.

2, method according to claim 1 is characterized in that, it also comprises the position that defines several sprocket bits.

3, method according to claim 1, its spy be in, before it also comprises the defined field bit boundary, with optics scanner scans blank list.

4, method according to claim 1 is characterized in that, step (a)-(d) is heavily covered enforcement to the field that several include data.

5, method according to claim 1 is characterized in that also comprising the form of definition of data information in the step of definition of data information attribute.

6, method according to claim 1 is characterized in that, in the step of defined attribute, also comprises comprising printing or hand-written character in the defined field.

7, want the described method of right according to right, it is characterized in that, described optical character identification step comprises the step of extractor from the list electronic image, and it comprises: determine (a) whether this electronic image tilts or displacement;

(b) extract a field in this electronic image certainly;

(c) proofread and correct the coordinate of this extraction field;

(d) extract at least one literal line in the field after the self-correcting;

(e) proofread and correct the coordinate of this literal line;

(f) extract at least one character in the self-tuning literal line; And

(g) proofread and correct the coordinate that extracts character.

8, method according to claim 7 is characterized in that, before described extraction field, and the definition field.

9, method according to claim 8 is characterized in that, in described definition field step, comprises the following steps:

(a) border of the described field of decision;

(b) position that the character expection occurs in the decision field;

(c) character of selection field; And

(d) mark of selection field.

10, method according to claim 7 is characterized in that, the step whether described decision list tilts or be offset comprises the following steps:

(a) determine the border of this electronic image; And

(b) according to the border of electronic image, decision waits to extract the hurdle bit position.

11, method according to claim 7 is characterized in that, the step of field coordinate is extracted in described correction, comprises tilting and the step of offset projection at the extraction field.

12, method according to claim 7 is characterized in that, also comprises the following steps: in the step of described at least one literal line of extraction

(a) with reference to the position of a database with the decision literal line; And

(b) utilize horizontal projection and its line position that extracts character in the field, adjust the position of literal line in the field.

13, method according to claim 7 is characterized in that, the coordinate step of literal line is extracted in described correction, also comprises the following steps:

(a) horizontal projection with character is projeced into extraction field and line position, to adjust the literal line in the field;

(b) whether the character in the decision literal line surpasses the bottom or the napex of described extraction field; And

(c) surpass the bottom or the napex of described extraction field if find the character in this literal line, then described literal line is produced literal line again.

14, method according to claim 7 is characterized in that, described character abstracting method step also comprises the following steps:

(a) be printing or hand-written with reference to a database decision character;

(b) extract character;

(c) the hand-written character that will extract is sent into hand-written character identification module; And

(d) the printing character that will extract out is sent into printing character identification module.

15, method according to claim 14 is characterized in that, the step of described extraction character also comprises:

(a) determine the vertical projection of a row character; And

(b) separate each character.

16, method according to claim 14 is characterized in that, the hand-written character step that described transmission is extracted also comprises:

(a) inquiry one database is contemplated to alphanumeric or Chinese to determine hand-written character;

(b) handwritten word alphameric unit is sent to handwritten word alphameric character identification module; And

(c) hand-written Chinese is sent to hand-written Chinese character identification module.

17, method according to claim 7 is characterized in that, described correction is extracted character coordinate step and also comprised according to horizontal coordinate arrangement character step.

18, method according to claim 7 is characterized in that, also comprises the following steps:

(a) carry out identification program to extracting character; And

(b) character to identification carries out the identification post processor.

19, method according to claim 1 is characterized in that, described words post-processing step comprises:

(a) on monitor, show the character that identification was handled; And

(b) if needed, corrigendum is any can't identification or the character of understanding.

20, a kind of method of verifying optical scanning data message correctness is characterized in that, comprises the following steps:

(a) form information after the identification is divided into;

(i) entirely true;

(ii) manually corrigendum; And

Refuse to recognize for (iii) whole;

(b) store right-on form information;

(c) in the form information of the artificial corrigendum of needs, whether decision refuses the unit of reading;

(d), manually correct the described unit of reading that refuses if any refusing the unit of reading;

(e) carry out the field aftertreatment inspection first time;

(f) if field then stores these character information by the aftertreatment inspection under the character after the corrigendum;

(g) to not carrying out the field corrigendum by there being first field of refusing to read the first time in field aftertreatment inspection and the hurdle;

(h) the field information after the corrigendum is carried out the field aftertreatment inspection second time;

(i) if the field information after the corrigendum by field aftertreatment inspection for the second time, then stores this field information;

(j) to not by the field aftertreatment inspection second time and be classified as whole the list refusing to recognize and carry out whole Zhang Gengzheng;

(k) form information to whole Zhang Gengzheng carries out system's aftertreatment inspection;

(l) store by this 3rd form information that system's aftertreatment is checked, and

(m) whole refusal accepts to fail the form information checked by the 3rd system's aftertreatment.

21, method according to claim 20 is characterized in that, described scan-data comprises many lists, and artificial corrigendum refuses to read the character corrected in first step from many lists.

22, method according to claim 20 is characterized in that, described artificial corrigendum first step of refusing to read also comprises the following steps:

(a) part of first on monitor shows first image of refusing to read; And

(b) second part at monitor provides the position that can import correct character.

23, the method for an artificial corrigendum optical scanning list is characterized in that, comprises manually correcting the step that program is arranged according to the work complexity, and better simply corrigendum proceedings is before the higher corrigendum program of complexity in this step.

24, method according to claim 23 is characterized in that, many lists of described many fields are manually corrected with following step scanner uni;

(a) artificial corrigendum is not by the character in the list field of field aftertreatment inspection for the first time;

(b) the list field data that hurdle processing is for the second time checked are passed through in artificial corrigendum; And

(c) whole Zhang Gengzheng does not pass through the form data that the 3rd field aftertreatment checked.