CN1452119A - Bills reading system, method and program - Google Patents

Bills reading system, method and program Download PDF

Info

Publication number
CN1452119A
CN1452119A CN02151375A CN02151375A CN1452119A CN 1452119 A CN1452119 A CN 1452119A CN 02151375 A CN02151375 A CN 02151375A CN 02151375 A CN02151375 A CN 02151375A CN 1452119 A CN1452119 A CN 1452119A
Authority
CN
China
Prior art keywords
document
mentioned
definition
view data
characteristic testing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN02151375A
Other languages
Chinese (zh)
Other versions
CN1198236C (en
Inventor
古川直広
嶺竜治
酒匂裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of CN1452119A publication Critical patent/CN1452119A/en
Application granted granted Critical
Publication of CN1198236C publication Critical patent/CN1198236C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/00002Diagnosis, testing or measuring; Detecting, analysing or monitoring not otherwise provided for
    • H04N1/00026Methods therefor
    • H04N1/00034Measuring, i.e. determining a quantity by comparison with a standard

Abstract

A document reading system, method and program therefor, for preparing and using a single document definition database DB even when scanning environments at defining a document and at reading a document are different from each other or a plurality of scanners are used for defining or reading the document, and automatically detecting state of the scanner such as degradating with time and faults and so on. The document reading system includes: a stroing device for stroing the document definition and characteristic testing pattern of a first input device; a device gets document image data and characteristic testing pattern of a second input device via network; a device for reading the document definition of the document image data and characteristic testing pattern of the first input device from the storing device; a device for calculting the difference of characteristic testing patterns of the first and the second input device; a device for approximating two characteristic testing patterns using the result of the calculting device; and a device for reading the document image data according to the document definition.

Description

Document read-out system, document reading method and document read routine
Technical field
The present invention relates to read the document disposal system of information such as the amount of money that is recorded on the document and payer's name.Be particularly related to generation, management and use method as the document definition of the pre-knowledge of the document that uses in the document disposal system.
Background technology
Pay the image of documents such as book and read in optical scanner etc. depositing subpoena and tax in, the read-out system of reading information such as the amount of money that document puts down in writing and payer's name from this view data is called the document disposal system.
Correctly handle document in order to make above-mentioned document disposal system, as pre-knowledge, in input media, must have and record and narrate the size read document, read that line number, position, word kind, number of characters etc. are read document feature such as the necessary parameter space of a whole page of document and in the information of handling the necessary disposal route of document.
In these information, for example, can comprise:
(1) the document kind of information,
Document kind ID,
Document publisher name,
Document publisher account number
(2) layout information,
The line lattice
The sash position
Sash attribute (amount of money record sash, date record sash etc.)
Sash record text type (numeral, Chinese character, katakana etc.)
(3) application message
Document is handled formality
Document is cut the line position
Neck receive seal affix one's seal in the information such as position certain some, in this manual, will comprise the layout information of sash positional information and sash attribute information at least as the document information processing.
In order to realize high-precision document processing, it is important that generation method, (b) management method, (c) of (a) of document definition utilize method.
As existing method about above-mentioned (a) document definition generation method, for example, open the input picture that discloses in the 2000-172779 communique from the document kind of document definition formation object the Jap.P. spy, the Automatic Extraction sash, utilize the format indication of the sash of login in advance, determine effective sash, generate the method for layout information.
In addition, open the spy and to disclose the inscape that can satisfy conditions such as character pattern in the flat 11-184965 communique, extract the method for the document definition that is used for distinguishing the document kind by Automatic Extraction in image.
Document definition management method about above-mentioned (b), define the method for DB (database) as the document of managed storage document definition, open to disclose by network in the flat 9-73502 communique the spy many Bill Management devices are connected, only make a document treating apparatus wherein keep document definition DB, send the method for information through network to other document treating apparatus.
In addition, open the spy and to disclose in the 2001-307008 communique as required, reduce the method for constructing document definition DB cost to document treating apparatus tender of documents definition.
Document definition about above-mentioned (c) utilizes method, generally is to handle according to the layout information execution character string identification of field sash of putting down in writing in the document definition and record character kind etc., will read object field and read.
Summary of the invention
Yet, for example, when utilizing line lattice information to determine the position of the project of reading, the occasion that has is because the image of the appearance of event the line lattice of the scan characteristic of each scanner when defining is different, can not as record in the definition, extract the line lattice, can make and determine this position of reading project failure.Like this, just there is following problem, promptly in existing method, owing to be the information of the document definition of the different characteristic value record of each input media of utilizing the brightness value of not considering to read in image and resolution etc. to use as it is, the scanner occasion of using when generating the document definition different with the characteristic of the scanner of document treating apparatus is difficult to the high-precision document of reading.In addition, as the formation of the reality of document disposal system, consider:
(i) be not only use 1 and be to use the scanner of many multiple classes collect the document image generate the document definition occasion,
(ii) in the occasion of using 1 document definition DB by many multiple class document treating apparatus.In this various scanners and the occasion of depositing, with the generation of existing document definition with to utilize method to dispose in fact be impossible.
Like this, just exist in document when definition different occasion of scanning circumstance when reading, and utilize a plurality of scanners to carry out the occasion that document definition or document are read, can not realize the problem 1 that high-precision document is read with existing method with document.
In addition, in order to keep high-precision reading always, must hold the scanner state in the document disposal system.Generally, optical scanner along with service time deterioration.And, sometimes because transfer system is bad etc., the flexible unusual occasion that waits of scan image can appear.The scanner of image quality aggravation must be changed as early as possible.But, but exist and have no idea to detect automatically the problem 2 of deterioration and fault etc. in time.
The invention that the present invention finishes just in view of the above problems, even its purpose is to provide a kind of different occasion of scanning circumstance when reading with document when document defines, and utilize a plurality of scanners to carry out the occasion that document defines or document is read, can generate and use the method for single document definition DB.
And, for the problems referred to above 2, providing a kind of automatic detection method of the scanner state of deterioration and fault etc. in time, this also is a purpose of the present invention.
The present invention, to achieve these goals, by in the definition of each document, carrying out record to the characteristic value of the input media that generates this definition or with its information that links, characteristic to definition generation and document readout device when document is read compares, carry out and the corresponding character string of this result identification and, make and to utilize single document definition DB under the constant situation of reading accuracy former state keeping than equity.
In addition, by observation, when great changes will take place, can detect automatically and judge that scanner such as deterioration and fault state in time takes place this scanner in characteristic testing pattern to the characteristic testing pattern of each scanning circumstance.
Description of drawings
Fig. 1 is the diagrammatic sketch that litigant of the present invention and relation thereof are shown.
Fig. 2 is the diagrammatic sketch that the configuration example 1 of document disposal system of the present invention is shown.
Fig. 3 is the diagrammatic sketch that the configuration example 2 of document disposal system of the present invention is shown.
Fig. 4 illustrates the diagrammatic sketch that the document definition generates the conventional example of step and document reading step.
Fig. 5 illustrates the diagrammatic sketch that document definition of the present invention generates step and document reading step.
Fig. 6 illustrates the exemplary plot that characteristic testing pattern generates specimen page.
Fig. 7 is the characteristic testing pattern exemplary plot.
The processing flow chart that Fig. 8 extracts for scan characteristic.
Fig. 9 is the processing flow chart of correcting value calculation element.
Figure 10 is the processing flow chart of document readout device.
Figure 11 is the processing flow chart of scanning mode pick-up unit.
Figure 12 is the key diagram of the image rectification of gray scale intensities value.
Figure 13 extracts the key diagram that the result proofreaies and correct for the line lattice.
Figure 14 is the key diagram of line lattice aligning step.
Figure 15 is the key diagram to line compartment distance.
Embodiment
Below litigant of the present invention and system's formation, each function etc. are described in detail.
At first, embodiments of the present invention are given summary introduction (Fig. 1).The litigant of Chu Xianing has 4 in the present embodiment.
The 1st litigant is document disposal system supplier 101.
The 2nd litigant is document disposal system user 102.
The 3rd litigant is a document publisher 103.
The 4th litigant handles hoper 104 for document.
For example, be example with electricity charge payment, document disposal system supplier 101 is system development and service company, and document disposal system user 102 is a financial institution, and document publisher 103 is a Utilities Electric Co., and document processing hoper 104 is electricity usage person.Utilities Electric Co. imposes for the electricity charge and payment distribution document, and it is sent to each electricity consumer.Electricity consumer is held to paying the document that the electricity charge send to and is utilized the document disposal system to handle the formality of paying dues to financial institution.Afterwards, financial institution imports Utilities Electric Co. with the usage charges of paying, and Utilities Electric Co. is to financial institution's payments document handling.Whole flow process finishes.
In this occasion, financial institution buys or leases the document disposal system from system development and service company, uses this document disposal system, and pays it and buy expense or usage charges.
The formation of the document disposal system that the document disposal system supplier 101 of Fig. 1 is provided is illustrated below.The document disposal system is broadly divided into: (1) concentrated (Fig. 2) and two kinds of formations of (2) divergence type (Fig. 3).
As constituting of the concentrated (Fig. 2) of the 1st kind of formation of document disposal system, read server 251 by scanner 211~213, document image server 221, characteristic testing pattern server 231, document definition server 241 and document and connect and constitute through network.
The entity of network 201 is wired networks of optical cable, Ethernet and telephone wire etc.; The wireless network of IEEE802.11a/b/g and bluetooth (registered trademark of bluetooth SIG company) etc.; Or its mixed structure.The formation of service area also can be to be concurrently in charge of a plurality of functions among each server by 1 hardware.
The document disposal system has 1 or many scanners.As the configuration example of scanner, can be the optical imagery reader unit and can control and the image of taking is sent to through network the combination of computing machine of document image server and FAX (facsimile recorder) device etc. it.In addition, the hardware of scanner constitutes also and can merge with some servers.Each scanner not necessarily must be same specification.As utilize Fig. 1 that the place that is provided with of scanner is described, can be arranged at document disposal system supplier 101, document disposal system user 102 or document and handle hoper 104.
Through network 201, the document image with each scanner scanning is sent to document image server 221.The document image server with the information of taking employed scanner, is stored in the document image that transmits among the document image DB 222.The information of employed document image-input device when what is called is taked, it is the scanner that depends on use, or change characteristic value on the view data of reading that the value of making may dissimilate in time owing to each input media, for example, comprise some at interval of resolution, deep or light information, scanning minimum feature and line at least.In addition, also can utilize the some of the information in addition that in document definition, comprises.Below, in this manual, the information of these input medias is called characteristic testing pattern.In addition, as in advance each scanner in the system being assigned intrinsic designation or character string (below, be called scanner ID), also can use this scans I D as characteristic testing pattern.
Each characteristic testing pattern is by 231 management of characteristic testing pattern server.The example of characteristic testing pattern is shown in Fig. 7.The characteristic testing pattern server is stored in the characteristic testing pattern of each scanner and each scanner accordingly among the characteristic testing pattern DB232 and manages.
Document definition server 241 receives the document image from the document image server, generates the document definition, is stored among the document definition DB242.In each document definition, add the characteristic testing pattern of taking scanner of the employed document definition document image that generates this each definition.
Document is read server 251, receive the document image from document image server or scanner, receive the scanner characteristic testing pattern from characteristic testing pattern server 231, receive the document definition from document definition server, read character string on the document and numeric string etc. according to document definition, and this is read the result be stored in document and read as a result among the DB.Reading processing about document will describe in detail in the back.In addition, as mentioned above, in the occasion that constitutes with 1 device, in document is read server 251, storage document image DB222, characteristic testing pattern DB232 and document definition DB242, the occasion receive view data from scanner also can realize above-mentioned processing.More than, be explanation to the concentrated that constitutes from the 1st of document disposal system.
Decentralized as the 2nd kind of formation of document disposal system is shown in Fig. 3.Be with the 1st kind of difference that constitutes concentrated,, still in each document readout device, handle different being to read the unified document of handling of server with 1 document to read.In decentralized, by the document definition center of carrying out document definition with in fact read 1 of document or many document readout devices and constitute.In the occasion of Fig. 3, what illustrate is by 2 document readout device A, the occasion that B forms.
Join the telecommunications services device except not needing document to read server and added DB at document definition center 300, identical with the 1st concentrated 200 that constitutes.
DB joins telecommunications services device 361, is to generate the characteristic testing pattern DB of the employed document image-input device of document definition DB, utilizes network 302 to join letter and gives each document readout device.The delivery method of document definition can use, and for example, the spy opens the document definition delivery method of putting down in writing in the 2001-307008 communique.
The document readout device has 1 or many scanners and document is read server, document is read DB as a result.Each document readout device through network 302 with join the telecommunications services device and be connected.For the characteristic testing pattern of the scanner in the document readout device, also can be stored in each document readout device, or manage by the characteristic testing pattern server.
More than explanation is the decentralized that constitutes as the 2nd of document disposal system.
In addition, also can be the mixed type of concentrated and decentralized as the document disposal system.In other words, in decentralized, in the document definition, have document in the heart and read server, also have and read the occasion that the document readout device of server is connected to bill definition center not having document.In this occasion, the characteristic testing pattern of each scanner keeps by in scanner 373 memory storage being set, and when the image that will read sends to the center, can automatically add.According to this formation, for example,, can carry out the processing of document efficiently by document information concentrated area is stored in document center etc.
Below by relatively conventional example (Fig. 4) and the present invention (Fig. 5) are illustrated document definition generation and document reading step.
Generate in the conventional example (Fig. 4) of step and document reading step in the document definition, at first utilize document image input block A412 document disposal system document 411 to be processed to be read in, obtain document image 413 in the electronics mode.Afterwards, define layout informations etc., generate document definition 415 by document definition generating apparatus 414.Be that the document definition generates step 410 so far.
Document reading step 420, the document 421 that at first utilizes document image input block B422 will handle in the electronics mode reads in, and obtains document image 423.So, the document that generates in document image 423 and the definition of the document formerly generation step is defined 415 as input, read in the character string of reading object that is recorded on the document paper and numeric string etc. by document sensing element 424, this result is stored as document reads result 425.
It more than is the conventional example that the document definition generates step and document reading step.
Yet, occasion in conventional example, because do not consider that the characteristic in the input picture of document image input block A412 and document image input block B422 is poor, can for example produce, the line lattice that in the image in when definition, exist when reading, detect less than, or undefined line lattice problem such as are read, the result reduces the occasion of document reading accuracy.
So mode of the present invention is to realize considering that the document of the difference of above-mentioned scan characteristic reads.
Read in the step of the present invention (Fig. 5) in document definition generation and document, utilize document image input block A512 document disposal system document 511 to be processed to be read in, obtain document image 513 in the electronics mode.Afterwards, define layout informations etc., generate document definition 515 by document definition generating apparatus 514.To define document image that uses in 515 and the characteristic testing pattern merge record of taking this moment at document in this definition generates.On the other hand, because the scan characteristic of known document image input block A 512, A generates specimen page 516 (with reference to Fig. 6 in the electronics mode with characteristic testing pattern by document image input block, be called specimen page to place an order) read in, extract scan characteristic, output characteristic test pattern A518 by scan characteristic extracting unit 517.Be that the document definition generates step 510 so far.All carry out when the characteristic test map generalization needn't scan document each time, as long as in the initial occasion of using this document image input block, the implementations such as occasion unusual in the image appearance of occasion of regularly safeguarding and scanning gets final product.Also may before scanner dispatches from the factory, generate characteristic testing pattern and store.Described in detail in the back about characteristic testing pattern and scan characteristic extracting unit.
Document reading step 520, the document 521 that at first utilizes document image input block B522 will handle in the electronics mode reads in, and obtains document image 523.On the one hand, the same with document definition generation step 510, because the characteristic testing pattern of known document image input block B522 utilizes document image input block B522 in the electronics mode specimen page 526 to be read in, extract characteristic testing pattern, output characteristic test pattern A528 by scan characteristic extracting unit 527.Read preceding or during document reads, extract between 2 characteristic testing pattern 518 and 528 at document by correction amount calculating unit 529, for example, the difference of brightness value and resolution etc., correcting value when determining to read and bearing calibration.So, generate the result of document definition 515 that step generates, correction amount calculating unit 529 as input with document image 523 with the definition of previous document, read in the character string of reading object that is recorded on the document paper and numeric string etc. by document sensing element 524, its result is stored as document reads result 525.
It more than is the step of the present invention that the document definition generates and document is read.
As mentioned above, when considering the document definition and the difference of the characteristic testing pattern of document when reading, even, also can suppress the reduction of document reading accuracy in the different occasion of document image input block.
Below characteristic testing pattern generation specimen page and characteristic testing pattern are illustrated.
So-called characteristic testing pattern generates specimen page, is with each document image input block of cause this specimen page to be scanned to make image electronicization, this image execution scan characteristic draw-out device is obtained the specimen page of characteristic testing pattern.Fig. 6 generates the example of specimen page for characteristic testing pattern.
In this example, in zone 610, be printed with the character string and the numeric string of various fonts, font and font size.Be to be used for behind the scanning specimen page, the scan characteristic draw-out device according to whether reading these characters, extracts the scan characteristic about character strings such as the font size identification of discerning minimum character.
Figure 621 is the figures that are used for understanding discernible line density.The line segment that exists between two horizontal lines could be discerned after the scanning by inquiry, identification max line density can be extracted.In this routine occasion, can discern the line segment of close central authorities, just represent that discernible line density is high more.
Figure 622 is the figures that are used for investigating the characteristic of gray level.By being determined at the brightness value of this figure of scanning back, can extract the gray color level characteristics of scanner.Equally, figure 623~625 is to be used for occasion at colour, investigates that each is red, green, the figure of the characteristic of cyan levels.
Zone each figure of 630,640 is the zone that is used for investigating the evident characteristics of line segment.In this routine occasion, in zone 630, the line segment that the live width of drawing is different.By being determined at the actual linewidth of this line segment after the scanning, can extract the destruction and the fringe of line segment.In zone 640, the different line of the concentration of having drawn.After scanning, this line segment could be in fact discerned by inquiry, the characteristic of the live width etc. of this moment can be extracted.
About characteristic testing pattern generate specimen page where, put down in writing which type of figure, what device to extract which type of characteristic with, the store method that such characteristic testing pattern generates specimen page information has:
(1) as pre-knowledge remain in the characteristic extracting device,
(2) be recorded in characteristic testing pattern and generate locality on the specimen page.Be the occasion of above-mentioned (2) in this example, these information are encoded to the example that the two-dimensional bar code 601 at specimen page upper right corner place writes down.In this occasion, when extracting scan characteristic, two-dimensional encoded decoding device must be arranged, but its advantage is the management that need not pre-knowledge, generating occasion that specimen pages exist with various characteristic testing pattern also can be corresponding or the like.
More than, the example that is generated the characteristic testing pattern of specimen page generation by the illustrative characteristic testing pattern of Fig. 6 is shown in Fig. 7.
At first, in field 701, shown in the record is the scans I D of the characteristic testing pattern of which kind of scanner.In field 703, record the characteristic that the gray color level is shown, for example, the measured value that the brightness of the brightness value (in 256 look levels of this routine occasion 0~255) of each gray scale that obtains from 622 of Fig. 6 actual measurement obtains.Equally, in field 704~706, record the measured value of utilize that 623~625 of Fig. 6 obtains red, green, cyan levels.Shown in the field 707,708 about the characteristic of line segment, in field 707 and 708, record discernible minimum feature and discernible max line density respectively.Shown in the field 709,710 for the characteristic of character string and pattern, in field 709 and 710, record the measured value and the discernible minimum font size of each live width respectively.These, can utilize respectively Fig. 6 610,630,640 shown in pattern obtain.In other examples, field 702 can be put down in writing the measured value of resolution.
On this project, for example, also can add the field that to discern etc. for each character.
(Fig. 5: treatment scheme 517,527) is illustrated to the scan characteristic draw-out device below with reference to Fig. 8.At first, the input feature vector test pattern generates specimen page image (step 801).Afterwards, from the image of input, detect four jiaos of specimen page, determine the position (step 802) in the image of specimen page.Afterwards, extract characteristic testing pattern and generate specimen page information (step 803).In the present embodiment, two-dimensional bar code 601 decoding processing of Fig. 6 are suitable therewith.Below, generating specimen page information according to characteristic testing pattern, order extracts each scan characteristic (step 804).In each scan characteristic extracts, at first determine the actual measurement zone (step 805) of this characteristic, measure measured value (step 806).Calculate scan characteristic (step 807) from measured value, this result is write characteristic testing pattern (step 808).Carry out above step repeatedly, all extract, then output characteristic test pattern and finish (step 809) as whole scan characteristics.Above processing for example, is used for confirming whether each goods satisfies realizations such as the employed instrument of specification in the time of also can utilizing present scanner to dispatch from the factory.
More than be that scan characteristic extracts treatment scheme.
Below, Fig. 9 illustrates correcting value calculation element (Fig. 5: treatment scheme 529).Characteristic testing pattern (step 901) when at first, the input definition generates.Characteristic testing pattern (step 902) when in addition, also importing document and read.In addition, step 901 and 902 order also can be conversely.Afterwards, calculate the difference (step 903) of 2 characteristic testing pattern, the document the when document of being determined the correcting value of document view data of input and document definition regulation by this difference is read is read parameter (step 904).At last, correcting value and parameter end of output (step 905).See below about this flow process and to state.
(Fig. 5: treatment scheme 524) is illustrated (Figure 10) to the document readout device below.
At first, in step 1001~1003, import the image of document to be read, the definition of this image, the correcting value that correcting value calculation element (529) calculates.And, no matter the execution sequence of each step., according to correcting value proofread and correct document view data (step 1004), detect four jiaos of document, determine the position (step 1005) in the image of document thereafter.Afterwards,, determine that this reads zone (step 1007), from this zone, extract character pattern (step 1008), obtain reading result's (step 1009) by each character is carried out character recognition at the project of respectively reading (step 1006) of document definition record.At last, read all read project after, with this result output (step 1010).
Below, the bearing calibration that utilizes characteristic testing pattern is given more detailed description.Bearing calibration is broadly divided into two kinds:
(1) image rectification: to read object read image directly proofread and correct,
(2) identification is proofreaied and correct: to the line lattice extract that result, character cut out, the change of the parameter of character recognition and dictionary etc.
In the treatment scheme that above-mentioned document is read, above-mentioned (1) is reflected in step 1004, and above-mentioned (2) are reflected in step 1007~1009.
Below, as 1 example of the image rectification of above-mentioned (1), the bearing calibration 1 of gray scale intensities value is illustrated (Figure 12).Because this correction, the different occasion of scanner when reading with document when document defines changes the characteristic of gray scale intensities value, is the method for the document image of a kind of document image when being used for document is read when defining near document.At first, from the characteristic testing pattern of 2 scanners, draw the input value brightness value of gray scale and the relation of measured value.In the occasion of Figure 12, characteristic testing pattern A (scan characteristic during the document definition) is 1201, and characteristic testing pattern B (scan characteristic when document is read) is 1202.As bearing calibration, for example, the brightness value of the pixel of being concerned about when document is read is the occasion of g, the measured value g ' of the correspondence the when brightness value of this pixel is changed to definition.In addition, can carry out same processing to the concentration of colored and look.Like this, image rectification, the device of reading view data of object is read in input media that uses when document is defined and output, for example, the processing that the influence that characteristics such as brightness value cause reduces.Below, the bearing calibration of the line lattice being extracted the result illustrated, 1 example (Figure 13) of proofreading and correct as the identification of above-mentioned (2).Reading in the zone (step 1007) of definite above-mentioned document reading step,, must carry out the comparison of line lattice in order to determine more accurate zone.But the different occasion of scanner when reading with document when document defines in the time can discerning line density and change, exists for example as shown in figure 13, and the line lattice login 1301 during the document definition and line lattice extract the different occasion of result 1306.The purpose of this correction is exactly to extracting the result and proofread and correct and the line lattice being compared according to the difference of characteristic testing pattern.
At first be conceived to the identification max line density of characteristic testing pattern A and B, a side's of advantage (greatly) line lattice information compared with inferior position one side's identification max line density and the result is proofreaied and correct.In the occasion of Figure 13, because a side in document when definition is dominant, carry out parameter transformation (1304) at the line lattice information of inferior position one side's identification max line density, obtain line lattice correction result 1305.Because extracting the result, the line lattice when this correction result and document are read carry out line lattice comparisons (1307), so, even above-mentioned two occasions that scan characteristic is different also can obtain high-precision line lattice comparison.
Figure 14 illustrates the example of the concrete steps of line lattice correction.At first, the set L of the line lattice that input is proofreaied and correct in step 1401 and identification max line density d.The line lattice set L of calibration object is that line density is high precision one side's a line lattice information, and d is inferior position one side's a line density.For example, in the occasion of Figure 13, L is the line lattice information in the document definition, and d is the line density of characteristic testing pattern B, equals 1.6.Afterwards, to each the line lattice l1 in the L, l2 sets up as each condition of step 1404, makes this two line lattice unification become 1 line lattice (step 1405).The distance of the line compartment in the condition of step 1404 as shown in figure 15, is that each on the l2 put p arbitrarily at two line lattice l1, distance between two points distance hour among the q.
In addition, waits at interval in scanning minimum feature and minimum line to exist the occasion of difference same, for example characteristic testing pattern one side in input document view data is the occasion of inferior position, and document is defined and minimum feature etc. compared and read.In addition, in order to read document, character that also can change character cuts out or pattern is unified the parameter of usefulness.In addition, developed the dictionary that fuzzy character is used in the character recognition, the changeable dictionary that when character recognition 1009, uses.In addition,, also this point or the confidence level that calculates can be attached on the output result, can obtain the more high-precision result of reading in the big occasion of difference.
At last, according to Figure 11 the treatment scheme of the pick-up unit of scanning mode is illustrated.
At first, newly-generated characteristic testing pattern is input to treating apparatus (step 1101) with characteristic testing pattern server 221 or characteristic testing pattern DB.Afterwards, the characteristic testing pattern in the past of scanner that will be identical with this characteristic testing pattern is retrieved come out (step 1102) from characteristic testing pattern DB, compares with new characteristic testing pattern and calculates difference (step 1103).After the step more than carrying out, carry out the scanner abnormality detection.At first, with reference to each characteristic value of new feature test pattern, judge whether it is the specification (step 1104) that satisfies the document disposal system, suppose it is ungratified occasion, just unusual as scanner, the person of possessing of warning scanner or the system supplier 101 (step 1105) of Fig. 1.In addition, the occasion in that the result of the difference of calculated characteristics test pattern surpasses predetermined value also can give a warning.Warning for example, can be passed through network and send mail, or mailing, also can be double maintenance of the direct visit of the supplier of system.Because this warning, system supplier 101, and this scanner is investigated back or replacing or repairing.In addition, be the occasion that the user of system 102 and document are handled hoper 104 device at this scanner, send content for not satisfying the warning of specification and scanner that specification is satisfied in introduction etc. by the supplier of system.In addition, surpass the occasion (step 1107) of threshold value, give a warning too in the new characteristic testing pattern and the difference of in the past characteristic testing pattern.
This device is carried out when characteristic testing pattern is upgraded or when generating new standard.Execution in step 1102,1103,1106,1107 not when in addition, formulating new standard.As the embodiment of reality, also can carry out maintenance service termly, or send characteristic testing pattern generation specimen page termly and read image by the user of system request by the attendant.According to this formation, implement precision high read in, can guarantee this precision.
As mentioned above, the application discloses a kind of document read-out system, it is characterized in that comprising: with document definition and the memory storage that utilizes the characteristic testing pattern of the 1st input media of the view data of this definition to store accordingly; Obtain the device of the characteristic testing pattern of document view data and employed the 2nd input media of this document view data of input through network; And the device of reading the characteristic testing pattern of the document definition of this document view data and above-mentioned the 1st input media from above-mentioned memory storage; Calculate the above-mentioned the 1st and the device of the difference of the characteristic testing pattern of the 2nd input media; Utilize the result of aforementioned calculation device to proofread and correct the device of above-mentioned document view data or the definition of above-mentioned document; And utilize above-mentioned document definition to read the device of above-mentioned document view data.In addition, the form that constitutes said system with network is also disclosed.
According to the present invention, even the different occasion of scanning circumstance when when document defines, reading with document, or use a plurality of scanners to carry out the occasion that document defines or document is read, can generate and utilize single document definition DB, obtain to prevent to reduce the effect that the document definition generates operation and reduces the document reading accuracy.
In addition, owing to can detect the state of the scanner of deterioration and fault etc. in time automatically, can obtain to reduce the effect of system maintenance operation.

Claims (12)

1. document read-out system is characterized in that comprising:
At least the memory storage that the characteristic testing pattern that comprises the 1st input media of the document definition of sash positional information and sash attribute and view data that should definition utilization is stored accordingly;
Obtain the device of the characteristic testing pattern of document view data and employed the 2nd input media of this document view data of input through network;
Read the device of the characteristic testing pattern of the document definition of this document view data and above-mentioned the 1st input media from above-mentioned memory storage;
Calculate the above-mentioned the 1st and the device of the difference of the characteristic testing pattern of the 2nd input media;
Utilize the result of aforementioned calculation device to change above-mentioned document view data or the definition of above-mentioned document, make the approximate device of above-mentioned two characteristic testing pattern; And
Utilize above-mentioned document definition to read the device of above-mentioned document view data.
2. document read-out system as claimed in claim 1 is characterized in that:
Above-mentioned memory storage is stored the characteristic testing pattern of above-mentioned the 2nd input media accordingly with the ID code of the 2nd input media;
Above-mentioned acquisition device is obtained the characteristic testing pattern with corresponding, above-mentioned the 2nd input media of giving to the above-mentioned document view data that obtains of ID code.
3. as the document read-out system of claim 1 to 2, it is characterized in that comprising: above-mentioned the 2nd input media of importing above-mentioned document view data.
4. as each document read-out system in the claim 1 to 3, it is characterized in that comprising: utilize the view data that obtains from above-mentioned the 2nd input media through above-mentioned network to generate the device of the characteristic testing pattern of above-mentioned the 2nd input media.
5. as the document read-out system of claim 2 or 3, it is characterized in that comprising:
Relatively be stored in the device of characteristic testing pattern with the characteristic testing pattern of the 2nd input media that utilizes above-mentioned acquisition device newly to obtain of the 2nd input media in the above-mentioned memory storage; And
With the device of above-mentioned comparative result through above-mentioned network output.
6. as each document read-out system in the claim 1 to 5, it is characterized in that: the above-mentioned the 1st comprises the information of relevant brightness value with the characteristic testing pattern of the 2nd input media; And
Above-mentioned change device to the brightness value result calculated, changes the brightness value of above-mentioned document view data or above-mentioned document definition according to the aforementioned calculation device.
7. as each document read-out system in the claim 1 to 6, it is characterized in that:
The above-mentioned the 1st and the characteristic testing pattern of the 2nd input media comprise the information of the reading accuracy of line lattice;
Above-mentioned change device according to the result of calculation of the line lattice reading accuracy information of aforementioned calculation device, changes the reading accuracy of the line lattice of above-mentioned document view data or above-mentioned document definition.
8. as each document read-out system in the claim 1 to 7, it is characterized in that:
The above-mentioned the 1st and the characteristic testing pattern of the 2nd input media comprise the information of the reading accuracy of character;
Above-mentioned change device, the result of calculation according to the character reading accuracy information of aforementioned calculation device changes the parameter that cuts out precision that above-mentioned document view data or above-mentioned document are defined in the character that is comprised.
9. document read-out system as claimed in claim 1 is characterized in that:
Above-mentioned readout device is stored the dictionary that a plurality of character recognition are used, and switches in the dictionary that uses in the character recognition that is contained in the character data in the document view data according to the aforementioned calculation result.
10. document reading method is characterized in that comprising:
Obtain the document view data through network;
Obtain the characteristic value in this document view data that depends on the 2nd input media that uses in this document view data input;
Read for the definition of the document of this document view data with to the 1st information that should document definition storage the above-mentioned characteristic value of the 1st input media of this definition use;
Calculate the above-mentioned the 1st and the difference of the 2nd information;
Utilize the result of aforementioned calculation device to proofread and correct above-mentioned document definition or above-mentioned document view data; And
Utilize above-mentioned document definition to read above-mentioned document view data.
11. the document reading method as claim 10 is characterized in that:
In aforementioned calculation result, the above-mentioned the 1st and the occasion of difference more than predetermined value of the 2nd information, notify above-mentioned the 2nd input media through network.
12. the program with cause receiver execution document reading method is characterized in that comprising:
The step of the 1st characteristic testing pattern of the input media of the image that uses when obtaining the document definition that is stored in the memory storage and generating this document definition;
Obtain the step of the view data of document from the view data input media that connects;
Read the step of the 2nd characteristic testing pattern of the input media that is stored in the above-mentioned view data in the memory storage;
Calculate the step of the difference of above-mentioned 2 characteristic testing pattern;
Utilize the aforementioned calculation result to change above-mentioned document view data or the definition of above-mentioned document, make the approximate step of above-mentioned 2 characteristic testing pattern;
Utilize the step of above-mentioned document definition from above-mentioned view data sense information; And
With above-mentioned information stores of reading in the step of memory storage.
CNB021513759A 2002-04-12 2002-11-21 Bills reading system, method and program Expired - Fee Related CN1198236C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP109904/2002 2002-04-12
JP2002109904A JP4185699B2 (en) 2002-04-12 2002-04-12 Form reading system, form reading method and program therefor

Publications (2)

Publication Number Publication Date
CN1452119A true CN1452119A (en) 2003-10-29
CN1198236C CN1198236C (en) 2005-04-20

Family

ID=29243212

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021513759A Expired - Fee Related CN1198236C (en) 2002-04-12 2002-11-21 Bills reading system, method and program

Country Status (3)

Country Link
JP (1) JP4185699B2 (en)
KR (1) KR20030080998A (en)
CN (1) CN1198236C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095842A (en) * 2014-05-22 2015-11-25 阿里巴巴集团控股有限公司 Method and device for identifying information of bill

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006079513A (en) * 2004-09-13 2006-03-23 Toppan Printing Co Ltd Production history management system
US9137417B2 (en) 2005-03-24 2015-09-15 Kofax, Inc. Systems and methods for processing video data
US9769354B2 (en) 2005-03-24 2017-09-19 Kofax, Inc. Systems and methods of processing scanned data
US8885229B1 (en) 2013-05-03 2014-11-11 Kofax, Inc. Systems and methods for detecting and classifying objects in video captured using mobile devices
JP2007141159A (en) * 2005-11-22 2007-06-07 Fuji Xerox Co Ltd Image processor, image processing method, and image processing program
JP4977368B2 (en) 2005-12-28 2012-07-18 富士通株式会社 Medium processing apparatus, medium processing method, medium processing system, and computer-readable recording medium recording medium processing program
US8958605B2 (en) 2009-02-10 2015-02-17 Kofax, Inc. Systems, methods and computer program products for determining document validity
US9767354B2 (en) 2009-02-10 2017-09-19 Kofax, Inc. Global geographic information retrieval, validation, and normalization
US8774516B2 (en) 2009-02-10 2014-07-08 Kofax, Inc. Systems, methods and computer program products for determining document validity
US9576272B2 (en) 2009-02-10 2017-02-21 Kofax, Inc. Systems, methods and computer program products for determining document validity
US9349046B2 (en) 2009-02-10 2016-05-24 Kofax, Inc. Smart optical input/output (I/O) extension for context-dependent workflows
US9058515B1 (en) 2012-01-12 2015-06-16 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
US8989515B2 (en) 2012-01-12 2015-03-24 Kofax, Inc. Systems and methods for mobile image capture and processing
US10146795B2 (en) 2012-01-12 2018-12-04 Kofax, Inc. Systems and methods for mobile image capture and processing
US9483794B2 (en) 2012-01-12 2016-11-01 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
US9058580B1 (en) 2012-01-12 2015-06-16 Kofax, Inc. Systems and methods for identification document processing and business workflow integration
US9311531B2 (en) 2013-03-13 2016-04-12 Kofax, Inc. Systems and methods for classifying objects in digital images captured using mobile devices
US9208536B2 (en) 2013-09-27 2015-12-08 Kofax, Inc. Systems and methods for three dimensional geometric reconstruction of captured image data
US9355312B2 (en) 2013-03-13 2016-05-31 Kofax, Inc. Systems and methods for classifying objects in digital images captured using mobile devices
US20140316841A1 (en) 2013-04-23 2014-10-23 Kofax, Inc. Location-based workflows and services
US9386235B2 (en) 2013-11-15 2016-07-05 Kofax, Inc. Systems and methods for generating composite images of long documents using mobile video data
US9760788B2 (en) 2014-10-30 2017-09-12 Kofax, Inc. Mobile document detection and orientation based on reference object characteristics
US10242285B2 (en) 2015-07-20 2019-03-26 Kofax, Inc. Iterative recognition-guided thresholding and data extraction
US9779296B1 (en) 2016-04-01 2017-10-03 Kofax, Inc. Content-based detection and three dimensional geometric reconstruction of objects in image and video data
US11062176B2 (en) 2017-11-30 2021-07-13 Kofax, Inc. Object detection and image cropping using a multi-detector approach
US11972197B2 (en) 2018-08-27 2024-04-30 Kyocera Document Solutions Inc. OCR system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095842A (en) * 2014-05-22 2015-11-25 阿里巴巴集团控股有限公司 Method and device for identifying information of bill
CN105095842B (en) * 2014-05-22 2018-12-11 口碑控股有限公司 A kind of method and apparatus of the information identification of document

Also Published As

Publication number Publication date
JP4185699B2 (en) 2008-11-26
CN1198236C (en) 2005-04-20
KR20030080998A (en) 2003-10-17
JP2003303315A (en) 2003-10-24

Similar Documents

Publication Publication Date Title
CN1198236C (en) Bills reading system, method and program
US11676185B2 (en) System and methods of an expense management system based upon business document analysis
CN107067044B (en) Financial reimbursement complete ticket intelligent auditing system
US8306325B2 (en) Text character identification system and method thereof
CN105654072B (en) A kind of text of low resolution medical treatment bill images automatically extracts and identifying system and method
CN1311393C (en) Sheet handling system
CN1103087C (en) Optical scanning list recognition and correction method
CN101382944B (en) Image processing apparatus and method, image forming apparatus and image reading apparatus
US7317833B2 (en) Image processing apparatus and image processing method
JP2007042106A (en) Document processing method, document processing media, document management method, document processing system, and document management system
US8170338B2 (en) Information processing apparatus and method for correcting electronic information obtained from handwritten information
JP2005302011A (en) Method and apparatus for populating electronic forms from scanned documents
CN1719865A (en) Image processing system and image processing method
JP2014038561A (en) Information processor, information processing method, and program
JP4970301B2 (en) Image processing method, image processing apparatus, image reading apparatus, image forming apparatus, image processing system, program, and recording medium
CN111539414B (en) Method and system for character recognition and character correction of OCR (optical character recognition) image
CN101364268B (en) Image processing apparatus and image processing method
CN1383093A (en) File, file processing system and file generating system
CN111860450A (en) Ticket recognition device and ticket information management system
CN101151882A (en) System and method of processing scan data
US9679179B2 (en) Method for processing information from a hand-held scanning device
JP2007041709A (en) Document processing system, control method of document processing system, document processing device, computer program and computer readable storage medium
CN115265620B (en) Acquisition and entry method and device for instrument display data and storage medium
JP2010081214A (en) Document feature extraction apparatus and method
JP2005208872A (en) Image processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050420