WO2019026147A1 - Information processing device, information processing system, control method, and control program - Google Patents

Information processing device, information processing system, control method, and control program Download PDF

Info

Publication number
WO2019026147A1
WO2019026147A1 PCT/JP2017/027758 JP2017027758W WO2019026147A1 WO 2019026147 A1 WO2019026147 A1 WO 2019026147A1 JP 2017027758 W JP2017027758 W JP 2017027758W WO 2019026147 A1 WO2019026147 A1 WO 2019026147A1
Authority
WO
WIPO (PCT)
Prior art keywords
rule
item
rule list
form image
unit
Prior art date
Application number
PCT/JP2017/027758
Other languages
French (fr)
Japanese (ja)
Inventor
裕紀 谷崎
満 西川
清人 小坂
Original Assignee
株式会社Pfu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Pfu filed Critical 株式会社Pfu
Priority to PCT/JP2017/027758 priority Critical patent/WO2019026147A1/en
Publication of WO2019026147A1 publication Critical patent/WO2019026147A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present disclosure relates to an information processing apparatus, an information processing system, a control method, and a control program, and more particularly to an information processing apparatus, an information processing system, and a control method for processing a form image in which item names and item values are represented for each of a plurality of items And the control program.
  • a person in charge manually generates a form such as a medical receipt sent from a customer at the time of insurance payment. Since a life insurance company sends a huge number of forms from customers, the work burden on the person in charge is large, and there is a growing demand for assistance with the data conversion of forms.
  • the form processing apparatus classifies each image data into a plurality of image groups based on the type of ruled line data extracted from each image data, and based on form template data assigned to each image group, Perform form processing on image data.
  • the object of the information processing apparatus, the information processing system, the control method and the control program is that the item value of the desired item can be detected more accurately from the form image in which the item name and the item value are represented for each of a plurality of items. And to.
  • An information processing system is an information processing system including a generating device, and an information processing device that processes a form image in which an item name and an item value are represented for each of a plurality of items.
  • a storage unit storing a plurality of sample form images, a plurality of item names, and a plurality of rule lists in which priorities of the item names are defined; and characteristics of the sample form images conforming to each of the plurality of rule lists
  • a rule dictionary generation unit that generates a rule dictionary that associates the amount and each rule list; and a transmission unit that transmits the rule dictionary to the information processing apparatus, and the information processing apparatus receives the rule dictionary from the generation apparatus
  • a receiving unit an acquiring unit for acquiring a form image, a feature amount calculating unit for calculating a feature amount from the acquired form image, and a feature amount calculated in the received rule dictionary.
  • each item name specified in the rule list is included in the form image in the order of the rule list extraction unit that extracts the rule list and the priority order specified in the extracted rule list. It has a detection part which detects the item value corresponding to the item name judged to be first contained, and the output part which outputs the detected item value.
  • a control method is a control method of an information processing apparatus that has a storage unit and an output unit and processes a form image in which an item name and an item value are represented for each of a plurality of items.
  • a control program is a control program of an information processing apparatus that has a storage unit and an output unit and processes a form image in which an item name and an item value are represented for each of a plurality of items.
  • Form image is acquired, feature amount is calculated from the acquired form image, and a rule dictionary is used to extract a rule list associated with the calculated feature amount, and the priority order specified in the extracted rule list is extracted
  • the information processing apparatus, the information processing system, the control method, and the control program can accurately calculate the item value of the desired item from the form image in which the item name and the item value are represented for each of the plurality of items. It becomes possible to detect.
  • FIG. 1 shows an example of a schematic configuration of an information processing system 10 according to an embodiment.
  • FIG. 2 is a diagram showing a schematic configuration of a first storage device 110 and a first CPU 120. It is a figure which shows schematic structure of 2nd memory
  • FIG. 6 is a view showing an example of a plurality of sample document images 400, 401, 402. It is a figure which shows an example of several rule list
  • FIG. 6 is a diagram showing an example of a rule dictionary 600. It is a flow chart which shows an example of operation of generation processing of a rule dictionary. It is a figure for demonstrating the feature-value regarding the shape of a ruled line.
  • FIG. 1 is a diagram showing an example of a schematic configuration of an information processing system 10 according to the embodiment.
  • the information processing system 10 includes an information processing device 100 and a generation device 200.
  • the information processing apparatus 100 and the generating apparatus 200 each have a wired or wireless communication function, are connected to the network 300, and mutually communicate with each other via the network 300.
  • the information processing apparatus 100 is an information processing apparatus such as a personal computer (PC) or a notebook PC, and is used by a worker who is the user to process a form image in which an item name and an item value are represented for each of a plurality of items. Do.
  • the information processing apparatus 100 may be a portable apparatus such as a tablet PC, a multi-function mobile phone (so-called smart phone), or a mobile information terminal.
  • the information processing device 100 includes a first communication device 101, an input device 102, a display device 103, a first storage device 110, and a first central processing unit (CPU) 120.
  • CPU central processing unit
  • the first communication apparatus 101 has a wired communication interface circuit such as Transmission Control Protocol / Internet Protocol (TCP / IP).
  • the first communication device 101 communicates with the network 300 in accordance with a communication method such as Ethernet (registered trademark).
  • the first communication device 101 supplies the data received from the generation device 200 via the network 300 to the first CPU 120, and transmits the data supplied from the first CPU 120 to the generation device 200 via the network 300.
  • the first communication device 101 may be any device as long as it can communicate with an external device.
  • the first communication apparatus 101 may communicate with the generation apparatus 200 via an access point (not shown) according to a wireless local area network (LAN) communication scheme.
  • the first communication apparatus 101 may communicate with the generation apparatus 200 via a base station apparatus (not shown) according to the mobile phone communication scheme.
  • the input device 102 is an example of an input unit, and includes an input device such as a touch panel type input device, a keyboard, a mouse, and the like, and an interface circuit that acquires a signal from the input device.
  • the input device 102 receives input data input by the user, and outputs a signal corresponding to the received input data to the first CPU 120.
  • the display device 103 is an example of an output unit, and includes a display configured of liquid crystal, organic EL (Electro-Luminescence), and the like, and an interface circuit that outputs image data or various information to the display.
  • the display device 103 is connected to the first CPU 120 and displays the image data output from the first CPU 120 on the display.
  • the input device 102 and the display device 103 may be integrally configured using a touch panel display.
  • the first CPU 120 operates based on a program stored in advance in the first storage device 110.
  • the first CPU 120 may be a general purpose processor. Note that, instead of the first CPU 120, a digital signal processor (DSP), a large scale integration (LSI), or the like may be used. Also, in place of the CPU 160, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or the like may be used.
  • DSP digital signal processor
  • LSI large scale integration
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the first CPU 120 is connected to the first communication device 101, the input device 102, the display device 103, and the first storage device 110, and controls these units.
  • the first CPU 120 performs data transmission / reception control via the first communication device 101, input control of the input device 102, output control of the display device 103, control of the first storage device 110, and the like. Further, the first CPU 120 processes a form image acquired from the first communication device 101 or the like.
  • the generation device 200 is a server that generates a rule dictionary used by the information processing device 100.
  • the rule dictionary is data used to process a form image, in particular to detect an item value of an item of interest from the form image.
  • the form image is image data representing a form
  • the form is a document in which an item name and an item value are represented for each of a plurality of items.
  • forms of various layouts are used as forms.
  • a form with various layouts is a form that is individually created by a plurality of different organizations, has different entry positions of item names for each form, and has a plurality of synonyms of item names.
  • the item to be noted is an item that needs to be processed as electronic data, such as calculation or output.
  • the generation device 200 includes a second communication device 201, a second storage device 210, and a second CPU 220. Hereinafter, each part of the generating device 200 will be described in detail.
  • the second communication device 201 has the same communication interface circuit as the first communication device 101, and communicates with the network 300 in accordance with the same communication method as the first communication device 101.
  • the second communication device 201 supplies the data received from the information processing device 100 via the network 300 to the second CPU 220, and transmits the data supplied from the second CPU 220 to the information processing device 100 via the network 300.
  • the second storage device 210 is an example of a storage unit.
  • the second storage device 210 includes a memory device similar to the first storage device, a fixed disk device, a storage device, and the like.
  • the second storage device 210 stores a computer program, a database, a table, and the like used for various processes of the generation device 200.
  • the computer program may be installed from a computer readable portable recording medium such as a CD-ROM, a DVD-ROM, etc.
  • the computer program is installed in the second storage device 210 using a known setup program or the like.
  • a plurality of sample form images, a rule dictionary, a plurality of rule lists, and the like are stored as data. Details of the sample form image, the rule dictionary, and the rule list will be described later.
  • the second CPU 220 operates based on a program stored in advance in the second storage device 210.
  • the second CPU 220 may be a general purpose processor. Note that, in place of the second CPU 220, a DSP, an LSI, an ASIC, an FPGA, or the like may be used.
  • the second CPU 220 is connected to the second communication device 201 and the second storage device 210, and controls these units.
  • the second CPU 220 performs data transmission / reception control via the second communication device 201, control of the second storage device 210, and the like. Also, the second CPU 220 generates a rule dictionary.
  • FIG. 2 is a diagram showing a schematic configuration of the first storage device 110 and the first CPU 120 of the information processing apparatus 100. As shown in FIG.
  • a reception program 111 As shown in FIG. 2, in the first storage device 110, a reception program 111, an acquisition program 112, a feature quantity calculation program 113, a rule list extraction program 114, a detection program 115, an output control program 116, a reception program 117 and an update program
  • Each program such as 118 is stored.
  • Each of these programs is a functional module implemented by software operating on the processor.
  • the first CPU 120 reads each program stored in the first storage device 110 and operates according to each read program.
  • the first CPU 120 functions as the reception unit 121, the acquisition unit 122, the feature amount calculation unit 123, the rule list extraction unit 124, the detection unit 125, the output control unit 126, the reception unit 127, and the update unit 128.
  • FIG. 3 is a diagram showing a schematic configuration of the second storage device 210 and the second CPU 220 of the generation device 200.
  • the second storage device 210 stores programs such as a rule list generation program 211, a rule dictionary generation program 212, and a transmission program 213. Each of these programs is a functional module implemented by software operating on the processor.
  • the second CPU 220 reads each program stored in the second storage device 210 and operates according to each read program. Thus, the second CPU 220 functions as a rule list generation unit 221, a rule dictionary generation unit 222, and a transmission unit 223.
  • FIG. 4 is a view showing an example of a plurality of sample form images 400, 401, 402 stored in the second storage device 210 of the generation device 200. As shown in FIG. 4
  • Each rule list 500, 501, 502 is generated by the generation device 200 and stored in the second storage device 210, and is also transmitted to the information processing device 100 and stored in the first storage device 110.
  • a rule list ID for identifying each rule list is assigned to each rule list 500, 501, 502.
  • a plurality of rules (information corresponding to each row in the table shown in FIG. 5) are defined in association with each other, and for each rule, the rule number, priority, item name And location information etc. are defined in association with each other. That is, one rule corresponds to one item name.
  • position information for example, relative positions of item values with respect to item names (for example, upper, lower, right, left, upper right, lower right, upper left, lower left) are defined.
  • the position information is used to specify an item value corresponding to an item name defined in each rule when detecting an item value of an item of interest from the form image using each rule list.
  • FIG. 6 shows an example of the rule dictionary 600. As shown in FIG. 6
  • the rule dictionary 600 is generated by the generation device 200 and stored in the second storage device 210, and is also transmitted to the information processing device 100 and stored in the first storage device 110. As shown in FIG. 6, in the rule dictionary 600, each of a plurality of feature amounts is defined in association with each of a plurality of rule list IDs.
  • the feature amount is a feature amount in each sample form image. Details of the feature amount will be described later.
  • the feature amount of the sample form image that conforms to each rule list is associated with each rule list.
  • FIG. 7 is a flowchart showing an example of the operation of the generation process of the rule dictionary by the generation device 200.
  • the flow of the operation described below is executed mainly by the second CPU 220 in cooperation with each element of the generation device 200 based on a program stored in advance in the second storage device 210.
  • the rule list generation unit 221 initializes the rule list, and the rule dictionary generation unit 222 initializes the rule dictionary (step S101).
  • the rule list generation unit 221 extracts all item names that can correspond to the item of interest from all the sample form images stored in the second storage device 210. An item name that can correspond to the item of interest may be registered in advance.
  • the rule list generation unit 221 generates a rule in which position information corresponding to all possible positions is associated with each extracted item name, and generates a rule list including all the generated rules as an initial rule. Generate as a list. In the initial rule list, rule numbers are assigned to each rule in an arbitrary order, and "undefined" is set as the priority. Further, the rule dictionary generation unit 222 generates a rule dictionary in which “indefinite” is set as the feature amount and the rule list ID as an initial rule dictionary.
  • the rule list generation unit 221 selects one rule from the initial rule list (step S102).
  • the rule list generation unit 221 detects item values from each of all sample form images that do not match any of the generated rule lists (step S103).
  • the rule list generation unit 221 detects item values from all sample form images stored in the second storage device 210.
  • the rule list generation unit 221 detects an item name in the selected rule from each sample form image using a known optical character recognition (OCR) technology.
  • OCR optical character recognition
  • the rule list generation unit 221 uses the well-known OCR technology to set the item values from the area of the predetermined size located in the display positional relationship defined in the positional information in the selected rule to the detected item name. To detect.
  • the rule list generation unit 221 detects the item value corresponding to each item name based on the display position relationship defined in the rule list.
  • the rule list generation unit 221 calculates a match rate indicating the rate at which each item value detected from each sample form image matches the correct value associated with each sample form image for which each item value is detected. (Step S104). The rule list generation unit 221 determines whether or not each item value detected from each sample document image matches the correct value associated with each sample document image for which each item value is detected, and detects the detected item value. The ratio of the item value that matches each correct value to the number is calculated as the matching rate.
  • the rule list generation unit 221 determines whether the process is completed for all the rules included in the initial rule list (step S105). If there is a rule whose processing has not been completed yet, the rule list generation unit 221 returns the processing to step S102, and executes the processing of steps S102 to S104 on the unselected rule.
  • the rule list generation unit 221 assigns the priority of each rule so that the priority becomes higher in descending order of the match rate calculated in step S104.
  • a rule list is generated (step S106).
  • the rule list generation unit 221 is configured such that, in each sample form image that does not conform to any of the plurality of rule lists, the rule list is configured such that the item name with the higher proportion corresponding to the correct value has higher priority.
  • the rule list generation unit 221 assigns a rule list ID to the generated rule list.
  • the rule dictionary generation unit 222 selects one sample slip image from among sample slip images that do not conform to any of the generated rule lists (step S107).
  • the rule dictionary generation unit 222 extracts item values from the selected sample document image based on the rule list generated in step S106 (step S108).
  • the rule dictionary generation unit 222 includes, in the sample document image, each item name defined in the rule list in the order of priority defined in the rule list (in the order of high priority) using known OCR technology. It is sequentially determined whether it is possible or not.
  • the rule dictionary generation unit 222 detects an item value from an area of a predetermined size located in the display position relationship defined in the position information associated with the item name, for the item name determined to be included first. Extract the detected item value as the item value corresponding to the item name.
  • the rule dictionary generation unit 222 determines whether or not the extracted item value matches the correct value associated with the selected sample form image (step S109).
  • the rule dictionary generation unit 222 determines that the rule list generated in step S106 and the sample document image selected in step S107 do not match, and the process proceeds to step S113. Do.
  • the rule dictionary generation unit 222 determines that the rule list generated in step S106 matches the sample document image selected in step S107 (step S110).
  • the rule dictionary generation unit 222 calculates the feature amount from the sample document image determined to conform to the rule list (step S111).
  • the rule dictionary generation unit 222 calculates, as the feature quantities of the sample document image, a feature quantity relating to the shape of a ruled line, a feature quantity relating to a logo, or a feature quantity relating to a character recognition result.
  • FIG. 8 is a diagram for explaining the feature amount related to the shape of the ruled line.
  • the form image 800 includes the table 801
  • a plurality of apexes 803 or intersections 804 having different shapes are formed by the ruled lines 802 constituting the table 801.
  • the rule dictionary generation unit 222 counts the number of vertices 803 and intersections 804 having each shape for each of the vertices 803 and intersections 804 having different shapes.
  • the rule dictionary generation unit 222 calculates a histogram (feature vector) 810 having the number of vertexes 803 having the respective shapes and the number of intersections 804 as a feature of the form image 800.
  • the rule dictionary generation unit 222 can identify each form of various layouts with high possibility of including a table with high accuracy by using the feature amount related to the shape of the ruled line.
  • FIG. 9 is a diagram for explaining the feature amount related to the logo.
  • the form image 900 includes various logo images, such as a pattern logo 901 or a character logo 902, representing an organization that issues the form.
  • the rule dictionary generation unit 222 detects logos 901 and 902 having each shape and counts the number of logos 901 and 902 having each shape using a known pattern matching technology.
  • the generating device 200 stores in advance a correct image obtained by binarizing the images of the logos 901 and 902.
  • the rule dictionary generation unit 222 sequentially cuts out the images while moving the cutout range in the form image 900, and the pixel coincidence rate between the binarized image obtained by binarizing the cutout image and the correct image is equal to or more than the threshold In this case, the extracted range is detected as a logo.
  • the rule dictionary generation unit 222 calculates a histogram (feature vector) 910 having the number of logos 901 and 902 having each shape as elements as the feature amount of the form image 900.
  • the rule dictionary generation unit 222 can identify each form individually created by each organization with high accuracy by using the feature amount related to the logo of each organization.
  • FIG. 10 is a diagram for explaining the feature amount related to the character recognition result.
  • the form image 1000 includes various character strings 1001 to 1005.
  • the rule dictionary generation unit 222 detects character strings 1001 to 1005 from the form image 1000 using known OCR technology, and counts the number of the character strings 1001 to 1005.
  • the rule dictionary generation unit 222 calculates a histogram (feature vector) 1010 having the number of character strings 1001 to 1005 as an element as a feature amount of the form image 1000.
  • the rule dictionary generation unit 222 can identify each form of various layouts with higher accuracy by using the feature amount related to the character recognition result.
  • the rule dictionary generation unit 222 may combine two or more of the feature amount related to the shape of the ruled line, the feature amount related to the logo, or the feature amount related to the character recognition result and use it as the feature amount of the sample document image . Thereby, the rule dictionary generation unit 222 can identify each form with higher accuracy.
  • the generation device 200 may calculate in advance the feature amount of each sample document image and store the calculated feature amount in the second storage device 210. In that case, the rule dictionary generation unit 222 acquires the feature amount of each sample form image from the second storage device 210. As a result, the rule dictionary generation unit 222 can reduce the processing load for the generation process and generate the rule dictionary in a shorter time.
  • the rule dictionary generation unit 222 stores (adds) the calculated feature amount in the rule dictionary in association with the rule list ID of the rule list generated in step S106 (step S112).
  • the rule dictionary generation unit 222 omits the update of the rule dictionary when the calculated feature amount matches the feature amount calculated in the past.
  • the rule dictionary generation unit 222 may omit the update of the rule dictionary also when the calculated feature amount is similar to the feature amount calculated in the past. In that case, the rule dictionary generation unit 222 determines that the two feature quantities are similar when the calculated feature quantity and the similarity between the feature quantities calculated in the past are equal to or greater than a threshold.
  • the similarity is, for example, the reciprocal of the Euclidean distance of each feature (feature vector), or a cosine similarity.
  • the rule dictionary generation section 222 newly generates a rule list ID associated with the feature quantity, which is already stored in the rule dictionary.
  • the rule list ID of the rule list may be updated (overwritten).
  • the rule dictionary generation unit 222 may allocate a feature amount ID to the calculated feature amount, and store the allocated feature amount ID in the rule dictionary in association with the rule list ID. For example, the rule dictionary generation unit 222 uses, as a feature quantity ID, a value (vector) obtained by rounding each element of the calculated feature quantity (feature vector) to a predetermined significant digit number. The rule dictionary generation unit 222 may use a value (vector) obtained by normalizing the calculated feature amount (feature vector) as a feature amount ID.
  • the rule dictionary generation unit 222 determines whether or not the processing in steps S107 to S112 is completed for all sample form images that do not conform to any of the generated rule lists (step S113). If there is a sample form image that has not been processed yet, the rule dictionary generation unit 222 returns the process to step S107, and executes the processes of steps S107 to S112 on the unselected sample form image.
  • the rule dictionary generation unit 222 determines whether all the sample document images match any of the generated rule lists (step S114). If there is a sample document image that does not match any of the generated rule lists, the rule dictionary generation unit 222 returns the process to step S102. In this case, the rule list generation unit 221 generates a new rule list based on the sample form image which does not match any of the generated rule lists by the processes of steps S102 to S106. On the other hand, the rule dictionary generation unit 222 updates the rule dictionary using the newly generated rule list by the processing of steps S107 to S112.
  • the rule dictionary generation unit 222 ends the series of steps.
  • the rule dictionary stored in the second storage device 210 at the end of the generation process becomes the rule dictionary generated by the rule dictionary generation unit 222.
  • the rule list generation unit 221 determines, based on the matching rate of the item value and the correct value in each sample form image not conforming to any of the generated rule lists, Determine the order of precedence in the rule list for performing a conformance check.
  • the rule dictionary generation unit 222 can efficiently check whether each sample document image matches each rule list, making the rule dictionary simpler and reducing the generation time of the rule dictionary. It becomes.
  • the rule list generation unit 221 When a sample document image is newly added after the rule dictionary is generated, the rule list generation unit 221 generates a new rule list based on the newly added sample document image. In such a case, the rule list generation unit 221 executes the processing of steps S102 to S106 with the newly added sample form image as a sample form image not conforming to any of the generated rule lists, and performs a new process. Generate a rule list. On the other hand, the rule dictionary generation unit 222 updates the rule dictionary using the newly generated rule list by the processing of steps S107 to S112. However, if the newly generated rule list is identical to any of the generated rule lists, the rule dictionary generation unit 222 updates the rule dictionary using the generated rule list.
  • the generating device 200 can update the rule dictionary by newly adding a sample form image to the rule dictionary generated once.
  • each sample document image and the sample document image are based on the matching rate of the item value and the correct value in each of the newly added sample document images.
  • the order of precedence in the rule list for performing a conformance check of As a result, the rule dictionary generation unit 222 can efficiently check whether each newly added sample form image and each rule list match, making the rule dictionary simpler and generating the rule dictionary. It is possible to reduce.
  • 11 to 13 are diagrams showing an example of the rule dictionary generated by the generation process shown in FIG.
  • each item value corresponding to the item name "patient's burden” is detected from each sample document image 1 to 9 (step S103), and each detected item value and each sample document image
  • the matching rate with the correct value associated with 1 to 9 is calculated (step S104).
  • a match rate is calculated for each item value corresponding to the item name "partial contribution” and each item value corresponding to the item name "total”, and a rule list to which priority is assigned in descending order of the match rate 1101 is generated (step S106).
  • priorities are assigned in the order of patient burden, partial contribution, and total.
  • Step S108 it is determined whether or not each item name is included in the order in which the priority defined in the rule list 1101 is high (in order of patient burden, partial contribution, and total) with respect to the sample form image 1
  • the item value corresponding to the item name determined to be included first is extracted (step S108). Then, when the extracted item value matches the correct value value associated with the sample document image 1, the feature amount A of the sample document image 1 and the rule list ID of the rule list 1101 are associated and stored in the rule dictionary 1102. (Step S109).
  • the rule list 1101 is stored in the rule dictionary 1102 in association with the rule list ID.
  • the item values extracted from the sample form images 1 to 6 match the correct values associated with the sample form images 1 to 6, and the feature amounts A to F of the sample form images 1 to 6 and the rule list
  • the rule list 1101 is stored in the rule dictionary 1102 in association with the rule list ID.
  • the item values extracted from sample form images 7 to 9 do not match the correct values associated with sample form images 7 to 9, and feature amounts G to I of sample form images 7 to 9 are still rules. It is not stored in the dictionary 1102.
  • the rule list 1201 is generated from the sample form images 7 to 9 that did not match the rule list 1101.
  • priorities are assigned in the order of partial contribution, patient contribution, and total.
  • the item value extracted from the sample form images 7 to 9 matches the correct value value associated with the sample form images 7 to 9 using the rule list 1201 for the sample form images 7 to 9
  • the rule dictionary 1202 is updated.
  • the item values extracted from the sample form images 7-8 match the correct values associated with the sample form images 7-8, and the feature amounts G to H of the sample form images 7 to 8 and the rule list
  • the rule list ID 1201 is stored in the rule dictionary 1202 in association with the rule list ID.
  • the item value extracted from the sample document image 9 does not match the correct value associated with the sample document image 9, and the feature amount I of the sample document image 9 is not stored in the rule dictionary 1202 yet.
  • a rule list 1301 is generated from the sample form image 9 which does not match the rule list 1201.
  • priorities are assigned in the order of total, patient contribution, and partial contribution.
  • the item value extracted from the sample form image 9 matches the correct value associated with the sample form image 9 using the rule list 1301 for the sample form image 9.
  • the rule dictionary 1302 is updated.
  • the item value extracted from the sample document image 9 matches the correct value associated with the sample document image 9, and the feature amount I of the sample document image 9 and the rule list ID of the rule list 1301 They are associated and stored in the rule dictionary 1302.
  • the processing for all the sample form images 1 to 9 is completed, and the rule dictionary 1302 is completed.
  • FIG. 14 is a flowchart illustrating an example of the operation of the process of detecting an item value from a form image by the information processing apparatus 100.
  • the receiving unit 121 transmits an acquisition request for requesting acquisition of the rule dictionary and the rule list to the generating device 200 via the first communication device 101, and the rule from the generating device 200 via the first communication device 101.
  • the dictionary and the rule list are received (step S201).
  • the transmitting unit 223 of the generating apparatus 200 reads the rule dictionary and the rule list generated in advance from the second storage apparatus 210, and 2. Transmit to the information processing apparatus 100 via the communication apparatus 201.
  • the receiving unit 121 stores the received rule dictionary and rule list in the first storage device 110.
  • the information processing apparatus 100 may obtain the rule dictionary and the rule list in advance and store the rule dictionary and the rule list in the first storage device 110.
  • the acquiring unit 122 acquires a form image to be detected as an item value from an image reading apparatus (not shown) via the first communication apparatus 101 (step S202).
  • the acquisition unit 122 may acquire a form image from the image reading apparatus via an interface circuit (not shown) such as USB (Universal Serial Bus).
  • the information processing apparatus 100 may include an imaging device, and the acquisition unit 122 may acquire a document image by imaging a form using the imaging device.
  • the feature quantity calculation unit 123 calculates a feature quantity from the form image acquired by the acquisition unit 122, as in the process of step S111 in FIG. 7 (step S203).
  • the feature amount calculation unit 123 calculates the same feature amount as the feature amount calculated by the rule dictionary generation unit 222 of the generation device 200.
  • the feature amount calculation unit 123 calculates the feature amount calculated by the same method as the rule dictionary generation unit 222. Assign feature quantity ID. In that case, the information processing apparatus 100 executes the following process using the feature amount ID instead of the feature amount.
  • the rule list extraction unit 124 determines whether the feature amount calculated by the feature amount calculation unit 123 is stored in the rule dictionary stored in the first storage device 110 (step S204). Note that the rule list extraction unit 124 is also configured to store the feature amount that is similar to the feature amount calculated by the feature amount calculation unit 123, as well as the feature amount that is similar to the feature amount calculated by the feature amount calculation unit 123. It may be determined that the feature quantity calculated by the above is stored. In such a case, the rule list extraction unit 124 determines that the two feature quantities are similar if the similarity between the feature quantity calculated by the feature quantity calculation unit 123 and the feature quantity stored in the rule dictionary is equal to or greater than a threshold. judge.
  • the similarity is, for example, the reciprocal of the Euclidean distance of each feature (feature vector), or a cosine similarity.
  • the rule list extraction unit 124 extracts the rule list corresponding to the rule list ID associated with the feature quantity in the rule dictionary. (Step S205).
  • the rule list extraction unit 124 extracts, for example, a rule list associated with the most feature amount in the rule dictionary as an alternative rule list.
  • the rule list extraction unit 124 can accurately detect the item value of the item to be noticed from the form image in which the feature amount is not registered in the rule dictionary.
  • the rule list extraction unit 124 may extract the alternative rule list according to other conditions.
  • the detection unit 125 detects an item value from the form image based on the rule list extracted by the rule list extraction unit 124 (step S207).
  • the detecting unit 125 determines whether or not each item name defined in the rule list is included in the form image in the order of the priorities defined in the rule list (in the order of high priority) using known OCR technology. The decision is made one by one.
  • the detection unit 125 detects an item value from an area of a predetermined size located in the display positional relationship defined in the position information associated with the item name, for the item name determined to be included first, Detect as an item value corresponding to the item name.
  • the detection unit 125 detects the item value corresponding to each item name based on the display position relationship defined in the rule list.
  • the output control unit 126 outputs the item value by displaying a recognition result screen including the item value detected by the detection unit 125 on the display device 103 (step S208).
  • FIG. 15 is a view showing an example of the recognition result screen.
  • an item 1501, a recognition result 1502, a correction button 1503, a confirmation button 1504 and the like are displayed for each item of interest.
  • the item value of each item detected from the form image is displayed as a recognition result 1502 in association with each item 1501.
  • the correction button 1503 is pressed by the user using the input device 102
  • the recognition result 1502 can be corrected (edited) by the operator.
  • the enter button 1504 using the input device 102
  • the item values of the items detected from the form image are corrected to the correction values designated in the recognition result 1502.
  • one correction button and a determination button commonly used for all the items may be displayed. In that case, when the correction button is pressed, all recognition results 1502 can be corrected (edited) by the operator, and when the confirmation button is pressed, all correction values designated in each recognition result 1502 are item values. Reflected in
  • the output control unit 126 outputs the item value by transmitting the item value to a server or the like that aggregates each piece of information related to the form via the first communication device 101 instead of displaying the item value on the display device 103. May be
  • the reception unit 127 determines whether or not the user has received an instruction to correct the item value displayed on the recognition result screen 1500 by pressing the enter button 1504 using the input device 102 by the user. (Step S209). If the user does not receive the correction instruction, the information processing apparatus 100 ends the series of steps.
  • the updating unit 128 executes the updating process (step S210), and ends the series of steps.
  • the updating unit 128 updates the rule list associated with the feature amount calculated by the feature amount calculating unit 123 in the rule dictionary in the updating process. Details of the update process will be described later.
  • FIG. 16 is a flowchart illustrating an example of the operation of the update process.
  • the update process shown in FIG. 16 is executed in step S210 of FIG.
  • the updating unit 128 selects one rule list from among the rule lists stored in the first storage device 110 (step S301).
  • the updating unit 128 extracts item values from the form image based on the selected rule list (step S302). Whether the item names specified in the rule list are included in the sample document image in the order of the priorities defined in the rule list (in the order of high priority) using the known OCR technology Whether or not it will be judged one by one.
  • the updating unit 128 detects an item value from an area of a predetermined size located in the display positional relationship defined in the position information associated with the item name, for the item name determined to be included first, Extract as the item value corresponding to the item name.
  • the updating unit 128 determines whether the extracted item value matches the correction value designated by the user using the input device 102 (step S303).
  • the updating unit 128 determines that the form image matches the rule list selected in step S301. In that case, the updating unit 128 associates the feature quantity calculated by the feature quantity calculating unit 123 in step S203 with the rule list ID of the rule list selected in step S301 and stores it in the rule dictionary (step S304) Finish the step. Thus, the updating unit 128 updates the rule dictionary stored in the first storage device 110. The updating unit 128 may transmit the updated rule dictionary to the generating device 200 via the first communication device 101, and may update the rule dictionary stored in the second storage device 210. As a result, the updating unit 128 can appropriately update the rule dictionary used by the other information processing apparatus.
  • the updating unit 128 updates (overwrites) the rule list ID associated with the feature quantity. Accordingly, the updating unit 128 can appropriately improve the quality of the rule dictionary by appropriately updating the rule list improperly registered in the rule dictionary.
  • the updating unit 128 associates the feature quantity with the rule list ID and newly adds the feature quantity to the rule dictionary. As a result, the updating unit 128 can add new information not yet registered to the rule dictionary to efficiently improve the quality of the rule dictionary.
  • the updating unit 128 determines whether the process has been completed for all the rule lists stored in the first storage device 110 (step S305). If there is a rule list whose processing has not been completed yet, the updating unit 128 returns the processing to step S301, and executes the processing of steps S301 to S305 on the unselected rule list. On the other hand, when the process is completed for all the rule lists, the updating unit 128 ends the series of steps.
  • the information processing apparatus 100 detects the item value of the desired item from the form image in accordance with the priority defined in the rule list associated with the feature amount of the form image.
  • the entry position of the item name may be different for each form, and there may be a plurality of synonyms of the item name in one form.
  • form images having similar feature amounts there is a high possibility that the same item name is used for the same item. It becomes possible for the information processing apparatus 100 to inspect each item name in descending order of the possibility of corresponding to the desired item from the form image in which the item name and the item value are represented for each of a plurality of items. It has become possible to detect the item value of the item more accurately.
  • the information processing apparatus 100 detects an item value of a desired item from the form image, using a rule list in which the display position relationship of item values corresponding to each item name is defined. As a result, the information processing apparatus 100 can detect the item value of the desired item with high accuracy from the form image of the form of the multi-layout form in which the description position of the item value for the item name is different for each form. .
  • the generating apparatus 200 examines each item name in the order of priority defined in the rule list, and first determines an item value corresponding to the item name determined to be included in the sample form image and its correct value If there is a match, the rule list is associated with the sample form image. As a result, the generating apparatus 200 can generate a rule list capable of correctly detecting the item value of a desired item by examining each item name in order of priority.
  • the generating apparatus 200 generates a rule dictionary by performing learning in advance using a plurality of sample form images.
  • the generation device 200 suppresses the number of rule lists included in the rule dictionary from becoming a huge amount, and highly accurate item values of desired items even for form images whose layout is unknown. It is possible to generate a rule dictionary that can be detected in
  • FIG. 17 is a block diagram showing a schematic configuration of the first processing circuit 320 in the information processing apparatus according to another embodiment.
  • the first processing circuit 320 is used instead of the first CPU 320, and executes detection processing instead of the first CPU 320.
  • the first processing circuit 320 includes a reception circuit 321, an acquisition circuit 322, a feature quantity calculation circuit 323, a rule list extraction circuit 324, a detection circuit 325, an output control circuit 326, a reception circuit 327, an update circuit 328, and the like.
  • the receiving circuit 321 is an example of a receiving unit, and has the same function as the receiving unit 121.
  • the receiving circuit 321 receives the rule dictionary and the rule list from the generating device via the first communication device 101, and stores the received rule dictionary and the rule list in the first storage device 110.
  • the feature amount calculation circuit 323 is an example of a feature amount calculation unit, and has the same function as the feature amount calculation unit 123.
  • the feature amount calculation circuit 323 receives the form image from the acquisition circuit 322, calculates the feature amount from the received form image, and outputs the feature amount to the rule list extraction circuit 324.
  • the rule list extraction circuit 324 is an example of a rule list extraction unit, and has the same function as the rule list extraction unit 124.
  • the rule list extraction circuit 324 reads the rule dictionary from the first storage device 110, receives the feature amount from the feature amount calculation circuit 323, extracts the rule list from the rule dictionary and the feature amount, and outputs the rule list to the detection circuit 325.
  • the detection circuit 325 is an example of a detection unit, and has the same function as the detection unit 125.
  • the detection circuit 325 receives a form image from the acquisition circuit 322, receives a rule list from the rule list extraction circuit 324, detects an item value of an item of interest from the form image and the rule list, and outputs the value to the output control circuit 326. Do.
  • the output control circuit 326 is an example of an output control unit, and has the same function as the output control unit 126.
  • the output control circuit 326 receives an item value from the detection circuit 325, and outputs the received item value to the display device 103.
  • the reception circuit 327 is an example of a reception unit, and has the same function as the reception unit 127.
  • the reception circuit 327 receives the correction value from the input device 102 and outputs the received correction value to the update circuit 328.
  • the update circuit 328 is an example of the update unit, and has the same function as the update unit 128.
  • the update circuit 328 reads the rule list from the first storage device 110, receives the correction value from the reception circuit 327, and updates the rule dictionary from the rule list and the correction value.
  • FIG. 18 is a block diagram showing a schematic configuration of the second processing circuit 420 in the generation device according to another embodiment.
  • the second processing circuit 420 is used instead of the second CPU 220, and executes generation processing instead of the second CPU 220.
  • the second processing circuit 420 includes a rule list generation circuit 421, a rule dictionary generation circuit 422, a transmission circuit 423, and the like.
  • the rule list generation circuit 421 is an example of a rule list generation unit, and has the same function as the rule list generation unit 221.
  • the rule list generation circuit 421 reads out a sample form image from the second storage device 210, generates a rule list based on the read out sample form image, and stores the rule list in the second storage device 210.
  • the rule dictionary generation circuit 422 is an example of a rule dictionary generation unit, and has the same function as the rule dictionary generation unit 222.
  • the rule dictionary generation circuit 422 reads out the sample form image and the rule list from the second storage device 210, generates a rule dictionary based on the read out sample form image and the rule list, and stores the rule dictionary in the second storage device 210.
  • the transmission circuit 423 is an example of a transmission unit, and has the same function as the transmission unit 223.
  • the transmission circuit 423 reads the rule dictionary from the second storage device 210, and transmits the read rule dictionary to the information processing device via the second communication device 201.
  • the information processing apparatus detects the item value of the desired item with higher accuracy. It became possible.
  • the generation process may be performed by the information processing device 100 instead of being performed by the generation device 200.
  • the first storage device 110 stores each piece of information stored in the second storage device 210
  • the first CPU 120 in addition to the portions shown in FIG. 2, has each portion having the same function as each portion of the second CPU 220. Have.
  • the generation device 200 may generate a rule dictionary and a rule list for each of two or more items, and the information processing device 100 may detect item values of two or more items from each form image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The present invention makes it possible to more accurately detect an item value for a desired item from a form image that indicates an item name and an item value for each of a plurality of items. This information processing device processes a form image indicating an item name and an item value for each of a plurality of items, and comprises: a storage unit which stores a rule dictionary associating a feature quantity of each of a plurality of sample form images with one of a plurality of rule lists that each specify a plurality of item names and a priority assigned to each item name; an acquisition unit which acquires a form image; a feature quantity calculation unit which calculates a feature quantity from the acquired form image; a rule list extraction unit which extracts the rule list associated by the rule dictionary with the calculated feature quantity; a detection unit which sequentially determines whether or not each item name specified in the extracted rule list is included in the form image, in the priority order specified in the rule list, and detects the item value associated with the first item name that was determined to be included in the form image; and an output unit which outputs the detected item value.

Description

情報処理装置、情報処理システム、制御方法及び制御プログラムINFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, CONTROL METHOD, AND CONTROL PROGRAM
 本開示は、情報処理装置、情報処理システム、制御方法及び制御プログラムに関し、特に、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置、情報処理システム、制御方法及び制御プログラムに関する。 The present disclosure relates to an information processing apparatus, an information processing system, a control method, and a control program, and more particularly to an information processing apparatus, an information processing system, and a control method for processing a form image in which item names and item values are represented for each of a plurality of items And the control program.
 生命保険会社では、保険金支払いの際、顧客から送付された医療領収書等の帳票を、担当者が手作業によりデータ化している。生命保険会社によっては、顧客から膨大な数の帳票が送付されるため、担当者の業務負担は大きく、帳票のデータ化作業のアシストに対する要望が高まっている。 At a life insurance company, a person in charge manually generates a form such as a medical receipt sent from a customer at the time of insurance payment. Since a life insurance company sends a huge number of forms from customers, the work burden on the person in charge is large, and there is a growing demand for assistance with the data conversion of forms.
 任意のフォーマットで情報が記入された媒体から読み取ったイメージに基づいて、その情報の認識を行う媒体処理装置が開示されている(特許文献1を参照)。この媒体処理装置は、読み取ったイメージのデータから、学習更新が可能な解析用辞書を参照してそのイメージのレイアウトの特徴を抽出し、その特徴から認識すべき情報の存在する位置を特定するとともに、そのレイアウトの特徴を学習によって更新していく。 A media processing apparatus is disclosed which recognizes information on the basis of an image read from a medium in which information is written in an arbitrary format (see Patent Document 1). The medium processing apparatus extracts features of the layout of the image from the data of the read image by referring to an analysis dictionary capable of learning and updating, and identifies the position where information to be recognized exists from the features. , Update the characteristics of the layout by learning.
 また、入力された帳票の画像情報の枠及び文字列を検出し、その文字列が項目名である可能性が高い枠を項目名領域候補として検出し、項目名領域候補に含まれる文字列が項目名であるか否かを判定する帳票認識装置が開示されている(特許文献2を参照)。 Also, a frame and a character string of the image information of the input form are detected, and a frame having a high possibility that the character string is an item name is detected as an item name area candidate, and the character string included in the item name area candidate is A form recognition apparatus for determining whether or not an item name is disclosed is disclosed (see Patent Document 2).
 また、帳票原稿から取得した画像データについて帳票処理を行う帳票処理装置が開示されている(特許文献3を参照)。この帳票処理装置は、各画像データから抽出した罫線データの種類に基づいて、各画像データを複数の画像グループに分類し、各画像グループに割り当てられた帳票テンプレートデータに基づいて、各画像グループの画像データについて帳票処理を行う。 Further, a form processing apparatus for performing form processing on image data acquired from a form document has been disclosed (see Patent Document 3). The form processing apparatus classifies each image data into a plurality of image groups based on the type of ruled line data extracted from each image data, and based on form template data assigned to each image group, Perform form processing on image data.
 また、帳票上に任意の表構造で存在する文字列の認識を行う帳票認識装置が開示されている(特許文献4を参照)。この帳票認識装置は、帳票が電子化された帳票画像上に存在する文字列を認識し、認識した文字列のなかから予め定めた文字列である見出し文言を抽出する。そして、この帳票認識装置は、抽出した見出し文言の帳票画像上の配置に基づいて、帳票画像上に存在する表構造を判定し、その判定結果を用いて、見出し文言とデータの間の対応関係を特定する。 There is also disclosed a form recognition apparatus for recognizing a character string existing in an arbitrary table structure on a form (see Patent Document 4). The form recognition device recognizes a character string existing on a form image obtained by digitizing a form, and extracts a headline wording, which is a predetermined character string, from among the recognized character strings. Then, the form recognition device determines the table structure existing on the form image based on the arrangement of the extracted headline wording on the form image, and uses the determination result to determine the correspondence between the headline wording and the data. Identify
WO97/05561WO 97/05561 特開2009-93305号公報JP, 2009-93305, A 特開2014-89575号公報JP, 2014-89575, A 特開2010-3155号公報JP, 2010-3155, A
 複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置において、所望の項目の項目値をより精度良く検出できることが望まれている。 In an information processing apparatus that processes a form image in which item names and item values are represented for each of a plurality of items, it is desired that item values of desired items can be detected more accurately.
 情報処理装置、情報処理システム、制御方法及び制御プログラムの目的は、複数の項目毎に項目名及び項目値が表された帳票画像から、所望の項目の項目値をより精度良く検出することを可能とすることにある。 The object of the information processing apparatus, the information processing system, the control method and the control program is that the item value of the desired item can be detected more accurately from the form image in which the item name and the item value are represented for each of a plurality of items. And to.
 本発明の一側面に係る情報処理装置は、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置であって、複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を記憶する記憶部と、帳票画像を取得する取得部と、取得された帳票画像から特徴量を算出する特徴量算出部と、ルール辞書において、算出された特徴量に関連付けられたルールリストを抽出するルールリスト抽出部と、抽出されたルールリストに規定された優先順位の順に、そのルールリストに規定された各項目名が帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された項目名に対応する項目値を検出する検出部と、検出された項目値を出力する出力部とを有する。 An information processing apparatus according to an aspect of the present invention is an information processing apparatus that processes a form image in which an item name and an item value are represented for each of a plurality of items, and each of the feature amounts for each of a plurality of sample form images A storage unit storing a rule dictionary in which a plurality of item names and each of a plurality of rule lists in which priorities of the item names are defined are associated, an acquisition unit acquiring a form image, and an acquired form image The feature quantity calculation unit that calculates the feature quantity from the rule list, the rule list extraction unit that extracts the rule list associated with the calculated feature quantity in the rule dictionary, and the priority order defined in the extracted rule list A detection unit that sequentially determines whether each item name specified in the rule list is included in the form image, and detects an item value corresponding to the item name determined to be first included; And an output unit that outputs the item value.
 本発明の一側面に係る情報処理システムは、生成装置と、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置とを有する情報処理システムであって、生成装置は、複数のサンプル帳票画像と、複数の項目名及び各項目名の優先順位が規定された複数のルールリストとを記憶する記憶部と、複数のルールリストのそれぞれに適合するサンプル帳票画像の特徴量と、各ルールリストとを関連付けたルール辞書を生成するルール辞書生成部と、ルール辞書を情報処理装置に送信する送信部とを有し、情報処理装置は、生成装置からルール辞書を受信する受信部と、帳票画像を取得する取得部と、取得された帳票画像から特徴量を算出する特徴量算出部と、受信されたルール辞書において、算出された特徴量に関連付けられたルールリストを抽出するルールリスト抽出部と、抽出されたルールリストに規定された優先順位の順に、そのルールリストに規定された各項目名が帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された項目名に対応する項目値を検出する検出部と、検出された項目値を出力する出力部とを有する。 An information processing system according to an aspect of the present invention is an information processing system including a generating device, and an information processing device that processes a form image in which an item name and an item value are represented for each of a plurality of items. A storage unit storing a plurality of sample form images, a plurality of item names, and a plurality of rule lists in which priorities of the item names are defined; and characteristics of the sample form images conforming to each of the plurality of rule lists A rule dictionary generation unit that generates a rule dictionary that associates the amount and each rule list; and a transmission unit that transmits the rule dictionary to the information processing apparatus, and the information processing apparatus receives the rule dictionary from the generation apparatus A receiving unit, an acquiring unit for acquiring a form image, a feature amount calculating unit for calculating a feature amount from the acquired form image, and a feature amount calculated in the received rule dictionary. It is sequentially judged whether or not each item name specified in the rule list is included in the form image in the order of the rule list extraction unit that extracts the rule list and the priority order specified in the extracted rule list. It has a detection part which detects the item value corresponding to the item name judged to be first contained, and the output part which outputs the detected item value.
 また、本発明の一側面に係る制御方法は、記憶部と、出力部とを有し、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置の制御方法であって、複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を記憶部に記憶し、帳票画像を取得し、取得された帳票画像から特徴量を算出し、ルール辞書において、算出された特徴量に関連付けられたルールリストを抽出し、抽出されたルールリストに規定された優先順位の順に、そのルールリストに規定された各項目名が帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された項目名に対応する項目値を検出し、検出された項目値を出力部に出力することを含む。 A control method according to an aspect of the present invention is a control method of an information processing apparatus that has a storage unit and an output unit and processes a form image in which an item name and an item value are represented for each of a plurality of items. Storing, in the storage unit, a rule dictionary in which each of the feature amounts for each of a plurality of sample document images is associated with each of a plurality of item names and a plurality of rule lists in which the priorities of the item names are defined; , Form image is acquired, feature amount is calculated from the acquired form image, and a rule dictionary is used to extract a rule list associated with the calculated feature amount, and the priority order specified in the extracted rule list is extracted In order, it is sequentially determined whether each item name specified in the rule list is included in the form image, and the item value corresponding to the item name determined to be included first is detected, and the detected item value Output to the output unit Including the Rukoto.
 また、本発明の一側面に係る制御プログラムは、記憶部と、出力部とを有し、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置の制御プログラムであって、複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を記憶部に記憶し、帳票画像を取得し、取得された帳票画像から特徴量を算出し、ルール辞書において、算出された特徴量に関連付けられたルールリストを抽出し、抽出されたルールリストに規定された優先順位の順に、そのルールリストに規定された各項目名が帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された項目名に対応する項目値を検出し、検出された項目値を出力部に出力することを情報処理装置に実行させる。 A control program according to an aspect of the present invention is a control program of an information processing apparatus that has a storage unit and an output unit and processes a form image in which an item name and an item value are represented for each of a plurality of items. Storing, in the storage unit, a rule dictionary in which each of the feature amounts for each of a plurality of sample document images is associated with each of a plurality of item names and a plurality of rule lists in which the priorities of the item names are defined; , Form image is acquired, feature amount is calculated from the acquired form image, and a rule dictionary is used to extract a rule list associated with the calculated feature amount, and the priority order specified in the extracted rule list is extracted In order, it is sequentially determined whether each item name specified in the rule list is included in the form image, and the item value corresponding to the item name determined to be included first is detected, and the detected item value The To perform the outputting the force unit to the information processing apparatus.
 本実施形態によれば、情報処理装置、情報処理システム、制御方法及び制御プログラムは、複数の項目毎に項目名及び項目値が表された帳票画像から、所望の項目の項目値をより精度良く検出することが可能となる。 According to the present embodiment, the information processing apparatus, the information processing system, the control method, and the control program can accurately calculate the item value of the desired item from the form image in which the item name and the item value are represented for each of the plurality of items. It becomes possible to detect.
 本発明の目的及び効果は、特に請求項において指摘される構成要素及び組み合わせを用いることによって認識され且つ得られるだろう。前述の一般的な説明及び後述の詳細な説明の両方は、例示的及び説明的なものであり、特許請求の範囲に記載されている本発明を制限するものではない。 The objects and advantages of the invention will be realized and obtained by means of the elements and combinations particularly pointed out in the claims. Both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
実施形態に従った情報処理システム10の概略構成の一例を示す図である。FIG. 1 shows an example of a schematic configuration of an information processing system 10 according to an embodiment. 第1記憶装置110及び第1CPU120の概略構成を示す図である。FIG. 2 is a diagram showing a schematic configuration of a first storage device 110 and a first CPU 120. 第2記憶装置210及び第2CPU220の概略構成を示す図である。It is a figure which shows schematic structure of 2nd memory | storage device 210 and 2nd CPU220. 複数のサンプル帳票画像400、401、402の一例を示す図である。FIG. 6 is a view showing an example of a plurality of sample document images 400, 401, 402. 複数のルールリスト500、501、502の一例を示す図である。It is a figure which shows an example of several rule list | wrist 500, 501,502. ルール辞書600の一例を示す図である。FIG. 6 is a diagram showing an example of a rule dictionary 600. ルール辞書の生成処理の動作の例を示すフローチャートである。It is a flow chart which shows an example of operation of generation processing of a rule dictionary. 罫線の形状に関する特徴量について説明するための図である。It is a figure for demonstrating the feature-value regarding the shape of a ruled line. ロゴに関する特徴量について説明するための図である。It is a figure for demonstrating the feature-value regarding a logo. 文字認識結果に関する特徴量について説明するための図である。It is a figure for demonstrating the feature-value regarding a character recognition result. 生成処理により生成されるルール辞書の一例を示す図である。It is a figure which shows an example of the rule dictionary produced | generated by a production | generation process. 生成処理により生成されるルール辞書の一例を示す図である。It is a figure which shows an example of the rule dictionary produced | generated by a production | generation process. 生成処理により生成されるルール辞書の一例を示す図である。It is a figure which shows an example of the rule dictionary produced | generated by a production | generation process. 項目値の検出処理の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of the detection process of an item value. 認識結果画面の一例を示す図である。It is a figure which shows an example of a recognition result screen. 更新処理の動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement of an update process. 他の情報処理装置の第1処理回路320の概略構成を示す図である。It is a figure which shows schematic structure of the 1st processing circuit 320 of another information processing apparatus. 他の生成装置の第2処理回路420の概略構成を示す図である。It is a figure which shows schematic structure of the 2nd processing circuit 420 of another production | generation apparatus.
 以下、本開示の一側面に係る情報処理装置について図を参照しつつ説明する。但し、本開示の技術的範囲はそれらの実施の形態に限定されず、特許請求の範囲に記載された発明とその均等物に及ぶ点に留意されたい。 Hereinafter, an information processing apparatus according to an aspect of the present disclosure will be described with reference to the drawings. However, it should be noted that the technical scope of the present disclosure is not limited to those embodiments, but extends to the inventions described in the claims and the equivalents thereof.
 図1は、実施形態に従った情報処理システム10の概略構成の一例を示す図である。 FIG. 1 is a diagram showing an example of a schematic configuration of an information processing system 10 according to the embodiment.
 情報処理システム10は、情報処理装置100と、生成装置200とを有する。 The information processing system 10 includes an information processing device 100 and a generation device 200.
 情報処理装置100及び生成装置200は、それぞれ、有線又は無線による通信機能を有し、ネットワーク300に接続され、ネットワーク300を介して相互に通信する。 The information processing apparatus 100 and the generating apparatus 200 each have a wired or wireless communication function, are connected to the network 300, and mutually communicate with each other via the network 300.
 情報処理装置100は、PC(Personal Computer)、ノートPC等の情報処理装置であり、そのユーザである作業者により使用され、複数の項目毎に項目名及び項目値が表された帳票画像を処理する。なお、情報処理装置100は、タブレットPC、多機能携帯電話(いわゆるスマートフォン)、携帯情報端末等の携帯可能な装置でもよい。情報処理装置100は、第1通信装置101と、入力装置102と、表示装置103と、第1記憶装置110と、第1CPU(Central Processing Unit)120とを有する。以下、情報処理装置100の各部について詳細に説明する。 The information processing apparatus 100 is an information processing apparatus such as a personal computer (PC) or a notebook PC, and is used by a worker who is the user to process a form image in which an item name and an item value are represented for each of a plurality of items. Do. The information processing apparatus 100 may be a portable apparatus such as a tablet PC, a multi-function mobile phone (so-called smart phone), or a mobile information terminal. The information processing device 100 includes a first communication device 101, an input device 102, a display device 103, a first storage device 110, and a first central processing unit (CPU) 120. Hereinafter, each part of the information processing apparatus 100 will be described in detail.
 第1通信装置101は、TCP/IP(Transmission Control Protocol/Internet Protocol)等の有線の通信インタフェース回路を有する。第1通信装置101は、イーサネット(登録商標)等の通信方式に従って、ネットワーク300と通信接続する。第1通信装置101は、ネットワーク300を介して生成装置200から受信したデータを第1CPU120に供給し、第1CPU120から供給されたデータをネットワーク300を介して生成装置200に送信する。なお、第1通信装置101は、外部の装置と通信できるものであればどのようなものでもよい。例えば、第1通信装置101は、無線LAN(Local Area Network)通信方式に従って不図示のアクセスポイントを介して生成装置200と通信してもよい。または、第1通信装置101は、携帯電話通信方式に従って不図示の基地局装置を介して生成装置200と通信してもよい。 The first communication apparatus 101 has a wired communication interface circuit such as Transmission Control Protocol / Internet Protocol (TCP / IP). The first communication device 101 communicates with the network 300 in accordance with a communication method such as Ethernet (registered trademark). The first communication device 101 supplies the data received from the generation device 200 via the network 300 to the first CPU 120, and transmits the data supplied from the first CPU 120 to the generation device 200 via the network 300. The first communication device 101 may be any device as long as it can communicate with an external device. For example, the first communication apparatus 101 may communicate with the generation apparatus 200 via an access point (not shown) according to a wireless local area network (LAN) communication scheme. Alternatively, the first communication apparatus 101 may communicate with the generation apparatus 200 via a base station apparatus (not shown) according to the mobile phone communication scheme.
 入力装置102は、入力部の一例であり、タッチパネル式の入力装置、キーボード、マウス等の入力デバイス及び入力デバイスから信号を取得するインタフェース回路を有する。入力装置102は、ユーザにより入力された入力データを受け付け、受け付けた入力データに応じた信号を第1CPU120に対して出力する。 The input device 102 is an example of an input unit, and includes an input device such as a touch panel type input device, a keyboard, a mouse, and the like, and an interface circuit that acquires a signal from the input device. The input device 102 receives input data input by the user, and outputs a signal corresponding to the received input data to the first CPU 120.
 表示装置103は、出力部の一例であり、液晶、有機EL(Electro-Luminescence)等から構成されるディスプレイ及びディスプレイに画像データ又は各種の情報を出力するインタフェース回路を有する。表示装置103は、第1CPU120と接続されて、第1CPU120から出力された画像データをディスプレイに表示する。なお、タッチパネルディスプレイを用いて、入力装置102と表示装置103を一体に構成してもよい。 The display device 103 is an example of an output unit, and includes a display configured of liquid crystal, organic EL (Electro-Luminescence), and the like, and an interface circuit that outputs image data or various information to the display. The display device 103 is connected to the first CPU 120 and displays the image data output from the first CPU 120 on the display. The input device 102 and the display device 103 may be integrally configured using a touch panel display.
 第1記憶装置110は、記憶部の一例である。第1記憶装置110は、RAM(Random Access Memory)、ROM(Read Only Memory)等のメモリ装置、ハードディスク等の固定ディスク装置、又はフレキシブルディスク、光ディスク等の可搬用の記憶装置等を有する。また、第1記憶装置110には、情報処理装置100の各種処理に用いられるコンピュータプログラム、データベース、テーブル等が格納される。コンピュータプログラムは、例えばCD-ROM(compact disk read only memory)、DVD-ROM(digital versatile disk read only memory)等のコンピュータ読み取り可能な可搬型記録媒体からインストールされてもよい。コンピュータプログラムは、公知のセットアッププログラム等を用いて第1記憶装置110にインストールされる。また、第1記憶装置110には、データとして、ルール辞書及び複数のルールリスト等が記憶される。ルール辞書及びルールリストの詳細については後述する。 The first storage device 110 is an example of a storage unit. The first storage device 110 includes a memory device such as a random access memory (RAM) or a read only memory (ROM), a fixed disk device such as a hard disk, or a portable storage device such as a flexible disk or an optical disk. In addition, the first storage device 110 stores a computer program, a database, a table, and the like used for various processes of the information processing apparatus 100. The computer program may be installed from a computer-readable portable recording medium such as, for example, a compact disk read only memory (CD-ROM) or a digital versatile disk read only memory (DVD-ROM). The computer program is installed in the first storage device 110 using a known setup program or the like. Further, in the first storage device 110, a rule dictionary, a plurality of rule lists, and the like are stored as data. Details of the rule dictionary and the rule list will be described later.
 第1CPU120は、予め第1記憶装置110に記憶されているプログラムに基づいて動作する。第1CPU120は、汎用プロセッサであってもよい。なお、第1CPU120に代えて、DSP(digital signal processor)、LSI(large scale integration)等が用いられてよい。また、CPU160に代えて、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)等が用いられてもよい。 The first CPU 120 operates based on a program stored in advance in the first storage device 110. The first CPU 120 may be a general purpose processor. Note that, instead of the first CPU 120, a digital signal processor (DSP), a large scale integration (LSI), or the like may be used. Also, in place of the CPU 160, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or the like may be used.
 第1CPU120は、第1通信装置101、入力装置102、表示装置103及び第1記憶装置110と接続され、これらの各部を制御する。第1CPU120は、第1通信装置101を介したデータ送受信制御、入力装置102の入力制御、表示装置103の出力制御及び第1記憶装置110の制御等を行う。また、第1CPU120は、第1通信装置101等から取得した帳票画像を処理する。 The first CPU 120 is connected to the first communication device 101, the input device 102, the display device 103, and the first storage device 110, and controls these units. The first CPU 120 performs data transmission / reception control via the first communication device 101, input control of the input device 102, output control of the display device 103, control of the first storage device 110, and the like. Further, the first CPU 120 processes a form image acquired from the first communication device 101 or the like.
 生成装置200は、情報処理装置100が使用するルール辞書を生成するサーバである。ルール辞書は、帳票画像を処理するために、特に帳票画像から注目する項目の項目値を検出するために、使用されるデータである。帳票画像は、帳票を表す画像データであり、帳票は、複数の項目毎に項目名及び項目値が表された文書である。本実施形態では、帳票として多種レイアウトの帳票が使用される。多種レイアウトの帳票とは、複数の異なる機関によって個別に作成され、帳票毎に項目名の記載位置が異なり且つ項目名の類義語が複数存在する帳票のことをいう。注目する項目とは、電子データとして演算又は出力等の処理を行う必要がある項目である。生成装置200は、第2通信装置201と、第2記憶装置210と、第2CPU220とを有する。以下、生成装置200の各部について詳細に説明する。 The generation device 200 is a server that generates a rule dictionary used by the information processing device 100. The rule dictionary is data used to process a form image, in particular to detect an item value of an item of interest from the form image. The form image is image data representing a form, and the form is a document in which an item name and an item value are represented for each of a plurality of items. In the present embodiment, forms of various layouts are used as forms. A form with various layouts is a form that is individually created by a plurality of different organizations, has different entry positions of item names for each form, and has a plurality of synonyms of item names. The item to be noted is an item that needs to be processed as electronic data, such as calculation or output. The generation device 200 includes a second communication device 201, a second storage device 210, and a second CPU 220. Hereinafter, each part of the generating device 200 will be described in detail.
 第2通信装置201は、第1通信装置101と同様の通信インタフェース回路を有し、第1通信装置101と同様の通信方式に従って、ネットワーク300と通信接続する。第2通信装置201は、ネットワーク300を介して情報処理装置100から受信したデータを第2CPU220に供給し、第2CPU220から供給されたデータをネットワーク300を介して情報処理装置100に送信する。 The second communication device 201 has the same communication interface circuit as the first communication device 101, and communicates with the network 300 in accordance with the same communication method as the first communication device 101. The second communication device 201 supplies the data received from the information processing device 100 via the network 300 to the second CPU 220, and transmits the data supplied from the second CPU 220 to the information processing device 100 via the network 300.
 第2記憶装置210は、記憶部の一例である。第2記憶装置210は、第1記憶装置と同様のメモリ装置、固定ディスク装置又は記憶装置等を有する。また、第2記憶装置210には、生成装置200の各種処理に用いられるコンピュータプログラム、データベース、テーブル等が格納される。コンピュータプログラムは、例えばCD-ROM、DVD-ROM等のコンピュータ読み取り可能な可搬型記録媒体からインストールされてもよい。コンピュータプログラムは、公知のセットアッププログラム等を用いて第2記憶装置210にインストールされる。また、第2記憶装置210には、データとして、複数のサンプル帳票画像、ルール辞書及び複数のルールリスト等が記憶される。サンプル帳票画像、ルール辞書及びルールリストの詳細については後述する。 The second storage device 210 is an example of a storage unit. The second storage device 210 includes a memory device similar to the first storage device, a fixed disk device, a storage device, and the like. In addition, the second storage device 210 stores a computer program, a database, a table, and the like used for various processes of the generation device 200. The computer program may be installed from a computer readable portable recording medium such as a CD-ROM, a DVD-ROM, etc. The computer program is installed in the second storage device 210 using a known setup program or the like. Further, in the second storage device 210, a plurality of sample form images, a rule dictionary, a plurality of rule lists, and the like are stored as data. Details of the sample form image, the rule dictionary, and the rule list will be described later.
 第2CPU220は、予め第2記憶装置210に記憶されているプログラムに基づいて動作する。第2CPU220は、汎用プロセッサであってもよい。なお、第2CPU220に代えて、DSP、LSI、ASIC、FPGA等が用いられてもよい。 The second CPU 220 operates based on a program stored in advance in the second storage device 210. The second CPU 220 may be a general purpose processor. Note that, in place of the second CPU 220, a DSP, an LSI, an ASIC, an FPGA, or the like may be used.
 第2CPU220は、第2通信装置201及び第2記憶装置210と接続され、これらの各部を制御する。第2CPU220は、第2通信装置201を介したデータ送受信制御及び第2記憶装置210の制御等を行う。また、第2CPU220は、ルール辞書を生成する。 The second CPU 220 is connected to the second communication device 201 and the second storage device 210, and controls these units. The second CPU 220 performs data transmission / reception control via the second communication device 201, control of the second storage device 210, and the like. Also, the second CPU 220 generates a rule dictionary.
 図2は、情報処理装置100の第1記憶装置110及び第1CPU120の概略構成を示す図である。 FIG. 2 is a diagram showing a schematic configuration of the first storage device 110 and the first CPU 120 of the information processing apparatus 100. As shown in FIG.
 図2に示すように、第1記憶装置110には、受信プログラム111、取得プログラム112、特徴量算出プログラム113、ルールリスト抽出プログラム114、検出プログラム115、出力制御プログラム116、受付プログラム117及び更新プログラム118等の各プログラムが記憶される。これらの各プログラムは、プロセッサ上で動作するソフトウェアにより実装される機能モジュールである。第1CPU120は、第1記憶装置110に記憶された各プログラムを読み取り、読み取った各プログラムに従って動作する。これにより、第1CPU120は、受信部121、取得部122、特徴量算出部123、ルールリスト抽出部124、検出部125、出力制御部126、受付部127及び更新部128として機能する。 As shown in FIG. 2, in the first storage device 110, a reception program 111, an acquisition program 112, a feature quantity calculation program 113, a rule list extraction program 114, a detection program 115, an output control program 116, a reception program 117 and an update program Each program such as 118 is stored. Each of these programs is a functional module implemented by software operating on the processor. The first CPU 120 reads each program stored in the first storage device 110 and operates according to each read program. Thus, the first CPU 120 functions as the reception unit 121, the acquisition unit 122, the feature amount calculation unit 123, the rule list extraction unit 124, the detection unit 125, the output control unit 126, the reception unit 127, and the update unit 128.
 図3は、生成装置200の第2記憶装置210及び第2CPU220の概略構成を示す図である。 FIG. 3 is a diagram showing a schematic configuration of the second storage device 210 and the second CPU 220 of the generation device 200.
 図3に示すように、第2記憶装置210には、ルールリスト生成プログラム211、ルール辞書生成プログラム212及び送信プログラム213等の各プログラムが記憶される。これらの各プログラムは、プロセッサ上で動作するソフトウェアにより実装される機能モジュールである。第2CPU220は、第2記憶装置210に記憶された各プログラムを読み取り、読み取った各プログラムに従って動作する。これにより、第2CPU220は、ルールリスト生成部221、ルール辞書生成部222及び送信部223として機能する。 As shown in FIG. 3, the second storage device 210 stores programs such as a rule list generation program 211, a rule dictionary generation program 212, and a transmission program 213. Each of these programs is a functional module implemented by software operating on the processor. The second CPU 220 reads each program stored in the second storage device 210 and operates according to each read program. Thus, the second CPU 220 functions as a rule list generation unit 221, a rule dictionary generation unit 222, and a transmission unit 223.
 図4は、生成装置200の第2記憶装置210に記憶される複数のサンプル帳票画像400、401、402の一例を示す図である。 FIG. 4 is a view showing an example of a plurality of sample form images 400, 401, 402 stored in the second storage device 210 of the generation device 200. As shown in FIG.
 各サンプル帳票画像400、401、402は、ルール辞書を生成するために事前に登録された帳票画像である。図4に示すように、各サンプル帳票画像400、401、402には、複数の項目毎に、各項目に対応する項目名と、各項目名に対応する項目値とが対応付けて表示されている。また、各サンプル帳票画像400、401、402には、注目する項目の項目値が、正解値として関連付けて記憶される。 Each sample form image 400, 401, 402 is a form image registered in advance to generate a rule dictionary. As shown in FIG. 4, in each of the sample form images 400, 401, 402, an item name corresponding to each item and an item value corresponding to each item name are displayed in association with each other for each of a plurality of items. There is. Further, in each of the sample document images 400, 401, and 402, the item value of the item of interest is associated and stored as the correct value.
 図4に示す例では、帳票として医療領収書が用いられている。この例では、医療に係る各費用が複数の項目に分類され、項目名として各費用の費用名目が記載され、項目値として各費用の金額が記載されている。また、この例では、注目する項目として「患者の負担額」が指定されている。しかしながら、医療領収書のような多種レイアウトの帳票では、同一の項目に対する項目名が帳票毎に異なる可能性がある。図4に示す例では、項目「患者の負担額」に対して、サンプル帳票画像400では項目名「患者負担額」403が用いられ、サンプル帳票画像401では項目名「一部負担金」404が用いられ、サンプル帳票画像402では項目名「合計」405が用いられている。各サンプル帳票画像400、401、402が登録される際に、サンプル帳票画像400、401、402のそれぞれにおける項目「患者の負担額」に対する正解値「15000」が登録者により登録される。 In the example shown in FIG. 4, a medical receipt is used as a form. In this example, each expense concerning medical treatment is classified into a plurality of items, the expense nominal of each expense is described as an item name, and the amount of each expense is described as an item value. Also, in this example, “patient's burden” is specified as the item to be focused on. However, in forms with various layouts such as medical receipts, item names for the same item may differ from form to form. In the example shown in FIG. 4, for the item “patient's burden”, the item name “patient's burden” 403 is used in the sample form image 400, and the item name “partial burden” 404 is in the sample form image 401. The item name “total” 405 is used in the sample document image 402. When each sample document image 400, 401, 402 is registered, the registrant registers the correct value "15000" for the item "patient's burden" in each of the sample document images 400, 401, 402.
 図5は、複数のルールリスト500、501、502の一例を示す図である。 FIG. 5 is a diagram showing an example of the plurality of rule lists 500, 501, 502. As shown in FIG.
 各ルールリスト500、501、502は、生成装置200により生成されて第2記憶装置210に記憶されるとともに、情報処理装置100に送信されて第1記憶装置110に記憶される。図5に示すように、各ルールリスト500、501、502には、各ルールリストを識別するためのルールリストIDが割り当てられる。また、各ルールリスト500、501、502には、複数のルール(図5に示す表の中の各行に対応する情報)が関連付けて規定され、各ルール毎に、ルール番号、優先順位、項目名及び位置情報等が関連付けて規定される。即ち、一つのルールは、一つの項目名に対応する。 Each rule list 500, 501, 502 is generated by the generation device 200 and stored in the second storage device 210, and is also transmitted to the information processing device 100 and stored in the first storage device 110. As shown in FIG. 5, a rule list ID for identifying each rule list is assigned to each rule list 500, 501, 502. Also, in each rule list 500, 501, 502, a plurality of rules (information corresponding to each row in the table shown in FIG. 5) are defined in association with each other, and for each rule, the rule number, priority, item name And location information etc. are defined in association with each other. That is, one rule corresponds to one item name.
 ルール番号は、各ルールを識別するための番号である。優先順位は、各ルールリストを用いて帳票画像から注目する項目の項目値を検出する際に、各ルールで規定された項目名が検査される順序を規定するための順位である。この例では、数値「1」の優先順位が最も高く、数値が大きくなる程、優先順位が低くなるように、各ルールに優先順位が割り当てられている。項目名は、各ルールリストを用いて帳票画像から注目する項目の項目値を検出する際に、各ルールにおいて検査される名称である。位置情報は、各サンプル帳票画像における各項目名と各項目名に対応する項目値の表示位置関係を規定するための情報である。位置情報として、例えば項目名に対する項目値の相対位置(例えば、上、下、右、左、右上、右下、左上、左下)が規定される。位置情報は、各ルールリストを用いて帳票画像から注目する項目の項目値を検出する際に、各ルールで規定された項目名に対応する項目値を特定するために使用される。 The rule number is a number for identifying each rule. The priority order is an order for defining the order in which the item names defined in each rule are inspected when detecting the item value of the item of interest from the form image using each rule list. In this example, the priority is assigned to each rule so that the priority becomes higher as the numerical value “1” has the highest priority and the numerical value becomes larger. The item name is a name to be inspected in each rule when detecting an item value of an item of interest from the form image using each rule list. The position information is information for defining the display position relationship between each item name and each item value corresponding to each item name in each sample form image. As position information, for example, relative positions of item values with respect to item names (for example, upper, lower, right, left, upper right, lower right, upper left, lower left) are defined. The position information is used to specify an item value corresponding to an item name defined in each rule when detecting an item value of an item of interest from the form image using each rule list.
 図6は、ルール辞書600の一例を示す図である。 FIG. 6 shows an example of the rule dictionary 600. As shown in FIG.
 ルール辞書600は、生成装置200により生成されて第2記憶装置210に記憶されるとともに、情報処理装置100に送信されて第1記憶装置110に記憶される。図6に示すように、ルール辞書600には、複数の特徴量のそれぞれと、複数のルールリストIDのそれぞれとが関連付けて規定される。 The rule dictionary 600 is generated by the generation device 200 and stored in the second storage device 210, and is also transmitted to the information processing device 100 and stored in the first storage device 110. As shown in FIG. 6, in the rule dictionary 600, each of a plurality of feature amounts is defined in association with each of a plurality of rule list IDs.
 特徴量は、各サンプル帳票画像における特徴量である。特徴量の詳細については後述する。ルール辞書600では、各ルールリストに適合するサンプル帳票画像の特徴量と、各ルールリストとが関連付けられている。 The feature amount is a feature amount in each sample form image. Details of the feature amount will be described later. In the rule dictionary 600, the feature amount of the sample form image that conforms to each rule list is associated with each rule list.
 図7は、生成装置200によるルール辞書の生成処理の動作の例を示すフローチャートである。 FIG. 7 is a flowchart showing an example of the operation of the generation process of the rule dictionary by the generation device 200.
 以下、図7に示したフローチャートを参照しつつ、生成装置200による生成処理の動作の例を説明する。なお、以下に説明する動作のフローは、予め第2記憶装置210に記憶されているプログラムに基づき主に第2CPU220により生成装置200の各要素と協働して実行される。 Hereinafter, an example of the operation of the generation processing by the generation device 200 will be described with reference to the flowchart illustrated in FIG. 7. The flow of the operation described below is executed mainly by the second CPU 220 in cooperation with each element of the generation device 200 based on a program stored in advance in the second storage device 210.
 最初に、ルールリスト生成部221はルールリストを初期化し、ルール辞書生成部222はルール辞書を初期化する(ステップS101)。ルールリスト生成部221は、第2記憶装置210に記憶された全てのサンプル帳票画像から、注目する項目に対応し得る全ての項目名を抽出する。なお、注目する項目に対応し得る項目名は、事前に登録されていてもよい。ルールリスト生成部221は、抽出した各項目名に対して、想定し得る全ての位置に対応する位置情報のそれぞれが関連付けられたルールを生成し、生成した全てのルールを含むルールリストを初期ルールリストとして生成する。初期ルールリストでは、各ルールに、任意の順序でルール番号が割り当てられ、優先順位として「不定」が設定される。また、ルール辞書生成部222は、特徴量及びルールリストIDとして「不定」が設定されたルール辞書を初期ルール辞書として生成する。 First, the rule list generation unit 221 initializes the rule list, and the rule dictionary generation unit 222 initializes the rule dictionary (step S101). The rule list generation unit 221 extracts all item names that can correspond to the item of interest from all the sample form images stored in the second storage device 210. An item name that can correspond to the item of interest may be registered in advance. The rule list generation unit 221 generates a rule in which position information corresponding to all possible positions is associated with each extracted item name, and generates a rule list including all the generated rules as an initial rule. Generate as a list. In the initial rule list, rule numbers are assigned to each rule in an arbitrary order, and "undefined" is set as the priority. Further, the rule dictionary generation unit 222 generates a rule dictionary in which “indefinite” is set as the feature amount and the rule list ID as an initial rule dictionary.
 次に、ルールリスト生成部221は、初期ルールリストから一つのルールを選択する(ステップS102)。 Next, the rule list generation unit 221 selects one rule from the initial rule list (step S102).
 次に、ルールリスト生成部221は、選択したルールに基づいて、生成済みのルールリストの何れにも適合していない全てのサンプル帳票画像のそれぞれから項目値を検出する(ステップS103)。ステップS103の処理が最初に実行される場合、ルールリスト生成部221は、第2記憶装置210に記憶された全てのサンプル帳票画像から項目値を検出する。ルールリスト生成部221は、公知のOCR(Optical Character Recognition)技術を利用して、各サンプル帳票画像から、選択したルールにおける項目名を検出する。さらに、ルールリスト生成部221は、公知のOCR技術を利用して、検出した項目名に対して、選択したルールにおける位置情報に規定された表示位置関係に位置する所定サイズの領域から項目値を検出する。このように、ルールリスト生成部221は、ルールリストにおいて規定された表示位置関係に基づいて、各項目名に対応する項目値を検出する。 Next, based on the selected rule, the rule list generation unit 221 detects item values from each of all sample form images that do not match any of the generated rule lists (step S103). When the process of step S103 is executed first, the rule list generation unit 221 detects item values from all sample form images stored in the second storage device 210. The rule list generation unit 221 detects an item name in the selected rule from each sample form image using a known optical character recognition (OCR) technology. Furthermore, the rule list generation unit 221 uses the well-known OCR technology to set the item values from the area of the predetermined size located in the display positional relationship defined in the positional information in the selected rule to the detected item name. To detect. Thus, the rule list generation unit 221 detects the item value corresponding to each item name based on the display position relationship defined in the rule list.
 次に、ルールリスト生成部221は、各サンプル帳票画像から検出した各項目値が、各項目値を検出した各サンプル帳票画像に関連付けられた正解値と一致している割合を示す一致率を算出する(ステップS104)。ルールリスト生成部221は、各サンプル帳票画像から検出した各項目値と、各項目値を検出した各サンプル帳票画像に関連付けられた正解値が一致するか否かを判定し、検出した項目値の数に対する、各正解値と一致する項目値の割合を一致率として算出する。 Next, the rule list generation unit 221 calculates a match rate indicating the rate at which each item value detected from each sample form image matches the correct value associated with each sample form image for which each item value is detected. (Step S104). The rule list generation unit 221 determines whether or not each item value detected from each sample document image matches the correct value associated with each sample document image for which each item value is detected, and detects the detected item value. The ratio of the item value that matches each correct value to the number is calculated as the matching rate.
 次に、ルールリスト生成部221は、初期ルールリストに含まれる全てのルールについて処理が完了したか否かを判定する(ステップS105)。まだ処理が完了していないルールが存在する場合、ルールリスト生成部221は、処理をステップS102に戻し、未選択のルールに対して、ステップS102~S104の処理を実行する。 Next, the rule list generation unit 221 determines whether the process is completed for all the rules included in the initial rule list (step S105). If there is a rule whose processing has not been completed yet, the rule list generation unit 221 returns the processing to step S102, and executes the processing of steps S102 to S104 on the unselected rule.
 一方、初期ルールリストに含まれる全てのルールについて処理が完了した場合、ルールリスト生成部221は、ステップS104で算出した一致率が高い順に優先順位が高くなるように各ルールの優先順位を割り当てたルールリストを生成する(ステップS106)。このように、ルールリスト生成部221は、複数のルールリストの何れにも適合しない各サンプル帳票画像において、正解値に対応している割合が高い項目名ほど、優先順位が高くなるようにルールリストを生成する。また、ルールリスト生成部221は、生成したルールリストにルールリストIDを割り当てる。 On the other hand, when the process is completed for all the rules included in the initial rule list, the rule list generation unit 221 assigns the priority of each rule so that the priority becomes higher in descending order of the match rate calculated in step S104. A rule list is generated (step S106). As described above, the rule list generation unit 221 is configured such that, in each sample form image that does not conform to any of the plurality of rule lists, the rule list is configured such that the item name with the higher proportion corresponding to the correct value has higher priority. Generate Further, the rule list generation unit 221 assigns a rule list ID to the generated rule list.
 次に、ルール辞書生成部222は、生成済みのルールリストの何れにも適合していないサンプル帳票画像の中から一つのサンプル帳票画像を選択する(ステップS107)。 Next, the rule dictionary generation unit 222 selects one sample slip image from among sample slip images that do not conform to any of the generated rule lists (step S107).
 次に、ルール辞書生成部222は、ステップS106で生成されたルールリストに基づいて、選択したサンプル帳票画像から項目値を抽出する(ステップS108)。ルール辞書生成部222は、公知のOCR技術を利用して、ルールリストに規定された優先順位の順(優先順位が高い順)に、ルールリストに規定された各項目名がサンプル帳票画像に含まれるか否かを順次判定していく。ルール辞書生成部222は、最初に含まれると判定された項目名に対して、その項目名に関連付けられた位置情報に規定された表示位置関係に位置する所定サイズの領域から項目値を検出し、検出した項目値をその項目名に対応する項目値として抽出する。 Next, the rule dictionary generation unit 222 extracts item values from the selected sample document image based on the rule list generated in step S106 (step S108). The rule dictionary generation unit 222 includes, in the sample document image, each item name defined in the rule list in the order of priority defined in the rule list (in the order of high priority) using known OCR technology. It is sequentially determined whether it is possible or not. The rule dictionary generation unit 222 detects an item value from an area of a predetermined size located in the display position relationship defined in the position information associated with the item name, for the item name determined to be included first. Extract the detected item value as the item value corresponding to the item name.
 次に、ルール辞書生成部222は、抽出した項目値と、選択したサンプル帳票画像に関連付けられた正解値とが一致するか否かを判定する(ステップS109)。 Next, the rule dictionary generation unit 222 determines whether or not the extracted item value matches the correct value associated with the selected sample form image (step S109).
 項目値と正解値とが一致しない場合、ルール辞書生成部222は、ステップS106で生成されたルールリストと、ステップS107で選択されたサンプル帳票画像が適合しないと判定し、処理をステップS113に移行する。 If the item value and the correct value do not match, the rule dictionary generation unit 222 determines that the rule list generated in step S106 and the sample document image selected in step S107 do not match, and the process proceeds to step S113. Do.
 一方、項目値と正解値とが一致する場合、ルール辞書生成部222は、ステップS106で生成されたルールリストと、ステップS107で選択されたサンプル帳票画像が適合すると判定する(ステップS110)。 On the other hand, when the item value and the correct answer value match, the rule dictionary generation unit 222 determines that the rule list generated in step S106 matches the sample document image selected in step S107 (step S110).
 次に、ルール辞書生成部222は、ルールリストに適合すると判定されたサンプル帳票画像から特徴量を算出する(ステップS111)。ルール辞書生成部222は、サンプル帳票画像の特徴量として、罫線の形状に関する特徴量、もしくは、ロゴに関する特徴量、又は、文字認識結果に関する特徴量を算出する。 Next, the rule dictionary generation unit 222 calculates the feature amount from the sample document image determined to conform to the rule list (step S111). The rule dictionary generation unit 222 calculates, as the feature quantities of the sample document image, a feature quantity relating to the shape of a ruled line, a feature quantity relating to a logo, or a feature quantity relating to a character recognition result.
 図8は、罫線の形状に関する特徴量について説明するための図である。 FIG. 8 is a diagram for explaining the feature amount related to the shape of the ruled line.
 図8に示すように、帳票画像800に表801が含まれる場合、表801を構成する各罫線802によって形状の異なる複数の頂点803又は交点804が形成される。ルール辞書生成部222は、形状の異なる頂点803及び交点804毎に、各形状を有する頂点803及び交点804の数を計数する。ルール辞書生成部222は、各形状を有する頂点803及び交点804の数を要素とするヒストグラム(特徴ベクトル)810を帳票画像800の特徴量として算出する。ルール辞書生成部222は、罫線の形状に関する特徴量を用いることにより、表が含まれる可能性の高い多種レイアウトの各帳票を高精度に識別することができる。 As shown in FIG. 8, when the form image 800 includes the table 801, a plurality of apexes 803 or intersections 804 having different shapes are formed by the ruled lines 802 constituting the table 801. The rule dictionary generation unit 222 counts the number of vertices 803 and intersections 804 having each shape for each of the vertices 803 and intersections 804 having different shapes. The rule dictionary generation unit 222 calculates a histogram (feature vector) 810 having the number of vertexes 803 having the respective shapes and the number of intersections 804 as a feature of the form image 800. The rule dictionary generation unit 222 can identify each form of various layouts with high possibility of including a table with high accuracy by using the feature amount related to the shape of the ruled line.
 図9は、ロゴに関する特徴量について説明するための図である。 FIG. 9 is a diagram for explaining the feature amount related to the logo.
 図9に示すように、一般に、帳票画像900には、その帳票を発行する機関を表す、模様のロゴ901又は文字のロゴ902等の様々なロゴ画像が含まれる。ルール辞書生成部222は、公知のパターンマッチング技術を利用して、各形状を有するロゴ901、902を検出し、各形状を有するロゴ901、902の数を計数する。例えば、生成装置200は、各ロゴ901、902の画像を二値化した正解画像を予め記憶しておく。ルール辞書生成部222は、帳票画像900内で切り出し範囲を移動させながら、画像を順次切り出し、切り出した画像を二値化した二値化画像と正解画像との画素の一致率が閾値以上である場合に、その切り出した範囲をロゴとして検出する。ルール辞書生成部222は、各形状を有するロゴ901、902の数を要素とするヒストグラム(特徴ベクトル)910を帳票画像900の特徴量として算出する。ルール辞書生成部222は、各機関のロゴに関する特徴量を用いることにより、各機関によって個別に作成される各帳票を高精度に識別することができる。 As shown in FIG. 9, generally, the form image 900 includes various logo images, such as a pattern logo 901 or a character logo 902, representing an organization that issues the form. The rule dictionary generation unit 222 detects logos 901 and 902 having each shape and counts the number of logos 901 and 902 having each shape using a known pattern matching technology. For example, the generating device 200 stores in advance a correct image obtained by binarizing the images of the logos 901 and 902. The rule dictionary generation unit 222 sequentially cuts out the images while moving the cutout range in the form image 900, and the pixel coincidence rate between the binarized image obtained by binarizing the cutout image and the correct image is equal to or more than the threshold In this case, the extracted range is detected as a logo. The rule dictionary generation unit 222 calculates a histogram (feature vector) 910 having the number of logos 901 and 902 having each shape as elements as the feature amount of the form image 900. The rule dictionary generation unit 222 can identify each form individually created by each organization with high accuracy by using the feature amount related to the logo of each organization.
 図10は、文字認識結果に関する特徴量について説明するための図である。 FIG. 10 is a diagram for explaining the feature amount related to the character recognition result.
 図10に示すように、一般に、帳票画像1000には、様々な文字列1001~1005が含まれる。ルール辞書生成部222は、公知のOCR技術を利用して、帳票画像1000から文字列1001~1005を検出し、各文字列1001~1005の数を計数する。ルール辞書生成部222は、各文字列1001~1005の数を要素とするヒストグラム(特徴ベクトル)1010を帳票画像1000の特徴量として算出する。ルール辞書生成部222は、文字認識結果に関する特徴量を用いることにより、多種レイアウトの各帳票をより高精度に識別することができる。 As shown in FIG. 10, generally, the form image 1000 includes various character strings 1001 to 1005. The rule dictionary generation unit 222 detects character strings 1001 to 1005 from the form image 1000 using known OCR technology, and counts the number of the character strings 1001 to 1005. The rule dictionary generation unit 222 calculates a histogram (feature vector) 1010 having the number of character strings 1001 to 1005 as an element as a feature amount of the form image 1000. The rule dictionary generation unit 222 can identify each form of various layouts with higher accuracy by using the feature amount related to the character recognition result.
 なお、ルール辞書生成部222は、罫線の形状に関する特徴量、ロゴに関する特徴量、又は、文字認識結果に関する特徴量の内の二つ以上を組み合わせて、サンプル帳票画像の特徴量として用いてもよい。これにより、ルール辞書生成部222は、各帳票をより高精度に識別することが可能となる。また、生成装置200は、各サンプル帳票画像の特徴量を事前に算出しておき、第2記憶装置210に記憶しておいてもよい。その場合、ルール辞書生成部222は、第2記憶装置210から各サンプル帳票画像の特徴量を取得する。これにより、ルール辞書生成部222は、生成処理にかかる処理負荷を低減し、より短時間でルール辞書を生成することが可能となる。 Note that the rule dictionary generation unit 222 may combine two or more of the feature amount related to the shape of the ruled line, the feature amount related to the logo, or the feature amount related to the character recognition result and use it as the feature amount of the sample document image . Thereby, the rule dictionary generation unit 222 can identify each form with higher accuracy. In addition, the generation device 200 may calculate in advance the feature amount of each sample document image and store the calculated feature amount in the second storage device 210. In that case, the rule dictionary generation unit 222 acquires the feature amount of each sample form image from the second storage device 210. As a result, the rule dictionary generation unit 222 can reduce the processing load for the generation process and generate the rule dictionary in a shorter time.
 次に、ルール辞書生成部222は、算出した特徴量を、ステップS106で生成されたルールリストのルールリストIDと関連付けてルール辞書に記憶(追加)する(ステップS112)。 Next, the rule dictionary generation unit 222 stores (adds) the calculated feature amount in the rule dictionary in association with the rule list ID of the rule list generated in step S106 (step S112).
 なお、ルール辞書生成部222は、算出した特徴量が過去に算出した特徴量と一致する場合、ルール辞書の更新を省略する。また、ルール辞書生成部222は、算出した特徴量が過去に算出した特徴量と類似する場合も、ルール辞書の更新を省略してもよい。その場合、ルール辞書生成部222は、算出した特徴量と、過去に算出した特徴量の類似度が閾値以上である場合に、二つの特徴量が類似すると判定する。類似度は、例えば各特徴量(特徴ベクトル)のユークリッド距離の逆数、又は、コサイン類似度等である。また、ルール辞書生成部222は、算出した特徴量が過去に算出した特徴量と一致又は類似する場合、ルール辞書に既に記憶されている、その特徴量に関連付けられたルールリストIDを新たに生成したルールリストのルールリストIDに更新(上書き)してもよい。 The rule dictionary generation unit 222 omits the update of the rule dictionary when the calculated feature amount matches the feature amount calculated in the past. The rule dictionary generation unit 222 may omit the update of the rule dictionary also when the calculated feature amount is similar to the feature amount calculated in the past. In that case, the rule dictionary generation unit 222 determines that the two feature quantities are similar when the calculated feature quantity and the similarity between the feature quantities calculated in the past are equal to or greater than a threshold. The similarity is, for example, the reciprocal of the Euclidean distance of each feature (feature vector), or a cosine similarity. In addition, when the calculated feature quantity matches or is similar to the feature quantity calculated in the past, the rule dictionary generation section 222 newly generates a rule list ID associated with the feature quantity, which is already stored in the rule dictionary. The rule list ID of the rule list may be updated (overwritten).
 また、ルール辞書生成部222は、算出した特徴量に特徴量IDを割り当てて、割り当てた特徴量IDをルールリストIDと関連付けてルール辞書に記憶してもよい。例えば、ルール辞書生成部222は、算出した特徴量(特徴ベクトル)の各要素を所定の有効桁数に丸めた値(ベクトル)を特徴量IDとして使用する。なお、ルール辞書生成部222は、算出した特徴量(特徴ベクトル)を正規化した値(ベクトル)を特徴量IDとして使用してもよい。 Alternatively, the rule dictionary generation unit 222 may allocate a feature amount ID to the calculated feature amount, and store the allocated feature amount ID in the rule dictionary in association with the rule list ID. For example, the rule dictionary generation unit 222 uses, as a feature quantity ID, a value (vector) obtained by rounding each element of the calculated feature quantity (feature vector) to a predetermined significant digit number. The rule dictionary generation unit 222 may use a value (vector) obtained by normalizing the calculated feature amount (feature vector) as a feature amount ID.
 次に、ルール辞書生成部222は、生成済みのルールリストの何れにも適合していない全てのサンプル帳票画像についてステップS107~S112の処理が完了したか否かを判定する(ステップS113)。まだ処理が完了していないサンプル帳票画像が存在する場合、ルール辞書生成部222は、処理をステップS107に戻し、未選択のサンプル帳票画像に対して、ステップS107~S112の処理を実行する。 Next, the rule dictionary generation unit 222 determines whether or not the processing in steps S107 to S112 is completed for all sample form images that do not conform to any of the generated rule lists (step S113). If there is a sample form image that has not been processed yet, the rule dictionary generation unit 222 returns the process to step S107, and executes the processes of steps S107 to S112 on the unselected sample form image.
 一方、各サンプル帳票画像について処理が完了した場合、ルール辞書生成部222は、全てのサンプル帳票画像が生成済みのルールリストの何れかと適合したか否かを判定する(ステップS114)。生成済みのルールリストの何れにも適合しないサンプル帳票画像が存在する場合、ルール辞書生成部222は、処理をステップS102に戻す。この場合、ルールリスト生成部221は、ステップS102~S106の処理により、生成済みのルールリストの何れにも適合していないサンプル帳票画像に基づいて、新たなルールリストを生成する。一方、ルール辞書生成部222は、ステップS107~S112の処理により、新たに生成されたルールリストを用いて、ルール辞書を更新する。 On the other hand, when the process is completed for each sample document image, the rule dictionary generation unit 222 determines whether all the sample document images match any of the generated rule lists (step S114). If there is a sample document image that does not match any of the generated rule lists, the rule dictionary generation unit 222 returns the process to step S102. In this case, the rule list generation unit 221 generates a new rule list based on the sample form image which does not match any of the generated rule lists by the processes of steps S102 to S106. On the other hand, the rule dictionary generation unit 222 updates the rule dictionary using the newly generated rule list by the processing of steps S107 to S112.
 一方、全てのサンプル帳票画像が生成済みのルールリストの何れかと適合した場合、ルール辞書生成部222は、一連のステップを終了する。生成処理が終了した時点で第2記憶装置210に記憶されているルール辞書が、ルール辞書生成部222により生成されたルール辞書となる。 On the other hand, if all sample form images match any of the generated rule lists, the rule dictionary generation unit 222 ends the series of steps. The rule dictionary stored in the second storage device 210 at the end of the generation process becomes the rule dictionary generated by the rule dictionary generation unit 222.
 ステップS106のように、ルールリスト生成部221は、生成済みのルールリストの何れにも適合していない各サンプル帳票画像における項目値と正解値の一致率に基づいて、次に各サンプル帳票画像との適合検査を行うルールリストにおける優先順位を決定する。これにより、ルール辞書生成部222は、各サンプル帳票画像と各ルールリストが適合するかを効率良く検査することが可能となり、ルール辞書をよりシンプルにし且つルール辞書の生成時間を低減することが可能となる。 As in step S106, the rule list generation unit 221 determines, based on the matching rate of the item value and the correct value in each sample form image not conforming to any of the generated rule lists, Determine the order of precedence in the rule list for performing a conformance check. As a result, the rule dictionary generation unit 222 can efficiently check whether each sample document image matches each rule list, making the rule dictionary simpler and reducing the generation time of the rule dictionary. It becomes.
 なお、ルール辞書が生成された後にサンプル帳票画像が新たに追加された場合、ルールリスト生成部221は、新たに追加されたサンプル帳票画像に基づいて、新たなルールリストを生成する。その場合、ルールリスト生成部221は、新たに追加されたサンプル帳票画像を、生成済みのルールリストの何れにも適合していないサンプル帳票画像として、ステップS102~S106の処理を実行し、新たなルールリストを生成する。一方、ルール辞書生成部222は、ステップS107~S112の処理により、新たに生成されたルールリストを用いて、ルール辞書を更新する。但し、新たに生成されたルールリストが生成済みのルールリストの何れかと同一である場合、ルール辞書生成部222は、その生成済みのルールリストを用いて、ルール辞書を更新する。 When a sample document image is newly added after the rule dictionary is generated, the rule list generation unit 221 generates a new rule list based on the newly added sample document image. In such a case, the rule list generation unit 221 executes the processing of steps S102 to S106 with the newly added sample form image as a sample form image not conforming to any of the generated rule lists, and performs a new process. Generate a rule list. On the other hand, the rule dictionary generation unit 222 updates the rule dictionary using the newly generated rule list by the processing of steps S107 to S112. However, if the newly generated rule list is identical to any of the generated rule lists, the rule dictionary generation unit 222 updates the rule dictionary using the generated rule list.
 これにより、生成装置200は、一旦生成されたルール辞書に対して、新たにサンプル帳票画像を追加して、ルール辞書を更新することができる。特に生成済みのルールリストの何れにも適合しないサンプル帳票画像が新たに追加された場合、新たに追加された各サンプル帳票画像における項目値と正解値の一致率に基づいて、各サンプル帳票画像との適合検査を行うルールリストにおける優先順位が決定される。これにより、ルール辞書生成部222は、新たに追加された各サンプル帳票画像と各ルールリストが適合するかを効率良く検査することが可能となり、ルール辞書をよりシンプルにし且つルール辞書の生成時間を低減することが可能となる。 Thus, the generating device 200 can update the rule dictionary by newly adding a sample form image to the rule dictionary generated once. In particular, when a sample document image that does not match any of the generated rule lists is newly added, each sample document image and the sample document image are based on the matching rate of the item value and the correct value in each of the newly added sample document images. The order of precedence in the rule list for performing a conformance check of As a result, the rule dictionary generation unit 222 can efficiently check whether each newly added sample form image and each rule list match, making the rule dictionary simpler and generating the rule dictionary. It is possible to reduce.
 図11~図13は、図7に示した生成処理により生成されるルール辞書の一例を示す図である。 11 to 13 are diagrams showing an example of the rule dictionary generated by the generation process shown in FIG.
 図11~図13では、図11に示す9個のサンプル帳票画像1~9に対して、生成処理が実行されている。まず、図11に示すように、各サンプル帳票画像1~9から項目名「患者負担額」に対応する各項目値が検出され(ステップS103)、検出された各項目値と、各サンプル帳票画像1~9に関連付けられた正解値との一致率が算出される(ステップS104)。同様に、項目名「一部負担金」に対応する各項目値、項目名「合計」に対応する各項目値についても一致率が算出され、一致率が高い順に優先順位が割り当てられたルールリスト1101が生成される(ステップS106)。この例では、ルールリスト1101において、患者負担額、一部負担金、合計の順に優先順位が割り当てられている。 In FIGS. 11 to 13, generation processing is performed on nine sample form images 1 to 9 shown in FIG. First, as shown in FIG. 11, each item value corresponding to the item name "patient's burden" is detected from each sample document image 1 to 9 (step S103), and each detected item value and each sample document image The matching rate with the correct value associated with 1 to 9 is calculated (step S104). Similarly, a match rate is calculated for each item value corresponding to the item name "partial contribution" and each item value corresponding to the item name "total", and a rule list to which priority is assigned in descending order of the match rate 1101 is generated (step S106). In this example, in the rule list 1101, priorities are assigned in the order of patient burden, partial contribution, and total.
 次に、サンプル帳票画像1に対して、ルールリスト1101に規定された優先順位が高い順(患者負担額、一部負担金、合計の順)に各項目名が含まれるか否かが判定され、最初に含まれると判定された項目名に対応する項目値が抽出される(ステップS108)。そして、抽出された項目値と、サンプル帳票画像1に関連付けられた正解値とが一致する場合、サンプル帳票画像1の特徴量Aとルールリスト1101のルールリストIDとが関連付けてルール辞書1102に記憶される(ステップS109)。同様に、サンプル帳票画像2~9から抽出された項目値と、サンプル帳票画像2~9に関連付けられた正解値とが一致する場合、サンプル帳票画像2~9の特徴量B~Iとルールリスト1101のルールリストIDとが関連付けてルール辞書1102に記憶される。 Next, it is determined whether or not each item name is included in the order in which the priority defined in the rule list 1101 is high (in order of patient burden, partial contribution, and total) with respect to the sample form image 1 The item value corresponding to the item name determined to be included first is extracted (step S108). Then, when the extracted item value matches the correct value value associated with the sample document image 1, the feature amount A of the sample document image 1 and the rule list ID of the rule list 1101 are associated and stored in the rule dictionary 1102. (Step S109). Similarly, when the item values extracted from the sample form images 2 to 9 match the correct values associated with the sample form images 2 to 9, the feature amounts B to I of the sample form images 2 to 9 and the rule list The rule list 1101 is stored in the rule dictionary 1102 in association with the rule list ID.
 この例ではサンプル帳票画像1~6から抽出された項目値と、サンプル帳票画像1~6に関連付けられた正解値とが一致して、サンプル帳票画像1~6の特徴量A~Fとルールリスト1101のルールリストIDとが関連付けてルール辞書1102に記憶されている。一方、サンプル帳票画像7~9から抽出された項目値と、サンプル帳票画像7~9に関連付けられた正解値とは一致せず、サンプル帳票画像7~9の特徴量G~Iは、まだルール辞書1102に記憶されていない。 In this example, the item values extracted from the sample form images 1 to 6 match the correct values associated with the sample form images 1 to 6, and the feature amounts A to F of the sample form images 1 to 6 and the rule list The rule list 1101 is stored in the rule dictionary 1102 in association with the rule list ID. On the other hand, the item values extracted from sample form images 7 to 9 do not match the correct values associated with sample form images 7 to 9, and feature amounts G to I of sample form images 7 to 9 are still rules. It is not stored in the dictionary 1102.
 次に、図12に示すように、ルールリスト1101と適合しなかったサンプル帳票画像7~9からルールリスト1201が生成される。この例では、ルールリスト1201において、一部負担金、患者負担額、合計の順に優先順位が割り当てられている。 Next, as shown in FIG. 12, the rule list 1201 is generated from the sample form images 7 to 9 that did not match the rule list 1101. In this example, in the rule list 1201, priorities are assigned in the order of partial contribution, patient contribution, and total.
 次に、サンプル帳票画像7~9に対して、ルールリスト1201を用いて、サンプル帳票画像7~9から抽出した項目値と、サンプル帳票画像7~9に関連付けられた正解値とが一致するか否かが判定され、ルール辞書1202が更新される。この例ではサンプル帳票画像7~8から抽出された項目値と、サンプル帳票画像7~8に関連付けられた正解値とが一致して、サンプル帳票画像7~8の特徴量G~Hとルールリスト1201のルールリストIDとが関連付けてルール辞書1202に記憶されている。一方、サンプル帳票画像9から抽出された項目値と、サンプル帳票画像9に関連付けられた正解値とは一致せず、サンプル帳票画像9の特徴量Iは、まだルール辞書1202に記憶されていない。 Next, whether the item value extracted from the sample form images 7 to 9 matches the correct value value associated with the sample form images 7 to 9 using the rule list 1201 for the sample form images 7 to 9 It is determined whether or not the rule dictionary 1202 is updated. In this example, the item values extracted from the sample form images 7-8 match the correct values associated with the sample form images 7-8, and the feature amounts G to H of the sample form images 7 to 8 and the rule list The rule list ID 1201 is stored in the rule dictionary 1202 in association with the rule list ID. On the other hand, the item value extracted from the sample document image 9 does not match the correct value associated with the sample document image 9, and the feature amount I of the sample document image 9 is not stored in the rule dictionary 1202 yet.
 次に、図13に示すように、ルールリスト1201と適合しなかったサンプル帳票画像9からルールリスト1301が生成される。この例では、合計、患者負担額、一部負担金の順に優先順位が割り当てられている。 Next, as shown in FIG. 13, a rule list 1301 is generated from the sample form image 9 which does not match the rule list 1201. In this example, priorities are assigned in the order of total, patient contribution, and partial contribution.
 次に、サンプル帳票画像9に対して、ルールリスト1301を用いて、サンプル帳票画像9から抽出した項目値と、サンプル帳票画像9に関連付けられた正解値とが一致するか否かが判定され、ルール辞書1302が更新される。この例では、サンプル帳票画像9から抽出された項目値と、サンプル帳票画像9に関連付けられた正解値とが一致して、サンプル帳票画像9の特徴量Iとルールリスト1301のルールリストIDとが関連付けてルール辞書1302に記憶されている。以上により、全てのサンプル帳票画像1~9に対する処理が完了し、ルール辞書1302が完成している。 Next, it is determined whether or not the item value extracted from the sample form image 9 matches the correct value associated with the sample form image 9 using the rule list 1301 for the sample form image 9. The rule dictionary 1302 is updated. In this example, the item value extracted from the sample document image 9 matches the correct value associated with the sample document image 9, and the feature amount I of the sample document image 9 and the rule list ID of the rule list 1301 They are associated and stored in the rule dictionary 1302. Thus, the processing for all the sample form images 1 to 9 is completed, and the rule dictionary 1302 is completed.
 図14は、情報処理装置100による帳票画像からの項目値の検出処理の動作の例を示すフローチャートである。 FIG. 14 is a flowchart illustrating an example of the operation of the process of detecting an item value from a form image by the information processing apparatus 100.
 以下、図14に示したフローチャートを参照しつつ、情報処理装置100による検出処理の動作の例を説明する。なお、以下に説明する動作のフローは、予め第1記憶装置110に記憶されているプログラムに基づき主に第1CPU120により情報処理装置100の各要素と協働して実行される。 Hereinafter, an example of the operation of the detection process by the information processing apparatus 100 will be described with reference to the flowchart illustrated in FIG. The flow of the operation described below is executed mainly by the first CPU 120 in cooperation with each element of the information processing apparatus 100 based on a program stored in advance in the first storage device 110.
 最初に、受信部121は、第1通信装置101を介して生成装置200に、ルール辞書及びルールリストの取得を要求する取得要求を送信し、第1通信装置101を介して生成装置200からルール辞書及びルールリストを受信する(ステップS201)。なお、生成装置200の送信部223は、第2通信装置201を介して情報処理装置100から取得要求を受信すると、事前に生成されたルール辞書及びルールリストを第2記憶装置210から読出し、第2通信装置201を介して情報処理装置100に送信する。受信部121は、受信したルール辞書及びルールリストを第1記憶装置110に記憶する。なお、情報処理装置100は、事前にルール辞書及びルールリストを取得して第1記憶装置110に記憶しておいてもよい。 First, the receiving unit 121 transmits an acquisition request for requesting acquisition of the rule dictionary and the rule list to the generating device 200 via the first communication device 101, and the rule from the generating device 200 via the first communication device 101. The dictionary and the rule list are received (step S201). When receiving the acquisition request from the information processing apparatus 100 via the second communication apparatus 201, the transmitting unit 223 of the generating apparatus 200 reads the rule dictionary and the rule list generated in advance from the second storage apparatus 210, and 2. Transmit to the information processing apparatus 100 via the communication apparatus 201. The receiving unit 121 stores the received rule dictionary and rule list in the first storage device 110. The information processing apparatus 100 may obtain the rule dictionary and the rule list in advance and store the rule dictionary and the rule list in the first storage device 110.
 次に、取得部122は、第1通信装置101を介して不図示の画像読取装置から、項目値を検出する対象となる帳票画像を取得する(ステップS202)。なお、取得部122は、USB(Universal Serial Bus)等の不図示のインタフェース回路を介して、画像読取装置から帳票画像を取得してもよい。または、情報処理装置100が撮像装置を有し、取得部122は、撮像装置により帳票を撮像して帳票画像を取得してもよい。 Next, the acquiring unit 122 acquires a form image to be detected as an item value from an image reading apparatus (not shown) via the first communication apparatus 101 (step S202). The acquisition unit 122 may acquire a form image from the image reading apparatus via an interface circuit (not shown) such as USB (Universal Serial Bus). Alternatively, the information processing apparatus 100 may include an imaging device, and the acquisition unit 122 may acquire a document image by imaging a form using the imaging device.
 次に、特徴量算出部123は、図7のステップS111の処理と同様にして、取得部122により取得された帳票画像から特徴量を算出する(ステップS203)。特徴量算出部123は、生成装置200のルール辞書生成部222が算出する特徴量と同じ特徴量を算出する。なお、ルール辞書生成部222が、特徴量IDをルールリストIDと関連付けてルール辞書に記憶している場合、特徴量算出部123は、ルール辞書生成部222と同じ方式により、算出した特徴量に特徴量IDを割り当てる。その場合、情報処理装置100は、特徴量の代わりに特徴量IDを用いて以下の処理を実行する。 Next, the feature quantity calculation unit 123 calculates a feature quantity from the form image acquired by the acquisition unit 122, as in the process of step S111 in FIG. 7 (step S203). The feature amount calculation unit 123 calculates the same feature amount as the feature amount calculated by the rule dictionary generation unit 222 of the generation device 200. When the rule dictionary generation unit 222 stores the feature amount ID in association with the rule list ID in the rule dictionary, the feature amount calculation unit 123 calculates the feature amount calculated by the same method as the rule dictionary generation unit 222. Assign feature quantity ID. In that case, the information processing apparatus 100 executes the following process using the feature amount ID instead of the feature amount.
 次に、ルールリスト抽出部124は、第1記憶装置110に記憶されたルール辞書に、特徴量算出部123により算出された特徴量が記憶されているか否かを判定する(ステップS204)。なお、ルールリスト抽出部124は、ルール辞書に、特徴量算出部123により算出された特徴量と同一の特徴量だけでなく、類似する特徴量が記憶されている場合も、特徴量算出部123により算出された特徴量が記憶されていると判定してもよい。その場合、ルールリスト抽出部124は、特徴量算出部123により算出された特徴量と、ルール辞書に記憶されている特徴量の類似度が閾値以上である場合に、二つの特徴量が類似すると判定する。類似度は、例えば各特徴量(特徴ベクトル)のユークリッド距離の逆数、又は、コサイン類似度等である。 Next, the rule list extraction unit 124 determines whether the feature amount calculated by the feature amount calculation unit 123 is stored in the rule dictionary stored in the first storage device 110 (step S204). Note that the rule list extraction unit 124 is also configured to store the feature amount that is similar to the feature amount calculated by the feature amount calculation unit 123, as well as the feature amount that is similar to the feature amount calculated by the feature amount calculation unit 123. It may be determined that the feature quantity calculated by the above is stored. In such a case, the rule list extraction unit 124 determines that the two feature quantities are similar if the similarity between the feature quantity calculated by the feature quantity calculation unit 123 and the feature quantity stored in the rule dictionary is equal to or greater than a threshold. judge. The similarity is, for example, the reciprocal of the Euclidean distance of each feature (feature vector), or a cosine similarity.
 ルール辞書に、特徴量算出部123により算出された特徴量が記憶されている場合、ルールリスト抽出部124は、ルール辞書において、その特徴量に関連付けられたルールリストIDに対応するルールリストを抽出する(ステップS205)。 When the feature quantity calculated by the feature quantity calculation unit 123 is stored in the rule dictionary, the rule list extraction unit 124 extracts the rule list corresponding to the rule list ID associated with the feature quantity in the rule dictionary. (Step S205).
 一方、ルール辞書に、特徴量算出部123により算出された特徴量が記憶されていない場合、即ち、その特徴量に関連付けられたルールリストが存在しない場合、ルールリスト抽出部124は、ルール辞書から、代替ルールリストを抽出する(ステップS206)。ルールリスト抽出部124は、例えば、ルール辞書において最も多くの特徴量に関連付けられたルールリストを代替ルールリストとして抽出する。これにより、ルールリスト抽出部124は、ルール辞書に特徴量が登録されていないような帳票画像からも、注目する項目の項目値を精度良く検出することが可能となる。なお、ルールリスト抽出部124は、他の条件によって代替ルールリストを抽出してもよい。 On the other hand, when the feature quantity calculated by the feature quantity calculation unit 123 is not stored in the rule dictionary, that is, when there is no rule list associated with the feature quantity, the rule list extraction unit 124 , And extract an alternative rule list (step S206). The rule list extraction unit 124 extracts, for example, a rule list associated with the most feature amount in the rule dictionary as an alternative rule list. As a result, the rule list extraction unit 124 can accurately detect the item value of the item to be noticed from the form image in which the feature amount is not registered in the rule dictionary. The rule list extraction unit 124 may extract the alternative rule list according to other conditions.
 次に、検出部125は、ルールリスト抽出部124により抽出されたルールリストに基づいて、帳票画像から項目値を検出する(ステップS207)。検出部125は、公知のOCR技術を利用して、ルールリストに規定された優先順位の順(優先順位が高い順)に、ルールリストに規定された各項目名が帳票画像に含まれるか否かを順次判定していく。検出部125は、最初に含まれると判定された項目名に対して、その項目名に関連付けられた位置情報に規定された表示位置関係に位置する所定サイズの領域から項目値を検出し、その項目名に対応する項目値として検出する。このように、検出部125は、ルールリストにおいて規定された表示位置関係に基づいて、各項目名に対応する項目値を検出する。 Next, the detection unit 125 detects an item value from the form image based on the rule list extracted by the rule list extraction unit 124 (step S207). The detecting unit 125 determines whether or not each item name defined in the rule list is included in the form image in the order of the priorities defined in the rule list (in the order of high priority) using known OCR technology. The decision is made one by one. The detection unit 125 detects an item value from an area of a predetermined size located in the display positional relationship defined in the position information associated with the item name, for the item name determined to be included first, Detect as an item value corresponding to the item name. Thus, the detection unit 125 detects the item value corresponding to each item name based on the display position relationship defined in the rule list.
 次に、出力制御部126は、検出部125により検出された項目値を含む認識結果画面を表示装置103に表示することにより、その項目値を出力する(ステップS208)。 Next, the output control unit 126 outputs the item value by displaying a recognition result screen including the item value detected by the detection unit 125 on the display device 103 (step S208).
 図15は、認識結果画面の一例を示す図である。 FIG. 15 is a view showing an example of the recognition result screen.
 図15に示すように、認識結果画面1500には、注目する項目毎に、項目1501、認識結果1502、修正ボタン1503及び確定ボタン1504等が表示される。帳票画像から検出された各項目の項目値は、各項目1501と対応付けて認識結果1502として表示される。ユーザにより入力装置102を用いて修正ボタン1503が押下されると、認識結果1502が作業者により修正(編集)可能となる。また、ユーザにより入力装置102を用いて確定ボタン1504が押下されると、帳票画像から検出された各項目の項目値が、認識結果1502において指定された修正値に修正される。なお、項目毎に表示される修正ボタン1503及び確定ボタン1504の代わりに、全ての項目に共通に用いられる一つの修正ボタン及び確定ボタンが表示されてもよい。その場合、修正ボタンが押下されると、全ての認識結果1502が作業者により修正(編集)可能となり、確定ボタンが押下されると、各認識結果1502において指定された全ての修正値が項目値に反映される。 As shown in FIG. 15, on the recognition result screen 1500, an item 1501, a recognition result 1502, a correction button 1503, a confirmation button 1504 and the like are displayed for each item of interest. The item value of each item detected from the form image is displayed as a recognition result 1502 in association with each item 1501. When the correction button 1503 is pressed by the user using the input device 102, the recognition result 1502 can be corrected (edited) by the operator. When the user presses the enter button 1504 using the input device 102, the item values of the items detected from the form image are corrected to the correction values designated in the recognition result 1502. Note that, instead of the correction button 1503 and the determination button 1504 displayed for each item, one correction button and a determination button commonly used for all the items may be displayed. In that case, when the correction button is pressed, all recognition results 1502 can be corrected (edited) by the operator, and when the confirmation button is pressed, all correction values designated in each recognition result 1502 are item values. Reflected in
 なお、出力制御部126は、項目値を表示装置103に表示する代わりに、第1通信装置101を介して、帳票に係る各情報を集約するサーバ等に送信することにより、項目値を出力してもよい。 Note that the output control unit 126 outputs the item value by transmitting the item value to a server or the like that aggregates each piece of information related to the form via the first communication device 101 instead of displaying the item value on the display device 103. May be
 次に、受付部127は、ユーザにより入力装置102を用いて確定ボタン1504が押下されることにより、ユーザによる、認識結果画面1500に表示された項目値の修正指示を受け付けたか否かを判定する(ステップS209)。ユーザによる修正指示が受け付けられなかった場合、情報処理装置100は、一連のステップを終了する。 Next, the reception unit 127 determines whether or not the user has received an instruction to correct the item value displayed on the recognition result screen 1500 by pressing the enter button 1504 using the input device 102 by the user. (Step S209). If the user does not receive the correction instruction, the information processing apparatus 100 ends the series of steps.
 一方、ユーザによる修正指示が受け付けられた場合、更新部128は、更新処理を実行し(ステップS210)、一連のステップを終了する。更新部128は、更新処理で、ルール辞書において、特徴量算出部123により算出された特徴量に関連付けられたルールリストを更新する。更新処理の詳細については後述する。 On the other hand, when the correction instruction from the user is accepted, the updating unit 128 executes the updating process (step S210), and ends the series of steps. The updating unit 128 updates the rule list associated with the feature amount calculated by the feature amount calculating unit 123 in the rule dictionary in the updating process. Details of the update process will be described later.
 図16は、更新処理の動作の例を示すフローチャートである。図16に示す更新処理は、図14のステップS210において実行される。 FIG. 16 is a flowchart illustrating an example of the operation of the update process. The update process shown in FIG. 16 is executed in step S210 of FIG.
 最初に、更新部128は、第1記憶装置110に記憶されたルールリストの中から一つのルールリストを選択する(ステップS301)。 First, the updating unit 128 selects one rule list from among the rule lists stored in the first storage device 110 (step S301).
 次に、更新部128は、選択したルールリストに基づいて、帳票画像から項目値を抽出する(ステップS302)。更新部128は、公知のOCR技術を利用して、ルールリストに規定された優先順位の順(優先順位が高い順)に、ルールリストに規定された各項目名がサンプル帳票画像に含まれるか否かを順次判定していく。更新部128は、最初に含まれると判定された項目名に対して、その項目名に関連付けられた位置情報に規定された表示位置関係に位置する所定サイズの領域から項目値を検出し、その項目名に対応する項目値として抽出する。 Next, the updating unit 128 extracts item values from the form image based on the selected rule list (step S302). Whether the item names specified in the rule list are included in the sample document image in the order of the priorities defined in the rule list (in the order of high priority) using the known OCR technology Whether or not it will be judged one by one. The updating unit 128 detects an item value from an area of a predetermined size located in the display positional relationship defined in the position information associated with the item name, for the item name determined to be included first, Extract as the item value corresponding to the item name.
 次に、更新部128は、抽出した項目値と、ユーザにより入力装置102を用いて指定された修正値とが一致するか否かを判定する(ステップS303)。 Next, the updating unit 128 determines whether the extracted item value matches the correction value designated by the user using the input device 102 (step S303).
 項目値と正解値とが一致する場合、更新部128は、ステップS301で選択されたルールリストと、帳票画像が適合すると判定する。その場合、更新部128は、ステップS203で特徴量算出部123により算出された特徴量を、ステップS301で選択されたルールリストのルールリストIDと関連付けてルール辞書に記憶し(ステップS304)、一連のステップを終了する。このように、更新部128は、第1記憶装置110に記憶されたルール辞書を更新する。なお、更新部128は、更新したルール辞書を第1通信装置101を介して生成装置200に送信し、第2記憶装置210に記憶されたルール辞書を更新してもよい。これにより、更新部128は、他の情報処理装置が使用するルール辞書についても適切に更新することが可能となる。 If the item value matches the correct value, the updating unit 128 determines that the form image matches the rule list selected in step S301. In that case, the updating unit 128 associates the feature quantity calculated by the feature quantity calculating unit 123 in step S203 with the rule list ID of the rule list selected in step S301 and stores it in the rule dictionary (step S304) Finish the step. Thus, the updating unit 128 updates the rule dictionary stored in the first storage device 110. The updating unit 128 may transmit the updated rule dictionary to the generating device 200 via the first communication device 101, and may update the rule dictionary stored in the second storage device 210. As a result, the updating unit 128 can appropriately update the rule dictionary used by the other information processing apparatus.
 また、ルール辞書に、特徴量算出部123により算出された特徴量が既に記憶されている場合、更新部128は、その特徴量に関連付けられたルールリストIDを更新(上書き)する。これにより、更新部128は、ルール辞書において不適切に登録されたルールリストを適宜更新して、ルール辞書の品質を効率良く向上させることが可能となる。一方、ルール辞書に、特徴量算出部123により算出された特徴量がまだ記憶されていない場合、更新部128は、その特徴量を、そのルールリストIDと関連付けてルール辞書に新たに追加する。これにより、更新部128は、まだ登録されていない新しい情報をルール辞書に追加して、ルール辞書の品質を効率良く向上させることが可能となる。 Further, when the feature quantity calculated by the feature quantity calculating unit 123 is already stored in the rule dictionary, the updating unit 128 updates (overwrites) the rule list ID associated with the feature quantity. Accordingly, the updating unit 128 can appropriately improve the quality of the rule dictionary by appropriately updating the rule list improperly registered in the rule dictionary. On the other hand, when the feature quantity calculated by the feature quantity calculating unit 123 is not yet stored in the rule dictionary, the updating unit 128 associates the feature quantity with the rule list ID and newly adds the feature quantity to the rule dictionary. As a result, the updating unit 128 can add new information not yet registered to the rule dictionary to efficiently improve the quality of the rule dictionary.
 一方、項目値と修正値とが一致しない場合、更新部128は、第1記憶装置110に記憶された全てのルールリストについて処理が完了したか否かを判定する(ステップS305)。まだ処理が完了していないルールリストが存在する場合、更新部128は、処理をステップS301に戻し、未選択のルールリストに対して、ステップS301~S305の処理を実行する。一方、全てのルールリストについて処理が完了した場合、更新部128は、一連のステップを終了する。 On the other hand, when the item value and the correction value do not match, the updating unit 128 determines whether the process has been completed for all the rule lists stored in the first storage device 110 (step S305). If there is a rule list whose processing has not been completed yet, the updating unit 128 returns the processing to step S301, and executes the processing of steps S301 to S305 on the unselected rule list. On the other hand, when the process is completed for all the rule lists, the updating unit 128 ends the series of steps.
 以上詳述したように、情報処理装置100は、帳票画像の特徴量に関連付けられたルールリストにおいて規定された優先順位に従って、帳票画像から所望の項目の項目値を検出する。多種レイアウトの帳票では、帳票毎に項目名の記載位置が異なり且つ一つの帳票の中に項目名の類義語が複数存在する可能性がある。一方、同様の特徴量を有する帳票画像では、同一の項目に対して同一の項目名が使用されている可能性が高い。情報処理装置100は、複数の項目毎に項目名及び項目値が表された帳票画像から、所望の項目に対応する可能性が高い順に、各項目名を検査していくことが可能となり、所望の項目の項目値をより精度良く検出することが可能となった。 As described in detail above, the information processing apparatus 100 detects the item value of the desired item from the form image in accordance with the priority defined in the rule list associated with the feature amount of the form image. In a form with various layouts, the entry position of the item name may be different for each form, and there may be a plurality of synonyms of the item name in one form. On the other hand, in form images having similar feature amounts, there is a high possibility that the same item name is used for the same item. It becomes possible for the information processing apparatus 100 to inspect each item name in descending order of the possibility of corresponding to the desired item from the form image in which the item name and the item value are represented for each of a plurality of items. It has become possible to detect the item value of the item more accurately.
 また、情報処理装置100は、各項目名に対応する項目値の表示位置関係が規定されたルールリストを用いて、帳票画像から所望の項目の項目値を検出する。これにより、情報処理装置100は、帳票毎に項目名に対する項目値の記載位置が異なる多種レイアウトの帳票に係る帳票画像から、所望の項目の項目値を高精度に検出することが可能となった。 Further, the information processing apparatus 100 detects an item value of a desired item from the form image, using a rule list in which the display position relationship of item values corresponding to each item name is defined. As a result, the information processing apparatus 100 can detect the item value of the desired item with high accuracy from the form image of the form of the multi-layout form in which the description position of the item value for the item name is different for each form. .
 また、生成装置200は、ルールリストに規定された優先順位の順に各項目名を検査していき、最初にサンプル帳票画像に含まれると判定された項目名に対応する項目値とその正解値とが一致する場合に、そのルールリストとサンプル帳票画像を関連付ける。これにより、生成装置200は、各項目名を優先順位の順に検査していくことにより所望の項目の項目値を正しく検出できるルールリストを生成することが可能となった。 In addition, the generating apparatus 200 examines each item name in the order of priority defined in the rule list, and first determines an item value corresponding to the item name determined to be included in the sample form image and its correct value If there is a match, the rule list is associated with the sample form image. As a result, the generating apparatus 200 can generate a rule list capable of correctly detecting the item value of a desired item by examining each item name in order of priority.
 また、生成装置200は、複数のサンプル帳票画像を用いて事前学習することにより、ルール辞書を生成する。これにより、生成装置200は、ルール辞書に含まれるルールリストの数が膨大な量になることを抑制しつつ、レイアウトが未知である帳票画像に対しても、所望の項目の項目値を高精度に検出できるルール辞書を生成することが可能となった。 In addition, the generating apparatus 200 generates a rule dictionary by performing learning in advance using a plurality of sample form images. As a result, the generation device 200 suppresses the number of rule lists included in the rule dictionary from becoming a huge amount, and highly accurate item values of desired items even for form images whose layout is unknown. It is possible to generate a rule dictionary that can be detected in
 図17は、他の実施形態に係る情報処理装置における第1処理回路320の概略構成を示すブロック図である。 FIG. 17 is a block diagram showing a schematic configuration of the first processing circuit 320 in the information processing apparatus according to another embodiment.
 第1処理回路320は、第1CPU320の代わりに用いられ、第1CPU320の代わりに検出処理を実行する。第1処理回路320は、受信回路321、取得回路322、特徴量算出回路323、ルールリスト抽出回路324、検出回路325、出力制御回路326、受付回路327及び更新回路328等を有する。 The first processing circuit 320 is used instead of the first CPU 320, and executes detection processing instead of the first CPU 320. The first processing circuit 320 includes a reception circuit 321, an acquisition circuit 322, a feature quantity calculation circuit 323, a rule list extraction circuit 324, a detection circuit 325, an output control circuit 326, a reception circuit 327, an update circuit 328, and the like.
 受信回路321は、受信部の一例であり、受信部121と同様の機能を有する。受信回路321は、第1通信装置101を介して生成装置からルール辞書及びルールリストを受信し、受信したルール辞書及びルールリストを第1記憶装置110に記憶する。 The receiving circuit 321 is an example of a receiving unit, and has the same function as the receiving unit 121. The receiving circuit 321 receives the rule dictionary and the rule list from the generating device via the first communication device 101, and stores the received rule dictionary and the rule list in the first storage device 110.
 取得回路322は、取得部の一例であり、取得部122と同様の機能を有する。取得回路322は、第1通信装置101を介して不図示の画像読取装置から帳票画像を受信し、受信した帳票画像を特徴量算出回路323及び検出回路325に出力する。 The acquisition circuit 322 is an example of an acquisition unit, and has the same function as the acquisition unit 122. The acquisition circuit 322 receives a form image from an image reading device (not shown) via the first communication device 101, and outputs the received form image to the feature amount calculation circuit 323 and the detection circuit 325.
 特徴量算出回路323は、特徴量算出部の一例であり、特徴量算出部123と同様の機能を有する。特徴量算出回路323は、取得回路322から帳票画像を受信し、受信した帳票画像から特徴量を算出し、ルールリスト抽出回路324に出力する。 The feature amount calculation circuit 323 is an example of a feature amount calculation unit, and has the same function as the feature amount calculation unit 123. The feature amount calculation circuit 323 receives the form image from the acquisition circuit 322, calculates the feature amount from the received form image, and outputs the feature amount to the rule list extraction circuit 324.
 ルールリスト抽出回路324は、ルールリスト抽出部の一例であり、ルールリスト抽出部124と同様の機能を有する。ルールリスト抽出回路324は、第1記憶装置110からルール辞書を読み出すとともに、特徴量算出回路323から特徴量を受信し、ルール辞書及び特徴量からルールリストを抽出して検出回路325に出力する。 The rule list extraction circuit 324 is an example of a rule list extraction unit, and has the same function as the rule list extraction unit 124. The rule list extraction circuit 324 reads the rule dictionary from the first storage device 110, receives the feature amount from the feature amount calculation circuit 323, extracts the rule list from the rule dictionary and the feature amount, and outputs the rule list to the detection circuit 325.
 検出回路325は、検出部の一例であり、検出部125と同様の機能を有する。検出回路325は、取得回路322から帳票画像を受信するとともに、ルールリスト抽出回路324からルールリストを受信し、帳票画像及びルールリストから注目する項目の項目値を検出し、出力制御回路326に出力する。 The detection circuit 325 is an example of a detection unit, and has the same function as the detection unit 125. The detection circuit 325 receives a form image from the acquisition circuit 322, receives a rule list from the rule list extraction circuit 324, detects an item value of an item of interest from the form image and the rule list, and outputs the value to the output control circuit 326. Do.
 出力制御回路326は、出力制御部の一例であり、出力制御部126と同様の機能を有する。出力制御回路326は、検出回路325から項目値を受信し、受信した項目値を表示装置103に出力する。 The output control circuit 326 is an example of an output control unit, and has the same function as the output control unit 126. The output control circuit 326 receives an item value from the detection circuit 325, and outputs the received item value to the display device 103.
 受付回路327は、受付部の一例であり、受付部127と同様の機能を有する。受付回路327は、入力装置102から修正値を受信し、受信した修正値を更新回路328に出力する。 The reception circuit 327 is an example of a reception unit, and has the same function as the reception unit 127. The reception circuit 327 receives the correction value from the input device 102 and outputs the received correction value to the update circuit 328.
 更新回路328は、更新部の一例であり、更新部128と同様の機能を有する。更新回路328は、第1記憶装置110からルールリストを読み出すとともに、受付回路327から修正値を受信し、ルールリスト及び修正値からルール辞書を更新する。 The update circuit 328 is an example of the update unit, and has the same function as the update unit 128. The update circuit 328 reads the rule list from the first storage device 110, receives the correction value from the reception circuit 327, and updates the rule dictionary from the rule list and the correction value.
 図18は、他の実施形態に係る生成装置における第2処理回路420の概略構成を示すブロック図である。 FIG. 18 is a block diagram showing a schematic configuration of the second processing circuit 420 in the generation device according to another embodiment.
 第2処理回路420は、第2CPU220の代わりに用いられ、第2CPU220の代わりに生成処理を実行する。第2処理回路420は、ルールリスト生成回路421、ルール辞書生成回路422及び送信回路423等を有する。 The second processing circuit 420 is used instead of the second CPU 220, and executes generation processing instead of the second CPU 220. The second processing circuit 420 includes a rule list generation circuit 421, a rule dictionary generation circuit 422, a transmission circuit 423, and the like.
 ルールリスト生成回路421は、ルールリスト生成部の一例であり、ルールリスト生成部221と同様の機能を有する。ルールリスト生成回路421は、第2記憶装置210からサンプル帳票画像を読み出し、読み出したサンプル帳票画像に基づいてルールリストを生成して第2記憶装置210に記憶する。 The rule list generation circuit 421 is an example of a rule list generation unit, and has the same function as the rule list generation unit 221. The rule list generation circuit 421 reads out a sample form image from the second storage device 210, generates a rule list based on the read out sample form image, and stores the rule list in the second storage device 210.
 ルール辞書生成回路422は、ルール辞書生成部の一例であり、ルール辞書生成部222と同様の機能を有する。ルール辞書生成回路422は、第2記憶装置210からサンプル帳票画像及びルールリストを読み出し、読み出したサンプル帳票画像及びルールリストに基づいてルール辞書を生成して第2記憶装置210に記憶する。 The rule dictionary generation circuit 422 is an example of a rule dictionary generation unit, and has the same function as the rule dictionary generation unit 222. The rule dictionary generation circuit 422 reads out the sample form image and the rule list from the second storage device 210, generates a rule dictionary based on the read out sample form image and the rule list, and stores the rule dictionary in the second storage device 210.
 送信回路423は、送信部の一例であり、送信部223と同様の機能を有する。送信回路423は、第2記憶装置210からルール辞書を読み出し、読み出したルール辞書を第2通信装置201を介して情報処理装置に送信する。 The transmission circuit 423 is an example of a transmission unit, and has the same function as the transmission unit 223. The transmission circuit 423 reads the rule dictionary from the second storage device 210, and transmits the read rule dictionary to the information processing device via the second communication device 201.
 以上詳述したように、情報処理装置が第1処理回路320を用い、生成装置が第2処理回路420を用いる場合においても、情報処理装置は、所望の項目の項目値をより精度良く検出することが可能となった。 As described above, even when the information processing apparatus uses the first processing circuit 320 and the generation apparatus uses the second processing circuit 420, the information processing apparatus detects the item value of the desired item with higher accuracy. It became possible.
 以上、本発明の好適な実施形態について説明してきたが、本発明はこれらの実施形態に限定されるものではない。例えば、生成処理は、生成装置200により実行されるのではなく、情報処理装置100により実行されてもよい。その場合、第1記憶装置110が、第2記憶装置210が記憶する各情報を記憶し、第1CPU120が、図2に示した各部に加えて、第2CPU220の各部と同様の機能を持つ各部を有する。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments. For example, the generation process may be performed by the information processing device 100 instead of being performed by the generation device 200. In that case, the first storage device 110 stores each piece of information stored in the second storage device 210, and the first CPU 120, in addition to the portions shown in FIG. 2, has each portion having the same function as each portion of the second CPU 220. Have.
 また、各帳票画像から検出される注目する項目の項目値の数は一つに限定されない。生成装置200は二つ以上の項目のそれぞれについてルール辞書及びルールリストを生成し、情報処理装置100は各帳票画像から二つ以上の項目の項目値を検出してもよい。 Further, the number of item values of the item of interest to be detected from each form image is not limited to one. The generation device 200 may generate a rule dictionary and a rule list for each of two or more items, and the information processing device 100 may detect item values of two or more items from each form image.
 100  情報処理装置
 102  入力装置
 103  表示装置
 110  第1記憶装置
 121  受信部
 122  取得部
 123  特徴量算出部
 124  ルールリスト抽出部
 125  検出部
 127  受付部
 128  更新部
 200  生成装置
 210  第2記憶装置
 221  ルールリスト生成部
 222  ルール辞書生成部
 223  送信部
100 information processing device 102 input device 103 display device 110 first storage device 121 reception unit 122 acquisition unit 123 feature amount calculation unit 124 rule list extraction unit 125 detection unit 127 reception unit 128 update unit 200 generation device 210 second storage device 221 rule List generation unit 222 Rule dictionary generation unit 223 Transmission unit

Claims (11)

  1.  複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置であって、
     複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を記憶する記憶部と、
     帳票画像を取得する取得部と、
     前記取得された帳票画像から特徴量を算出する特徴量算出部と、
     前記ルール辞書において、前記算出された特徴量に関連付けられたルールリストを抽出するルールリスト抽出部と、
     前記抽出されたルールリストに規定された前記優先順位の順に、当該ルールリストに規定された各項目名が前記帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された前記項目名に対応する項目値を検出する検出部と、
     前記検出された項目値を出力する出力部と、
     を有することを特徴とする情報処理装置。
    An information processing apparatus that processes a form image in which an item name and an item value are represented for each of a plurality of items.
    A storage unit storing a rule dictionary in which each of the feature amounts for each of a plurality of sample form images is associated with each of a plurality of item names and a plurality of rule lists in which priorities of the item names are defined;
    An acquisition unit for acquiring a form image,
    A feature amount calculation unit that calculates a feature amount from the acquired form image;
    A rule list extraction unit that extracts a rule list associated with the calculated feature amount in the rule dictionary;
    It is sequentially determined whether or not each item name defined in the rule list is included in the form image in the order of the priority defined in the extracted rule list, and it is determined that the item image is first included A detection unit that detects an item value corresponding to the item name;
    An output unit that outputs the detected item value;
    An information processing apparatus comprising:
  2.  前記ルールリストには、前記複数の項目名、各項目名の前記優先順位及び前記サンプル帳票画像における各項目名と各項目名に対応する項目値の表示位置関係が規定され、
     前記検出部は、前記表示位置関係に基づいて、前記最初に含まれると判定された前記項目名に対応する項目値を検出する、請求項1に記載の情報処理装置。
    In the rule list, the display position relationship between the plurality of item names, the priority order of each item name, the item names in the sample form image, and the item values corresponding to the item names is defined.
    The information processing apparatus according to claim 1, wherein the detection unit detects an item value corresponding to the item name determined to be included first based on the display position relation.
  3.  前記帳票画像毎の特徴量は、罫線の形状に関する特徴量、もしくは、ロゴに関する特徴量、又は、文字認識結果に関する特徴量である、請求項1または2に記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the feature amount for each form image is a feature amount related to a shape of a ruled line, a feature amount related to a logo, or a feature amount related to a character recognition result.
  4.  前記ルールリスト抽出部は、前記ルール辞書に、前記算出された特徴量に関連付けられたルールリストが存在しない場合、最も多くの特徴量に関連付けられたルールリストを抽出する、請求項1~3の何れか一項に記載の情報処理装置。 The rule list extraction unit extracts a rule list associated with the largest number of feature amounts when there is no rule list associated with the calculated feature amount in the rule dictionary. The information processing apparatus according to any one of the above.
  5.  ユーザによる前記出力された項目値の修正指示を受け付ける受付部と、
     前記修正指示が受け付けられた場合、前記ルール辞書において、前記算出された特徴量に関連付けられたルールリストを更新する更新部と、をさらに有する、請求項1~4の何れか一項に記載の情報処理装置。
    A receiving unit for receiving a correction instruction of the output item value from the user;
    The update method according to any one of claims 1 to 4, further comprising: an update unit for updating a rule list associated with the calculated feature amount in the rule dictionary when the correction instruction is accepted. Information processing device.
  6.  生成装置と、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置とを有する情報処理システムであって、
     前記生成装置は、
      複数のサンプル帳票画像と、複数の項目名及び各項目名の優先順位が規定された複数のルールリストとを記憶する記憶部と、
      前記複数のルールリストのそれぞれに適合する前記サンプル帳票画像の特徴量と、各ルールリストとを関連付けたルール辞書を生成するルール辞書生成部と、
      前記ルール辞書を前記情報処理装置に送信する送信部と、を有し、
     前記情報処理装置は、
      前記生成装置から前記ルール辞書を受信する受信部と、
      帳票画像を取得する取得部と、
      前記取得された帳票画像から特徴量を算出する特徴量算出部と、
      前記受信されたルール辞書において、前記算出された特徴量に関連付けられたルールリストを抽出するルールリスト抽出部と、
      前記抽出されたルールリストに規定された前記優先順位の順に、当該ルールリストに規定された各項目名が前記帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された前記項目名に対応する項目値を検出する検出部と、
      前記検出された項目値を出力する出力部と、を有する、
     ことを特徴とする情報処理システム。
    An information processing system comprising: a generation device; and an information processing device for processing a form image in which an item name and an item value are represented for each of a plurality of items.
    The generator is
    A storage unit that stores a plurality of sample form images, a plurality of item names, and a plurality of rule lists in which priorities of the item names are defined;
    A rule dictionary generation unit that generates a rule dictionary in which feature amounts of the sample form image that conform to each of the plurality of rule lists are associated with each rule list;
    And a transmitter configured to transmit the rule dictionary to the information processing apparatus.
    The information processing apparatus is
    A receiver for receiving the rule dictionary from the generator;
    An acquisition unit for acquiring a form image,
    A feature amount calculation unit that calculates a feature amount from the acquired form image;
    A rule list extraction unit that extracts a rule list associated with the calculated feature amount in the received rule dictionary;
    It is sequentially determined whether or not each item name defined in the rule list is included in the form image in the order of the priority defined in the extracted rule list, and it is determined that the item image is first included A detection unit that detects an item value corresponding to the item name;
    And an output unit that outputs the detected item value.
    An information processing system characterized by
  7.  前記複数のルールリストの何れにも適合しないサンプル帳票画像が存在する場合、前記複数のルールリストの何れにも適合しないサンプル帳票画像に基づいて、新たなルールリストを生成するルールリスト生成部をさらに有する、請求項6に記載の情報処理システム。 When there is a sample form image that does not match any of the plurality of rule lists, the rule list generation unit further generates a new rule list based on the sample form image that does not match any of the plurality of rule lists. The information processing system according to claim 6.
  8.  前記記憶部は、前記複数のサンプル帳票画像のそれぞれを正解値と関連付けて記憶し、
     前記ルールリスト生成部は、前記複数のルールリストの何れにも適合しない各サンプル帳票画像において、前記正解値に対応している割合が高い項目名ほど、前記優先順位が高くなるように前記新たなルールリストを生成する、請求項7に記載の情報処理システム。
    The storage unit stores each of the plurality of sample form images in association with a correct value,
    The rule list generation unit is configured such that, in each sample form image not conforming to any of the plurality of rule lists, the priority is increased such that the item name with a higher ratio corresponding to the correct value becomes higher. The information processing system according to claim 7, generating a rule list.
  9.  前記記憶部は、前記複数のサンプル帳票画像のそれぞれを正解値と関連付けて記憶し、
     前記ルール辞書生成部は、前記ルールリストに規定された前記優先順位の順に、前記ルールリストに規定された各項目名が前記サンプル帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された前記項目名に対応する項目値と、前記サンプル帳票画像に関連付けられた正解値とが一致する場合に、前記ルールリストと前記サンプル帳票画像が適合すると判定する、請求項6に記載の情報処理システム。
    The storage unit stores each of the plurality of sample form images in association with a correct value,
    The rule dictionary generation unit sequentially determines whether each item name defined in the rule list is included in the sample document image in the order of the priority defined in the rule list, and is first included It is determined that the rule list and the sample form image match if the item value corresponding to the item name determined to be identical with the correct value associated with the sample form image. Information processing system.
  10.  記憶部と、出力部とを有し、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置の制御方法であって、
     複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を前記記憶部に記憶し、
     帳票画像を取得し、
     前記取得された帳票画像から特徴量を算出し、
     前記ルール辞書において、前記算出された特徴量に関連付けられたルールリストを抽出し、
     前記抽出されたルールリストに規定された前記優先順位の順に、当該ルールリストに規定された各項目名が前記帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された前記項目名に対応する項目値を検出し、
     前記検出された項目値を前記出力部に出力する、
     ことを含むことを特徴とする制御方法。
    A control method of an information processing apparatus, which has a storage unit and an output unit and processes a form image in which an item name and an item value are represented for each of a plurality of items.
    Storing, in the storage unit, a rule dictionary in which each of the feature amounts for each of a plurality of sample form images is associated with each of a plurality of item names and a plurality of rule lists in which the priorities of the item names are defined;
    Get form image,
    Calculate the feature amount from the acquired form image,
    Extracting a rule list associated with the calculated feature amount in the rule dictionary;
    It is sequentially determined whether or not each item name defined in the rule list is included in the form image in the order of the priority defined in the extracted rule list, and it is determined that the item image is first included Detect the item value corresponding to the item name,
    Outputting the detected item value to the output unit;
    Control method characterized in that it includes.
  11.  記憶部と、出力部とを有し、複数の項目毎に項目名及び項目値が表された帳票画像を処理する情報処理装置の制御プログラムであって、
     複数のサンプル帳票画像毎の特徴量のそれぞれと、複数の項目名及び各項目名の優先順位が規定された複数のルールリストのそれぞれとが関連付けられたルール辞書を前記記憶部に記憶し、
     帳票画像を取得し、
     前記取得された帳票画像から特徴量を算出し、
     前記ルール辞書において、前記算出された特徴量に関連付けられたルールリストを抽出し、
     前記抽出されたルールリストに規定された前記優先順位の順に、当該ルールリストに規定された各項目名が前記帳票画像に含まれるか否かを順次判定し、最初に含まれると判定された前記項目名に対応する項目値を検出し、
     前記検出された項目値を前記出力部に出力する、
     ことを前記情報処理装置に実行させることを特徴とする制御プログラム。
    A control program of an information processing apparatus having a storage unit and an output unit and processing a form image in which an item name and an item value are represented for each of a plurality of items,
    Storing, in the storage unit, a rule dictionary in which each of the feature amounts for each of a plurality of sample form images is associated with each of a plurality of item names and a plurality of rule lists in which the priorities of the item names are defined;
    Get form image,
    Calculate the feature amount from the acquired form image,
    Extracting a rule list associated with the calculated feature amount in the rule dictionary;
    It is sequentially determined whether or not each item name defined in the rule list is included in the form image in the order of the priority defined in the extracted rule list, and it is determined that the item image is first included Detect the item value corresponding to the item name,
    Outputting the detected item value to the output unit;
    Control program for causing the information processing apparatus to execute the control program.
PCT/JP2017/027758 2017-07-31 2017-07-31 Information processing device, information processing system, control method, and control program WO2019026147A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/027758 WO2019026147A1 (en) 2017-07-31 2017-07-31 Information processing device, information processing system, control method, and control program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/027758 WO2019026147A1 (en) 2017-07-31 2017-07-31 Information processing device, information processing system, control method, and control program

Publications (1)

Publication Number Publication Date
WO2019026147A1 true WO2019026147A1 (en) 2019-02-07

Family

ID=65232380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/027758 WO2019026147A1 (en) 2017-07-31 2017-07-31 Information processing device, information processing system, control method, and control program

Country Status (1)

Country Link
WO (1) WO2019026147A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020175163A1 (en) * 2019-02-27 2020-09-03 日本電信電話株式会社 Information processing device, associating method, and associating program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11296676A (en) * 1998-04-08 1999-10-29 Oki Electric Ind Co Ltd Image data classification method and image data registration method
JP2006134106A (en) * 2004-11-05 2006-05-25 Hammock:Kk Business form recognition system, business form recognition method and computer program
JP2016051339A (en) * 2014-08-29 2016-04-11 日立オムロンターミナルソリューションズ株式会社 Document recognition device and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11296676A (en) * 1998-04-08 1999-10-29 Oki Electric Ind Co Ltd Image data classification method and image data registration method
JP2006134106A (en) * 2004-11-05 2006-05-25 Hammock:Kk Business form recognition system, business form recognition method and computer program
JP2016051339A (en) * 2014-08-29 2016-04-11 日立オムロンターミナルソリューションズ株式会社 Document recognition device and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUN'ICHI HIRAYAMA ET AL.: "Kasetsu Kenshogata Approach o Mochiita Teigi Less Hiteikei Chohyo Ninshiki Gijutsu", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS D, vol. J97-D, no. 12, 1 December 2014 (2014-12-01), pages 1797 - 1808 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020175163A1 (en) * 2019-02-27 2020-09-03 日本電信電話株式会社 Information processing device, associating method, and associating program
JP2020140410A (en) * 2019-02-27 2020-09-03 日本電信電話株式会社 Information processing device, association method and association program
CN113508393A (en) * 2019-02-27 2021-10-15 日本电信电话株式会社 Information processing apparatus, correlation method, and correlation program
EP3910546A4 (en) * 2019-02-27 2022-10-05 Nippon Telegraph And Telephone Corporation Information processing device, associating method, and associating program
JP7211157B2 (en) 2019-02-27 2023-01-24 日本電信電話株式会社 Information processing device, association method and association program

Similar Documents

Publication Publication Date Title
CN112395978B (en) Behavior detection method, behavior detection device and computer readable storage medium
CN109145926B (en) Similar picture identification method and computer equipment
US11328504B2 (en) Image-processing device for document image, image-processing method for document image, and storage medium on which program is stored
EP3543912A1 (en) Image processing device, image processing method, and image processing program
CN107808154B (en) Method and device for extracting cash register bill information
JP2008259156A (en) Information processing device, information processing system, information processing method, program, and storage medium
CN111190595A (en) Method, device, medium and electronic equipment for automatically generating interface code based on interface design drawing
CN108509597B (en) Method and system for evaluating success rate of character trademark registration
CN115205883A (en) Data auditing method, device, equipment and storage medium based on OCR (optical character recognition) and NLP (non-line language)
CN111052221B (en) Chord information extraction device, chord information extraction method and memory
US11106908B2 (en) Techniques to determine document recognition errors
CN113033271A (en) Processing method for learning face identification by using artificial intelligence module
WO2019026147A1 (en) Information processing device, information processing system, control method, and control program
US8335007B2 (en) Image processing apparatus
JP2009026158A (en) Object designating device, object designating method, and computer program
CN111881810B (en) Certificate identification method, device, terminal and storage medium based on OCR
WO2019116466A1 (en) Information processing device, control method, and control program
KR20180123412A (en) Contents service system and method
JP5959460B2 (en) Data processing apparatus, data processing method, and program
JP2021033688A (en) Date generation apparatus, control method, and program
CN113703759B (en) Code generation method, device, medium and equipment
JP6675895B2 (en) Measurement data processing device, measurement system, and measurement data processing program
CN113723466B (en) Text similarity quantification method, device and system
US20220382804A1 (en) Information processing apparatus, information processing system, and method of information processing
CN109726375B (en) Code configuration modification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17920240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17920240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP