WO2022097189A1 - Data processing device, data processing method, and program - Google Patents

Data processing device, data processing method, and program Download PDF

Info

Publication number
WO2022097189A1
WO2022097189A1 PCT/JP2020/041162 JP2020041162W WO2022097189A1 WO 2022097189 A1 WO2022097189 A1 WO 2022097189A1 JP 2020041162 W JP2020041162 W JP 2020041162W WO 2022097189 A1 WO2022097189 A1 WO 2022097189A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
character
master data
recognition
similar
Prior art date
Application number
PCT/JP2020/041162
Other languages
French (fr)
Japanese (ja)
Inventor
鴻鵬 葛
顕 松田
貴亮 佐藤
智 小俣
啓太郎 森
Original Assignee
ファーストアカウンティング株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ファーストアカウンティング株式会社 filed Critical ファーストアカウンティング株式会社
Priority to PCT/JP2020/041162 priority Critical patent/WO2022097189A1/en
Priority to JP2020561940A priority patent/JP6870159B1/en
Priority to JP2021068170A priority patent/JP2022075467A/en
Publication of WO2022097189A1 publication Critical patent/WO2022097189A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result

Definitions

  • the present invention relates to a data processing apparatus, a data processing method and a program.
  • the error is corrected by comparing the extracted characters with the contents of a predetermined database.
  • the character string described in the voucher is not registered in the database and does not match the contents of the database even though the character recognition result is correct.
  • you correct it so that it matches the contents of other character strings contained in the database, a character string different from the character string described in the voucher will be displayed even though the character is correctly recognized. There was a problem that it was output.
  • the present invention has been made in view of these points, and an object thereof is to improve the probability that the character string included in the image data of the voucher is correctly output.
  • the data processing device has a data acquisition unit for acquiring voucher image data and a character recognition unit for outputting a plurality of recognition character strings by recognizing character strings included in the voucher image data. And, the first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and is different from the first character string among the plurality of recognition character strings.
  • a similar character string most similar to the first character string among one or more registered character strings associated with the second character string in the master data. It has a correction unit for correcting the first character string, and an output unit for outputting the corrected first character string after the correction of the first character string and the second character string in association with each other.
  • the correction unit corrects the first character string to the similar character string on condition that two or more of the second character strings among the plurality of recognition character strings are included in the master data. May be good.
  • the correction unit determines the similar character string before correcting the first character string to the similar character string.
  • a plurality of candidates may be output to the output unit, and the first character string may be corrected to the similar character string corresponding to the candidate selected from the plurality of candidates.
  • the master data includes the company name and account information as the plurality of registered character strings
  • the correction unit includes the recognition company name which is the first character string recognized by the character recognition unit in the master data. If the master data includes the recognition account information which is the second character string recognized by the character recognition unit, the recognition is given to the company name associated with the recognition account information in the master data. You may correct the company name.
  • the master data includes a company name and an item name as the plurality of registered character strings
  • the correction unit includes a recognition company name which is the first character string recognized by the character recognition unit in the master data.
  • the master data includes the recognized item name which is the second character string recognized by the character recognition unit, a plurality of company names associated with the recognized item name in the master data.
  • the recognized company name may be corrected to the company name most similar to the recognized company name.
  • the master data includes the item name and the product unit price as the plurality of registered character strings
  • the correction unit includes the recognized item name which is the first character string recognized by the character recognition unit in the master data.
  • the recognized product unit price which is the second character string recognized by the character recognition unit
  • a plurality of item names associated with the recognized product unit price in the master data may be corrected to the item name most similar to the recognized item name.
  • the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs a search on the Internet, and the similar character. If it is determined that the column is more likely to be correct than the first character string, the first character string may be corrected to the similar character string.
  • the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs the first search.
  • the similar character string in the master data may be corrected to the first character string without correcting the first character string. ..
  • the output unit can distinguish between the character string determined by the correction unit to be necessary for correction and the character string determined by the correction unit that correction is not necessary among the plurality of recognition character strings. You may output it.
  • the output unit When the master data does not include a character string having a similarity with the first character string equal to or higher than the threshold value, the output unit outputs the first character string as a character string to be registered in the master data. You may.
  • the output unit is said to have many character strings not included in the master data when the ratio of the recognition character strings not included in the master data is equal to or more than a predetermined value among the plurality of recognition character strings.
  • the data processing method of the second aspect of the present invention is a step of acquiring voucher image data executed by a computer and a step of outputting a plurality of recognition character strings by recognizing a character string included in the voucher image data.
  • the first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and is different from the first character string among the plurality of recognition character strings.
  • a similar character string most similar to the first character string among one or more registered character strings associated with the second character string in the master data. It has a step of correcting the first character string and a step of associating the corrected first character string with the corrected first character string after the correction of the first character string and outputting the second character string.
  • the program of the third aspect of the present invention includes a step of acquiring voucher image data, a step of outputting a plurality of recognition character strings by recognizing a character string included in the voucher image data, and the plurality of steps.
  • the first character string among the recognition character strings of is not included in the master data associated with a plurality of registered character strings, and the second character string different from the first character string among the plurality of recognition character strings is When included in the master data, the first of the registered character strings associated with the second character string in the master data is the most similar to the first character string.
  • the step of correcting the character string and the step of associating the corrected first character string with the corrected first character string after the correction of the first character string and outputting the second character string are executed.
  • invoice image data which is a type of voucher image data.
  • invoice data This is an example of invoice data.
  • FIG. 1 is a diagram for explaining an outline of the data processing device 1.
  • the data processing device 1 is a device for specifying a character string included in the voucher image data by acquiring voucher image data and performing character recognition processing on the voucher image data, for example, a computer.
  • the data processing device 1 creates voucher data including the specified character string, and outputs the created voucher data to the external device 3.
  • Voucher image data is image data of voucher documents such as invoices, purchase orders, invoices, quotations or inspection slips. If the voucher image data is the image data of the invoice, the voucher image data is an image of the voucher containing the company name, contact information, the name (item name) of the product or service to be billed, the billing amount, the tax amount, etc. It is the converted data.
  • the voucher image data is, for example, image data generated by the image reading device 2 (for example, a scanner or a digital camera) reading the voucher, but may be image data or text data created by a computer.
  • the external device 3 is, for example, a computer used by a user (for example, an accounting person) who uses the data processing device 1 in a company that has received a voucher, or an enterprise resource planning (ERP).
  • the data processing device 1 displays, for example, the specified character string on the user's computer, or transmits the specified character string to the core system.
  • the core system is a system that stores various data used for accounting, for example.
  • FIG. 2 is an example of invoice image data, which is a kind of voucher image data.
  • FIG. 3 is an example of invoice data which is voucher data including a character string specified based on the invoice image data shown in FIG. In the invoice data shown in FIG. 3, the character string included in the invoice image data shown in FIG. 2 is correctly described.
  • the data processing device 1 it is not always possible for the data processing device 1 to correctly identify all the character strings included in the invoice image data by performing character recognition.
  • the data processing device 1 erroneously recognizes a character string
  • the invoice data contains an erroneous character string
  • invoice data including the erroneous character string is created, and the erroneous character string is registered in the external device 3. Will be done.
  • the data processing device 1 has the company name, branch name, telephone number, account information, contact information, department in charge, person in charge name, item name, and product unit price, which are character strings described in the voucher.
  • the master data associated with a plurality of character strings is referred to, and it is determined whether or not the character-recognized character string is correct based on the plurality of character strings associated with the master data.
  • the misrecognized character string is used as the above-mentioned other character in the master data. Correct to the most similar string among the multiple strings associated with the column. Since the data processing device 1 is configured in this way, the probability that the data processing device 1 outputs a correct character string is increased.
  • the voucher is an invoice
  • the voucher may be an order form, a delivery note, a quotation, an acceptance slip, etc. other than the invoice.
  • FIG. 4 is a diagram showing the configuration of the data processing device 1.
  • the data processing device 1 includes a communication unit 11, a storage unit 12, and a control unit 13.
  • the control unit 13 includes a data acquisition unit 131, a character recognition unit 132, a correction unit 133, and an output unit 134.
  • the communication unit 11 includes a communication interface for transmitting and receiving various data to and from the image reading device 2 or the external device 3 via a network such as the Internet or an intranet.
  • the communication unit 11 inputs the invoice image data received from the image reading device 2 to the data acquisition unit 131. Further, the communication unit 11 outputs the invoice data output by the output unit 134 to the external device 3.
  • the storage unit 12 has a storage medium such as a ROM (ReadOnlyMemory), a RAM (RandomAccessMemory), and an SSD (SolidStateDrive).
  • the storage unit 12 stores a program executed by the control unit 13.
  • the storage unit 12 has a company name (registered company name), a branch name, a telephone number, account information (registered account information), contact information, a department in charge, a person in charge, and an item name, which are character strings written on the voucher.
  • the master data in which a plurality of character strings among the product unit prices are associated as a plurality of registered character strings is stored.
  • the storage unit 12 stores, for example, company master data 121 including data on the company and product master data 122 including data on the product. These master data are used when the user creates a voucher, but are also used by the control unit 13 for character recognition and correction of the specified character string.
  • the company master data 121 and the product master data 122 may be stored in a storage medium external to the storage unit 12.
  • FIG. 5 is a diagram showing an example of company master data 121.
  • a company name, a branch name, a company address, a telephone number, a department in charge, a person in charge, and a transfer account number are associated with each other.
  • FIG. 6 is a diagram showing an example of product master data 122.
  • the product name, the price, the manufacturer name, and the business partner name are associated with each other.
  • the product name includes the product name and the model name or model number, but may include only the product name or may include only the model name or model number.
  • the control unit 13 shown in FIG. 4 has, for example, a CPU (Central Processing Unit).
  • the control unit 13 functions as a data acquisition unit 131, a character recognition unit 132, a correction unit 133, and an output unit 134 by executing the program stored in the storage unit 12.
  • the data acquisition unit 131 acquires voucher image data via the communication unit 11.
  • the data acquisition unit 131 acquires, for example, the voucher image data output from the image reading device 2 that has read the voucher, and inputs the acquired voucher image data to the character recognition unit 132.
  • the character recognition unit 132 outputs a plurality of recognition character strings by recognizing the character strings included in the voucher image data.
  • the character recognition unit 132 recognizes the characters included in the voucher image data by executing OCR (Optical Character Recognition) processing executed by, for example, AI (Artificial Intelligence), and sets a plurality of consecutive characters as character strings. recognize.
  • OCR Optical Character Recognition
  • AI Artificial Intelligence
  • the character recognition unit 132 inputs a plurality of recognized character strings to the correction unit 133.
  • the first character string among the plurality of recognition character strings recognized by the character recognition unit 132 in one voucher image data is not included in the master data associated with the plurality of registered character strings, and a plurality of them.
  • the master data contains a second character string different from the first character string among the recognition character strings of, the first character of one or more registered character strings associated with the second character string in the master data. Correct the first string to a similar string that most closely resembles the column.
  • the correction unit 133 does not correct the first character string, and the first character string and the second character string are included in the master data. Notify the output unit 134 that it has not been done.
  • the master data may be company master data 121 or product master data 122.
  • the correction unit 133 may use the company master data 121 and the product master data 122 in combination.
  • the correction unit 133 inputs the corrected first character string to the output unit 134.
  • the output unit 134 outputs the corrected first character string and the second character string in association with each other.
  • the output unit 134 outputs voucher data in a state in which a plurality of character strings corresponding to one voucher image data are associated, for example, like the invoice data shown in FIG.
  • the output unit 134 outputs the voucher data to the external device 3 via the communication unit 11, or outputs the voucher data to the display, for example.
  • the output unit 134 outputs the first character string in association with the similar character string before the correction unit 133 corrects the first character string to a similar character string, and the correction unit 133 outputs the first character string by the output unit 134. After associating the column with the similar character string and outputting it, the first character string may be corrected to the similar character string on condition that the instruction for permitting the correction is received. Further, when the output unit 134 receives a notification from the correction unit 133 that the first character string and the second character string are not included in the master data, the first character string and the second character string are included in the master data. You may output the information indicating that it has not been done.
  • the details of the correction process by the correction unit 133 will be described.
  • the recognition company name which is the first character string recognized by the character recognition unit 132 is not included in the master data, and the recognition account information which is the second character string recognized by the character recognition unit is included. If it is included in the master data, the recognized company name is corrected to the company name associated with the recognized account information in the master data.
  • a plurality of recognition character strings are "Taguchi Shoji” and "AAA Bank Tokyo Branch Ordinary 12233334".
  • the company master data 121 shown in FIG. 5 does not include the first character string.
  • the second character string "AAA Bank Tokyo Branch Ordinary 12233334" is included in the company master data 121 shown in FIG.
  • the correction unit 133 may use a plurality of character strings such as "Tanaka Shoji”, “Tokyo Branch”, and "001-12, Chiyoda-ku, Tokyo” associated with "AAA Bank Tokyo Branch Ordinary 12233334".
  • "Tanaka Shoji” is judged to be the most similar to "Taguchi Shoji”
  • "Tanaka Shoji” in the first character string is corrected to "Tanaka Shoji”.
  • the correction unit 133 identifies the most similar similar character string, for example, based on the similarity calculated based on the number of matching characters among a plurality of characters included in each of the plurality of character strings.
  • the correction unit 133 specifies the character string having the largest number of character strings that match the first character string as a similar character string.
  • the correction unit 133 may specify the character string registered as the company name in the master data as a similar character string.
  • a plurality of recognition character strings are "ink (AK-123)" and " ⁇ 1,800".
  • the first character string is " ⁇ 1,800”
  • the product master data 122 shown in FIG. 6 does not include the first character string.
  • the second character string "ink (AK-123)” is included in the product master data 122 shown in FIG.
  • the correction unit 133 uses the character strings of " ⁇ 1,000", “ABC”, “Tanaka Shoji Co., Ltd.”, etc. associated with “ink (AK-123)” to be “ ⁇ 1,000”. It is determined that "1,000” is the most similar to " ⁇ 1,800", and " ⁇ 1,800" in the first character string is corrected to " ⁇ 1,000".
  • the correction unit 133 does not include the recognition item name, which is the first character string recognized by the character recognition unit 132, in the master data, and the recognition product unit price, which is the second character string recognized by the character recognition unit 132, is the master data.
  • the recognized item name may be corrected to the item name most similar to the recognized item name among the plurality of item names associated with the recognized product unit price in the master data. For example, when the character recognition unit 132 recognizes the ink (AK-723) as the first character string and the character recognition unit 132 recognizes " ⁇ 1,000" as the second character string, the correction unit 133 is shown in FIG. In the indicated product master data 122, the "ink (AK-0123)" and the "copy paper (A1)" associated with " ⁇ 1,000" are specified.
  • the correction unit 133 puts the first character string “ink (AK-723)" in “ink (AK-0123)” and “copy paper (A1)” rather than “copy paper (A1)”.
  • the first character string is corrected to a similar "ink (AK-0123)”.
  • the recognition company name which is the first character string recognized by the character recognition unit 132
  • the recognition item name which is the second character string recognized by the character recognition unit
  • the recognized company name may be corrected to the company name most similar to the recognized company name among the plurality of company names associated with the recognized item name in the master data. For example, when the character recognition unit 132 recognizes "Taguchi Shoji Co., Ltd.” as the first character string and the character recognition unit 132 recognizes "ink (AK-0123)" as the second character string, the correction unit 133 is shown in the figure.
  • the correction unit 133 is a "Tanaka Shoji Co., Ltd.” that is more similar to the first character string "Taguchi Shoji Co., Ltd.” than "MM Electric Co., Ltd.” among "Tanaka Shoji Co., Ltd.” and "MM Electric Co., Ltd.” Correct the first character string to "company”.
  • the correction unit 133 corrects the first character string to the character string associated with the second character string on the premise that the second character string is correct, the first character is added to the wrong character string.
  • the columns may be corrected. Therefore, the correction unit 133 corrects the first character string to a similar character string on condition that the correction unit 133 includes two or more second character strings among the plurality of recognition character strings in the master data. You may.
  • the correction unit 133 is associated with, for example, "AAA Bank Tokyo Branch Ordinary 12233334" and "03-1234-5678" by being included in the master data. Correct "Taguchi Shoji” in the first character string to "Tanaka Shoji". When the probability that two or more character strings are erroneously recognized is sufficiently low, the correction unit 133 operates in this way to reduce the probability that the first character string is corrected to the wrong character string. Therefore, it is possible to improve the probability that the character string described in the voucher is output correctly.
  • the correction unit 133 selects a plurality of candidates for the similar character string before correcting the first character string to the similar character string.
  • the first character string may be corrected to a similar character string corresponding to a candidate selected from a plurality of candidates by causing the output unit 134 to output. For example, if the company master data 121 includes the telephone number "03-1234-5678" and the fax number "03-1234-5679" and the first character string is "03-1234-5670", "03-" Both "1234-5678" and "03-1234-5679" are similar to the first string.
  • the correction unit 133 uses "03-1234-5678" and "03-1234-5679" as correction candidates. , Display on the display of the computer used by the user.
  • the correction unit 133 corrects the first character string to the character string selected by the user. By operating the correction unit 133 in this way, even when a plurality of similar character strings are registered in the master data, the probability that the first character string is corrected to the wrong character string is reduced. be able to.
  • the correction unit 133 executes a search on the Internet using at least one of the first character string or the similar character string as a keyword before correcting the first character string to the similar character string, and the similar character string is the first. If it is determined that the probability of being correct is higher than that of one character string, the first character string may be corrected to a similar character string.
  • the first character string recognized by the character recognition unit 132 is " ⁇ 2-15, Chiyoda-ku, Tokyo"
  • the correction unit 133 is "001, Chiyoda-ku, Tokyo” in the company master data shown in FIG.
  • the correction unit 133 searches on the Internet using the company name or telephone number associated with the similar character string as a keyword. The correction unit 133 does not correct the first character string when the address described in the searched and displayed website matches the first character string, and the address described in the website is similar.
  • the correction unit 133 By operating the correction unit 133 in this way, it is possible to prevent erroneous correction when the character string registered in the master data is not the latest.
  • the correction unit 133 executes a search on the Internet using at least one of the first character string or the similar character string as a keyword before correcting the first character string to a similar character string, and the first character string is used.
  • the similar character string in the master data may be corrected to the first character string without correcting the first character string.
  • the correction unit 133 has a high probability that the address described in the website searched and displayed matches the first character string, and the first character string is more correct than the similar character string.
  • the correction unit 133 By operating the correction unit 133 in this way, the master data is updated. As a result, in the future, the accuracy of correction when the character recognition unit 132 misrecognizes and the correction unit 133 corrects the character string is improved.
  • the output unit 134 outputs the corrected character string to the core system in order to improve the probability that an appropriate character string is registered in the core system based on the character string described in the voucher. It may be possible for the user to confirm. As an example, the output unit 134 can distinguish between a character string determined by the correction unit 133 that correction is necessary and a character string determined by the correction unit 133 that correction is not necessary among the plurality of recognition character strings. Output in the mode.
  • FIG. 7 is a diagram showing an example of a voucher data display screen output by the output unit 134.
  • "Taguchi Shoji” and “ink (AK-723)” are not included in the company master data or the product master data, so that they are different from other character strings (italicized characters in a thick frame). ) Is displayed.
  • the output unit 134 By outputting such data by the output unit 134, for example, the user can easily grasp which character string needs to be corrected.
  • the output unit 134 may display a character string that is a candidate for correction when a predetermined operation is performed on the screen of FIG. 7A.
  • the predetermined operation is, for example, an operation (for example, a click operation or a touch operation) in which the user selects a character string that needs to be corrected.
  • FIG. 7B is a diagram showing an example in which a character string that is a candidate for correction is displayed.
  • the correction unit 133 corrects the character string displayed in FIG. 7A to the displayed candidate character string. As a result, the voucher data is corrected to the state shown in FIG.
  • the output unit 134 determines that the ratio of the recognition character strings not included in the master data among the plurality of recognition character strings is equal to or more than a predetermined value. Warning information indicating that there are many character strings not included in the master data may be output.
  • the output unit 134 may output warning information together with a plurality of recognition character strings. As shown in FIG. 7, the output unit 134 may output warning information together with a plurality of recognition character strings in a state where the character string requiring correction can be identified.
  • the output unit 134 may display a screen for inputting a process to be executed by the user together with the warning information or as the warning information.
  • the output unit 134 displays, for example, a screen for performing an operation for associating a plurality of recognition character strings and registering them in master data.
  • the output unit 134 registers a plurality of recognition character strings in the master data when the operation for registration is performed.
  • the output unit 134 sets the first character string as a character string to be registered in the master data. You may output it.
  • the output unit 134 for example, when the first character string is "Sato Shoji" and the master data does not include "Sato Shoji", such as "Do you want to register Sato Shoji in the master data?" , Display a message containing the first character string on the user's computer.
  • the output unit 134 registers in the master data among a plurality of other character strings included in the voucher image data including the character string determined to be registered (for example, the above-mentioned "Sato Shoji").
  • a plurality of character strings corresponding to the items to be registered may be displayed as character strings of candidates for registration.
  • the output unit 134 displays the operation image that accepts the operation for registration together with the character string of the registration target candidate, and when the operation image is operated, the first character string and the character string of the registration target candidate are displayed. It may be registered in the master data.
  • FIG. 8 is a flowchart showing a processing flow of the data processing device 1. The flowchart shown in FIG. 8 starts from the time when the image reading device 2 outputs the voucher image data.
  • the character recognition unit 132 executes an OCR process for recognizing the characters included in the voucher image data (S12).
  • the character recognition unit 132 recognizes a plurality of character strings based on the recognized characters.
  • the correction unit 133 first selects one recognition character string from the plurality of recognition character strings in order to determine whether or not the plurality of character strings recognized by the character recognition unit 132 are correctly recognized (S13). The correction unit 133 determines whether or not one of the selected recognition character strings matches any of the plurality of character strings included in the master data (S14). When the correction unit 133 determines that one selected recognition character string matches any of a plurality of character strings included in the master data (YES in S14), the correction unit 133 selects another recognition character string (S15). ), The process of S14 is executed again.
  • the correction unit 133 determines that the first character string, which is one selected recognition character string, does not match all of the plurality of character strings included in the master data (NO in S14)
  • the character recognition unit 132 It is determined whether or not the other recognition character string among the plurality of recognition character strings recognized by the user matches any of the plurality of character strings included in the master data (S16).
  • the correction unit 133 determines that the other recognition character string matches any of the plurality of character strings included in the master data (YES in S16), the master data that matches the other recognition character string. Among the plurality of character strings associated with the character string in the above, the recognized first character string is corrected to the character string most similar to the first character string (S17). When the correction unit 133 determines that the other recognition character string does not match any of the plurality of character strings included in the master data (NO in S16), the correction unit 133 further refers to the other recognition character string in S16. Executes the processing of.
  • the correction unit 133 determines whether or not the processing from S14 to S17 has been completed for all the recognition character strings, and if not, returns to S14. When the processing for all the recognized character strings is completed (YES in S18), the correction unit 133 creates voucher data composed of the corrected character strings, and the output unit 134 outputs the voucher data (S19).
  • the data processing device 1 includes the company name, branch name, telephone number, account information, contact information, department in charge, person in charge name, item name, and product unit price, which are character strings written on the voucher.
  • the correction unit 133 does not include the first character string in the master data among the plurality of recognition character strings specified by recognizing the character strings included in the voucher image data, and the correction unit 133 has a plurality of recognition character strings.
  • the master data contains a second character string different from the first character string, it is most similar to the first character string among one or more registered character strings associated with the second character string in the master data.
  • the first character string is corrected to the similar character string.
  • the output unit 134 can output a correct character string even if an error occurs in character recognition, so that the character string included in the image data of the voucher is correct. Improve the probability of being output. As a result, it is possible to improve the work efficiency and work quality of the user who performs the work using the data described in the voucher.
  • the data processing device 1 has the company master data 121 and the product master. Only one of the data 122 may be used. Further, the data processing device 1 does not have to be configured by one computer, and a plurality of computers may operate in cooperation with each other, or the computer and the storage medium in which the master data is stored may be physically separated. You may be doing it.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

A data processing device 1 comprises: a storage unit 12 which stores master data in which a plurality of character strings to be written on vouchers are associated as a plurality of registered character strings; a character recognition unit 132 which recognizes character strings included in voucher image data and thereby outputs a plurality of recognized character strings; and a correction unit 133 which, if a first character string of the plurality of recognized character strings is not included in the master data, but a second character string of the plurality of recognized character string that is different from the first character string is included in the master data, corrects the first character string to a similar character string that is most similar to the first character string among the one or more registered character strings that are associated, in the master data, with the second character string.

Description

データ処理装置、データ処理方法及びプログラムData processing equipment, data processing methods and programs
 本発明は、データ処理装置、データ処理方法及びプログラムに関する。 The present invention relates to a data processing apparatus, a data processing method and a program.
 従来、請求書を読み取った画像に基づいて文字認識し、文字認識した結果に誤りがある場合に認識した文字を補正する技術が知られている(例えば、特許文献1を参照)。 Conventionally, there is known a technique of recognizing characters based on an image obtained by reading an invoice and correcting the recognized characters when there is an error in the result of character recognition (see, for example, Patent Document 1).
特開2012-517637号公報Japanese Unexamined Patent Publication No. 2012-517637
 従来の技術においては、抽出された文字を所定のデータベースの内容と比較することにより誤りが補正される。しかしながら、証憑に記載された文字列がデータベースに登録されておらず、文字認識した結果に誤りがないにもかかわらず、データベースの内容と一致しないという場合がある。このような場合にデータベースに含まれる他の文字列の内容と一致するように補正をしてしまうと、正しく文字認識されているにもかかわらず、証憑に記載された文字列と異なる文字列が出力されてしまうという問題が生じていた。 In the conventional technique, the error is corrected by comparing the extracted characters with the contents of a predetermined database. However, there are cases where the character string described in the voucher is not registered in the database and does not match the contents of the database even though the character recognition result is correct. In such a case, if you correct it so that it matches the contents of other character strings contained in the database, a character string different from the character string described in the voucher will be displayed even though the character is correctly recognized. There was a problem that it was output.
 そこで、本発明はこれらの点に鑑みてなされたものであり、証憑の画像データに含まれる文字列が正しく出力される確率を向上させることを目的とする。 Therefore, the present invention has been made in view of these points, and an object thereof is to improve the probability that the character string included in the image data of the voucher is correctly output.
 本発明の第1の態様に係るデータ処理装置は、証憑画像データを取得するデータ取得部と、前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力する文字認識部と、前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正する補正部と、前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力する出力部と、を有する。 The data processing device according to the first aspect of the present invention has a data acquisition unit for acquiring voucher image data and a character recognition unit for outputting a plurality of recognition character strings by recognizing character strings included in the voucher image data. And, the first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and is different from the first character string among the plurality of recognition character strings. When two character strings are included in the master data, a similar character string most similar to the first character string among one or more registered character strings associated with the second character string in the master data. It has a correction unit for correcting the first character string, and an output unit for outputting the corrected first character string after the correction of the first character string and the second character string in association with each other.
 前記補正部は、前記複数の認識文字列のうち2つ以上の前記第2文字列が前記マスターデータに含まれていることを条件として、前記第1文字列を前記類似文字列に補正してもよい。 The correction unit corrects the first character string to the similar character string on condition that two or more of the second character strings among the plurality of recognition character strings are included in the master data. May be good.
 前記補正部は、前記第1文字列に最も類似する前記類似文字列の候補が複数あることを特定した場合、前記第1文字列を前記類似文字列に補正をする前に、前記類似文字列の複数の候補を前記出力部に出力させ、前記複数の候補から選択された候補に対応する前記類似文字列に前記第1文字列を補正してもよい。 When the correction unit identifies that there are a plurality of candidates for the similar character string that are most similar to the first character string, the correction unit determines the similar character string before correcting the first character string to the similar character string. A plurality of candidates may be output to the output unit, and the first character string may be corrected to the similar character string corresponding to the candidate selected from the plurality of candidates.
 前記マスターデータは、会社名及び口座情報を前記複数の登録文字列として含み、前記補正部は、前記文字認識部が認識した前記第1文字列である認識会社名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識口座情報が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識口座情報に関連付けられている会社名に前記認識会社名を補正してもよい。 The master data includes the company name and account information as the plurality of registered character strings, and the correction unit includes the recognition company name which is the first character string recognized by the character recognition unit in the master data. If the master data includes the recognition account information which is the second character string recognized by the character recognition unit, the recognition is given to the company name associated with the recognition account information in the master data. You may correct the company name.
 前記マスターデータは、会社名及び品目名を前記複数の登録文字列として含み、前記補正部は、前記文字認識部が認識した前記第1文字列である認識会社名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識品目名が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識品目名に関連付けられている複数の会社名のうち、前記認識会社名に最も類似する会社名に前記認識会社名を補正してもよい。 The master data includes a company name and an item name as the plurality of registered character strings, and the correction unit includes a recognition company name which is the first character string recognized by the character recognition unit in the master data. However, when the master data includes the recognized item name which is the second character string recognized by the character recognition unit, a plurality of company names associated with the recognized item name in the master data. Of these, the recognized company name may be corrected to the company name most similar to the recognized company name.
 前記マスターデータは、品目名及び商品単価を前記複数の登録文字列として含み、前記補正部は、前記文字認識部が認識した前記第1文字列である認識品目名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識商品単価が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識商品単価に関連付けられている複数の品目名のうち、前記認識品目名に最も類似する品目名に前記認識品目名を補正してもよい。 The master data includes the item name and the product unit price as the plurality of registered character strings, and the correction unit includes the recognized item name which is the first character string recognized by the character recognition unit in the master data. However, when the recognized product unit price, which is the second character string recognized by the character recognition unit, is included in the master data, a plurality of item names associated with the recognized product unit price in the master data. Of these, the recognized item name may be corrected to the item name most similar to the recognized item name.
 前記補正部は、前記第1文字列を前記類似文字列に補正する前に、前記第1文字列又は前記類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、前記類似文字列が前記第1文字列よりも正しい蓋然性が高いと判定した場合に、前記第1文字列を前記類似文字列に補正してもよい。 Before correcting the first character string to the similar character string, the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs a search on the Internet, and the similar character. If it is determined that the column is more likely to be correct than the first character string, the first character string may be corrected to the similar character string.
 前記補正部は、前記第1文字列を前記類似文字列に補正する前に、前記第1文字列又は前記類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、前記第1文字列が前記類似文字列よりも正しい蓋然性が高いと判定した場合に、前記第1文字列を補正することなく、前記マスターデータにおける前記類似文字列を前記第1文字列に補正してもよい。 Before correcting the first character string to the similar character string, the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs the first search. When it is determined that the character string is more likely to be correct than the similar character string, the similar character string in the master data may be corrected to the first character string without correcting the first character string. ..
 前記出力部は、前記複数の認識文字列のうち、補正が必要であると前記補正部が判定した文字列と、補正が不要であると前記補正部が判定した文字列とを識別できる態様で出力してもよい。 The output unit can distinguish between the character string determined by the correction unit to be necessary for correction and the character string determined by the correction unit that correction is not necessary among the plurality of recognition character strings. You may output it.
 前記第1文字列との類似度が閾値以上の文字列が前記マスターデータに含まれていない場合、前記出力部は、前記第1文字列を前記マスターデータに登録する対象の文字列として出力してもよい。 When the master data does not include a character string having a similarity with the first character string equal to or higher than the threshold value, the output unit outputs the first character string as a character string to be registered in the master data. You may.
 前記出力部は、前記複数の認識文字列のうち、前記マスターデータに含まれていない認識文字列の割合が所定の値以上である場合に、前記マスターデータに含まれていない文字列が多いということを示す情報を出力してもよい。 The output unit is said to have many character strings not included in the master data when the ratio of the recognition character strings not included in the master data is equal to or more than a predetermined value among the plurality of recognition character strings. Information indicating that may be output.
 本発明の第2の態様のデータ処理方法は、コンピュータが実行する、証憑画像データを取得するステップと、前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力するステップと、前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正するステップと、前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力するステップと、を有する。 The data processing method of the second aspect of the present invention is a step of acquiring voucher image data executed by a computer and a step of outputting a plurality of recognition character strings by recognizing a character string included in the voucher image data. And, the first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and is different from the first character string among the plurality of recognition character strings. When two character strings are included in the master data, a similar character string most similar to the first character string among one or more registered character strings associated with the second character string in the master data. It has a step of correcting the first character string and a step of associating the corrected first character string with the corrected first character string after the correction of the first character string and outputting the second character string.
 本発明の第3の態様のプログラムは、コンピュータに、証憑画像データを取得するステップと、前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力するステップと、前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正するステップと、前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力するステップと、を実行させる。 The program of the third aspect of the present invention includes a step of acquiring voucher image data, a step of outputting a plurality of recognition character strings by recognizing a character string included in the voucher image data, and the plurality of steps. The first character string among the recognition character strings of is not included in the master data associated with a plurality of registered character strings, and the second character string different from the first character string among the plurality of recognition character strings is When included in the master data, the first of the registered character strings associated with the second character string in the master data is the most similar to the first character string. The step of correcting the character string and the step of associating the corrected first character string with the corrected first character string after the correction of the first character string and outputting the second character string are executed.
 本発明によれば、証憑の画像データに含まれる文字列が正しく出力される確率を向上させることができるという効果を奏する。 According to the present invention, there is an effect that the probability that the character string included in the image data of the voucher is correctly output can be improved.
データ処理装置の概要を説明するための図である。It is a figure for demonstrating the outline of a data processing apparatus. 証憑画像データの一種である請求書画像データの一例である。This is an example of invoice image data, which is a type of voucher image data. 請求書データの一例である。This is an example of invoice data. データ処理装置の構成を示す図である。It is a figure which shows the structure of the data processing apparatus. 会社マスターデータの一例を示す図である。It is a figure which shows an example of company master data. 商品マスターデータの一例を示す図である。It is a figure which shows an example of product master data. 出力部が出力する証憑データの表示画面の一例を示す図である。It is a figure which shows an example of the display screen of the voucher data output by an output part. データ処理装置1の処理の流れを示すフローチャートである。It is a flowchart which shows the processing flow of the data processing apparatus 1.
[データ処理装置1の概要]
 図1は、データ処理装置1の概要を説明するための図である。データ処理装置1は、証憑画像データを取得し、証憑画像データに文字認識の処理を施すことにより証憑画像データに含まれる文字列を特定する装置であり、例えばコンピュータである。データ処理装置1は、特定した文字列を含む証憑データを作成し、作成した証憑データを外部装置3に出力する。
[Overview of data processing device 1]
FIG. 1 is a diagram for explaining an outline of the data processing device 1. The data processing device 1 is a device for specifying a character string included in the voucher image data by acquiring voucher image data and performing character recognition processing on the voucher image data, for example, a computer. The data processing device 1 creates voucher data including the specified character string, and outputs the created voucher data to the external device 3.
 証憑画像データは、請求書、注文書、納品書、見積書又は検収書等の証憑書面の画像データである。証憑画像データが請求書の画像データである場合、証憑画像データは、会社名、連絡先、請求対象の商品又はサービスの名称(品目名)、請求額及び税額等が含まれている証憑が画像化されたデータである。証憑画像データは、例えば画像読取装置2(例えばスキャナ又はデジタルカメラ)が証憑を読み取ることによって生成された画像データであるが、コンピュータにより作成された画像データ又はテキストデータであってもよい。 Voucher image data is image data of voucher documents such as invoices, purchase orders, invoices, quotations or inspection slips. If the voucher image data is the image data of the invoice, the voucher image data is an image of the voucher containing the company name, contact information, the name (item name) of the product or service to be billed, the billing amount, the tax amount, etc. It is the converted data. The voucher image data is, for example, image data generated by the image reading device 2 (for example, a scanner or a digital camera) reading the voucher, but may be image data or text data created by a computer.
 外部装置3は、例えば、証憑を受け取った企業においてデータ処理装置1を利用するユーザ(例えば経理担当者)が使用するコンピュータ、又は会計基幹システム(ERP:Enterprise Resource Planning)である。データ処理装置1は、例えば、特定した文字列をユーザのコンピュータに表示させたり、特定した文字列を基幹システムに送信したりする。基幹システムは、例えば会計処理に使用される各種のデータを記憶しているシステムである。 The external device 3 is, for example, a computer used by a user (for example, an accounting person) who uses the data processing device 1 in a company that has received a voucher, or an enterprise resource planning (ERP). The data processing device 1 displays, for example, the specified character string on the user's computer, or transmits the specified character string to the core system. The core system is a system that stores various data used for accounting, for example.
 図2は、証憑画像データの一種である請求書画像データの一例である。図3は、図2に示す請求書画像データに基づいて特定された文字列を含む証憑データである請求書データの一例である。図3に示す請求書データにおいては、図2に示す請求書画像データに含まれる文字列が正しく記載されている。 FIG. 2 is an example of invoice image data, which is a kind of voucher image data. FIG. 3 is an example of invoice data which is voucher data including a character string specified based on the invoice image data shown in FIG. In the invoice data shown in FIG. 3, the character string included in the invoice image data shown in FIG. 2 is correctly described.
 しかしながら、データ処理装置1が文字認識を行うことにより、請求書画像データに含まれる全ての文字列を正しく特定できるとは限らない。データ処理装置1が文字列を誤認識した場合、請求書データに誤った文字列が含まれてしまい、誤った文字列を含む請求書データが作成され、誤った文字列が外部装置3に登録されてしまう。 However, it is not always possible for the data processing device 1 to correctly identify all the character strings included in the invoice image data by performing character recognition. When the data processing device 1 erroneously recognizes a character string, the invoice data contains an erroneous character string, invoice data including the erroneous character string is created, and the erroneous character string is registered in the external device 3. Will be done.
 そこで、本実施形態に係るデータ処理装置1は、証憑に記載される文字列である会社名、支店名、電話番号、口座情報、連絡先、担当部署、担当者名、品目名及び商品単価のうち複数の文字列が関連付けられたマスターデータを参照し、マスターデータにおいて関連付けられた複数の文字列に基づいて、文字認識した文字列が正しいかどうかを判定する。データ処理装置1は、マスターデータにおいて関連付けられている他の文字列を用いて、判定対象の文字列に誤りがあると判定した場合、誤認識した文字列を、マスターデータにおいて上記の他の文字列に関連付けられた複数の文字列のうち最も類似する文字列に補正する。データ処理装置1がこのように構成されていることで、データ処理装置1が正しい文字列を出力する確率が高まる。 Therefore, the data processing device 1 according to the present embodiment has the company name, branch name, telephone number, account information, contact information, department in charge, person in charge name, item name, and product unit price, which are character strings described in the voucher. Of these, the master data associated with a plurality of character strings is referred to, and it is determined whether or not the character-recognized character string is correct based on the plurality of character strings associated with the master data. When the data processing device 1 determines that there is an error in the character string to be determined by using another character string associated with the master data, the misrecognized character string is used as the above-mentioned other character in the master data. Correct to the most similar string among the multiple strings associated with the column. Since the data processing device 1 is configured in this way, the probability that the data processing device 1 outputs a correct character string is increased.
 以下、データ処理装置1の構成及び動作を詳細に説明する。以下の説明においては、証憑が請求書である場合を中心に説明するが、証憑が請求書以外の注文書、納品書、見積書又は検収書等であってもよい。 Hereinafter, the configuration and operation of the data processing device 1 will be described in detail. In the following explanation, the case where the voucher is an invoice will be mainly described, but the voucher may be an order form, a delivery note, a quotation, an acceptance slip, etc. other than the invoice.
[データ処理装置1の構成]
 図4は、データ処理装置1の構成を示す図である。データ処理装置1は、通信部11と、記憶部12と、制御部13と、を有する。制御部13は、データ取得部131と、文字認識部132と、補正部133と、出力部134と、を有する。
[Configuration of data processing device 1]
FIG. 4 is a diagram showing the configuration of the data processing device 1. The data processing device 1 includes a communication unit 11, a storage unit 12, and a control unit 13. The control unit 13 includes a data acquisition unit 131, a character recognition unit 132, a correction unit 133, and an output unit 134.
 通信部11は、インターネット又はイントラネットなどのネットワークを介して画像読取装置2又は外部装置3との間で各種のデータを送受信するための通信インターフェースを含む。通信部11は、画像読取装置2から受信した請求書画像データをデータ取得部131に入力する。また、通信部11は、出力部134が出力した請求書データを外部装置3へと出力する。 The communication unit 11 includes a communication interface for transmitting and receiving various data to and from the image reading device 2 or the external device 3 via a network such as the Internet or an intranet. The communication unit 11 inputs the invoice image data received from the image reading device 2 to the data acquisition unit 131. Further, the communication unit 11 outputs the invoice data output by the output unit 134 to the external device 3.
 記憶部12は、ROM(Read Only Memory)、RAM(Random Access Memory)及びSSD(Solid State Drive)等の記憶媒体を有する。記憶部12は、制御部13が実行するプログラムを記憶する。 The storage unit 12 has a storage medium such as a ROM (ReadOnlyMemory), a RAM (RandomAccessMemory), and an SSD (SolidStateDrive). The storage unit 12 stores a program executed by the control unit 13.
 また、記憶部12は、証憑に記載される文字列である会社名(登録会社名)、支店名、電話番号、口座情報(登録口座情報)、連絡先、担当部署、担当者名、品目名及び商品単価のうち複数の文字列が複数の登録文字列として関連付けられたマスターデータを記憶する。記憶部12は、例えば、会社に関するデータを含む会社マスターデータ121、及び商品に関するデータを含む商品マスターデータ122を記憶している。これらのマスターデータは、ユーザが証憑を作成する際に使用されるが、制御部13が文字認識して特定した文字列を補正するためにも用いられる。なお、会社マスターデータ121及び商品マスターデータ122は、記憶部12の外部の記憶媒体に記憶されていてもよい。 In addition, the storage unit 12 has a company name (registered company name), a branch name, a telephone number, account information (registered account information), contact information, a department in charge, a person in charge, and an item name, which are character strings written on the voucher. And, the master data in which a plurality of character strings among the product unit prices are associated as a plurality of registered character strings is stored. The storage unit 12 stores, for example, company master data 121 including data on the company and product master data 122 including data on the product. These master data are used when the user creates a voucher, but are also used by the control unit 13 for character recognition and correction of the specified character string. The company master data 121 and the product master data 122 may be stored in a storage medium external to the storage unit 12.
 図5は、会社マスターデータ121の一例を示す図である。図5に示す会社マスターデータ121においては、会社名と、支店名と、会社の住所と、電話番号と、担当部署と、担当者名と、振込先口座番号とが関連付けられている。 FIG. 5 is a diagram showing an example of company master data 121. In the company master data 121 shown in FIG. 5, a company name, a branch name, a company address, a telephone number, a department in charge, a person in charge, and a transfer account number are associated with each other.
 図6は、商品マスターデータ122の一例を示す図である。図6に示す商品マスターデータ122においては、商品名と、価格と、メーカー名と、取引先名とが関連付けられている。商品名は、商品の名称と型名又は型番を含んでいるが、商品の名称のみを含んでいてもよく、型名又は型番のみを含んでいてもよい。 FIG. 6 is a diagram showing an example of product master data 122. In the product master data 122 shown in FIG. 6, the product name, the price, the manufacturer name, and the business partner name are associated with each other. The product name includes the product name and the model name or model number, but may include only the product name or may include only the model name or model number.
 図4に示す制御部13は、例えばCPU(Central Processing Unit)を有する。制御部13は、記憶部12に記憶されたプログラムを実行することにより、データ取得部131、文字認識部132、補正部133、及び出力部134として機能する。 The control unit 13 shown in FIG. 4 has, for example, a CPU (Central Processing Unit). The control unit 13 functions as a data acquisition unit 131, a character recognition unit 132, a correction unit 133, and an output unit 134 by executing the program stored in the storage unit 12.
 データ取得部131は、通信部11を介して証憑画像データを取得する。データ取得部131は、例えば、証憑を読み取った画像読取装置2から出力された証憑画像データを取得し、取得した証憑画像データを文字認識部132に入力する。 The data acquisition unit 131 acquires voucher image data via the communication unit 11. The data acquisition unit 131 acquires, for example, the voucher image data output from the image reading device 2 that has read the voucher, and inputs the acquired voucher image data to the character recognition unit 132.
 文字認識部132は、証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力する。文字認識部132は、例えばAI(Artificial Intelligence)により実行されるOCR(Optical Character Recognition)処理を実行することにより証憑画像データに含まれている文字を認識し、連続する複数の文字を文字列として認識する。文字認識部132は、認識した複数の文字列を補正部133に入力する。 The character recognition unit 132 outputs a plurality of recognition character strings by recognizing the character strings included in the voucher image data. The character recognition unit 132 recognizes the characters included in the voucher image data by executing OCR (Optical Character Recognition) processing executed by, for example, AI (Artificial Intelligence), and sets a plurality of consecutive characters as character strings. recognize. The character recognition unit 132 inputs a plurality of recognized character strings to the correction unit 133.
 補正部133は、文字認識部132が一つの証憑画像データにおいて認識した複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、複数の認識文字列のうち第1文字列と異なる第2文字列がマスターデータに含まれている場合に、マスターデータにおいて第2文字列に関連付けられている一以上の登録文字列のうち第1文字列に最も類似する類似文字列に第1文字列を補正する。補正部133は、第1文字列及び第2文字列がマスターデータに含まれていない場合には、第1文字列を補正しないで、第1文字列及び第2文字列がマスターデータに含まれていないということを出力部134に通知する。マスターデータは、会社マスターデータ121であってもよく、商品マスターデータ122であってもよい。補正部133は、会社マスターデータ121と商品マスターデータ122とを組み合わせて使用してもよい。補正部133は、補正した後の補正第1文字列を出力部134に入力する。 In the correction unit 133, the first character string among the plurality of recognition character strings recognized by the character recognition unit 132 in one voucher image data is not included in the master data associated with the plurality of registered character strings, and a plurality of them. When the master data contains a second character string different from the first character string among the recognition character strings of, the first character of one or more registered character strings associated with the second character string in the master data. Correct the first string to a similar string that most closely resembles the column. When the first character string and the second character string are not included in the master data, the correction unit 133 does not correct the first character string, and the first character string and the second character string are included in the master data. Notify the output unit 134 that it has not been done. The master data may be company master data 121 or product master data 122. The correction unit 133 may use the company master data 121 and the product master data 122 in combination. The correction unit 133 inputs the corrected first character string to the output unit 134.
 出力部134は、補正第1文字列と第2文字列とを関連付けて出力する。出力部134は、例えば、図3に示した請求書データのように、一つの証憑画像データに対応する複数の文字列が関連付けられた状態の証憑データを出力する。出力部134は、例えば、通信部11を介して外部装置3に証憑データを出力したり、ディスプレイに証憑データを出力したりする。 The output unit 134 outputs the corrected first character string and the second character string in association with each other. The output unit 134 outputs voucher data in a state in which a plurality of character strings corresponding to one voucher image data are associated, for example, like the invoice data shown in FIG. The output unit 134 outputs the voucher data to the external device 3 via the communication unit 11, or outputs the voucher data to the display, for example.
 出力部134は、補正部133が第1文字列を類似文字列に補正する前に、第1文字列と類似文字列とを関連付けて出力し、補正部133は、出力部134が第1文字列と類似文字列とを関連付けて出力した後に、補正を許可する指示を受けたことを条件として、第1文字列を類似文字列に補正してもよい。また、出力部134は、第1文字列及び第2文字列がマスターデータに含まれていないという通知を補正部133から受けた場合、第1文字列及び第2文字列がマスターデータに含まれていないことを示す情報を出力してもよい。 The output unit 134 outputs the first character string in association with the similar character string before the correction unit 133 corrects the first character string to a similar character string, and the correction unit 133 outputs the first character string by the output unit 134. After associating the column with the similar character string and outputting it, the first character string may be corrected to the similar character string on condition that the instruction for permitting the correction is received. Further, when the output unit 134 receives a notification from the correction unit 133 that the first character string and the second character string are not included in the master data, the first character string and the second character string are included in the master data. You may output the information indicating that it has not been done.
[補正処理の詳細]
 以下、補正部133による補正処理の詳細を説明する。
 一例として、補正部133は、文字認識部132が認識した第1文字列である認識会社名がマスターデータに含まれておらず、文字認識部が認識した第2文字列である認識口座情報がマスターデータに含まれている場合に、マスターデータにおいて認識口座情報に関連付けられている会社名に認識会社名を補正する。
[Details of correction processing]
Hereinafter, the details of the correction process by the correction unit 133 will be described.
As an example, in the correction unit 133, the recognition company name which is the first character string recognized by the character recognition unit 132 is not included in the master data, and the recognition account information which is the second character string recognized by the character recognition unit is included. If it is included in the master data, the recognized company name is corrected to the company name associated with the recognized account information in the master data.
 一例として、複数の認識文字列が「田口商事」と「AAA銀行 東京支店 普通 1223334」であるとする。第1文字列が「田口商事」である場合、図5に示す会社マスターデータ121には第1文字列が含まれていない。しかしながら、「AAA銀行 東京支店 普通 1223334」という第2文字列は、図5に示す会社マスターデータ121に含まれている。このような場合、補正部133は、「AAA銀行 東京支店 普通 1223334」に関連付けられている「田中商事」、「東京支店」、「東京都千代田区〇〇1-12」等の複数の文字列のうち「田中商事」が「田口商事」に最も類似すると判定し、第1文字列の「田口商事」を「田中商事」に補正する。 As an example, it is assumed that a plurality of recognition character strings are "Taguchi Shoji" and "AAA Bank Tokyo Branch Ordinary 12233334". When the first character string is "Taguchi Shoji", the company master data 121 shown in FIG. 5 does not include the first character string. However, the second character string "AAA Bank Tokyo Branch Ordinary 12233334" is included in the company master data 121 shown in FIG. In such a case, the correction unit 133 may use a plurality of character strings such as "Tanaka Shoji", "Tokyo Branch", and "001-12, Chiyoda-ku, Tokyo" associated with "AAA Bank Tokyo Branch Ordinary 12233334". Of these, "Tanaka Shoji" is judged to be the most similar to "Taguchi Shoji", and "Tanaka Shoji" in the first character string is corrected to "Tanaka Shoji".
 補正部133は、例えば複数の文字列それぞれに含まれる複数の文字のうち、一致する文字の数に基づいて算出した類似度に基づいて、最も類似する類似文字列を特定する。補正部133は、第1文字列と一致する文字列が最も多い文字列を類似文字列として特定する。補正部133は、第1文字列が会社名であることを特定した場合、マスターデータにおいて会社名として登録されている文字列を類似文字列として特定してもよい。 The correction unit 133 identifies the most similar similar character string, for example, based on the similarity calculated based on the number of matching characters among a plurality of characters included in each of the plurality of character strings. The correction unit 133 specifies the character string having the largest number of character strings that match the first character string as a similar character string. When the correction unit 133 specifies that the first character string is a company name, the correction unit 133 may specify the character string registered as the company name in the master data as a similar character string.
 他の例として、複数の認識文字列が「インク(AK-123)」と「¥1,800」であるとする。第1文字列が「¥1,800」である場合、図6に示す商品マスターデータ122には第1文字列が含まれていない。しかしながら、「インク(AK-123)」という第2文字列は、図6に示す商品マスターデータ122に含まれている。このような場合、補正部133は、「インク(AK-123)」に関連付けられている「¥1,000」、「ABC社」、「田中商事株式会社」等の文字列のうち、「¥1,000」が「¥1,800」に最も類似すると判定し、第1文字列の「¥1,800」を「¥1,000」に補正する。 As another example, it is assumed that a plurality of recognition character strings are "ink (AK-123)" and "¥ 1,800". When the first character string is "¥ 1,800", the product master data 122 shown in FIG. 6 does not include the first character string. However, the second character string "ink (AK-123)" is included in the product master data 122 shown in FIG. In such a case, the correction unit 133 uses the character strings of "¥ 1,000", "ABC", "Tanaka Shoji Co., Ltd.", etc. associated with "ink (AK-123)" to be "¥ 1,000". It is determined that "1,000" is the most similar to "¥ 1,800", and "¥ 1,800" in the first character string is corrected to "¥ 1,000".
 補正部133は、文字認識部132が認識した第1文字列である認識品目名がマスターデータに含まれておらず、文字認識部132が認識した第2文字列である認識商品単価がマスターデータに含まれている場合に、マスターデータにおいて認識商品単価に関連付けられている複数の品目名のうち、認識品目名に最も類似する品目名に認識品目名を補正してもよい。例えば文字認識部132が第1文字列としてインク(AK-723)を認識し、文字認識部132が第2文字列として「¥1,000」を認識した場合、補正部133は、図6に示す商品マスターデータ122において「¥1,000」に関連付けられている「インク(AK-0123)」と「コピー用紙(A1)」を特定する。そして、補正部133は、「インク(AK-0123)」と「コピー用紙(A1)」のうち、第1文字列である「インク(AK-723)」に「コピー用紙(A1)」よりも類似する「インク(AK-0123)」に第1文字列を補正する。 The correction unit 133 does not include the recognition item name, which is the first character string recognized by the character recognition unit 132, in the master data, and the recognition product unit price, which is the second character string recognized by the character recognition unit 132, is the master data. When included in, the recognized item name may be corrected to the item name most similar to the recognized item name among the plurality of item names associated with the recognized product unit price in the master data. For example, when the character recognition unit 132 recognizes the ink (AK-723) as the first character string and the character recognition unit 132 recognizes "¥ 1,000" as the second character string, the correction unit 133 is shown in FIG. In the indicated product master data 122, the "ink (AK-0123)" and the "copy paper (A1)" associated with "¥ 1,000" are specified. Then, the correction unit 133 puts the first character string "ink (AK-723)" in "ink (AK-0123)" and "copy paper (A1)" rather than "copy paper (A1)". The first character string is corrected to a similar "ink (AK-0123)".
 補正部133は、文字認識部132が認識した第1文字列である認識会社名がマスターデータに含まれておらず、文字認識部が認識した第2文字列である認識品目名がマスターデータに含まれている場合に、マスターデータにおいて認識品目名に関連付けられている複数の会社名のうち、認識会社名に最も類似する会社名に認識会社名を補正してもよい。例えば文字認識部132が第1文字列として「田口商事株式会社」を認識し、文字認識部132が第2文字列として「インク(AK-0123)」を認識した場合、補正部133は、図6に示す商品マスターデータ122において「インク(AK-0123)」に関連付けられている「田中商事株式会社」と「MM電気株式会社」を特定する。そして、補正部133は、「田中商事株式会社」と「MM電気株式会社」のうち、第1文字列である「田口商事株式会社」に「MM電気株式会社」よりも類似する「田中商事株式会社」に第1文字列を補正する。 In the correction unit 133, the recognition company name, which is the first character string recognized by the character recognition unit 132, is not included in the master data, and the recognition item name, which is the second character string recognized by the character recognition unit, is included in the master data. If it is included, the recognized company name may be corrected to the company name most similar to the recognized company name among the plurality of company names associated with the recognized item name in the master data. For example, when the character recognition unit 132 recognizes "Taguchi Shoji Co., Ltd." as the first character string and the character recognition unit 132 recognizes "ink (AK-0123)" as the second character string, the correction unit 133 is shown in the figure. In the product master data 122 shown in 6, "Tanaka Shoji Co., Ltd." and "MM Electric Co., Ltd." associated with "ink (AK-0123)" are specified. Then, the correction unit 133 is a "Tanaka Shoji Co., Ltd." that is more similar to the first character string "Taguchi Shoji Co., Ltd." than "MM Electric Co., Ltd." among "Tanaka Shoji Co., Ltd." and "MM Electric Co., Ltd." Correct the first character string to "company".
 ところで、第2文字列がマスターデータに含まれているとしても、第2文字列が誤認識された結果としてマスターデータに含まれているという場合もある。このような場合に、第2文字列が正しいという前提で、補正部133が第2文字列に関連付けられた文字列に第1文字列を補正してしまうと、間違った文字列に第1文字列が補正されてしまう場合がある。そこで、補正部133は、補正部133は、複数の認識文字列のうち2つ以上の第2文字列がマスターデータに含まれていることを条件として、第1文字列を類似文字列に補正してもよい。 By the way, even if the second character string is included in the master data, it may be included in the master data as a result of erroneous recognition of the second character string. In such a case, if the correction unit 133 corrects the first character string to the character string associated with the second character string on the premise that the second character string is correct, the first character is added to the wrong character string. The columns may be corrected. Therefore, the correction unit 133 corrects the first character string to a similar character string on condition that the correction unit 133 includes two or more second character strings among the plurality of recognition character strings in the master data. You may.
 先の例の場合、補正部133は、例えば「AAA銀行 東京支店 普通 1223334」と「03-1234-5678」とが関連付けられてマスターデータに含まれていることにより、これらに関連付けられている「田中商事」に第1文字列の「田口商事」を補正する。2つ以上の文字列が誤認識される確率が十分に低い場合、補正部133がこのように動作することで、間違った文字列に第1文字列が補正されてしまう確率を低下させることができるので、証憑に記載された文字列が正しく出力される確率を向上させることができる。 In the case of the previous example, the correction unit 133 is associated with, for example, "AAA Bank Tokyo Branch Ordinary 12233334" and "03-1234-5678" by being included in the master data. Correct "Taguchi Shoji" in the first character string to "Tanaka Shoji". When the probability that two or more character strings are erroneously recognized is sufficiently low, the correction unit 133 operates in this way to reduce the probability that the first character string is corrected to the wrong character string. Therefore, it is possible to improve the probability that the character string described in the voucher is output correctly.
 補正部133は、第1文字列に最も類似する類似文字列の候補が複数あることを特定した場合、第1文字列を類似文字列に補正をする前に、類似文字列の複数の候補を出力部134に出力させ、複数の候補から選択された候補に対応する類似文字列に第1文字列を補正してもよい。例えば会社マスターデータ121に電話番号「03-1234-5678」とFAX番号「03-1234-5679」が含まれており、第1文字列が「03-1234-5670」である場合、「03-1234-5678」及び「03-1234-5679」の両方が第1文字列に類似する。 When the correction unit 133 identifies that there are a plurality of candidates for a similar character string most similar to the first character string, the correction unit 133 selects a plurality of candidates for the similar character string before correcting the first character string to the similar character string. The first character string may be corrected to a similar character string corresponding to a candidate selected from a plurality of candidates by causing the output unit 134 to output. For example, if the company master data 121 includes the telephone number "03-1234-5678" and the fax number "03-1234-5679" and the first character string is "03-1234-5670", "03-" Both "1234-5678" and "03-1234-5679" are similar to the first string.
 このような場合には、どの文字列に補正をすべきかをユーザが判断する必要があるため、補正部133は、「03-1234-5678」及び「03-1234-5679」を補正の候補として、ユーザが使用するコンピュータのディスプレイに表示させる。補正部133は、ユーザが選択した文字列に第1文字列を補正する。補正部133がこのように動作することで、類似する複数の文字列がマスターデータに登録されている場合であっても、間違った文字列に第1文字列が補正されてしまう確率を低下させることができる。 In such a case, since it is necessary for the user to determine which character string should be corrected, the correction unit 133 uses "03-1234-5678" and "03-1234-5679" as correction candidates. , Display on the display of the computer used by the user. The correction unit 133 corrects the first character string to the character string selected by the user. By operating the correction unit 133 in this way, even when a plurality of similar character strings are registered in the master data, the probability that the first character string is corrected to the wrong character string is reduced. be able to.
 ところで、文字認識部132が認識した文字列とマスターデータに含まれている文字列とが一致しない場合、マスターデータに含まれている文字列が間違っていることも想定される。例えば、会社が移転した場合、マスターデータに含まれている住所が最新の住所でないという場合もある。そこで、補正部133は、第1文字列を類似文字列に補正する前に、第1文字列又は類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、類似文字列が第1文字列よりも正しい蓋然性が高いと判定した場合に、第1文字列を類似文字列に補正してもよい。 By the way, if the character string recognized by the character recognition unit 132 does not match the character string included in the master data, it is assumed that the character string included in the master data is incorrect. For example, if the company relocates, the address contained in the master data may not be the latest address. Therefore, the correction unit 133 executes a search on the Internet using at least one of the first character string or the similar character string as a keyword before correcting the first character string to the similar character string, and the similar character string is the first. If it is determined that the probability of being correct is higher than that of one character string, the first character string may be corrected to a similar character string.
 一例として、文字認識部132が認識した第1文字列が「東京都千代田区〇〇2-15」であり、補正部133が図5に示す会社マスターデータ内の「東京都千代田区〇〇1-12」を類似文字列として特定した場合、補正部133は、類似文字列に関連付けられた会社名又は電話番号等をキーワードとして用いてインターネット上で検索する。補正部133は、検索して表示されるウェブサイトに記載されている住所が第1文字列に一致している場合に第1文字列を補正せず、ウェブサイトに記載されている住所が類似文字列に一致しており、類似文字列が第1文字列よりも正しいと判定した場合に、第1文字列の「東京都千代田区〇〇2-15」を「東京都千代田区〇〇1-12」に補正する。補正部133がこのように動作することで、マスターデータに登録されている文字列が最新でない場合に、誤って補正されてしまうことを防げる。 As an example, the first character string recognized by the character recognition unit 132 is "○ 2-15, Chiyoda-ku, Tokyo", and the correction unit 133 is "001, Chiyoda-ku, Tokyo" in the company master data shown in FIG. When "-12" is specified as a similar character string, the correction unit 133 searches on the Internet using the company name or telephone number associated with the similar character string as a keyword. The correction unit 133 does not correct the first character string when the address described in the searched and displayed website matches the first character string, and the address described in the website is similar. If it matches the character string and it is determined that the similar character string is more correct than the first character string, the first character string "○ 2-15, Chiyoda-ku, Tokyo" is changed to "001, Chiyoda-ku, Tokyo". -12 ". By operating the correction unit 133 in this way, it is possible to prevent erroneous correction when the character string registered in the master data is not the latest.
 また、補正部133は、第1文字列を類似文字列に補正する前に、第1文字列又は類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、第1文字列が類似文字列よりも正しい蓋然性が高いと判定した場合に、第1文字列を補正することなく、マスターデータにおける類似文字列を第1文字列に補正してもよい。上述した例の場合、補正部133は、検索して表示されるウェブサイトに記載されている住所が第1文字列に一致しており、第1文字列が類似文字列よりも正しい蓋然性が高いと判定した場合、会社マスターデータにおける「東京都千代田区〇〇1-12」という文字列を第1文字列「東京都千代田区〇〇2-15」に補正する。補正部133がこのように動作することで、マスターデータが最新の状態になる。その結果、将来、文字認識部132が誤認識をして補正部133が文字列を補正する際の補正の精度が向上する。 Further, the correction unit 133 executes a search on the Internet using at least one of the first character string or the similar character string as a keyword before correcting the first character string to a similar character string, and the first character string is used. When it is determined that the probability of being correct is higher than that of the similar character string, the similar character string in the master data may be corrected to the first character string without correcting the first character string. In the case of the above example, the correction unit 133 has a high probability that the address described in the website searched and displayed matches the first character string, and the first character string is more correct than the similar character string. If it is determined that, the character string "001-12, Chiyoda-ku, Tokyo" in the company master data is corrected to the first character string "XX2-15, Chiyoda-ku, Tokyo". By operating the correction unit 133 in this way, the master data is updated. As a result, in the future, the accuracy of correction when the character recognition unit 132 misrecognizes and the correction unit 133 corrects the character string is improved.
[出力部134の動作]
 出力部134は、証憑に記載された文字列に基づいて、適切な文字列が基幹システムに登録される確率を向上させるために、補正部133が補正をした文字列を基幹システムに出力する前にユーザが確認できるようにしてもよい。一例として、出力部134は、複数の認識文字列のうち、補正が必要であると補正部133が判定した文字列と、補正が不要であると補正部133が判定した文字列とを識別できる態様で出力する。
[Operation of output unit 134]
The output unit 134 outputs the corrected character string to the core system in order to improve the probability that an appropriate character string is registered in the core system based on the character string described in the voucher. It may be possible for the user to confirm. As an example, the output unit 134 can distinguish between a character string determined by the correction unit 133 that correction is necessary and a character string determined by the correction unit 133 that correction is not necessary among the plurality of recognition character strings. Output in the mode.
 図7は、出力部134が出力する証憑データの表示画面の一例を示す図である。図7(a)においては、「田口商事」と「インク(AK-723)」が会社マスターデータ又は商品マスターデータに含まれていないため、他の文字列と異なる態様(太枠内に斜体字)で表示されている。出力部134がこのようなデータを出力することで、例えばユーザは、どの文字列を補正する必要があるかを容易に把握することができる。 FIG. 7 is a diagram showing an example of a voucher data display screen output by the output unit 134. In FIG. 7 (a), "Taguchi Shoji" and "ink (AK-723)" are not included in the company master data or the product master data, so that they are different from other character strings (italicized characters in a thick frame). ) Is displayed. By outputting such data by the output unit 134, for example, the user can easily grasp which character string needs to be corrected.
 出力部134は、図7(a)の画面において、所定の操作をした場合に、補正の候補となる文字列を表示してもよい。所定の操作は、例えば、補正が必要な文字列をユーザが選択する操作(例えばクリック操作又はタッチ操作)である。図7(b)は、補正の候補となる文字列が表示された例を示す図である。表示された候補をユーザが選択すると、補正部133は、図7(a)に表示された文字列を表示された候補の文字列に補正する。その結果、証憑データが、図3に示した状態に補正される。 The output unit 134 may display a character string that is a candidate for correction when a predetermined operation is performed on the screen of FIG. 7A. The predetermined operation is, for example, an operation (for example, a click operation or a touch operation) in which the user selects a character string that needs to be corrected. FIG. 7B is a diagram showing an example in which a character string that is a candidate for correction is displayed. When the user selects the displayed candidate, the correction unit 133 corrects the character string displayed in FIG. 7A to the displayed candidate character string. As a result, the voucher data is corrected to the state shown in FIG.
 ところで、証憑画像データに含まれる文字列を認識した結果、多くの文字列がマスターデータに含まれていないという場合がある。このようなことが生じるのは、証憑画像データの画質が悪過ぎるという場合、間違って発行された証憑であるという場合、又はマスターデータに未登録の取引先からの証憑であるという場合が想定される。ユーザが、これらの状況を把握することができるように、出力部134は、複数の認識文字列のうち、マスターデータに含まれていない認識文字列の割合が所定の値以上である場合に、マスターデータに含まれていない文字列が多いということを示す警告情報を出力してもよい。 By the way, as a result of recognizing the character strings included in the voucher image data, there are cases where many character strings are not included in the master data. It is assumed that this happens when the image quality of the voucher image data is too poor, when it is a voucher issued by mistake, or when it is a voucher from a business partner who is not registered in the master data. To. In order for the user to grasp these situations, the output unit 134 determines that the ratio of the recognition character strings not included in the master data among the plurality of recognition character strings is equal to or more than a predetermined value. Warning information indicating that there are many character strings not included in the master data may be output.
 出力部134は、複数の認識文字列とともに警告情報を出力してもよい。出力部134は、図7に示したように、補正が必要な文字列を識別できるようにした状態で、複数の認識文字列とともに警告情報を出力してもよい。出力部134は、警告情報とともに、又は警告情報として、ユーザが実行する処理を入力するための画面を表示してもよい。出力部134は、例えば、複数の認識文字列を関連付けてマスターデータに登録するための操作を行うための画面を表示する。出力部134は、登録するための操作が行われた場合に、複数の認識文字列をマスターデータに登録する。 The output unit 134 may output warning information together with a plurality of recognition character strings. As shown in FIG. 7, the output unit 134 may output warning information together with a plurality of recognition character strings in a state where the character string requiring correction can be identified. The output unit 134 may display a screen for inputting a process to be executed by the user together with the warning information or as the warning information. The output unit 134 displays, for example, a screen for performing an operation for associating a plurality of recognition character strings and registering them in master data. The output unit 134 registers a plurality of recognition character strings in the master data when the operation for registration is performed.
 出力部134は、第1文字列と類似する文字列(例えば類似度が閾値以上の文字列)がマスターデータに含まれていない場合、第1文字列をマスターデータに登録する対象の文字列として出力してもよい。出力部134は、例えば、第1文字列が「佐藤商事」であり、マスターデータに「佐藤商事」が含まれていない場合に、「佐藤商事をマスターデータに登録しますか?」のように、第1文字列を含むメッセージをユーザのコンピュータに表示させる。 When the master data does not include a character string similar to the first character string (for example, a character string having a similarity equal to or higher than the threshold value), the output unit 134 sets the first character string as a character string to be registered in the master data. You may output it. The output unit 134, for example, when the first character string is "Sato Shoji" and the master data does not include "Sato Shoji", such as "Do you want to register Sato Shoji in the master data?" , Display a message containing the first character string on the user's computer.
 出力部134は、登録する対象であると判定された文字列(例えば上述の「佐藤商事」)が含まれている証憑画像データに含まれる他の複数の文字列のうち、マスターデータに登録するべき項目に対応する複数の文字列を、登録対象候補の文字列として表示してもよい。出力部134は、登録対象候補の文字列とともに、登録するための操作を受け付ける操作用画像を表示し、当該操作用画像が操作された場合に、第1文字列及び登録対象候補の文字列をマスターデータに登録してもよい。出力部134がこのように動作することで、例えば、過去の取引がない新しい取引先が生じた場合に、マスターデータに登録する文字列をユーザが入力する必要がないので、ユーザの作業効率が向上する。 The output unit 134 registers in the master data among a plurality of other character strings included in the voucher image data including the character string determined to be registered (for example, the above-mentioned "Sato Shoji"). A plurality of character strings corresponding to the items to be registered may be displayed as character strings of candidates for registration. The output unit 134 displays the operation image that accepts the operation for registration together with the character string of the registration target candidate, and when the operation image is operated, the first character string and the character string of the registration target candidate are displayed. It may be registered in the master data. By operating the output unit 134 in this way, for example, when a new customer who has no past transaction occurs, the user does not need to input the character string to be registered in the master data, so that the user's work efficiency is improved. improves.
[データ処理装置1の処理の流れ]
 図8は、データ処理装置1の処理の流れを示すフローチャートである。図8に示すフローチャートは、画像読取装置2が証憑画像データを出力した時点から開始している。
[Processing flow of data processing device 1]
FIG. 8 is a flowchart showing a processing flow of the data processing device 1. The flowchart shown in FIG. 8 starts from the time when the image reading device 2 outputs the voucher image data.
 画像読取装置2が出力した証憑画像データをデータ取得部131が取得すると(S11)、文字認識部132が、証憑画像データに含まれている文字を認識するOCR処理を実行する(S12)。文字認識部132は、認識した文字に基づいて、複数の文字列を認識する。 When the data acquisition unit 131 acquires the voucher image data output by the image reading device 2 (S11), the character recognition unit 132 executes an OCR process for recognizing the characters included in the voucher image data (S12). The character recognition unit 132 recognizes a plurality of character strings based on the recognized characters.
 補正部133は、文字認識部132が認識した複数の文字列が正しく認識されているか否かを判定するために、まず、複数の認識文字列から1つの認識文字列を選択する(S13)。補正部133は、選択した1つの認識文字列が、マスターデータに含まれている複数の文字列のいずれかと一致するか否かを判定する(S14)。補正部133は、選択した1つの認識文字列が、マスターデータに含まれている複数の文字列のいずれかと一致すると判定した場合(S14においてYES)、他の認識文字列を選択して(S15)、S14の処理を再度実行する。 The correction unit 133 first selects one recognition character string from the plurality of recognition character strings in order to determine whether or not the plurality of character strings recognized by the character recognition unit 132 are correctly recognized (S13). The correction unit 133 determines whether or not one of the selected recognition character strings matches any of the plurality of character strings included in the master data (S14). When the correction unit 133 determines that one selected recognition character string matches any of a plurality of character strings included in the master data (YES in S14), the correction unit 133 selects another recognition character string (S15). ), The process of S14 is executed again.
 補正部133は、選択した1つの認識文字列である第1文字列が、マスターデータに含まれている複数の文字列の全てと一致しないと判定した場合(S14においてNO)、文字認識部132が認識した複数の認識文字列のうち他の認識文字列が、マスターデータに含まれている複数の文字列のいずれかと一致するか否かを判定する(S16)。 When the correction unit 133 determines that the first character string, which is one selected recognition character string, does not match all of the plurality of character strings included in the master data (NO in S14), the character recognition unit 132 It is determined whether or not the other recognition character string among the plurality of recognition character strings recognized by the user matches any of the plurality of character strings included in the master data (S16).
 補正部133は、他の認識文字列が、マスターデータに含まれている複数の文字列のいずれかと一致していると判定した場合(S16においてYES)、他の認識文字列と一致するマスターデータ内の文字列に関連付けられた複数の文字列のうち、第1文字列に最も類似する文字列に、認識された第1文字列を補正する(S17)。補正部133は、他の認識文字列が、マスターデータに含まれている複数の文字列のいずれかと一致していないと判定した場合(S16においてNO)、さらに他の認識文字列に対してS16の処理を実行する。 When the correction unit 133 determines that the other recognition character string matches any of the plurality of character strings included in the master data (YES in S16), the master data that matches the other recognition character string. Among the plurality of character strings associated with the character string in the above, the recognized first character string is corrected to the character string most similar to the first character string (S17). When the correction unit 133 determines that the other recognition character string does not match any of the plurality of character strings included in the master data (NO in S16), the correction unit 133 further refers to the other recognition character string in S16. Executes the processing of.
 補正部133は、全ての認識文字列に対してS14からS17の処理が終了したか否かを判定し、終了していない場合(S18においてNO)、S14に戻る。補正部133は、全ての認識文字列に対する処理が終了した場合(S18においてYES)、補正後の文字列により構成される証憑データを作成し、出力部134が証憑データを出力する(S19)。 The correction unit 133 determines whether or not the processing from S14 to S17 has been completed for all the recognition character strings, and if not, returns to S14. When the processing for all the recognized character strings is completed (YES in S18), the correction unit 133 creates voucher data composed of the corrected character strings, and the output unit 134 outputs the voucher data (S19).
[データ処理装置1による効果]
 以上説明したように、データ処理装置1は、証憑に記載される文字列である会社名、支店名、電話番号、口座情報、連絡先、担当部署、担当者名、品目名及び商品単価のうち複数の文字列が複数の登録文字列として関連付けられたマスターデータを参照する。そして、補正部133は、証憑画像データに含まれる文字列を認識することにより特定された複数の認識文字列のうち第1文字列が前記マスターデータに含まれておらず、複数の認識文字列のうち第1文字列と異なる第2文字列がマスターデータに含まれている場合に、マスターデータにおいて第2文字列に関連付けられている一以上の登録文字列のうち第1文字列に最も類似する類似文字列に第1文字列を補正する。
[Effect of data processing device 1]
As described above, the data processing device 1 includes the company name, branch name, telephone number, account information, contact information, department in charge, person in charge name, item name, and product unit price, which are character strings written on the voucher. Refer to the master data in which multiple strings are associated as multiple registered strings. Then, the correction unit 133 does not include the first character string in the master data among the plurality of recognition character strings specified by recognizing the character strings included in the voucher image data, and the correction unit 133 has a plurality of recognition character strings. When the master data contains a second character string different from the first character string, it is most similar to the first character string among one or more registered character strings associated with the second character string in the master data. The first character string is corrected to the similar character string.
 データ処理装置1がこのように構成されていることで、文字認識において誤りが生じたとしても出力部134が正しい文字列を出力することができるので、証憑の画像データに含まれる文字列が正しく出力される確率を向上させる。その結果、証憑に記載されたデータを用いて業務を行うユーザの業務効率と業務品質を向上させることができる。 Since the data processing device 1 is configured in this way, the output unit 134 can output a correct character string even if an error occurs in character recognition, so that the character string included in the image data of the voucher is correct. Improve the probability of being output. As a result, it is possible to improve the work efficiency and work quality of the user who performs the work using the data described in the voucher.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the gist. be. For example, all or part of the device can be functionally or physically distributed / integrated in any unit. Also included in the embodiments of the present invention are new embodiments resulting from any combination of the plurality of embodiments. The effect of the new embodiment produced by the combination has the effect of the original embodiment together.
 例えば、以上の説明においては、データ処理装置1の記憶部12に会社マスターデータ121及び商品マスターデータ122が記憶されている場合を例示したが、データ処理装置1は、会社マスターデータ121及び商品マスターデータ122のいずれか一方のみを使用してもよい。また、データ処理装置1は、一つのコンピュータにより構成されていなくてもよく、複数のコンピュータが連携して動作したり、コンピュータと、マスターデータが記憶された記憶媒体とが物理的に分離したりしていてもよい。 For example, in the above description, the case where the company master data 121 and the product master data 122 are stored in the storage unit 12 of the data processing device 1 has been illustrated, but the data processing device 1 has the company master data 121 and the product master. Only one of the data 122 may be used. Further, the data processing device 1 does not have to be configured by one computer, and a plurality of computers may operate in cooperation with each other, or the computer and the storage medium in which the master data is stored may be physically separated. You may be doing it.
1 データ処理装置
2 画像読取装置
3 外部装置
11 通信部
12 記憶部
13 制御部
121 会社マスターデータ
122 商品マスターデータ
131 データ取得部
132 文字認識部
133 補正部
134 出力部
 
1 Data processing device 2 Image reader 3 External device 11 Communication unit 12 Storage unit 13 Control unit 121 Company master data 122 Product master data 131 Data acquisition unit 132 Character recognition unit 133 Correction unit 134 Output unit

Claims (13)

  1.  証憑画像データを取得するデータ取得部と、
     前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力する文字認識部と、
     前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正する補正部と、
     前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力する出力部と、
     を有するデータ処理装置。
    The data acquisition unit that acquires voucher image data,
    A character recognition unit that outputs a plurality of recognition character strings by recognizing the character strings included in the voucher image data, and
    The first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and the second character different from the first character string among the plurality of recognition character strings. When the column is included in the master data, the similar character string most similar to the first character string among the one or more registered character strings associated with the second character string in the master data. A correction unit that corrects the first character string,
    An output unit that outputs the corrected first character string after the first character string is corrected in association with the second character string, and
    Data processing device with.
  2.  前記補正部は、前記複数の認識文字列のうち2つ以上の前記第2文字列が前記マスターデータに含まれていることを条件として、前記第1文字列を前記類似文字列に補正する、
     請求項1に記載のデータ処理装置。
    The correction unit corrects the first character string to the similar character string on condition that two or more of the second character strings among the plurality of recognition character strings are included in the master data.
    The data processing apparatus according to claim 1.
  3.  前記補正部は、前記第1文字列に最も類似する前記類似文字列の候補が複数あることを特定した場合、前記第1文字列を前記類似文字列に補正をする前に、前記類似文字列の複数の候補を前記出力部に出力させ、前記複数の候補から選択された候補に対応する前記類似文字列に前記第1文字列を補正する、
     請求項1又は2に記載のデータ処理装置。
    When the correction unit identifies that there are a plurality of candidates for the similar character string that are most similar to the first character string, the correction unit determines the similar character string before correcting the first character string to the similar character string. The first character string is corrected to the similar character string corresponding to the candidate selected from the plurality of candidates by outputting the plurality of candidates of the above to the output unit.
    The data processing apparatus according to claim 1 or 2.
  4.  前記マスターデータは、会社名及び口座情報を前記複数の登録文字列として含み、
     前記補正部は、前記文字認識部が認識した前記第1文字列である認識会社名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識口座情報が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識口座情報に関連付けられている会社名に前記認識会社名を補正する、
     請求項1から3のいずれか一項に記載のデータ処理装置。
    The master data includes the company name and account information as the plurality of registered character strings.
    The correction unit does not include the recognition company name which is the first character string recognized by the character recognition unit in the master data, and the recognition account information which is the second character string recognized by the character recognition unit. Is included in the master data, the recognized company name is corrected to the company name associated with the recognized account information in the master data.
    The data processing apparatus according to any one of claims 1 to 3.
  5.  前記マスターデータは、会社名及び品目名を前記複数の登録文字列として含み、
     前記補正部は、前記文字認識部が認識した前記第1文字列である認識会社名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識品目名が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識品目名に関連付けられている複数の会社名のうち、前記認識会社名に最も類似する会社名に前記認識会社名を補正する、
     請求項1から4のいずれか一項に記載のデータ処理装置。
    The master data includes the company name and the item name as the plurality of registered character strings.
    In the correction unit, the recognition company name which is the first character string recognized by the character recognition unit is not included in the master data, and the recognition item name which is the second character string recognized by the character recognition unit. Is included in the master data, the recognized company name is corrected to the company name most similar to the recognized company name among the plurality of company names associated with the recognized item name in the master data. ,
    The data processing apparatus according to any one of claims 1 to 4.
  6.  前記マスターデータは、品目名及び商品単価を前記複数の登録文字列として含み、
     前記補正部は、前記文字認識部が認識した前記第1文字列である認識品目名が前記マスターデータに含まれておらず、前記文字認識部が認識した前記第2文字列である認識商品単価が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記認識商品単価に関連付けられている複数の品目名のうち、前記認識品目名に最も類似する品目名に前記認識品目名を補正する、
     請求項1から5のいずれか一項に記載のデータ処理装置。
    The master data includes the item name and the product unit price as the plurality of registered character strings.
    In the correction unit, the recognition item name which is the first character string recognized by the character recognition unit is not included in the master data, and the recognition product unit price which is the second character string recognized by the character recognition unit is not included in the master data. Is included in the master data, the recognized item name is corrected to the item name most similar to the recognized item name among the plurality of item names associated with the recognized product unit price in the master data. ,
    The data processing apparatus according to any one of claims 1 to 5.
  7.  前記補正部は、前記第1文字列を前記類似文字列に補正する前に、前記第1文字列又は前記類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、前記類似文字列が前記第1文字列よりも正しい蓋然性が高いと判定した場合に、前記第1文字列を前記類似文字列に補正する、
     請求項1から6のいずれか一項に記載のデータ処理装置。
    Before correcting the first character string to the similar character string, the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs a search on the Internet, and the similar character. When it is determined that the column has a higher probability of being correct than the first character string, the first character string is corrected to the similar character string.
    The data processing apparatus according to any one of claims 1 to 6.
  8.  前記補正部は、前記第1文字列を前記類似文字列に補正する前に、前記第1文字列又は前記類似文字列の少なくともいずれかをキーワードとしてインターネット上での検索を実行し、前記第1文字列が前記類似文字列よりも正しい蓋然性が高いと判定した場合に、前記第1文字列を補正することなく、前記マスターデータにおける前記類似文字列を前記第1文字列に補正する、
     請求項1から6のいずれか一項に記載のデータ処理装置。
    Before correcting the first character string to the similar character string, the correction unit executes a search on the Internet using at least one of the first character string or the similar character string as a keyword, and performs the first search. When it is determined that the character string is more likely to be correct than the similar character string, the similar character string in the master data is corrected to the first character string without correcting the first character string.
    The data processing apparatus according to any one of claims 1 to 6.
  9.  前記出力部は、前記複数の認識文字列のうち、補正が必要であると前記補正部が判定した文字列と、補正が不要であると前記補正部が判定した文字列とを識別できる態様で出力する、
     請求項1から8のいずれか一項に記載のデータ処理装置。
    The output unit can distinguish between the character string determined by the correction unit to be necessary for correction and the character string determined by the correction unit that correction is not necessary among the plurality of recognition character strings. Output,
    The data processing apparatus according to any one of claims 1 to 8.
  10.  前記第1文字列との類似度が閾値以上の文字列が前記マスターデータに含まれていない場合、前記出力部は、前記第1文字列を前記マスターデータに登録する対象の文字列として出力する、
     請求項1から9のいずれか一項に記載のデータ処理装置。
    When the master data does not include a character string having a similarity with the first character string equal to or higher than the threshold value, the output unit outputs the first character string as a character string to be registered in the master data. ,
    The data processing apparatus according to any one of claims 1 to 9.
  11.  前記出力部は、前記複数の認識文字列のうち、前記マスターデータに含まれていない認識文字列の割合が所定の値以上である場合に、前記マスターデータに含まれていない文字列が多いということを示す情報を出力する、
     請求項1から10のいずれか一項に記載のデータ処理装置。
    The output unit is said to have many character strings not included in the master data when the ratio of the recognition character strings not included in the master data is equal to or more than a predetermined value among the plurality of recognition character strings. Output information indicating that
    The data processing apparatus according to any one of claims 1 to 10.
  12.  コンピュータが実行する、
     証憑画像データを取得するステップと、
     前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力するステップと、
     前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正するステップと、
     前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力するステップと、
     を有するデータ処理方法。
    Computer runs,
    Steps to get voucher image data and
    A step of outputting a plurality of recognition character strings by recognizing the character strings included in the voucher image data, and
    The first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and the second character different from the first character string among the plurality of recognition character strings. When the column is included in the master data, the similar character string most similar to the first character string among the one or more registered character strings associated with the second character string in the master data. The step to correct the first character string and
    A step of associating and outputting the corrected first character string after the first character string is corrected and the second character string, and
    Data processing method.
  13.  コンピュータに、
     証憑画像データを取得するステップと、
     前記証憑画像データに含まれる文字列を認識することにより複数の認識文字列を出力するステップと、
     前記複数の認識文字列のうち第1文字列が、複数の登録文字列が関連付けられたマスターデータに含まれておらず、前記複数の認識文字列のうち前記第1文字列と異なる第2文字列が前記マスターデータに含まれている場合に、前記マスターデータにおいて前記第2文字列に関連付けられている一以上の前記登録文字列のうち前記第1文字列に最も類似する類似文字列に前記第1文字列を補正するステップと、
     前記第1文字列が補正された後の補正第1文字列と前記第2文字列とを関連付けて出力するステップと、
     を実行させるためのプログラム。
     
     
    On the computer
    Steps to get voucher image data and
    A step of outputting a plurality of recognition character strings by recognizing the character strings included in the voucher image data, and
    The first character string among the plurality of recognition character strings is not included in the master data associated with the plurality of registered character strings, and the second character different from the first character string among the plurality of recognition character strings. When the column is included in the master data, the similar character string most similar to the first character string among the one or more registered character strings associated with the second character string in the master data. The step to correct the first character string and
    A step of associating and outputting the corrected first character string after the first character string is corrected and the second character string, and
    A program to execute.

PCT/JP2020/041162 2020-11-04 2020-11-04 Data processing device, data processing method, and program WO2022097189A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2020/041162 WO2022097189A1 (en) 2020-11-04 2020-11-04 Data processing device, data processing method, and program
JP2020561940A JP6870159B1 (en) 2020-11-04 2020-11-04 Data processing equipment, data processing methods and programs
JP2021068170A JP2022075467A (en) 2020-11-04 2021-04-14 Data processing device, data processing method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/041162 WO2022097189A1 (en) 2020-11-04 2020-11-04 Data processing device, data processing method, and program

Publications (1)

Publication Number Publication Date
WO2022097189A1 true WO2022097189A1 (en) 2022-05-12

Family

ID=75801856

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/041162 WO2022097189A1 (en) 2020-11-04 2020-11-04 Data processing device, data processing method, and program

Country Status (2)

Country Link
JP (2) JP6870159B1 (en)
WO (1) WO2022097189A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7339708B1 (en) 2022-09-29 2023-09-06 株式会社トランザック PROGRAM, BUSINESS INFORMATION CONFIRMATION METHOD AND BUSINESS INFORMATION CONFIRMATION SYSTEM

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133565A (en) * 2002-10-09 2004-04-30 Fujitsu Ltd Postprocessing device for character recognition using internet
JP2012517637A (en) * 2009-02-10 2012-08-02 コファックス, インコーポレイテッド System, method and computer program product for determining document validity
JP2014078203A (en) * 2012-10-12 2014-05-01 Fuji Xerox Co Ltd Image processing device and image processing program
JP2014137791A (en) * 2013-01-18 2014-07-28 Fujitsu Ltd Display program, display device and display method
JP2016159245A (en) * 2015-03-03 2016-09-05 株式会社東芝 Delivery processor and delivery processing program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133565A (en) * 2002-10-09 2004-04-30 Fujitsu Ltd Postprocessing device for character recognition using internet
JP2012517637A (en) * 2009-02-10 2012-08-02 コファックス, インコーポレイテッド System, method and computer program product for determining document validity
JP2014078203A (en) * 2012-10-12 2014-05-01 Fuji Xerox Co Ltd Image processing device and image processing program
JP2014137791A (en) * 2013-01-18 2014-07-28 Fujitsu Ltd Display program, display device and display method
JP2016159245A (en) * 2015-03-03 2016-09-05 株式会社東芝 Delivery processor and delivery processing program

Also Published As

Publication number Publication date
JP2022075467A (en) 2022-05-18
JPWO2022097189A1 (en) 2022-05-12
JP6870159B1 (en) 2021-05-12

Similar Documents

Publication Publication Date Title
US6801658B2 (en) Business form handling method and system for carrying out the same
EP1483729B1 (en) Extracting text written on a check
US20120102002A1 (en) Automatic data validation and correction
JP6357621B1 (en) Accounting processing apparatus, accounting processing system, accounting processing method and program
US20140169665A1 (en) Automated Processing of Documents
US8049921B2 (en) System and method for transferring invoice data output of a print job source to an automated data processing system
WO2022097189A1 (en) Data processing device, data processing method, and program
US20220044012A1 (en) Information processing apparatus, information processing method, and computer program product
JP2019023793A (en) Journalizing information processing apparatus, journalizing information processing method, and program
US20100023517A1 (en) Method and system for extracting data-points from a data file
JP2004013813A (en) Information management system and method
WO2022029874A1 (en) Data processing device, data processing method, and data processing program
US20210240973A1 (en) Extracting data from tables detected in electronic documents
JPH10105654A (en) Character recognition device for form
JP2022133739A (en) Program and information processing device
JP6946222B2 (en) Payroll information processing device, payroll information processing method, and program
US20200304670A1 (en) Information processing apparatus and non-transitory computer readable medium
TWM584476U (en) Transfer servo system
JP2021064122A (en) Image processing device, image processing method, and program
US20230140357A1 (en) Image processing apparatus, image processing method, and non-transitory storage medium
TWI768744B (en) Reference document generation method and system
JP7484176B2 (en) Information processing device, information processing system, and program
JP2806340B2 (en) Form management device
WO2022024262A1 (en) Data processing device, data processing method, and program
JP6993032B2 (en) Accounting equipment, accounting systems, accounting methods and programs

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020561940

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20960743

Country of ref document: EP

Kind code of ref document: A1