WO2021059848A1 - Information processing device, information processing method, and information processing program - Google Patents

Information processing device, information processing method, and information processing program Download PDF

Info

Publication number
WO2021059848A1
WO2021059848A1 PCT/JP2020/032346 JP2020032346W WO2021059848A1 WO 2021059848 A1 WO2021059848 A1 WO 2021059848A1 JP 2020032346 W JP2020032346 W JP 2020032346W WO 2021059848 A1 WO2021059848 A1 WO 2021059848A1
Authority
WO
WIPO (PCT)
Prior art keywords
text data
image data
data
item
read
Prior art date
Application number
PCT/JP2020/032346
Other languages
French (fr)
Japanese (ja)
Inventor
択 渡久地
Original Assignee
AI inside株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI inside株式会社 filed Critical AI inside株式会社
Publication of WO2021059848A1 publication Critical patent/WO2021059848A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning

Definitions

  • This disclosure relates to an information processing device, an information processing method, and an information processing program that read character information from image data including character information.
  • image data including character information
  • OCR Optical Character Recognition
  • the process of generating text data from forms which is performed by such a technique, can be read with a certain accuracy in a standard form described in a predetermined format, but can be read in an atypical form. May be less accurate. This is because, in the case of an atypical form, it is unclear what kind of entry item is placed at which position on the form, and even if OCR processing is performed with the entry item unknown, it is difficult to read with high accuracy. It depends on what it is.
  • Patent Document 1 discloses a document structure analysis device that performs document structure analysis on an atypical document. This document structure analysis device acquires the lines of the read document, extracts the attribute probabilities for each attribute, what kind of line (title, export, continuation from the previous line, etc.), and there are multiple possibilities. A multiple temporary document structure network that expresses the document structure of is generated. Using this network, the consistency of the document structure is analyzed while reducing the ambiguity of the document structure.
  • an information processing device capable of improving the accuracy of recognizing the position of a read item by reading image data including character information and performing machine learning will be described.
  • the information processing device is an information processing device that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and is an image for acquiring image data.
  • the text data and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match.
  • Machine learning is performed based on the correct answer data extraction unit that extracts text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and updated. It is equipped with a learning department to perform.
  • the information processing method in one aspect of the present disclosure is an information processing method in which character information is read from image data including character information and machine learning is performed on the position of the read character information in the image data, which is performed by the image data acquisition unit.
  • the text data generation step of recognizing the character information in the read item and generating the text data, the text data performed by the correct answer data extraction unit, and the correct answer text data indicating the character information included in the image data stored in advance.
  • the information processing program is an information processing program that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and acquires the image data.
  • Image data acquisition step to be performed a reading item recognition step that identifies the position of character information contained in the image data and recognizes it as a reading item, and a text data generation that performs character recognition of the character information in the reading item and generates text data.
  • the step, the text data, and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match.
  • Machine learning is performed based on the correct answer data extraction step to extract the extracted text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and Let the electronic computer execute the learning step to update.
  • the position of the character information contained in the image data is identified and recognized as a reading item, the character information contained in the image data is recognized as a character to generate text data, and the correct text data and the reading item are obtained.
  • Each item is compared to determine whether or not they match, and machine learning is performed based on the text data determined to match and the underlying image data. Therefore, it is possible to improve the reading accuracy of the image data.
  • a machine learning model for reading image data including character information can be generated without requiring a lot of trouble. It is possible to do.
  • FIG. 1 is a functional block configuration diagram showing an information processing system 1 according to the first embodiment of the present disclosure.
  • This information processing system 1 is not limited, but as an example, performs character recognition of character information included in image data including character information to generate text data, and normally read text data and an image on which the information data is based. It is a system that performs machine learning based on data.
  • the information processing system 1 includes correct text data indicating character information included in the image data in order to determine whether or not the text data has been read normally.
  • the generated text data is compared with the correct text data to determine whether or not they match, and if it is determined to match, it is determined to be normally read text data.
  • image data obtained by scanning forms as an image is described as an example, but the data is not limited to such form data.
  • the forms to be scanned are atypical documents.
  • An atypical document is a document such as an invoice, which may have a different format depending on the creator, and has similar contents as the description contents, but the information processing in the present embodiment is described.
  • the forms to be scanned by the system 1 are not limited to this.
  • the information processing system 1 has an information processing device 100, a user terminal 200, and a network NW.
  • the information processing device 100 and the user terminal 200 are connected to each other via a network NW.
  • the network NW is a communication network for communication, and is not limited to the Internet, an intranet, a LAN (Local Area Network), a WAN (Wide Area Network), a wireless LAN (Wireless LAN: WAN), and a wireless WAN (Wireless LAN). It is composed of a communication network including Wireless WAN: WAN), Virtual Private Network (VPN), and the like.
  • the information processing device 100 identifies the position of the character information included in the image data, recognizes it as a reading item, performs character recognition of the character information included in the image data, generates text data, and generates the text data, and the text data and the correct answer text. Learning to compare with the data to determine whether or not they match, and to estimate the position of the character information in the image data based on the text data determined to match and the underlying image data. It is a device that performs machine learning about the model.
  • the information processing device 100 is composed of, for example, a computer (desktop, laptop, tablet, etc.) that controls various devices, a server device, and the like.
  • the information processing device 100 is not limited to a device that operates independently, and may be a distributed server system or a cloud server in which a plurality of devices are connected to each other via a communication network and cooperate with each other to perform communication.
  • the user terminal 200 is a device that receives operation input performed by the user to the information processing device 100, and is not limited to the user terminal 200, and is composed of, for example, a smartphone, a mobile terminal, a computer (desktop, laptop, tablet, etc.). There is.
  • this user terminal 200 as an example, not limited to, an application for receiving the service of the information processing system 1 is installed, or a URL or the like for accessing the information processing device 100 is set, and tap or double them. The service is started by clicking or starting the service.
  • the information processing device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.
  • the communication unit 110 is a communication interface for communicating with the user terminal 200 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed.
  • the communication unit 110 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol).
  • the storage unit 120 stores programs, input data, and the like for executing various control processes and each function in the control unit 130.
  • the storage unit 120 is not limited, but as an example, a RAM (RandomAccessMemory) and a ROM (ReadOnly). It is composed of a memory including a memory) and a storage including an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory and the like.
  • the storage unit 120 stores the image data DB 121, the correct answer T (text) data DB 122, the text data DB 123, and the reading learning model DB 124. Further, the storage unit 120 temporarily stores the data when communicating with the user terminal 200 and the data generated by each process described later.
  • the image data DB 121, the correct text data DB 122, the text data DB 123, and the reading learning model DB 124 are databases that can be accessed, referenced, and updated from various programs of the control unit 130.
  • the image data DB 121 stores image data obtained by scanning forms as an image with a scanner device, or path information of a storage destination in which the image data is stored. This image data is for reading character information by OCR.
  • the forms to be scanned are, for example, invoices, etc., and the invoice source and billing destination companies do not have to be the same company, etc., and the format of the invoices, etc. is unified. You don't have to.
  • the image data scanned by the scanner device is targeted, but it is sufficient that the forms in the paper medium are converted into electronic data, and for example, the photographic image data captured by a camera or the like may be used.
  • correct answer text data DB 122 character information included in the image data stored in the image data DB 121 (or the image data acquired from the storage destination path information stored in the image data DB 121) is stored as the correct answer text data. Has been done. This correct text data is used to determine whether or not the character information has been read normally by OCR.
  • This correct answer text data is not limited, but is stored for each item described in the underlying forms as an example, and attributes are set for each item. The attribute is, for example, the name of the item, such as "form name", "company name”, and "date”.
  • the character information is read by OCR from the image data stored in the image data DB 121 (or the image data acquired from the path information of the storage destination stored in the image data DB 121), and the text is generated.
  • the data is stored. Whether or not this text data is read normally is determined, and the text data read normally is used for machine learning.
  • This text data is not limited, but is stored for each item described in the read forms as in the case of the correct answer text data, and the attribute is set for each item.
  • the reading learning model DB 124 stores a learning model generated by machine learning based on normally read text data.
  • This learning model is model information for generating text data by performing character recognition of character information included in image data obtained by scanning forms as an image and estimating the position in the image data.
  • the control unit 130 controls the entire operation of the information processing device 100 by executing a program stored in the storage unit 120, and is not limited to, but as an example, a CPU (Central Processing Unit), an MPU ( Equipment including MicroProcessingUnit), GPU (GraphicsProcessingUnit), microprocessor (Microprocessor), processor core (Processorcore), multiprocessor (Multiprocessor), ASIC (Application-Specific IntegratedCircuit), FPGA (Field ProgrammableGateArray) Etc.
  • the functions of the control unit 130 include an image data acquisition unit 131, a reading item recognition unit 132, a text data generation unit 133, an attribute setting unit 134, a correct answer data extraction unit 135, and a learning unit 136.
  • the image data acquisition unit 131, the reading item recognition unit 132, the text data generation unit 133, the attribute setting unit 134, the correct answer data extraction unit 135, and the learning unit 136 are activated by the program stored in the storage unit 120 to provide information. It is executed by the processing device 100.
  • the image data acquisition unit 131 receives from the user terminal 200 the image data obtained by scanning the forms, which are examples of the image data including the character information, as an image by the scanner device, or the path information of the storage destination in which the image data is stored. Obtained via the communication unit 110. For example, since the scanner device is directly connected to the user terminal 200 or is connected via the network NW and the scanned image data is transmitted from the user terminal 200, the image data may be acquired. Further, since the image data scanned by another external device is acquired by the user terminal 200 and the image data is transmitted, the image data may be acquired. The scanner device or external device in this case is not shown. The image data acquired by the image data acquisition unit 131 is stored in the image data DB 121.
  • the reading item recognition unit 132 identifies the position of the character information included in the image data from the image data stored in the image data DB 121, and recognizes it as a reading item for reading the character information by OCR.
  • the character information contained in the image data for example, the character information such as "invoice” and " ⁇ Co., Ltd.” is rectangular, for example. It is selected to be enclosed in and recognized as a reading item.
  • the text data generation unit 133 reads character information by OCR and performs character recognition on a portion recognized as a reading item by the reading item recognition unit 132, and generates text data. As described above, in the case of image data obtained by scanning an invoice, for example, characters such as "invoice” and " ⁇ Co., Ltd.” in the portion recognized as a read item are read as character information, and the text data is displayed. Will be generated. The generated text data is stored in the text data DB 123 for each read item.
  • the attribute setting unit 134 has attributes such as "form name”, "company name”, and "date” which are the names of the items of the text data generated by the text data generation unit 133 and stored in the text data DB 123.
  • the attribute setting for each item of the text data is automatically set by the attribute setting unit 134 based on, for example, the position of the read item in the image data obtained by scanning the invoice or the like and the content of the text data read from the read item. Is set.
  • the set attribute is stored in the text data DB 123 in association with the reading item of the text data.
  • the correct answer data extraction unit 135 compares the text data stored in the text data DB 123 with the correct text data stored in the correct answer text data DB 122, determines whether or not they match, and determines whether or not they match. Extract the text data determined to be. That is, the text data that has been read normally is extracted. The determination as to whether or not there is a match is made for each read item stored in the text data DB 123 and the correct answer text data DB 122.
  • the determination by the correct answer data extraction unit 135 is not limited, but as an example, first, the read item in the text data is compared with the read item in the correct text data, and it is determined whether or not each read item matches. Next, when it is determined that the read items match, it is determined whether or not the text data and the correct answer text data match. Alternatively, the attributes set in the text data are compared with the attributes set in the correct text data to determine whether or not the attributes match, and then if the attributes match. For the determined read item, it is determined whether or not the text data and the correct answer text data match.
  • the matching degree of the text data is calculated based on the correct answer text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, the text data and the correct answer text data Is determined to match.
  • the determination of whether or not there is a match is not limited to the determination of whether or not there is an exact match, and it may be determined that the match is made when the degree of match is equal to or greater than a predetermined threshold value.
  • the determination of the match based on the degree of matching may be made for each read item stored in the text data DB 123 and the correct answer text data DB 122, and similarly, the attributes also match when the degree of matching is equal to or greater than a predetermined threshold value. It may be determined that it is. Further, the determination may be made using a different threshold value for each read item. In particular, in the case of attributes, it is not necessary that they match exactly, and if the degree of matching is equal to or higher than a predetermined threshold value, it may be determined that they match.
  • the learning unit 136 estimates the position of the character information in the image data based on the text data extracted by the correct answer data extraction unit 135 and the image data stored in the image data DB 121 which is the basis of the text data. Therefore, machine learning is performed on the learning model, and the learning model stored in the reading learning model DB 124 is generated or updated.
  • the learning model may be updated by, for example, an aggregation process in which the learning model stored in the reading learning model DB 124 and the learning result by the learning unit 136 are merged.
  • Machine learning by the learning unit 136 may be performed by supervised machine learning using the extracted text data and the underlying image data as teacher data, and may be performed by unsupervised machine learning, as an example without limitation. It may be done by deep learning.
  • FIG. 2 is a functional block configuration diagram showing the user terminal 200 of FIG.
  • the user terminal 200 includes a communication unit 210, a display unit 220, an operation unit 230, a storage unit 240, and a control unit 250.
  • the communication unit 210 is a communication interface for communicating with the information processing device 100 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed.
  • the communication unit 210 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP.
  • the display unit 220 is a user interface used for displaying the operation content input by the user and the transmission content from the information processing device 100, and is composed of a liquid crystal display or the like.
  • the display unit 220 displays the notification information notified from the information processing device 100 to the user.
  • the operation unit 230 is a user interface used for the user to input operation instructions, and is composed of a keyboard, a mouse, a touch panel, and the like.
  • the operation unit 230 is used for inputting operation information performed by the user to the information processing device 100.
  • the storage unit 240 stores programs for executing various control processes and each function in the control unit 250, input data, and the like.
  • the storage unit 240 is not limited, and as an example, a memory including a RAM, a ROM, and the like, an HDD, and the like. It is composed of storage including SSD, flash memory and the like.
  • the storage unit 240 temporarily stores the data that has communicated with the information processing device 100.
  • the control unit 250 controls the entire operation of the user terminal 200 by executing a program stored in the storage unit 240, and is not limited to, but as an example, a CPU, an MPU, a GPU, a microprocessor, and a processor. It is composed of a core, a multiprocessor, an ASIC, a device including an FPGA, and the like.
  • FIG. 3 is a flowchart showing the operation of the information processing device 100 of FIG.
  • the user terminal 200 transmits the scanned image data or the path information of the storage destination in which the image data is stored, so that the image data acquisition unit 131 acquires the image data.
  • the acquired image data is stored in the image data DB 121.
  • the reading item recognition unit 132 reads the image data acquired in step S101 and stored in the image data DB 121.
  • FIG. 4 is a schematic diagram showing an example of image data P1 acquired by the image data acquisition unit 131 of FIG.
  • the image data P1 shown in FIG. 4 shows image data obtained by scanning an invoice as an example of a form, in which " ⁇ Co., Ltd.” is the invoice source and an invoice addressed to " ⁇ Co., Ltd.” is illustrated. ing.
  • this image data P1 information such as a subject, an item, a quantity, and an amount of money is described in addition to a "invoice" which is a form name, a billing source company name, and a billing destination company name.
  • image data obtained by scanning an invoice or the like as shown in FIG. 4 is acquired, stored in the image data DB 121, and read by the process of step S102.
  • the reading item recognition unit 132 identifies the position of the character information included in the image data read in step S102, and recognizes it as a reading item for reading the character information by OCR.
  • FIG. 5 is a schematic diagram showing an example of recognition of a read item performed by the read item recognition unit 132 of FIG.
  • FIG. 5 shows an example in which the position of the character information is identified with respect to the image data P1 shown in FIG. 4 and recognized as a reading item for reading the character information.
  • the reading items A1 to A11 shown in FIG. 5 indicate a state in which the character information of the image data P1 shown in FIG. 4 is recognized as a reading item, and the character information is recognized as a rectangular selection area.
  • the character information of the form name "invoice” is selected.
  • the character information of " ⁇ Co., Ltd.” which is the name of the billing company, is selected.
  • the character information of " ⁇ Co., Ltd.” which is the billing company name is selected.
  • the reading item A4 selects the character information of the date "September 1, 2019”.
  • the character information of the subject " ⁇ ⁇ ⁇ matter” is selected.
  • the character information of the item name " ⁇ ⁇ ⁇ fee” is selected.
  • the character information of "1" which is the number of items of the reading item A6, is selected.
  • the reading item A8 selects the character information of "150,000” which is the amount of money of the item of the reading item A6.
  • the reading item A9 selects the character information of "150,000", which is the subtotal amount.
  • the reading item A10 selects the character information of "12,000”, which is the amount of consumption tax.
  • the reading item A11 selects the character information of "162,000", which is the total amount of money.
  • the position of the character information as shown in FIG. 5 is identified and recognized as a reading item for reading the character information.
  • the text data generation unit 133 reads the character information by OCR and performs character recognition on the portion recognized as the read item in step S103, and the text data is generated.
  • the generated text data is stored in the text data DB 123 for each read item.
  • step S104 the characters of "invoice” are read by the reading item A1 as shown in FIG. 5, and are generated as text data.
  • the characters “ ⁇ Co., Ltd.” are read and generated as text data.
  • the characters “ ⁇ Co., Ltd.” are read by the reading item A3 and generated as text data.
  • the characters "September 1, 2019” are read by the reading item A4 and generated as text data.
  • the characters " ⁇ ⁇ ⁇ ” are read in the reading item A5 and generated as text data.
  • the following processing is the same, so it is omitted.
  • the attribute setting unit 134 sets attributes for the text data generated in step S104 and stored in the text data DB 123.
  • FIG. 6 is a schematic diagram showing an example of text data T1 generated and attribute-set by the text data generation unit 133 and the attribute setting unit 134 of FIG.
  • the text data shown in the right column of the text data T1 in FIG. 6 is text data generated from the reading items A1 to A11 shown in FIG. 5 (the reading items A6 to A11 are not shown), and the respective texts are shown.
  • the attributes shown in the left column of FIG. 6 are set so as to be linked to the data.
  • the correct answer data extraction unit 135 compares and matches the text data generated in step S104 and stored in the text data DB 123 with the correct answer text data stored in the correct answer text data DB 122. It is determined whether or not it is.
  • the read items in the text data and the read items in the correct text data are compared, and it is determined whether or not the read items match.
  • the attributes set in the text data and the attributes set in the correct text data are compared, and it is determined whether or not the attributes match.
  • FIG. 7 is a schematic diagram showing an example of the correct answer text data T2 stored in the correct answer text data DB 122 of FIG.
  • the correct answer text data shown in the right column of the correct answer text data T2 of FIG. 7 is data stored in the correct answer text data DB 122 as the correct answer text data of the character information included in the image data P1 shown in FIG.
  • the attributes shown in the left column of FIG. 7 are set so as to be associated with the respective correct answer text data in the same manner as the text data stored in the text data DB 123.
  • FIG. 8 is a schematic diagram showing an example of determination in the correct answer data extraction unit 135 of FIG.
  • the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7 are compared, and it is determined whether or not they match.
  • the text data T1 and the correct answer text data T2 shown in FIG. 8 are the same as the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7, respectively.
  • the attribute set in the text data T1 and the attribute set in the correct text data T2 are compared, and it is determined whether or not the attributes match.
  • the attribute "company name" in the second line of the text data T1 and the attribute "destination” in the third line are the attributes "billing source” in the second line of the correct text data T2, and the third line. It is different from the attribute "billing address”.
  • the correct answer data extraction unit 135 calculates the degree of matching for the attributes of each item, and determines for each attribute that the calculated matching degree matches when the calculated matching degree is equal to or greater than a predetermined threshold value. In this case, since different attributes do not affect the generation of text data, it may be determined that the attributes match.
  • the correct answer data extraction unit 135 calculates the matching degree for the text data of each item, and determines that the matching degree matches when the calculated matching degree is equal to or more than a predetermined threshold value. In this case, if the dates are different, it may be determined that the reading could not be performed normally.
  • the correct answer data extraction unit 135 extracts the text data determined that the determination results performed in step S106 match.
  • the text data determined to match may be extracted in units of image data or in units of read items.
  • the read item of the attribute "date" was not read normally, but all the text data of the read item for the invoice may be excluded from the extraction target. Only the read item of the attribute "date" may be excluded.
  • the status information may be provided in the text data DB 123 and the status may be set only for the extracted items, or a separate database may be provided.
  • the learning unit 136 performs machine learning based on the text data extracted in step S107 and the image data stored in the image data DB 121 which is the basis of the text data, and reads the text data.
  • the learning model stored in the learning model DB 124 is generated and updated.
  • the information processing device, the information processing system, and the information processing method according to the present embodiment perform character recognition of character information included in image data obtained by scanning forms as an image and generate text data.
  • the correct text data which is the correct answer data of the character information included in the image data
  • a learning model is generated by comparing the text data with the correct text data to determine whether or not they match, and performing machine learning based on the text data determined to match and the underlying image data. Will be done. Therefore, since machine learning is performed by targeting only the text data that has been read normally as the target of machine learning, it is possible to efficiently improve the reading accuracy of the image data.
  • the character information contained in the image data is character-recognized for each read item, compared with the correct text data for each read item to determine whether or not they match, and is based on the text data determined to match.
  • Machine learning is performed based on the image data that becomes. Therefore, since the determination is made for each reading item, it is possible to improve the reading accuracy that differs for each item.
  • the matching degree of the text data is calculated based on the correct answer text data, and if the calculated matching degree is equal to or more than a predetermined threshold value, it is determined that the text data and the correct answer text data match. Further, this determination may be made for each read item. Therefore, the criteria for determining whether or not they match can be set for each form and each read item. As a result, it is possible to improve the reading accuracy, which differs for each item, more efficiently.
  • FIG. 9 is a functional block configuration diagram showing the information processing system 1A according to the second embodiment of the present disclosure.
  • This information processing system 1A generates text data by performing character recognition of character information included in image data including character information, and is a machine based on the normally read text data and the image data on which the information data is based. It is the same as the information processing system 1 according to the first embodiment in that it is a learning system, but includes an image data reading unit 137 as a function of the control unit 130 of the information processing device 100A provided in the present embodiment. In that respect, it differs from the information processing system 1 according to the first embodiment.
  • the actual forms are read based on the learning model generated by the information processing system 1A.
  • the image data reading unit 137 acquires image data by newly scanning forms based on the learning model in which machine learning is performed by the learning unit 136 and is stored in the reading learning model DB 124, and character recognition of character information is performed. And generate new text data.
  • the new text data may be stored in the text data DB 123, or may be newly stored in another database. This text data may be provided to a person who provides the forms, for example, as a product of a service that scans the forms and generates the text data read by OCR.
  • the learning unit 136 in the present embodiment may perform machine learning based on new text data and image data obtained by scanning new forms. Thereby, the reading accuracy can be further improved.
  • Other configurations and processing flows are the same as those in the first embodiment.
  • an image data reading unit for newly acquiring image data obtained by scanning forms and performing character recognition of character information is provided, and character information is provided based on a learning model. Character recognition of. As a result, the accuracy of reading can be further improved, and it is possible to provide the form as a product of the service of scanning the forms and generating the read text data by OCR to the person who provides the forms. ..
  • FIG. 10 is a functional block configuration diagram showing an example of the configuration of the computer (electronic computer) 700.
  • the computer 700 includes a CPU 701, a main storage device 702, an auxiliary storage device 703, and an interface 704.
  • the details of the control program (information processing program) for realizing each function constituting the data reading unit 137 will be described.
  • These functional blocks are implemented in the computer 700.
  • the operation of each of these components is stored in the auxiliary storage device 703 in the form of a program.
  • the CPU 701 reads the program from the auxiliary storage device 703, expands it to the main storage device 702, and executes the above processing according to the program. Further, the CPU 701 secures a storage area corresponding to the above-mentioned storage unit in the main storage device 702 according to the program.
  • the program includes an image data acquisition step of acquiring image data, a reading item recognition step of identifying the position of character information included in the image data and recognizing it as a reading item, and a reading item in the computer 700.
  • the text data generation step of performing character recognition of the character information in the above and generating the text data, and the text data and the correct answer text data indicating the character information included in the image data stored in advance are compared for each read item.
  • the correct answer data extraction step that determines whether or not they match and extracts the text data that is determined to match, and the reading of the extracted text data and the image data that is the basis of the extracted text data. It is a control program that realizes a learning step that performs machine learning based on the position of an item and generates and updates a learning model by a computer.
  • the auxiliary storage device 703 is an example of a tangible medium that is not temporary.
  • Other examples of non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, etc. connected via interface 704.
  • the distributed computer 700 may expand the program to the main storage device 702 and execute the above processing.
  • the program may be for realizing a part of the above-mentioned functions. Further, the program may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with another program already stored in the auxiliary storage device 703.
  • difference file difference program
  • 1,1A information processing system 100,100A information processing device, 110 communication unit, 120 storage unit, 121 image data DB, 122 correct answer T (text) data DB, 123 text data DB, 124 reading learning model DB, 130 control unit , 131 image data acquisition unit, 132 reading item recognition unit, 133 text data generation unit, 134 attribute setting unit, 135 correct answer data extraction unit, 136 learning unit, 137 image data reading unit, 200 user terminal, 210 communication unit, 220 display Unit, 230 operation unit, 240 storage unit, 250 control unit, NW network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

An information processing device 100 in an information processing system 1 comprises the following as functions thereof: an image data acquisition unit 131 that acquires image data; a reading item recognition unit 132 that identifies the location of character information included in the image data and recognizes the same as reading items; a text data generation unit 133 that reads out character information pertaining to the recognized portions and performs character recognition; an attribute settings unit 134 that sets attributes pertaining to text data; a correct data extraction unit 135 that compares the text data to correct text data to determine whether there is a match, and extracts the text data determined to be a match; and a learning unit 136 that performs machine learning on the basis of the extracted text data and the image data serving as the source of the text data.

Description

情報処理装置、情報処理方法及び情報処理プログラムInformation processing equipment, information processing methods and information processing programs
 本開示は、文字情報を含む画像データから文字情報を読み取る情報処理装置、情報処理方法及び情報処理プログラムに関する。 This disclosure relates to an information processing device, an information processing method, and an information processing program that read character information from image data including character information.
 文字情報を含む画像データの例として、帳票類をイメージスキャナなどで読み取り、OCR(Optical Character Recognition)処理を行うことにより、入力情報を所定の文字コードに変換し、テキストデータを生成する技術が普及している。 As an example of image data including character information, a technique for converting input information into a predetermined character code and generating text data by reading forms with an image scanner and performing OCR (Optical Character Recognition) processing has become widespread. are doing.
 このような技術で行われる、帳票類からテキストデータを生成する処理は、所定のフォーマットで記載された定型的な帳票では、一定の精度により読取が可能であるが、非定型な帳票では、読取の精度が低いことがある。これは、非定型な帳票の場合、その帳票のどの位置にどのような記載項目が配置されているか不明であり、記載項目が不明な状態でOCR処理を行っても精度の高い読取は困難であることによる。 The process of generating text data from forms, which is performed by such a technique, can be read with a certain accuracy in a standard form described in a predetermined format, but can be read in an atypical form. May be less accurate. This is because, in the case of an atypical form, it is unclear what kind of entry item is placed at which position on the form, and even if OCR processing is performed with the entry item unknown, it is difficult to read with high accuracy. It depends on what it is.
 そのため、例えば、特許文献1には、非定型文書に対して文書構造解析を行う文書構造解析装置が開示されている。この文書構造解析装置では、読み込んだ文書の行を取得し、どのような行(タイトル、書き出し、前行からの続き等)であるか、属性ごとの属性確率が抽出され、可能性のある複数の文書構造を表現する多重仮設文書構造ネットワークを生成している。このネットワークを用いて、文書構造の曖昧性を低減しながら文書構造の整合性の分析を行っている。 Therefore, for example, Patent Document 1 discloses a document structure analysis device that performs document structure analysis on an atypical document. This document structure analysis device acquires the lines of the read document, extracts the attribute probabilities for each attribute, what kind of line (title, export, continuation from the previous line, etc.), and there are multiple possibilities. A multiple temporary document structure network that expresses the document structure of is generated. Using this network, the consistency of the document structure is analyzed while reducing the ambiguity of the document structure.
特開2015-127913号公報Japanese Unexamined Patent Publication No. 2015-127913
 ところで、一般的に使用されている帳票、特にビジネスの世界で使用されている文書のような帳票は、例えば請求書のように、作成する者によりフォーマットが異なることがあるが、記載内容としては似たような内容が記載されていることが多い。このような帳票について、特許文献1に記載のような煩雑な処理を必要とせずに、帳票を読み取って機械学習を行うことで、帳票のどの位置にどのような記載項目が配置されているかを精度よく把握することが可能な手法が望まれていた。 By the way, commonly used forms, especially forms such as documents used in the business world, may have different formats depending on the creator, such as invoices, but the description content is as follows. Often similar content is described. By reading the form and performing machine learning on such a form without requiring complicated processing as described in Patent Document 1, what kind of description item is arranged at which position on the form can be determined. A method that can be grasped with high accuracy has been desired.
 そこで、本開示では、文字情報を含む画像データを読み取って機械学習を行うことで読取項目の位置を認識する精度を上げることが可能な情報処理装置、情報処理方法及び情報処理プログラムについて説明する。 Therefore, in the present disclosure, an information processing device, an information processing method, and an information processing program capable of improving the accuracy of recognizing the position of a read item by reading image data including character information and performing machine learning will be described.
 本開示の一態様における情報処理装置は、文字情報を含む画像データから文字情報を読み取り、読み取った文字情報の画像データにおける位置について機械学習を行う情報処理装置であって、画像データを取得する画像データ取得部と、画像データに含まれる文字情報の位置を識別し、読取項目として認識する読取項目認識部と、読取項目における文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成部と、テキストデータと、あらかじめ記憶されている画像データに含まれる文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定されたテキストデータを抽出する正解データ抽出部と、抽出されたテキストデータと、抽出されたテキストデータの基になる画像データにおける読取項目の位置とに基づいて機械学習を行い、学習モデルの生成及び更新を行う学習部と、を備える。 The information processing device according to one aspect of the present disclosure is an information processing device that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and is an image for acquiring image data. A data acquisition unit, a reading item recognition unit that identifies the position of character information contained in image data and recognizes it as a reading item, and a text data generation unit that recognizes the character information in the reading item and generates text data. , The text data and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Machine learning is performed based on the correct answer data extraction unit that extracts text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and updated. It is equipped with a learning department to perform.
 本開示の一態様における情報処理方法は、文字情報を含む画像データから文字情報を読み取り、読み取った文字情報の画像データにおける位置について機械学習を行う情報処理方法であって、画像データ取得部が行う、画像データを取得する画像データ取得ステップと、読取項目認識部が行う、画像データに含まれる文字情報の位置を識別し、読取項目として認識する読取項目認識ステップと、テキストデータ生成部が行う、読取項目における文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成ステップと、正解データ抽出部が行う、テキストデータと、あらかじめ記憶されている画像データに含まれる文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定されたテキストデータを抽出する正解データ抽出ステップと、学習部が行う、抽出されたテキストデータと、抽出されたテキストデータの基になる画像データにおける読取項目の位置とに基づいて機械学習を行い、学習モデルの生成及び更新を行う学習ステップと、を備える。 The information processing method in one aspect of the present disclosure is an information processing method in which character information is read from image data including character information and machine learning is performed on the position of the read character information in the image data, which is performed by the image data acquisition unit. , The image data acquisition step of acquiring image data, the reading item recognition step of identifying the position of the character information included in the image data and recognizing it as a reading item, and the text data generation unit, which are performed by the reading item recognition unit. The text data generation step of recognizing the character information in the read item and generating the text data, the text data performed by the correct answer data extraction unit, and the correct answer text data indicating the character information included in the image data stored in advance. Is compared for each read item, it is judged whether or not they match, and the correct answer data extraction step of extracting the text data determined to match, and the extracted text data performed by the learning unit , A learning step of performing machine learning based on the position of a read item in the image data on which the extracted text data is based, and generating and updating a learning model.
 また、本開示の一態様における情報処理プログラムは、文字情報を含む画像データから文字情報を読み取り、読み取った文字情報の画像データにおける位置について機械学習を行う情報処理プログラムであって、画像データを取得する画像データ取得ステップと、画像データに含まれる文字情報の位置を識別し、読取項目として認識する読取項目認識ステップと、読取項目における文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成ステップと、テキストデータと、あらかじめ記憶されている画像データに含まれる文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定されたテキストデータを抽出する正解データ抽出ステップと、抽出されたテキストデータと、抽出されたテキストデータの基になる画像データにおける読取項目の位置とに基づいて機械学習を行い、学習モデルの生成及び更新を行う学習ステップと、を電子計算機に実行させる。 Further, the information processing program according to one aspect of the present disclosure is an information processing program that reads character information from image data including character information and performs machine learning about the position of the read character information in the image data, and acquires the image data. Image data acquisition step to be performed, a reading item recognition step that identifies the position of character information contained in the image data and recognizes it as a reading item, and a text data generation that performs character recognition of the character information in the reading item and generates text data. The step, the text data, and the correct text data indicating the character information contained in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Machine learning is performed based on the correct answer data extraction step to extract the extracted text data, the extracted text data, and the position of the read item in the image data that is the basis of the extracted text data, and the learning model is generated and Let the electronic computer execute the learning step to update.
 本開示によれば、画像データに含まれる文字情報の位置を識別して読取項目として認識し、画像データに含まれる文字情報の文字認識を行ってテキストデータを生成し、正解テキストデータと読取項目ごとに比較して一致しているか否かの判定を行い、一致していると判定されたテキストデータと基になる画像データとに基づいて機械学習を行う。そのため、画像データの読取精度を上げることが可能である。また、煩雑な処理を必要とせずに読取項目の位置を認識する精度を上げることが可能であるため、多くの手間を必要とせずに文字情報を含む画像データを読み取るための機械学習モデルを生成することが可能である。 According to the present disclosure, the position of the character information contained in the image data is identified and recognized as a reading item, the character information contained in the image data is recognized as a character to generate text data, and the correct text data and the reading item are obtained. Each item is compared to determine whether or not they match, and machine learning is performed based on the text data determined to match and the underlying image data. Therefore, it is possible to improve the reading accuracy of the image data. In addition, since it is possible to improve the accuracy of recognizing the position of the read item without requiring complicated processing, a machine learning model for reading image data including character information can be generated without requiring a lot of trouble. It is possible to do.
本開示の一実施形態に係る情報処理システムを示す機能ブロック構成図である。It is a functional block block diagram which shows the information processing system which concerns on one Embodiment of this disclosure. 図1のユーザ端末200を示す機能ブロック構成図である。It is a functional block block diagram which shows the user terminal 200 of FIG. 図1の情報処理装置100の動作を示すフローチャートである。It is a flowchart which shows the operation of the information processing apparatus 100 of FIG. 図1の画像データ取得部131で取得される画像データP1の例を示す模式図である。It is a schematic diagram which shows the example of the image data P1 acquired by the image data acquisition unit 131 of FIG. 図1の読取項目認識部132で行われる読取項目の認識の例を示す模式図である。It is a schematic diagram which shows the example of the recognition of the reading item performed by the reading item recognition unit 132 of FIG. 図1のテキストデータ生成部133及び属性設定部134において生成及び属性設定されたテキストデータの例を示す模式図である。It is a schematic diagram which shows the example of the text data generated and attribute set by the text data generation unit 133 and the attribute setting unit 134 of FIG. 図1の正解テキストデータDB122に格納される正解データの例を示す模式図である。It is a schematic diagram which shows the example of the correct answer data stored in the correct answer text data DB 122 of FIG. 図1の正解データ抽出部135における判定の例を示す模式図である。It is a schematic diagram which shows the example of the determination in the correct answer data extraction unit 135 of FIG. 本開示の一実施形態に係る情報処理システムを示す機能ブロック構成図である。It is a functional block block diagram which shows the information processing system which concerns on one Embodiment of this disclosure. 本開示の一実施形態に係るコンピュータ700を示す機能ブロック構成図である。It is a functional block block diagram which shows the computer 700 which concerns on one Embodiment of this disclosure.
 以下、本開示の実施形態について図面を参照して説明する。なお、以下に説明する実施形態は、特許請求の範囲に記載された本開示の内容を不当に限定するものではない。また、実施形態に示される構成要素のすべてが、本開示の必須の構成要素であるとは限らない。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. The embodiments described below do not unreasonably limit the contents of the present disclosure described in the claims. Also, not all of the components shown in the embodiments are essential components of the present disclosure.
 (実施形態1)
 <構成>
 図1は、本開示の実施形態1に係る情報処理システム1を示す機能ブロック構成図である。この情報処理システム1は、限定ではなく例として、文字情報を含む画像データに含まれる文字情報の文字認識を行ってテキストデータを生成し、正常に読み込まれたテキストデータと、その基になる画像データとに基づいて機械学習を行うシステムである。情報処理システム1では、テキストデータが正常に読み込まれたか否かを判定するために、画像データに含まれる文字情報を示す正解テキストデータを備えている。生成されたテキストデータは、この正解テキストデータと比較して一致しているか否かが判定され、一致していると判定された場合に正常に読み込まれたテキストデータと判定される。
(Embodiment 1)
<Structure>
FIG. 1 is a functional block configuration diagram showing an information processing system 1 according to the first embodiment of the present disclosure. This information processing system 1 is not limited, but as an example, performs character recognition of character information included in image data including character information to generate text data, and normally read text data and an image on which the information data is based. It is a system that performs machine learning based on data. The information processing system 1 includes correct text data indicating character information included in the image data in order to determine whether or not the text data has been read normally. The generated text data is compared with the correct text data to determine whether or not they match, and if it is determined to match, it is determined to be normally read text data.
 ここで、本実施形態では、文字情報を含む画像データとして、帳票類を画像としてスキャンした画像データを例として説明しているが、このような帳票データに限られない。この例において、スキャンの対象となる帳票類は、非定型文書である。非定型文書とは、例えば請求書のように、作成する者によりフォーマットが異なることがあるが、記載内容としては似たような内容が記載されている文書であるが、本実施形態における情報処理システム1でスキャンの対象とされる帳票類は、これに限られない。 Here, in the present embodiment, as image data including character information, image data obtained by scanning forms as an image is described as an example, but the data is not limited to such form data. In this example, the forms to be scanned are atypical documents. An atypical document is a document such as an invoice, which may have a different format depending on the creator, and has similar contents as the description contents, but the information processing in the present embodiment is described. The forms to be scanned by the system 1 are not limited to this.
 情報処理システム1は、情報処理装置100と、ユーザ端末200と、ネットワークNWとを有している。情報処理装置100と、ユーザ端末200とは、ネットワークNWを介して相互に接続される。ネットワークNWは、通信を行うための通信網であり、限定ではなく例として、インターネット、イントラネット、LAN(Local Area Network)、WAN(Wide Area Network)、ワイヤレスLAN(Wireless LAN:WLAN)、ワイヤレスWAN(Wireless WAN:WWAN)、仮想プライベートネットワーク(Virtual Private Network:VPN)等を含む通信網により構成されている。 The information processing system 1 has an information processing device 100, a user terminal 200, and a network NW. The information processing device 100 and the user terminal 200 are connected to each other via a network NW. The network NW is a communication network for communication, and is not limited to the Internet, an intranet, a LAN (Local Area Network), a WAN (Wide Area Network), a wireless LAN (Wireless LAN: WAN), and a wireless WAN (Wireless LAN). It is composed of a communication network including Wireless WAN: WAN), Virtual Private Network (VPN), and the like.
 情報処理装置100は、画像データに含まれる文字情報の位置を識別して読取項目として認識し、画像データに含まれる文字情報の文字認識を行ってテキストデータを生成し、このテキストデータと正解テキストデータと比較して一致しているか否かの判定を行い、一致していると判定されたテキストデータと基になる画像データとに基づいて、文字情報の画像データにおける位置について推定するための学習モデルに関する機械学習を行う装置である。この情報処理装置100は、具体的には、限定ではなく例として各種装置を制御するコンピュータ(デスクトップ、ラップトップ、タブレット等)や、サーバ装置等により構成されている。なお、情報処理装置100は、単体で動作する装置に限られず、複数の装置が通信網を介して相互に接続され、通信を行うことで協調動作する分散型サーバシステムや、クラウドサーバでもよい。 The information processing device 100 identifies the position of the character information included in the image data, recognizes it as a reading item, performs character recognition of the character information included in the image data, generates text data, and generates the text data, and the text data and the correct answer text. Learning to compare with the data to determine whether or not they match, and to estimate the position of the character information in the image data based on the text data determined to match and the underlying image data. It is a device that performs machine learning about the model. Specifically, the information processing device 100 is composed of, for example, a computer (desktop, laptop, tablet, etc.) that controls various devices, a server device, and the like. The information processing device 100 is not limited to a device that operates independently, and may be a distributed server system or a cloud server in which a plurality of devices are connected to each other via a communication network and cooperate with each other to perform communication.
 ユーザ端末200は、ユーザが情報処理装置100に対して行う操作入力を受け付ける装置であり、限定ではなく例として、スマートフォンや、携帯端末、コンピュータ(デスクトップ、ラップトップ、タブレットなど)等により構成されている。このユーザ端末200では、限定ではなく例として、情報処理システム1のサービスの提供を受けるためのアプリがインストールされ、または情報処理装置100にアクセスするためのURL等が設定され、それらをタップまたはダブルクリック等して起動することにより、サービスが開始される。 The user terminal 200 is a device that receives operation input performed by the user to the information processing device 100, and is not limited to the user terminal 200, and is composed of, for example, a smartphone, a mobile terminal, a computer (desktop, laptop, tablet, etc.). There is. In this user terminal 200, as an example, not limited to, an application for receiving the service of the information processing system 1 is installed, or a URL or the like for accessing the information processing device 100 is set, and tap or double them. The service is started by clicking or starting the service.
 情報処理装置100は、通信部110と、記憶部120と、制御部130とを備える。 The information processing device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.
 通信部110は、ネットワークNWを介してユーザ端末200と有線または無線で通信を行うための通信インタフェースであり、互いの通信が実行できるのであればどのような通信プロトコルを用いてもよい。この通信部110は、限定ではなく例として、TCP/IP(Transmission Control Protocol/Internet Protocol)等の通信プロトコルにより通信が行われる。 The communication unit 110 is a communication interface for communicating with the user terminal 200 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed. The communication unit 110 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP (Transmission Control Protocol / Internet Protocol).
 記憶部120は、各種制御処理や制御部130内の各機能を実行するためのプログラムや入力データ等を記憶するものであり、限定ではなく例として、RAM(Random Access Memory)、ROM(Read Only Memory)等を含むメモリや、HDD(Hard Disk Drive)、SSD(Solid State Drive)、フラッシュメモリ等を含むストレージから構成される。また、記憶部120は、画像データDB121と、正解T(テキスト)データDB122と、テキストデータDB123と、読取学習モデルDB124とを記憶する。さらに、記憶部120は、ユーザ端末200との間で通信を行った際のデータや、後述する各処理にて生成されたデータを一時的に記憶する。画像データDB121、正解テキストデータDB122、テキストデータDB123、及び読取学習モデルDB124は、制御部130の各種プログラムからアクセスされて参照、更新が可能なデータベースである。 The storage unit 120 stores programs, input data, and the like for executing various control processes and each function in the control unit 130. The storage unit 120 is not limited, but as an example, a RAM (RandomAccessMemory) and a ROM (ReadOnly). It is composed of a memory including a memory) and a storage including an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory and the like. Further, the storage unit 120 stores the image data DB 121, the correct answer T (text) data DB 122, the text data DB 123, and the reading learning model DB 124. Further, the storage unit 120 temporarily stores the data when communicating with the user terminal 200 and the data generated by each process described later. The image data DB 121, the correct text data DB 122, the text data DB 123, and the reading learning model DB 124 are databases that can be accessed, referenced, and updated from various programs of the control unit 130.
 画像データDB121には、帳票類をスキャナ装置で画像としてスキャンした画像データ、またはこの画像データが格納されている格納先のパス情報が格納されている。この画像データは、OCRにより文字情報を読み取るためのものである。スキャンの対象となる帳票類は、前述のように例えば請求書等であり、請求書の請求元及び請求先の企業等は同一の企業等である必要はなく、請求書等のフォーマットも統一されている必要はない。なお、本実施形態ではスキャナ装置でスキャンされた画像データを対象としたが、紙媒体による帳票類が電子データ化されていればよく、例えば、カメラ等により撮像された写真画像データ等でもよい。 The image data DB 121 stores image data obtained by scanning forms as an image with a scanner device, or path information of a storage destination in which the image data is stored. This image data is for reading character information by OCR. As mentioned above, the forms to be scanned are, for example, invoices, etc., and the invoice source and billing destination companies do not have to be the same company, etc., and the format of the invoices, etc. is unified. You don't have to. In the present embodiment, the image data scanned by the scanner device is targeted, but it is sufficient that the forms in the paper medium are converted into electronic data, and for example, the photographic image data captured by a camera or the like may be used.
 正解テキストデータDB122には、画像データDB121に格納されている画像データ(または画像データDB121に格納されている格納先のパス情報から取得した画像データ)に含まれる文字情報が、正解テキストデータとして格納されている。この正解テキストデータは、OCRによる文字情報の読取が正常に行われたか否かを判定するために使用される。この正解テキストデータは、限定ではなく例として、基になる帳票類に記載されている項目ごとに格納されており、項目ごとに属性が設定されている。属性とは、例えばその項目の名称であり、「帳票名」、「会社名」、「日付」等である。 In the correct answer text data DB 122, character information included in the image data stored in the image data DB 121 (or the image data acquired from the storage destination path information stored in the image data DB 121) is stored as the correct answer text data. Has been done. This correct text data is used to determine whether or not the character information has been read normally by OCR. This correct answer text data is not limited, but is stored for each item described in the underlying forms as an example, and attributes are set for each item. The attribute is, for example, the name of the item, such as "form name", "company name", and "date".
 テキストデータDB123には、画像データDB121に格納されている画像データ(または画像データDB121に格納されている格納先のパス情報から取得した画像データ)をOCRにより、文字情報が読み取られて生成したテキストデータが格納されている。このテキストデータは、正常に読み込まれたか否かが判定され、正常に読み込まれたテキストデータは機械学習を行うために使用される。このテキストデータは、限定ではなく例として、正解テキストデータと同様に読み取られた帳票類に記載されている項目ごとに格納されており、項目ごとに属性が設定されている。 In the text data DB 123, the character information is read by OCR from the image data stored in the image data DB 121 (or the image data acquired from the path information of the storage destination stored in the image data DB 121), and the text is generated. The data is stored. Whether or not this text data is read normally is determined, and the text data read normally is used for machine learning. This text data is not limited, but is stored for each item described in the read forms as in the case of the correct answer text data, and the attribute is set for each item.
 読取学習モデルDB124には、正常に読み込まれたテキストデータにより機械学習が行われて生成された学習モデルが格納されている。この学習モデルは、帳票類を画像としてスキャンした画像データに含まれる文字情報の文字認識を行い、画像データにおける位置について推定してテキストデータを生成するためのモデル情報である。 The reading learning model DB 124 stores a learning model generated by machine learning based on normally read text data. This learning model is model information for generating text data by performing character recognition of character information included in image data obtained by scanning forms as an image and estimating the position in the image data.
 制御部130は、記憶部120に記憶されているプログラムを実行することにより、情報処理装置100の全体の動作を制御するものであり、限定ではなく例として、CPU(Central Processing Unit)、MPU(Micro Processing Unit)、GPU(Graphics Processing Unit)、マイクロプロセッサ(Microprocessor)、プロセッサコア(Processor core)、マルチプロセッサ(Multiprocessor)、ASIC(Application-Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)を含む装置等から構成される。制御部130の機能として、画像データ取得部131と、読取項目認識部132と、テキストデータ生成部133と、属性設定部134と、正解データ抽出部135と、学習部136とを備えている。この画像データ取得部131、読取項目認識部132、テキストデータ生成部133、属性設定部134、正解データ抽出部135、及び学習部136は、記憶部120に記憶されているプログラムにより起動されて情報処理装置100にて実行される。 The control unit 130 controls the entire operation of the information processing device 100 by executing a program stored in the storage unit 120, and is not limited to, but as an example, a CPU (Central Processing Unit), an MPU ( Equipment including MicroProcessingUnit), GPU (GraphicsProcessingUnit), microprocessor (Microprocessor), processor core (Processorcore), multiprocessor (Multiprocessor), ASIC (Application-Specific IntegratedCircuit), FPGA (Field ProgrammableGateArray) Etc. The functions of the control unit 130 include an image data acquisition unit 131, a reading item recognition unit 132, a text data generation unit 133, an attribute setting unit 134, a correct answer data extraction unit 135, and a learning unit 136. The image data acquisition unit 131, the reading item recognition unit 132, the text data generation unit 133, the attribute setting unit 134, the correct answer data extraction unit 135, and the learning unit 136 are activated by the program stored in the storage unit 120 to provide information. It is executed by the processing device 100.
 画像データ取得部131は、文字情報を含む画像データの例である帳票類をスキャナ装置で画像としてスキャンした画像データ、またはこの画像データが格納されている格納先のパス情報を、ユーザ端末200から通信部110を介して取得する。例えば、ユーザ端末200にスキャナ装置が直接接続され、またはネットワークNWを介して接続され、スキャンされた画像データがユーザ端末200から送信されるので、その画像データを取得してもよい。また、他の外部装置によりスキャンされた画像データがユーザ端末200により取得され、その画像データが送信されるので、その画像データを取得してもよい。この場合のスキャナ装置または外部装置は図示を省略する。画像データ取得部131で取得された画像データは、画像データDB121に格納される。 The image data acquisition unit 131 receives from the user terminal 200 the image data obtained by scanning the forms, which are examples of the image data including the character information, as an image by the scanner device, or the path information of the storage destination in which the image data is stored. Obtained via the communication unit 110. For example, since the scanner device is directly connected to the user terminal 200 or is connected via the network NW and the scanned image data is transmitted from the user terminal 200, the image data may be acquired. Further, since the image data scanned by another external device is acquired by the user terminal 200 and the image data is transmitted, the image data may be acquired. The scanner device or external device in this case is not shown. The image data acquired by the image data acquisition unit 131 is stored in the image data DB 121.
 読取項目認識部132は、画像データDB121に格納されている画像データから、この画像データに含まれる文字情報の位置を識別し、OCRにより文字情報を読み取るための読取項目として認識する。前述のように、例えば請求書等をスキャンした画像データの場合、画像データの中に含まれる文字情報、例えば「請求書」、「○○株式会社」のような文字情報の箇所が、例えば長方形で囲むように選択され、読取項目として認識される。 The reading item recognition unit 132 identifies the position of the character information included in the image data from the image data stored in the image data DB 121, and recognizes it as a reading item for reading the character information by OCR. As described above, in the case of image data obtained by scanning an invoice, for example, the character information contained in the image data, for example, the character information such as "invoice" and "○○ Co., Ltd." is rectangular, for example. It is selected to be enclosed in and recognized as a reading item.
 テキストデータ生成部133は、読取項目認識部132によって読取項目として認識された箇所について、OCRにより文字情報を読み取って文字認識を行い、テキストデータを生成する。前述のように、例えば請求書等をスキャンした画像データの場合、読取項目として認識された箇所の「請求書」、「○○株式会社」のような文字が文字情報として読み取られ、テキストデータが生成される。生成されたテキストデータは、読取項目ごとにテキストデータDB123に格納される。 The text data generation unit 133 reads character information by OCR and performs character recognition on a portion recognized as a reading item by the reading item recognition unit 132, and generates text data. As described above, in the case of image data obtained by scanning an invoice, for example, characters such as "invoice" and "○○ Co., Ltd." in the portion recognized as a read item are read as character information, and the text data is displayed. Will be generated. The generated text data is stored in the text data DB 123 for each read item.
 属性設定部134は、テキストデータ生成部133によって生成され、テキストデータDB123に格納されたテキストデータについて、例えば、その項目の名称である「帳票名」、「会社名」、「日付」といった属性が設定される。テキストデータの各項目に対する属性の設定は、例えば請求書等をスキャンした画像データにおける当該読取項目の位置と、その読取項目から読み取られたテキストデータの内容と、に基づき、属性設定部134により自動的に設定される。設定された属性は、テキストデータの読取項目に紐づけられてテキストデータDB123に格納される。 The attribute setting unit 134 has attributes such as "form name", "company name", and "date" which are the names of the items of the text data generated by the text data generation unit 133 and stored in the text data DB 123. Set. The attribute setting for each item of the text data is automatically set by the attribute setting unit 134 based on, for example, the position of the read item in the image data obtained by scanning the invoice or the like and the content of the text data read from the read item. Is set. The set attribute is stored in the text data DB 123 in association with the reading item of the text data.
 正解データ抽出部135は、テキストデータDB123に格納されているテキストデータと、正解テキストデータDB122に格納されている正解テキストデータと、を比較して一致しているか否かの判定を行い、一致していると判定されたテキストデータを抽出する。すなわち、正常に読取が行われたテキストデータを抽出する。この一致しているか否かの判定は、テキストデータDB123及び正解テキストデータDB122に格納されている読取項目ごとに判定される。 The correct answer data extraction unit 135 compares the text data stored in the text data DB 123 with the correct text data stored in the correct answer text data DB 122, determines whether or not they match, and determines whether or not they match. Extract the text data determined to be. That is, the text data that has been read normally is extracted. The determination as to whether or not there is a match is made for each read item stored in the text data DB 123 and the correct answer text data DB 122.
 正解データ抽出部135による判定は、限定ではなく例として、まず、テキストデータにおける読取項目と、正解テキストデータにおける読取項目とを比較してそれぞれの読取項目が一致しているか否かの判定が行われ、次に、読取項目が一致していると判定された場合に、テキストデータと正解テキストデータとが一致しているか否かの判定が行われる。または、テキストデータに設定されている属性と、正解テキストデータに設定されている属性とを比較して属性が一致しているか否かの判定が行われ、次に、属性が一致していると判定された読取項目について、テキストデータと正解テキストデータとが一致しているか否かの判定が行われる。 The determination by the correct answer data extraction unit 135 is not limited, but as an example, first, the read item in the text data is compared with the read item in the correct text data, and it is determined whether or not each read item matches. Next, when it is determined that the read items match, it is determined whether or not the text data and the correct answer text data match. Alternatively, the attributes set in the text data are compared with the attributes set in the correct text data to determine whether or not the attributes match, and then if the attributes match. For the determined read item, it is determined whether or not the text data and the correct answer text data match.
 また、正解データ抽出部135による判定では、限定ではなく例として、正解テキストデータに基づいてテキストデータの合致度を算出し、算出した合致度が所定の閾値以上の場合、テキストデータと正解テキストデータとが一致していると判定する。このように、一致しているか否かの判定は完全一致であるか否かの判定に限られず、合致度が所定の閾値以上の場合に一致していると判定してもよい。この合致度による一致の判定は、テキストデータDB123及び正解テキストデータDB122に格納されている読取項目ごとに判定されてもよく、属性についても同様に、合致度が所定の閾値以上の場合に一致していると判定してもよい。また、読取項目ごとに異なる閾値を使用して判定してもよい。特に属性の場合、完全一致している必要はなく、合致度が所定の閾値以上であれば一致していると判定しても差し支えないからである。 Further, in the determination by the correct answer data extraction unit 135, as an example, not limited, the matching degree of the text data is calculated based on the correct answer text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, the text data and the correct answer text data Is determined to match. As described above, the determination of whether or not there is a match is not limited to the determination of whether or not there is an exact match, and it may be determined that the match is made when the degree of match is equal to or greater than a predetermined threshold value. The determination of the match based on the degree of matching may be made for each read item stored in the text data DB 123 and the correct answer text data DB 122, and similarly, the attributes also match when the degree of matching is equal to or greater than a predetermined threshold value. It may be determined that it is. Further, the determination may be made using a different threshold value for each read item. In particular, in the case of attributes, it is not necessary that they match exactly, and if the degree of matching is equal to or higher than a predetermined threshold value, it may be determined that they match.
 学習部136は、正解データ抽出部135によって抽出されたテキストデータと、当該テキストデータの基になる、画像データDB121に格納されている画像データとに基づき、画像データにおける文字情報の位置について推定するため学習モデルに関する機械学習を行い、読取学習モデルDB124に格納されている学習モデルの生成または更新を行う。学習モデルの更新は、例えば、読取学習モデルDB124に格納されている学習モデルと、学習部136による学習結果とをマージするアグリゲーションの処理により行ってもよい。 The learning unit 136 estimates the position of the character information in the image data based on the text data extracted by the correct answer data extraction unit 135 and the image data stored in the image data DB 121 which is the basis of the text data. Therefore, machine learning is performed on the learning model, and the learning model stored in the reading learning model DB 124 is generated or updated. The learning model may be updated by, for example, an aggregation process in which the learning model stored in the reading learning model DB 124 and the learning result by the learning unit 136 are merged.
 学習部136による機械学習は、限定ではなく例として、抽出されたテキストデータと基になる画像データとを教師データとする教師あり機械学習により行われてもよく、教師なし機械学習により行われてもよく、ディープラーニングにより行われてもよい。 Machine learning by the learning unit 136 may be performed by supervised machine learning using the extracted text data and the underlying image data as teacher data, and may be performed by unsupervised machine learning, as an example without limitation. It may be done by deep learning.
 図2は、図1のユーザ端末200を示す機能ブロック構成図である。ユーザ端末200は、通信部210と、表示部220と、操作部230と、記憶部240と、制御部250とを備える。 FIG. 2 is a functional block configuration diagram showing the user terminal 200 of FIG. The user terminal 200 includes a communication unit 210, a display unit 220, an operation unit 230, a storage unit 240, and a control unit 250.
 通信部210は、ネットワークNWを介して情報処理装置100と有線または無線で通信を行うための通信インタフェースであり、互いの通信が実行できるのであればどのような通信プロトコルを用いてもよい。この通信部210は、限定ではなく例として、TCP/IP等の通信プロトコルにより通信が行われる。 The communication unit 210 is a communication interface for communicating with the information processing device 100 by wire or wirelessly via the network NW, and any communication protocol may be used as long as mutual communication can be executed. The communication unit 210 is not limited, and for example, communication is performed by a communication protocol such as TCP / IP.
 表示部220は、ユーザから入力された操作内容や、情報処理装置100からの送信内容を表示するために用いられるユーザインタフェースであり、液晶ディスプレイ等から構成される。表示部220では、情報処理装置100からユーザに対して通知される通知情報を表示する。 The display unit 220 is a user interface used for displaying the operation content input by the user and the transmission content from the information processing device 100, and is composed of a liquid crystal display or the like. The display unit 220 displays the notification information notified from the information processing device 100 to the user.
 操作部230は、ユーザが操作指示を入力するために用いられるユーザインタフェースであり、キーボードやマウス、タッチパネル等から構成される。操作部230は、ユーザが情報処理装置100に対して行う操作情報の入力に使用される。 The operation unit 230 is a user interface used for the user to input operation instructions, and is composed of a keyboard, a mouse, a touch panel, and the like. The operation unit 230 is used for inputting operation information performed by the user to the information processing device 100.
 記憶部240は、各種制御処理や制御部250内の各機能を実行するためのプログラム、入力データ等を記憶するものであり、限定ではなく例として、RAM、ROM等を含むメモリや、HDD、SSD、フラッシュメモリ等を含むストレージから構成される。また、記憶部240は、情報処理装置100と通信を行ったデータを一時的に記憶する。 The storage unit 240 stores programs for executing various control processes and each function in the control unit 250, input data, and the like. The storage unit 240 is not limited, and as an example, a memory including a RAM, a ROM, and the like, an HDD, and the like. It is composed of storage including SSD, flash memory and the like. In addition, the storage unit 240 temporarily stores the data that has communicated with the information processing device 100.
 制御部250は、記憶部240に記憶されているプログラムを実行することにより、ユーザ端末200の全体の動作を制御するものであり、限定ではなく例として、CPU、MPU、GPU、マイクロプロセッサ、プロセッサコア、マルチプロセッサ、ASIC、FPGAを含む装置等から構成される。 The control unit 250 controls the entire operation of the user terminal 200 by executing a program stored in the storage unit 240, and is not limited to, but as an example, a CPU, an MPU, a GPU, a microprocessor, and a processor. It is composed of a core, a multiprocessor, an ASIC, a device including an FPGA, and the like.
 <処理の流れ>
 図3を参照しながら、情報処理システム1の情報処理装置100が実行する、情報処理方法の一例の処理の流れについて説明する。図3は、図1の情報処理装置100の動作を示すフローチャートである。
<Processing flow>
The processing flow of an example of the information processing method executed by the information processing apparatus 100 of the information processing system 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the information processing device 100 of FIG.
 ステップS101の処理として、ユーザ端末200では、スキャンされた画像データ、またはこの画像データが格納されている格納先のパス情報が送信されるので、画像データ取得部131では、その画像データが取得される。取得された画像データは、画像データDB121に格納される。 As the process of step S101, the user terminal 200 transmits the scanned image data or the path information of the storage destination in which the image data is stored, so that the image data acquisition unit 131 acquires the image data. To. The acquired image data is stored in the image data DB 121.
 ステップS102の処理として、読取項目認識部132では、ステップS101で取得され、画像データDB121に格納されている画像データの読取が行われる。 As the process of step S102, the reading item recognition unit 132 reads the image data acquired in step S101 and stored in the image data DB 121.
 図4は、図1の画像データ取得部131で取得される画像データP1の例を示す模式図である。図4に示す画像データP1は、帳票の例として請求書をスキャンした画像データを示しており、「△△株式会社」が請求元であり、「〇〇株式会社」宛の請求書を例示している。この画像データP1には、帳票名である「請求書」や、請求元会社名、請求先会社名の他、件名、品目、個数、金額等の情報が記載されている。ステップS101の処理では、図4に示すような、例えば請求書等をスキャンした画像データが取得されて画像データDB121に格納され、ステップS102の処理で読み取られる。 FIG. 4 is a schematic diagram showing an example of image data P1 acquired by the image data acquisition unit 131 of FIG. The image data P1 shown in FIG. 4 shows image data obtained by scanning an invoice as an example of a form, in which "△△ Co., Ltd." is the invoice source and an invoice addressed to "○○ Co., Ltd." is illustrated. ing. In this image data P1, information such as a subject, an item, a quantity, and an amount of money is described in addition to a "invoice" which is a form name, a billing source company name, and a billing destination company name. In the process of step S101, for example, image data obtained by scanning an invoice or the like as shown in FIG. 4 is acquired, stored in the image data DB 121, and read by the process of step S102.
 ステップS103の処理として、読取項目認識部132では、ステップS102で読み取られた画像データに含まれる文字情報の位置が識別され、OCRにより文字情報を読み取るための読取項目として認識される。 As the process of step S103, the reading item recognition unit 132 identifies the position of the character information included in the image data read in step S102, and recognizes it as a reading item for reading the character information by OCR.
 図5は、図1の読取項目認識部132で行われる読取項目の認識の例を示す模式図である。図5では、図4に示す画像データP1に対して文字情報の位置が識別され、文字情報を読み取るための読取項目として認識される例を示している。図5に示す読取項目A1~A11は、図4に示す画像データP1の文字情報が読取項目として認識された状態を示しており、文字情報が長方形の選択エリアとして認識されている。 FIG. 5 is a schematic diagram showing an example of recognition of a read item performed by the read item recognition unit 132 of FIG. FIG. 5 shows an example in which the position of the character information is identified with respect to the image data P1 shown in FIG. 4 and recognized as a reading item for reading the character information. The reading items A1 to A11 shown in FIG. 5 indicate a state in which the character information of the image data P1 shown in FIG. 4 is recognized as a reading item, and the character information is recognized as a rectangular selection area.
 図5に示すように、例えば読取項目A1は、帳票名である「請求書」の文字情報を選択している。読取項目A2は、請求元会社名である「△△株式会社」の文字情報を選択している。読取項目A3は、請求先会社名である「〇〇株式会社」の文字情報を選択している。読取項目A4は、日付である「2019年9月1日」の文字情報を選択している。読取項目A5は、件名である「〇△◇の件」の文字情報を選択している。読取項目A6は、品目名である「〇△◇手数料」の文字情報を選択している。読取項目A7は、読取項目A6の品目の個数である「1」の文字情報を選択している。読取項目A8は、読取項目A6の品目の金額である「150,000」の文字情報を選択している。読取項目A9は、小計の金額である「150,000」の文字情報を選択している。読取項目A10は、消費税の金額である「12,000」の文字情報を選択している。読取項目A11は、合計の金額である「162,000」の文字情報を選択している。ステップS103の処理では、図5に示すような文字情報の位置が識別され、文字情報を読み取るための読取項目として認識される。 As shown in FIG. 5, for example, for the reading item A1, the character information of the form name "invoice" is selected. For the reading item A2, the character information of "△△ Co., Ltd.", which is the name of the billing company, is selected. For the reading item A3, the character information of "○○ Co., Ltd." which is the billing company name is selected. The reading item A4 selects the character information of the date "September 1, 2019". For the reading item A5, the character information of the subject "○ △ ◇ matter" is selected. For the reading item A6, the character information of the item name "○ △ ◇ fee" is selected. For the reading item A7, the character information of "1", which is the number of items of the reading item A6, is selected. The reading item A8 selects the character information of "150,000" which is the amount of money of the item of the reading item A6. The reading item A9 selects the character information of "150,000", which is the subtotal amount. The reading item A10 selects the character information of "12,000", which is the amount of consumption tax. The reading item A11 selects the character information of "162,000", which is the total amount of money. In the process of step S103, the position of the character information as shown in FIG. 5 is identified and recognized as a reading item for reading the character information.
 ステップS104の処理として、テキストデータ生成部133では、ステップS103で読取項目として認識された箇所について、OCRにより文字情報が読み取られて文字認識が行われ、テキストデータが生成される。生成されたテキストデータは、読取項目ごとにテキストデータDB123に格納される。 As the process of step S104, the text data generation unit 133 reads the character information by OCR and performs character recognition on the portion recognized as the read item in step S103, and the text data is generated. The generated text data is stored in the text data DB 123 for each read item.
 ステップS104の処理では、図5に示すような、読取項目A1で「請求書」の文字が読み取られ、テキストデータとして生成される。同様に、読取項目A2で「△△株式会社」の文字が読み取られ、テキストデータとして生成される。読取項目A3で「〇〇株式会社」の文字が読み取られ、テキストデータとして生成される。読取項目A4で「2019年9月1日」の文字が読み取られ、テキストデータとして生成される。読取項目A5で「〇△◇の件」の文字が読み取られ、テキストデータとして生成される。以下の処理は同様なので省略する。 In the process of step S104, the characters of "invoice" are read by the reading item A1 as shown in FIG. 5, and are generated as text data. Similarly, in the reading item A2, the characters "△△ Co., Ltd." are read and generated as text data. The characters "○○ Co., Ltd." are read by the reading item A3 and generated as text data. The characters "September 1, 2019" are read by the reading item A4 and generated as text data. The characters "○ △ ◇" are read in the reading item A5 and generated as text data. The following processing is the same, so it is omitted.
 ステップS105の処理として、属性設定部134では、ステップS104で生成され、テキストデータDB123に格納されたテキストデータについて、属性が設定される。 As the process of step S105, the attribute setting unit 134 sets attributes for the text data generated in step S104 and stored in the text data DB 123.
 図6は、図1のテキストデータ生成部133及び属性設定部134において生成及び属性設定されたテキストデータT1の例を示す模式図である。図6のテキストデータT1の右欄に示すテキストデータは、図5に示す読取項目A1~A11から生成されたテキストデータであり(読取項目A6~A11については図示を省略する。)、それぞれのテキストデータに紐づくように、図6の左欄に示す属性が設定されている。 FIG. 6 is a schematic diagram showing an example of text data T1 generated and attribute-set by the text data generation unit 133 and the attribute setting unit 134 of FIG. The text data shown in the right column of the text data T1 in FIG. 6 is text data generated from the reading items A1 to A11 shown in FIG. 5 (the reading items A6 to A11 are not shown), and the respective texts are shown. The attributes shown in the left column of FIG. 6 are set so as to be linked to the data.
 例えば、図5に示す読取項目A1から生成された「請求書」のテキストデータには、属性として「帳票名」が設定されている。同様に、読取項目A2から生成された「△△株式会社」のテキストデータには、属性として「会社名」が設定されている。読取項目A3から生成された「〇〇株式会社」のテキストデータには、属性として「宛先」が設定されている。読取項目A4から生成された「2019年9月7日」のテキストデータには、属性として「年月日」が設定されている(本実施形態では、当該項目について読取が正常に行われなかったものとする。)。読取項目A5から生成された「〇△◇の件」のテキストデータには、属性として「件名」が設定されている。ステップS105の処理では、図5に示すような、テキストデータ「請求書」等に対して、属性として「帳票名」等が設定される。 For example, in the text data of the "invoice" generated from the reading item A1 shown in FIG. 5, the "form name" is set as an attribute. Similarly, in the text data of "△△ Co., Ltd." generated from the read item A2, "company name" is set as an attribute. In the text data of "○○ Co., Ltd." generated from the read item A3, "destination" is set as an attribute. "Date" is set as an attribute in the text data of "September 7, 2019" generated from the reading item A4 (in this embodiment, the reading was not normally performed for the item. It shall be.). In the text data of "○ △ ◇ matter" generated from the reading item A5, "subject" is set as an attribute. In the process of step S105, a "form name" or the like is set as an attribute for the text data "invoice" or the like as shown in FIG.
 ステップS106の処理として、正解データ抽出部135では、ステップS104で生成され、テキストデータDB123に格納されているテキストデータと、正解テキストデータDB122に格納されている正解テキストデータとが比較され、一致しているか否かの判定が行われる。 As the process of step S106, the correct answer data extraction unit 135 compares and matches the text data generated in step S104 and stored in the text data DB 123 with the correct answer text data stored in the correct answer text data DB 122. It is determined whether or not it is.
 このときの例として、まず、テキストデータにおける読取項目と、正解テキストデータにおける読取項目とが比較され、それぞれの読取項目が一致しているか否かの判定が行われる。次に、読取項目が一致していると判定された場合に、テキストデータと正解テキストデータとが一致しているか否かの判定が行われる。または、テキストデータに設定されている属性と、正解テキストデータに設定されている属性とが比較され、属性が一致しているか否かの判定が行われる。次に、属性が一致していると判定された読取項目について、テキストデータと正解テキストデータとが一致しているか否かの判定が行われる。 As an example at this time, first, the read items in the text data and the read items in the correct text data are compared, and it is determined whether or not the read items match. Next, when it is determined that the read items match, it is determined whether or not the text data and the correct answer text data match. Alternatively, the attributes set in the text data and the attributes set in the correct text data are compared, and it is determined whether or not the attributes match. Next, it is determined whether or not the text data and the correct text data match with respect to the read items for which the attributes are determined to match.
 図7は、図1の正解テキストデータDB122に格納される正解テキストデータT2の例を示す模式図である。図7の正解テキストデータT2の右欄に示す正解テキストデータは、図4に示す画像データP1に含まれる文字情報の正解テキストデータとして、正解テキストデータDB122に格納されているデータである。それぞれの正解テキストデータの読取項目ごとに、テキストデータDB123に格納されているテキストデータと同様に、それぞれの正解テキストデータに紐づくように、図7の左欄に示す属性が設定されている。 FIG. 7 is a schematic diagram showing an example of the correct answer text data T2 stored in the correct answer text data DB 122 of FIG. The correct answer text data shown in the right column of the correct answer text data T2 of FIG. 7 is data stored in the correct answer text data DB 122 as the correct answer text data of the character information included in the image data P1 shown in FIG. For each read item of the correct answer text data, the attributes shown in the left column of FIG. 7 are set so as to be associated with the respective correct answer text data in the same manner as the text data stored in the text data DB 123.
 例えば、図5に示す読取項目A1の正解データである「請求書」の正解テキストデータには、属性として「帳票名」が設定されている。同様に、読取項目A2の正解データである「△△株式会社」の正解テキストデータには、属性として「請求元」が設定されている。読取項目A3の正解データである「〇〇株式会社」の正解テキストデータには、属性として「請求先」が設定されている。読取項目A4の正解データである「2019年9月1日」の正解テキストデータには、属性として「年月日」が設定されている。読取項目A5の正解データである「〇△◇の件」の正解テキストデータには、属性として「件名」が設定されている。 For example, in the correct answer text data of "invoice" which is the correct answer data of the reading item A1 shown in FIG. 5, "form name" is set as an attribute. Similarly, in the correct answer text data of "△△ Co., Ltd." which is the correct answer data of the reading item A2, "billing source" is set as an attribute. "Billing address" is set as an attribute in the correct answer text data of "○○ Co., Ltd." which is the correct answer data of the reading item A3. "Date" is set as an attribute in the correct text data of "September 1, 2019", which is the correct answer data of the read item A4. A "subject" is set as an attribute in the correct text data of "○ △ ◇ matter" which is the correct answer data of the read item A5.
 図8は、図1の正解データ抽出部135における判定の例を示す模式図である。ステップS106の処理では、図6に示すテキストデータT1と、図7に示す正解テキストデータT2とが比較され、一致しているか否かの判定が行われる。図8に示すテキストデータT1、正解テキストデータT2は、それぞれ図6に示すテキストデータT1、図7に示す正解テキストデータT2と同一である。 FIG. 8 is a schematic diagram showing an example of determination in the correct answer data extraction unit 135 of FIG. In the process of step S106, the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7 are compared, and it is determined whether or not they match. The text data T1 and the correct answer text data T2 shown in FIG. 8 are the same as the text data T1 shown in FIG. 6 and the correct answer text data T2 shown in FIG. 7, respectively.
 このときの例として、まず、テキストデータT1に設定されている属性と、正解テキストデータT2に設定されている属性とが比較され、属性が一致しているか否かの判定が行われる。図8に示す例では、テキストデータT1の2行目の属性「会社名」、3行目の属性「宛先」が、正解テキストデータT2の2行目の属性「請求元」、3行目の属性「請求先」とそれぞれ異なる。このような相違について、正解データ抽出部135では、各項目の属性について合致度を算出し、算出した合致度が所定の閾値以上の場合に一致していると属性ごとに判定する。この場合、属性が異なってもテキストデータの生成には影響しないため、当該属性は一致していると判定してもよい。 As an example at this time, first, the attribute set in the text data T1 and the attribute set in the correct text data T2 are compared, and it is determined whether or not the attributes match. In the example shown in FIG. 8, the attribute "company name" in the second line of the text data T1 and the attribute "destination" in the third line are the attributes "billing source" in the second line of the correct text data T2, and the third line. It is different from the attribute "billing address". Regarding such a difference, the correct answer data extraction unit 135 calculates the degree of matching for the attributes of each item, and determines for each attribute that the calculated matching degree matches when the calculated matching degree is equal to or greater than a predetermined threshold value. In this case, since different attributes do not affect the generation of text data, it may be determined that the attributes match.
 次に、属性が一致していると判定された読取項目について、テキストデータT1と正解テキストデータT2とが一致しているか否かの判定が行われる。図8に示す例では、テキストデータT1の4行目「2019年9月7日」が、正解テキストデータT2の4行目「2019年9月1日」と異なる。このような相違について、正解データ抽出部135では、各項目のテキストデータについて合致度を算出し、算出した合致度が所定の閾値以上の場合に一致していると判定する。この場合、日付が異なる場合は正常に読取ができなかったと判定してもよい。 Next, it is determined whether or not the text data T1 and the correct text data T2 match with respect to the read items for which the attributes are determined to match. In the example shown in FIG. 8, the fourth line “September 7, 2019” of the text data T1 is different from the fourth line “September 1, 2019” of the correct text data T2. Regarding such a difference, the correct answer data extraction unit 135 calculates the matching degree for the text data of each item, and determines that the matching degree matches when the calculated matching degree is equal to or more than a predetermined threshold value. In this case, if the dates are different, it may be determined that the reading could not be performed normally.
 ステップS107の処理として、正解データ抽出部135では、ステップS106で行われた判定結果が、一致していると判定されたテキストデータを抽出する。一致していると判定されたテキストデータの抽出は、画像データ単位で行われてもよく、読取項目単位で行われてもよい。 As the process of step S107, the correct answer data extraction unit 135 extracts the text data determined that the determination results performed in step S106 match. The text data determined to match may be extracted in units of image data or in units of read items.
 例えば、図8に示す例の場合、属性「年月日」の読取項目の読取が正常に行われなかったが、当該請求書についての読取項目のテキストデータをすべて抽出対象外にしてもよく、属性「年月日」の読取項目のみを対象外にしてもよい。抽出した項目は、例えばテキストデータDB123にステータス情報を設けて抽出した項目についてのみステータス設定してもよく、別途データベースを設けてもよい。 For example, in the case of the example shown in FIG. 8, the read item of the attribute "date" was not read normally, but all the text data of the read item for the invoice may be excluded from the extraction target. Only the read item of the attribute "date" may be excluded. For the extracted items, for example, the status information may be provided in the text data DB 123 and the status may be set only for the extracted items, or a separate database may be provided.
 ステップS108の処理として、学習部136では、ステップS107で抽出されたテキストデータと、当該テキストデータの基になる、画像データDB121に格納されている画像データとに基づいて機械学習が行われ、読取学習モデルDB124に格納されている学習モデルが生成され、更新が行われる。 As the process of step S108, the learning unit 136 performs machine learning based on the text data extracted in step S107 and the image data stored in the image data DB 121 which is the basis of the text data, and reads the text data. The learning model stored in the learning model DB 124 is generated and updated.
 <効果>
 以上のように、本実施形態に係る情報処理装置、情報処理システム、及び情報処理方法は、帳票類を画像としてスキャンした画像データに含まれる文字情報の文字認識を行い、テキストデータを生成する。また、画像データに含まれる文字情報の正解データである正解テキストデータをあらかじめ記憶する。テキストデータと正解テキストデータ比較して一致しているか否かの判定を行い、一致していると判定されたテキストデータと、基になる画像データとに基づいて機械学習を行い、学習モデルが生成される。そのため、正常に読取りが行われたテキストデータのみを機械学習の対象にして機械学習を行うので、効率よく画像データの読取精度を上げることが可能である。
<Effect>
As described above, the information processing device, the information processing system, and the information processing method according to the present embodiment perform character recognition of character information included in image data obtained by scanning forms as an image and generate text data. In addition, the correct text data, which is the correct answer data of the character information included in the image data, is stored in advance. A learning model is generated by comparing the text data with the correct text data to determine whether or not they match, and performing machine learning based on the text data determined to match and the underlying image data. Will be done. Therefore, since machine learning is performed by targeting only the text data that has been read normally as the target of machine learning, it is possible to efficiently improve the reading accuracy of the image data.
 また、画像データに含まれる文字情報を、読取項目ごとに文字認識を行い、読取項目ごとに正解テキストデータと比較して一致しているか判定し、一致していると判定されたテキストデータと基になる画像データとに基づいて機械学習を行う。そのため、読取項目ごとに判定されるので、項目ごとに異なる読取の精度を、それぞれ向上させることが可能である。 In addition, the character information contained in the image data is character-recognized for each read item, compared with the correct text data for each read item to determine whether or not they match, and is based on the text data determined to match. Machine learning is performed based on the image data that becomes. Therefore, since the determination is made for each reading item, it is possible to improve the reading accuracy that differs for each item.
 さらに、正解テキストデータに基づいてテキストデータの合致度を算出し、算出した合致度が所定の閾値以上の場合、テキストデータと正解テキストデータとが一致していると判定する。また、この判定は読取項目ごとに行われてもよい。そのため、一致しているか否かの判定基準を帳票ごと、読取項目ごとに設定することができる。これにより、項目ごとに異なる読取の精度を、より効率的にそれぞれ向上させることが可能である。 Further, the matching degree of the text data is calculated based on the correct answer text data, and if the calculated matching degree is equal to or more than a predetermined threshold value, it is determined that the text data and the correct answer text data match. Further, this determination may be made for each read item. Therefore, the criteria for determining whether or not they match can be set for each form and each read item. As a result, it is possible to improve the reading accuracy, which differs for each item, more efficiently.
 (実施形態2)
 図9は、本開示の実施形態2に係る情報処理システム1Aを示す機能ブロック構成図である。この情報処理システム1Aは、文字情報を含む画像データに含まれる文字情報の文字認識を行ってテキストデータを生成し、正常に読み込まれたテキストデータと、その基になる画像データとに基づいて機械学習を行うシステムである点において、実施形態1に係る情報処理システム1と同様であるが、本実施形態に備える情報処理装置100Aの制御部130の機能として、画像データ読取部137を備えている点において、実施形態1に係る情報処理システム1と異なる。
(Embodiment 2)
FIG. 9 is a functional block configuration diagram showing the information processing system 1A according to the second embodiment of the present disclosure. This information processing system 1A generates text data by performing character recognition of character information included in image data including character information, and is a machine based on the normally read text data and the image data on which the information data is based. It is the same as the information processing system 1 according to the first embodiment in that it is a learning system, but includes an image data reading unit 137 as a function of the control unit 130 of the information processing device 100A provided in the present embodiment. In that respect, it differs from the information processing system 1 according to the first embodiment.
 本実施形態では、情報処理システム1Aによって生成された学習モデルに基づき、実際の帳票類の読取を行うものである。 In this embodiment, the actual forms are read based on the learning model generated by the information processing system 1A.
 画像データ読取部137は、学習部136で機械学習が行われて読取学習モデルDB124に格納されている学習モデルに基づき、新たに帳票類をスキャンした画像データを取得して文字情報の文字認識を行い、新たなテキストデータを生成する。新たなテキストデータは、テキストデータDB123に格納してもよく、新たに別のデータベースに格納してもよい。このテキストデータは、例えば、帳票類をスキャンしてOCRにより読み取りテキストデータを生成するサービスの成果物として、当該帳票類を提供する者に提供してもよい。 The image data reading unit 137 acquires image data by newly scanning forms based on the learning model in which machine learning is performed by the learning unit 136 and is stored in the reading learning model DB 124, and character recognition of character information is performed. And generate new text data. The new text data may be stored in the text data DB 123, or may be newly stored in another database. This text data may be provided to a person who provides the forms, for example, as a product of a service that scans the forms and generates the text data read by OCR.
 本実施形態における学習部136は、新たなテキストデータと新たな帳票類をスキャンした画像データとに基づいて機械学習を行ってもよい。これにより、さらに読取の精度を、向上させることができる。その他の構成及び処理の流れについては、実施形態1と同様である。 The learning unit 136 in the present embodiment may perform machine learning based on new text data and image data obtained by scanning new forms. Thereby, the reading accuracy can be further improved. Other configurations and processing flows are the same as those in the first embodiment.
 本実施形態によれば、上記実施形態1の効果に加え、新たに帳票類をスキャンした画像データを取得して文字情報の文字認識を行う画像データ読取部を備え、学習モデルに基づいて文字情報の文字認識を行う。これにより、さらに読取の精度を向上させることができるとともに、帳票類をスキャンしてOCRにより読み取りテキストデータを生成するサービスの成果物として、当該帳票類を提供する者に提供することが可能である。 According to the present embodiment, in addition to the effect of the first embodiment, an image data reading unit for newly acquiring image data obtained by scanning forms and performing character recognition of character information is provided, and character information is provided based on a learning model. Character recognition of. As a result, the accuracy of reading can be further improved, and it is possible to provide the form as a product of the service of scanning the forms and generating the read text data by OCR to the person who provides the forms. ..
 (実施形態3(プログラム))
 図10は、コンピュータ(電子計算機)700の構成の例を示す機能ブロック構成図である。コンピュータ700は、CPU701、主記憶装置702、補助記憶装置703、インタフェース704を備える。
(Embodiment 3 (program))
FIG. 10 is a functional block configuration diagram showing an example of the configuration of the computer (electronic computer) 700. The computer 700 includes a CPU 701, a main storage device 702, an auxiliary storage device 703, and an interface 704.
 ここで、実施形態1及び2に係る画像データ取得部131と、読取項目認識部132と、テキストデータ生成部133と、属性設定部134と、正解データ抽出部135と、学習部136と、画像データ読取部137とを構成する各機能を実現するための制御プログラム(情報処理プログラム)の詳細について説明する。これらの機能ブロックは、コンピュータ700に実装される。そして、これらの各構成要素の動作は、プログラムの形式で補助記憶装置703に記憶されている。CPU701は、プログラムを補助記憶装置703から読み出して主記憶装置702に展開し、当該プログラムに従って上記処理を実行する。また、CPU701は、プログラムに従って、上述した記憶部に対応する記憶領域を主記憶装置702に確保する。 Here, the image data acquisition unit 131, the reading item recognition unit 132, the text data generation unit 133, the attribute setting unit 134, the correct answer data extraction unit 135, the learning unit 136, and the image according to the first and second embodiments. The details of the control program (information processing program) for realizing each function constituting the data reading unit 137 will be described. These functional blocks are implemented in the computer 700. The operation of each of these components is stored in the auxiliary storage device 703 in the form of a program. The CPU 701 reads the program from the auxiliary storage device 703, expands it to the main storage device 702, and executes the above processing according to the program. Further, the CPU 701 secures a storage area corresponding to the above-mentioned storage unit in the main storage device 702 according to the program.
 当該プログラムは、具体的には、コンピュータ700において、画像データを取得する画像データ取得ステップと、画像データに含まれる文字情報の位置を識別し、読取項目として認識する読取項目認識ステップと、読取項目における文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成ステップと、テキストデータと、あらかじめ記憶されている画像データに含まれる文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定されたテキストデータを抽出する正解データ抽出ステップと、抽出されたテキストデータと、抽出されたテキストデータの基になる画像データにおける読取項目の位置とに基づいて機械学習を行い、学習モデルの生成及び更新を行う学習ステップと、をコンピュータによって実現する制御プログラムである。 Specifically, the program includes an image data acquisition step of acquiring image data, a reading item recognition step of identifying the position of character information included in the image data and recognizing it as a reading item, and a reading item in the computer 700. The text data generation step of performing character recognition of the character information in the above and generating the text data, and the text data and the correct answer text data indicating the character information included in the image data stored in advance are compared for each read item. The correct answer data extraction step that determines whether or not they match and extracts the text data that is determined to match, and the reading of the extracted text data and the image data that is the basis of the extracted text data. It is a control program that realizes a learning step that performs machine learning based on the position of an item and generates and updates a learning model by a computer.
 なお、補助記憶装置703は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例としては、インタフェース704を介して接続される磁気ディスク、光磁気ディスク、CD-ROM、DVD-ROM、半導体メモリ等が挙げられる。また、このプログラムがネットワークを介してコンピュータ700に配信される場合、配信を受けたコンピュータ700が当該プログラムを主記憶装置702に展開し、上記処理を実行してもよい。 The auxiliary storage device 703 is an example of a tangible medium that is not temporary. Other examples of non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, etc. connected via interface 704. When this program is distributed to the computer 700 via the network, the distributed computer 700 may expand the program to the main storage device 702 and execute the above processing.
 また、当該プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、当該プログラムは、前述した機能を補助記憶装置703に既に記憶されている他のプログラムとの組み合わせで実現するもの、いわゆる差分ファイル(差分プログラム)であってもよい。 Further, the program may be for realizing a part of the above-mentioned functions. Further, the program may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with another program already stored in the auxiliary storage device 703.
 以上、開示に係る実施形態について説明したが、これらはその他の様々な形態で実施することが可能であり、種々の省略、置換及び変更を行なって実施することができる。これらの実施形態及び変形例ならびに省略、置換及び変更を行なったものは、特許請求の範囲の技術的範囲とその均等の範囲に含まれる。 Although the embodiments related to disclosure have been described above, these can be implemented in various other embodiments, and can be implemented by making various omissions, substitutions, and changes. These embodiments and modifications, as well as those omitted, replaced or modified, are included in the technical scope of the claims and the equivalent scope thereof.
1,1A 情報処理システム、100,100A 情報処理装置、110 通信部、120 記憶部、121 画像データDB、122 正解T(テキスト)データDB、123 テキストデータDB、124 読取学習モデルDB、130 制御部、131 画像データ取得部、132 読取項目認識部、133 テキストデータ生成部、134 属性設定部、135 正解データ抽出部、136 学習部、137 画像データ読取部、200 ユーザ端末、210 通信部、220 表示部、230 操作部、240 記憶部、250 制御部、NW ネットワーク

 
1,1A information processing system, 100,100A information processing device, 110 communication unit, 120 storage unit, 121 image data DB, 122 correct answer T (text) data DB, 123 text data DB, 124 reading learning model DB, 130 control unit , 131 image data acquisition unit, 132 reading item recognition unit, 133 text data generation unit, 134 attribute setting unit, 135 correct answer data extraction unit, 136 learning unit, 137 image data reading unit, 200 user terminal, 210 communication unit, 220 display Unit, 230 operation unit, 240 storage unit, 250 control unit, NW network

Claims (11)

  1.  文字情報を含む画像データから前記文字情報を読み取り、読み取った前記文字情報の前記画像データにおける位置について推定するための学習モデルに関する機械学習を行う情報処理装置であって、
     前記画像データを取得する画像データ取得部と、
     前記画像データに含まれる前記文字情報の位置を識別し、読取項目として認識する読取項目認識部と、
     読取項目における前記文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成部と、
     前記テキストデータと、あらかじめ記憶されている前記画像データに含まれる前記文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定された前記テキストデータを抽出する正解データ抽出部と、
     抽出された前記テキストデータと、抽出された前記テキストデータの基になる前記画像データにおける読取項目の位置とに基づいて機械学習を行い、前記学習モデルの生成または更新を行う学習部と、を備える情報処理装置。
    An information processing device that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
    An image data acquisition unit that acquires the image data,
    A reading item recognition unit that identifies the position of the character information included in the image data and recognizes it as a reading item.
    A text data generation unit that performs character recognition of the character information in the read item and generates text data,
    The text data and the correct text data indicating the character information included in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. The correct answer data extraction unit that extracts the text data
    It includes a learning unit that performs machine learning based on the extracted text data and the position of a read item in the image data that is the basis of the extracted text data, and generates or updates the learning model. Information processing device.
  2.  前記正解データ抽出部は、前記テキストデータと前記正解テキストデータとを比較して前記テキストデータの合致度を算出し、算出した合致度が所定の閾値以上の場合、前記テキストデータと前記正解テキストデータとが一致していると判定する、請求項1に記載の情報処理装置。 The correct answer data extraction unit calculates the matching degree of the text data by comparing the text data with the correct answer text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, the text data and the correct answer text data The information processing apparatus according to claim 1, wherein the information processing apparatus is determined to match.
  3.  前記正解データ抽出部は、
     前記テキストデータにおける読取項目と、前記正解テキストデータにおける読取項目とを比較し、それぞれの読取項目が一致していると判定し、
     一致していると判定された場合、前記テキストデータと、前記正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行う、請求項1または請求項2に記載の情報処理装置。
    The correct answer data extraction unit
    The read items in the text data are compared with the read items in the correct text data, and it is determined that the read items match.
    The information according to claim 1 or 2, wherein when it is determined that they match, the text data and the correct text data are compared for each read item, and it is determined whether or not they match. Processing equipment.
  4.  前記正解データ抽出部は、
     前記テキストデータにおける読取項目と、前記正解テキストデータにおける読取項目とを比較して前記テキストデータの読取項目の合致度を算出し、算出した合致度が所定の閾値以上の場合、それぞれの読取項目が一致していると判定し、
     一致していると判定された場合、前記テキストデータと、前記正解テキストデータとを読取項目ごとに比較して前記テキストデータの合致度を読取項目ごとに算出し、算出した合致度がそれぞれ所定の閾値以上の場合、前記テキストデータが一致していると判定して抽出する、請求項3に記載の情報処理装置。
    The correct answer data extraction unit
    The reading item in the text data is compared with the reading item in the correct text data to calculate the matching degree of the reading item of the text data, and when the calculated matching degree is equal to or more than a predetermined threshold value, each reading item is determined. Judge that they match,
    When it is determined that they match, the text data and the correct text data are compared for each read item, the matching degree of the text data is calculated for each reading item, and the calculated matching degree is predetermined. The information processing apparatus according to claim 3, wherein if the text data is equal to or greater than the threshold value, it is determined that the text data match and the data is extracted.
  5.  前記正解データ抽出部は、算出した合致度が読取項目ごとにそれぞれ異なる所定の閾値以上の場合、前記テキストデータが一致していると判定する、請求項4に記載の情報処理装置。 The information processing device according to claim 4, wherein the correct answer data extraction unit determines that the text data match when the calculated matching degree is equal to or higher than a predetermined threshold value that differs for each read item.
  6.  認識された前記読取項目の前記画像データにおける位置と、前記読取項目から読み取られた前記テキストデータと、に基づき、前記読取項目の属性を設定する属性設定部を備える、請求項1から請求項5のいずれか1項に記載の情報処理装置。 Claims 1 to 5 include an attribute setting unit for setting the attributes of the read item based on the recognized position of the read item in the image data and the text data read from the read item. The information processing apparatus according to any one of the above items.
  7.  前記正解データ抽出部は、
     前記テキストデータにおける読取項目の属性と、前記正解テキストデータにおける読取項目に設定された属性とを比較し、一致しているか否かの判定を行い、
     一致していると判定された場合、前記テキストデータと、前記正解テキストデータとを属性ごとに比較し、一致しているか否かの判定を行う、請求項6に記載の情報処理装置。
    The correct answer data extraction unit
    The attribute of the read item in the text data is compared with the attribute set in the read item in the correct answer text data, and it is determined whether or not they match.
    The information processing apparatus according to claim 6, wherein when it is determined that they match, the text data and the correct text data are compared for each attribute and it is determined whether or not they match.
  8.  前記学習部は、抽出された前記テキストデータと、抽出された前記テキストデータの基になる前記画像データにおける読取項目の位置とを教師データとする教師あり機械学習を行う、請求項1から請求項7のいずれか1項に記載の情報処理装置。 The learning unit performs supervised machine learning using the extracted text data and the position of a read item in the image data on which the extracted text data is based as supervised data, according to claims 1 to 1. The information processing apparatus according to any one of 7.
  9.  前記学習モデルに基づき、新たな画像データを取得して文字情報の文字認識を行い、新たなテキストデータを生成する画像データ読取部を備える、請求項1から請求項8のいずれか1項に記載の情報処理装置。 The method according to any one of claims 1 to 8, further comprising an image data reading unit that acquires new image data, performs character recognition of character information, and generates new text data based on the learning model. Information processing equipment.
  10.  文字情報を含む画像データから前記文字情報を読み取り、読み取った前記文字情報の前記画像データにおける位置について推定するための学習モデルに関する機械学習を行う情報処理方法であって、
     画像データ取得部が行う、前記画像データを取得する画像データ取得ステップと、
     読取項目認識部が行う、前記画像データに含まれる前記文字情報の位置を識別し、読取項目として認識する読取項目認識ステップと、
     テキストデータ生成部が行う、読取項目における前記文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成ステップと、
     正解データ抽出部が行う、前記テキストデータと、あらかじめ記憶されている前記画像データに含まれる前記文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定された前記テキストデータを抽出する正解データ抽出ステップと、
     学習部が行う、抽出された前記テキストデータと、抽出された前記テキストデータの基になる前記画像データにおける読取項目の位置とに基づいて機械学習を行い、前記学習モデルの生成または更新を行う学習ステップと、を備える情報処理方法。
    It is an information processing method that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
    An image data acquisition step of acquiring the image data performed by the image data acquisition unit, and
    A reading item recognition step performed by the reading item recognition unit to identify the position of the character information included in the image data and recognize it as a reading item.
    A text data generation step of performing character recognition of the character information in a read item and generating text data performed by the text data generation unit, and
    The text data performed by the correct answer data extraction unit is compared with the correct text data indicating the character information included in the image data stored in advance for each read item, and it is determined whether or not they match. , The correct answer data extraction step to extract the text data determined to match, and
    Learning that the learning unit performs machine learning based on the extracted text data and the position of the read item in the image data that is the basis of the extracted text data, and generates or updates the learning model. An information processing method that includes steps.
  11.  文字情報を含む画像データから前記文字情報を読み取り、読み取った前記文字情報の前記画像データにおける位置について推定するための学習モデルに関する機械学習を行う情報処理プログラムであって、
     前記画像データを取得する画像データ取得ステップと、
     前記画像データに含まれる前記文字情報の位置を識別し、読取項目として認識する読取項目認識ステップと、
     読取項目における前記文字情報の文字認識を行い、テキストデータを生成するテキストデータ生成ステップと、
     前記テキストデータと、あらかじめ記憶されている前記画像データに含まれる前記文字情報を示す正解テキストデータとを読取項目ごとに比較し、一致しているか否かの判定を行い、一致していると判定された前記テキストデータを抽出する正解データ抽出ステップと、
     抽出された前記テキストデータと、抽出された前記テキストデータの基になる前記画像データにおける読取項目の位置とに基づいて機械学習を行い、前記学習モデルの生成または更新を行う学習ステップと、を電子計算機に実行させるための、情報処理プログラム。

     
    An information processing program that reads the character information from image data including character information and performs machine learning on a learning model for estimating the position of the read character information in the image data.
    The image data acquisition step for acquiring the image data and
    A reading item recognition step that identifies the position of the character information included in the image data and recognizes it as a reading item.
    A text data generation step of performing character recognition of the character information in the read item and generating text data, and
    The text data and the correct text data indicating the character information included in the image data stored in advance are compared for each read item, and it is determined whether or not they match, and it is determined that they match. Correct answer data extraction step to extract the said text data
    A learning step of performing machine learning based on the extracted text data and the position of a read item in the image data on which the extracted text data is based to generate or update the learning model, and electronically. An information processing program for a computer to execute.

PCT/JP2020/032346 2019-09-27 2020-08-27 Information processing device, information processing method, and information processing program WO2021059848A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019177757A JP6722929B1 (en) 2019-09-27 2019-09-27 Information processing apparatus, information processing method, and information processing program
JP2019-177757 2019-09-27

Publications (1)

Publication Number Publication Date
WO2021059848A1 true WO2021059848A1 (en) 2021-04-01

Family

ID=71523804

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/032346 WO2021059848A1 (en) 2019-09-27 2020-08-27 Information processing device, information processing method, and information processing program

Country Status (2)

Country Link
JP (1) JP6722929B1 (en)
WO (1) WO2021059848A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020462B2 (en) 2020-08-06 2024-06-25 Ricoh Company, Ltd. Information processing apparatus, information processing method, and computer program product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4316498A1 (en) 2021-03-30 2024-02-07 Kaneka Corporation Trypsin inhibition method, and method for producing cell preparation in which same is used

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010140204A (en) * 2008-12-10 2010-06-24 Sharp Corp Character recognition device, character recognition method, character recognition program, and recording medium
JP2019008775A (en) * 2017-06-22 2019-01-17 日本電気株式会社 Image processing device, image processing system, image processing method, program
JP2019074807A (en) * 2017-10-12 2019-05-16 富士ゼロックス株式会社 Information processing device and program
JP2019086984A (en) * 2017-11-06 2019-06-06 株式会社日立製作所 Computer and document identification method
JP2019133218A (en) * 2018-01-29 2019-08-08 株式会社 みずほ銀行 Document sheet accommodating system, document sheet accommodating method, and document sheet accommodating program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010140204A (en) * 2008-12-10 2010-06-24 Sharp Corp Character recognition device, character recognition method, character recognition program, and recording medium
JP2019008775A (en) * 2017-06-22 2019-01-17 日本電気株式会社 Image processing device, image processing system, image processing method, program
JP2019074807A (en) * 2017-10-12 2019-05-16 富士ゼロックス株式会社 Information processing device and program
JP2019086984A (en) * 2017-11-06 2019-06-06 株式会社日立製作所 Computer and document identification method
JP2019133218A (en) * 2018-01-29 2019-08-08 株式会社 みずほ銀行 Document sheet accommodating system, document sheet accommodating method, and document sheet accommodating program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12020462B2 (en) 2020-08-06 2024-06-25 Ricoh Company, Ltd. Information processing apparatus, information processing method, and computer program product

Also Published As

Publication number Publication date
JP2021056659A (en) 2021-04-08
JP6722929B1 (en) 2020-07-15

Similar Documents

Publication Publication Date Title
US10466971B2 (en) Generation of an application from data
US8456477B2 (en) Information processing apparatus, information processing method and program for generating and displaying network structures
US20160253303A1 (en) Digital processing and completion of form documents
US10552525B1 (en) Systems, methods and apparatuses for automated form templating
JP2010092501A (en) Error notification method and error notification device
JP5670787B2 (en) Information processing apparatus, form type estimation method, and form type estimation program
JP7070745B2 (en) Information processing equipment, information display method and program
CN102779114A (en) Unstructured data support generated by utilizing automatic rules
WO2019061664A1 (en) Electronic device, user&#39;s internet surfing data-based product recommendation method, and storage medium
US7971135B2 (en) Method and system for automatic data aggregation
WO2021059848A1 (en) Information processing device, information processing method, and information processing program
US20220004885A1 (en) Computer system and contribution calculation method
JP2019114193A (en) Image processing device and image processing program
US20210286857A1 (en) System and method for browser-based target data extraction
JP6552162B2 (en) Information processing apparatus, information processing method, and program
CN113515921A (en) Auxiliary generation method of patent text and electronic terminal
JP7430437B1 (en) Method, program, and information processing device for collecting character information printed on printed matter
JP7008152B1 (en) Information processing equipment, information processing methods and information processing programs
JP6777907B1 (en) Business support device and business support system
JP7484461B2 (en) Information processing device, information processing system, and program
JP6866705B2 (en) Information processing equipment and programs
JP2018116657A (en) Information providing device, information providing system, terminal device, information providing method, and information providing program
KR101983103B1 (en) Method for providing customized information to machine industry through behavior pattern analysis
JP2024045829A (en) Information processing device, file management method and program
KR101986674B1 (en) Method for forecasting machine industry trend based on search frequency and system thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867285

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867285

Country of ref document: EP

Kind code of ref document: A1