WO2025028507A1 - 情報処理装置、情報処理方法、および、情報処理装置用のプログラム - Google Patents

情報処理装置、情報処理方法、および、情報処理装置用のプログラム Download PDF

Info

Publication number
WO2025028507A1
WO2025028507A1 PCT/JP2024/027081 JP2024027081W WO2025028507A1 WO 2025028507 A1 WO2025028507 A1 WO 2025028507A1 JP 2024027081 W JP2024027081 W JP 2024027081W WO 2025028507 A1 WO2025028507 A1 WO 2025028507A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
image
information processing
type
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/027081
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
尚 荒井
直人 針谷
鷹志 平田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dai Nippon Printing Co Ltd
Original Assignee
Dai Nippon Printing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dai Nippon Printing Co Ltd filed Critical Dai Nippon Printing Co Ltd
Priority to JP2025537434A priority Critical patent/JPWO2025028507A1/ja
Publication of WO2025028507A1 publication Critical patent/WO2025028507A1/ja
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Definitions

  • the present invention relates to an information processing device, an information processing method, and a program for an information processing device that performs information processing on image data of a document.
  • Patent Document 1 discloses a program that causes a computer to execute a process of generating a first component image by extracting horizontally extending lines from an image of a form and a second component image by extracting vertically extending lines, dividing the first component image into a plurality of blocks, generating a first feature value that indicates the characteristics of the distribution of the lines in the first component image based on the presence or absence of lines in each block, and dividing the second component image into a plurality of blocks, generating a second feature value that indicates the characteristics of the distribution of the lines in the second component image based on the presence or absence of lines in each block, and identifying and classifying the type of form in the image based on the difference between the first feature value in the form definition registered in advance in a definition body and the generated first feature value, and the difference between the second feature value in the form definition and the generated second feature value.
  • the present invention has been made in consideration of the above problems, and one example of the objective of the present invention is to provide information processing etc. that improves the accuracy of document sorting.
  • the invention described in claim 1 is characterized by comprising a document image acquisition means for acquiring a document image of a standardized document in which predetermined items among a plurality of items written on the document are masked, a template information acquisition means for acquiring template information of a template image prepared for each type of the document, a standardized document discrimination means for comparing the template information with information on the document image to discriminate the type of the document, and an extraction means for extracting item images for the items according to the discriminated type of the document.
  • the invention described in claim 2 is characterized in that, in the information processing device described in claim 1, the template information acquisition means acquires information on feature points of the template image, and the standard document discrimination means extracts feature points from the document image and compares the extracted feature points with the feature points of the template image to discriminate the type of the document.
  • the invention described in claim 3 is characterized in that in the information processing device described in claim 2, the feature points of the template image are feature points extracted from an image obtained by removing variable information, including character information, from the template image.
  • the invention described in claim 4 is characterized in that in the information processing device described in claim 2 or claim 3, the standard document discrimination means extracts feature points corresponding to feature points of the template image from the feature points extracted from the document image, selects feature points from the corresponding feature points according to the Hamming distance between the feature points, and discriminates the type of the document based on the number of selected feature points.
  • the invention described in claim 5 is characterized in that in the information processing device described in claim 4, the standard document discrimination means discriminates the type of the document based on the number of the selected feature points and the number of feature points of the template image.
  • the invention described in claim 6 is characterized in that in the information processing device described in claim 4, the type of the document is determined based on the number of feature points of the document image that correspond to the feature points of the template image.
  • the invention described in claim 7 is characterized in that the information processing device described in claim 2 further includes an image correction means for correcting the document image based on feature points extracted from the document image.
  • the invention described in claim 8 is characterized in that the information processing device described in claim 1 further includes a code reading means for reading a code written on the document from the document image, and the standard document discrimination means compares the read code with a code associated with the template image to discriminate the type of the document.
  • the invention described in claim 9 is characterized in that in the information processing device described in claim 1 or claim 8, the standard document discrimination means compares the document image with the template image to discriminate the type of the document.
  • the invention described in claim 10 is characterized by comprising a document image acquisition means for acquiring a document image of a document in which a predetermined item among a plurality of items written on the document is masked, a character feature extraction means for extracting character features from text data generated by optically reading the document image, a document discrimination means for discriminating the type of the document based on the extracted character features using a trained machine learning model that outputs the type of document when character features are input, and a cutting means for cutting out item images for the items according to the discriminated type of the document.
  • the invention described in claim 11 is characterized in that in the information processing device described in claim 10, the character feature extraction means extracts the character feature consisting of character strings obtained by dividing the text data into predetermined character units and the frequency of occurrence of each of the divided character strings in the text data, and there are a plurality of the predetermined character units.
  • the invention described in claim 12 is characterized in that, in the information processing device described in claim 11, at least one of the predetermined character units is a single character unit.
  • the invention described in claim 13 is characterized in that in the information processing device described in claim 11 or claim 12, the character feature extraction means extracts the character feature weighted based on the frequency of occurrence of each of the divided character strings.
  • the invention described in claim 14 is characterized by including a document image acquisition step in which a document image acquisition means acquires a document image of a standard document in which predetermined items among a plurality of items written on the document are masked, a template information acquisition step in which a template information acquisition means acquires template information of a template image prepared for each type of document, a standard document determination step in which a standard document discrimination means compares the template information with information on the document image to determine the type of the document, and an extraction step in which an extraction means extracts an item image for the item according to the determined type of the document.
  • the invention described in claim 15 is characterized by including a document image acquisition step in which a document image acquisition means acquires a document image of a document in which a predetermined item among a plurality of items written on the document is masked, a character feature extraction step in which a character feature extraction means extracts character features from text data generated by optically reading the document image, a document discrimination step in which a document discrimination means discriminates the type of the document based on the extracted character features using a trained machine learning model that outputs a document type when character features are input, and a cut-out step in which a cut-out means cuts out an item image for the item according to the discriminated document type.
  • the invention described in claim 17 is characterized in that the computer functions as a document image acquisition means for acquiring a document image of a document in which a predetermined item among a plurality of items written on the document is masked, a character feature extraction means for extracting character features from text data generated by optically reading the document image, a document discrimination means for discriminating the type of the document based on the extracted character features using a trained machine learning model that outputs the type of document when character features are input, and an extraction means for cutting out item images for the items according to the discriminated type of the document.
  • FIG. 1 is a schematic diagram illustrating an example of a schematic configuration of an information processing system according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing an example of a document.
  • FIG. 2 is a schematic diagram showing an example of a document.
  • FIG. 2 is a schematic diagram showing an example of a document.
  • FIG. 2 is a schematic diagram showing an example of a document.
  • FIG. 2 is a block diagram showing an example of a schematic configuration of the information processing server shown in FIG. 1 .
  • FIG. 2 is a schematic diagram illustrating an example of an image template.
  • FIG. 4 is a schematic diagram showing an example of feature points of an image template.
  • FIG. 2 is a schematic diagram showing an example of a document having a code.
  • FIG. 2 is a schematic diagram showing an example of a target document.
  • FIG. 2 is a schematic diagram showing an example of feature points of a target document.
  • FIG. 13 is a schematic diagram showing an example of matched feature points.
  • FIG. 13 is a schematic diagram showing an example of unique and matching feature points.
  • 8C is a flowchart showing a subroutine of the AI sorting of FIG. 8B.
  • 20 is a flowchart showing a subroutine for AI sorting of documents in FIG. 18.
  • FIG. 2 is a schematic diagram illustrating an example of text data.
  • FIG. 13 is a schematic diagram showing an example of the frequency of appearance of divided character strings.
  • FIG. 13 is a schematic diagram showing an example of the frequency of appearance of divided character strings.
  • FIG. 13 is a schematic diagram showing an example of an examination screen.
  • FIG. 1 Overview of management system configuration and functions
  • FIG. 1 is a schematic diagram showing an example of the general configuration of an information processing system according to an embodiment of the present invention.
  • FIGS. 2A to 4B are schematic diagrams showing examples of documents.
  • the information processing system 1 includes an information processing server device 10 that processes information about services such as applications, investigations, and account opening, a terminal device 20 where workers work on documents, a user terminal device 30 for users who apply for services, and an image input device 35 that captures images of documents and imports image data of the document images.
  • an information processing server device 10 that processes information about services such as applications, investigations, and account opening
  • a terminal device 20 where workers work on documents
  • a user terminal device 30 for users who apply for services and an image input device 35 that captures images of documents and imports image data of the document images.
  • the information processing server device 10 which is an example of an information processing device, is a server for a BPO (Business Process Outsourcing) business, which is a service provider or service agent for applications, surveys, account opening, applications for various services, questionnaire surveys, etc.
  • the information processing server device 10 performs information processing such as sorting the types of documents from the image data of each document included in a set of documents received on paper or via the web.
  • the terminal device 20 is, for example, a mobile terminal such as a personal computer, a portable wireless telephone including a smartphone, or a tablet terminal.
  • the terminal device 20 is installed according to each worker.
  • the terminal device 20 displays document images, etc.
  • the user terminal device 30 is, for example, a mobile terminal such as a personal computer, a portable wireless telephone including a smartphone, or a tablet terminal. Using the user terminal device 30, a user can apply for services such as opening a bank account online, or download application forms for services from the web and print them out.
  • the image input device 35 has a scanner or digital camera with an imaging element such as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor.
  • the image input device 35 scans or captures documents received from the agent's client or documents sent by the applicant, and generates image data.
  • the information processing server device 10, the terminal device 20, the user terminal device 30, and the image input device 35 are capable of transmitting and receiving data to and from each other via the network 3 using, for example, a communication protocol such as TCP/IP.
  • the network 3 is constructed, for example, from a local area network, the Internet, a dedicated communication line (for example, a CATV (Community Antenna Television) line), a mobile communication network, a gateway, etc.
  • a service provided by a BPO business operator there is an agency service that investigates the purpose of a transaction by sending a questionnaire or confirmation letter to a customer who has already opened an account with a financial institution and has a transaction, depending on the content and situation of the transaction. This investigation is required for financial institutions to continuously check customer information in order to strengthen measures against money laundering and terrorist financing. Customers must respond by accessing a website via documents sent by mail or e-mail. Confirmation is required to be carried out periodically. For example, a BPO business operator prints out the documents to be sent and mails them to the customer or sends them using a communication method such as e-mail.
  • the customer writes the necessary information on documents such as questionnaires by hand, seals them in an envelope, and sends them to the financial institution along with a copy of the identification document or an image taken with a mobile terminal device.
  • the necessary information can be entered on a website and sent.
  • the BPO business operator then receives the information sent by mail or entered on the website, extracts personal information from the information, and performs screening such as identity verification.
  • the documents are questionnaires for confirming the purpose of the transaction, an application form for opening a bank account, identity verification documents, contracts, delivery notes, etc.
  • An example of a set of documents that is related to multiple documents is, for example, in the case of confirming the purpose of the transaction, etc., multiple documents such as questionnaires for confirming the purpose of the transaction, identity verification documents, etc.
  • the set of documents may also include a photograph of the person and multiple identity verification documents.
  • the questionnaire (customer information confirmation document) has columns for each item, such as name, sex, date of birth, address, telephone number, place of employment, occupation, purpose of transaction, assets, etc., and often consists of multiple pages, such as questionnaire document 40 on the first page and questionnaire document 41 on the second page.
  • Identification documents include driver's licenses, resident registration cards, family register cards, health insurance cards, passports, My Number cards, receipts (for example, receipts for utility bills such as electricity and gas that show the name and address), etc.
  • questionnaires examples include questionnaires, applications for opening a bank account (application forms, application forms), identity documents, blank sheets, and other documents.
  • questionnaires include the first page of the questionnaire, the second page of the questionnaire, the third page of the questionnaire, etc.
  • identity documents include driver's licenses, passports, My Number cards, insurance cards, resident registration cards, receipts, etc.
  • Documents are also classified into standard documents, in which the positions of items written on the document are fixed for the document type, and non-standard documents, in which the positions are not fixed for the document type.
  • Standard documents include driver's licenses, passports, and My Number cards.
  • Non-standard documents include resident's certificates, the format of which varies from city to city, town to village, and insurance cards and receipts, the format of which varies depending on the issuer.
  • the format of license 42 is standardized nationwide, and license 42 is classified as a standard document.
  • Figures 4A and 4B the format of resident's certificates is not standardized, like resident's certificates 43 and 44, and so resident's certificates are classified as a non-standard document.
  • the questionnaire document may be considered to be a standard document.
  • the information processing server device 10 has multiple sorting engines, such as a sorting engine that can sort standard documents by document type, a sorting engine that can sort non-standard documents by document type, and a sorting engine that can sort specific documents with high efficiency.
  • a sorting engine is an algorithm or information processing method for determining the document type from image data of a document.
  • FIG. 5 is a block diagram showing an example of the general configuration of the information processing server device 10.
  • FIG. 6 is a schematic diagram showing an example of an image template.
  • FIG. 7 is a schematic diagram showing an example of feature points of an image template.
  • the information processing server device 10 which is a computer, includes a control unit 11 that controls the information processing server device 10, a storage unit 12 that has various databases, a communication unit 13 that communicates with terminal devices 20 and the like, and an output unit 14 that displays system management information, etc.
  • the control unit 11 has, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory).
  • the control unit 11 may have a calculation chip dedicated to AI calculations, such as a GPU (Graphics Processing Unit).
  • the control unit 11 reads and executes various control programs stored in the ROM and RAM by the CPU reading and executing various programs stored in the ROM and memory unit 12.
  • the control unit 11 controls each part of the information processing server device 10 (the memory unit 12, the communication unit 13, the output unit 14, etc.).
  • the control unit 11 may also read and execute these programs from a recording medium or the like on which they are stored.
  • the memory unit 12 (an example of a storage means) is composed of, for example, a hard disk drive, a silicon disk drive, etc.
  • the memory unit 12 stores a template database 12a that stores template images of each document, an image feature database 12b that stores features of the template images of each document, an image database 12c that stores image data of each document, a business information database 12d that stores associations between business information and the sorting engine to be applied, etc.
  • the memory unit 12 stores various pre-trained machine learning models for performing processes such as sorting using AI (Artificial Intelligence).
  • the template database 12a stores coordinate information for each item such as name, address, and gender in association with a document type code, etc.
  • item coordinate information include the range of the item field, and in the case of a name field, the coordinate information of the area in which the name is written, etc.
  • variable information such as text information indicating items such as name and address, and the area indicating the location of the face photo, in the template image with the same color as the background color, or to remove the margins around the document.
  • variable information is information that differs for each identification document, such as name, address, and face photo.
  • image database 12c the acquired image data is stored in association with the document set ID of the document set, the document ID issued sequentially for each document, etc.
  • image data of items cut out from the document image during the image division process for each document item is stored in association with the document ID, item ID, etc.
  • the business information database 12d stores business information associated with the customer information requesting the service, such as the opening of a bank account, the content of the service (e.g., questionnaire surveys), whether manual sorting is required, information on the type of identification document (e.g., whether only a driver's license or whether a resident's certificate is also included), whether the document is a standard or non-standard document, and information on the method of submission (submission via the web, paper submission by mail, or a mixture of both).
  • business information associated with the customer information requesting the service such as the opening of a bank account, the content of the service (e.g., questionnaire surveys), whether manual sorting is required, information on the type of identification document (e.g., whether only a driver's license or whether a resident's certificate is also included), whether the document is a standard or non-standard document, and information on the method of submission (submission via the web, paper submission by mail, or a mixture of both).
  • the memory unit 12 also stores a confidence threshold for the machine learning model used for classification for each type of document.
  • This threshold is a judgment threshold for each type of document that has been determined in advance through testing, etc. For example, if the type of document is a driver's license, the confidence threshold is xx%, if the type of document is an insurance card, the confidence threshold is x%, etc.
  • AI that performs item detection processing
  • teacher data with information tags added to each item of the document is created, and machine learning is performed on the AI to build a machine learning model for item detection (an example of a machine learning model for item identification).
  • machine learning may be performed by taking the difference before and after the masking work from the actual data of the document and mechanically detecting it.
  • machine learning may be performed for each type of document so that each item is identified for each type of document, and a machine learning model for item detection for each type of document may be built in the storage unit 12.
  • each database may be managed by the same database on the same information processing server device 10, or each database may be stored in a database on a different information processing server device.
  • the communication unit 13 connects to the network 3 and controls communication with the terminal device 20, the user terminal device 30, and the image input device 35.
  • the output unit 14 is composed of, for example, a liquid crystal display element or an organic EL (Electro Luminescence) element.
  • Fig. 8A is a schematic diagram showing an example of a document having a code.
  • Fig. 8B is a flowchart showing an example of the operation of the information processing server device of Fig. 1.
  • the envelope that has been mailed is opened.
  • the delivered cardboard box is opened, and an envelope containing a set of documents, such as a questionnaire with the necessary information filled in and a copy of an identification document, is taken out.
  • the envelope is then opened, and the set of documents is taken out.
  • Each paper document that is removed is assigned a document ID such as a code to identify it.
  • the codes that can be assigned include one-dimensional codes and two-dimensional codes.
  • a sticker with a code image 45a such as a two-dimensional code is affixed to a specified position on a copy 45 of the identity verification document.
  • it may be a sticker with a code value according to the type of document.
  • the code value may be the value of only the major classification code, or the value of the detailed classification code.
  • a two-dimensional or one-dimensional code can be printed in advance in a specified location, such as the bottom right of the document, or when applying online, a two-dimensional or one-dimensional code can be displayed in advance in a specified location, such as the bottom right of the document, and can be included in the image data in advance.
  • operation (2) in order to protect personal information, etc., items such as the photograph, the Individual Number, unnecessary personal information such as license conditions, and sensitive information such as medical history are masked.
  • items such as the photograph, the Individual Number, unnecessary personal information such as license conditions, and sensitive information such as medical history are masked.
  • a sticker like a masking tape is affixed to the photograph, license conditions, etc. on a copy 45 of the identification document, as shown in masking 45b.
  • the worker uses the image input device 35 to scan each document in the document set, such as the questionnaire document and the masked copy of the identification document, converts it into image data, and transmits it to the information processing server device 10.
  • the information processing server device 10 acquires a series of image data for the document set from the image input device 35 and stores it in the memory unit 12 together with the ID of the document set and the document ID issued for each document.
  • the information processing server device 10 functions as an example of a document image acquisition means that acquires a document image of a standardized document in which certain items among multiple items written on the document are masked.
  • the information processing server device 10 acquires a series of image data of a set of documents, such as a questionnaire and identity verification documents, from the user terminal device 30 and stores them in the memory unit 12 together with the document set ID of the document set and the document ID issued for each document.
  • the identity verification documents are image data photographed with a mobile terminal device equipped with a camera.
  • the set of documents may also include image data of the applicant's face.
  • the information processing server device 10 acquires business information (step S1). Specifically, the control unit 11 refers to the business information database 12d to acquire business information corresponding to the service of the outsourced customer.
  • the information processing server device 10 checks the advance designation and determines whether manual sorting is specified (step S2). Specifically, the control unit 11 refers to the business information database 12d, checks the advance designation as to whether the document type for the set of documents will be sorted using a sorting engine or manually, and determines whether manual sorting is required. Note that there may be a higher-level server device (not shown) that manages the entire system for the information processing server device 10, and the higher-level server device may check the advance designation as to whether the document type for the set of documents will be sorted using a sorting engine or manually.
  • step S2 If manual sorting is specified (step S2; YES), the information processing server device 10 registers manual sorting for the set of documents (step S3).
  • step S4 the information processing server device 10 judges whether or not the documents in the document set are all standard documents. Specifically, the control unit 11 judges whether or not the documents in the document set are all standard documents, based on the business information.
  • step S5 the information processing server device 10 judges whether the image is distorted. Specifically, the control unit 11 judges whether the document is a paper submission based on the business information. In the case of submission via the web, the applicant sends a paper copy of a scanned or photographed identification document, etc., and there is a possibility that the image may be distorted, so the information processing server device 10 judges that the image is distorted. Even if the submission is paper, if the image is highly likely to be distorted due to the characteristics of the case, the information processing server device 10 judges that the image is distorted.
  • the control unit 11 may also calculate the degree of distortion by image processing, and if the degree of distortion is equal to or greater than a predetermined value, it may judge that the image is distorted. Instead of judging whether the image is distorted, the information processing server device 10 may judge whether the document has a code, such as a two-dimensional code or a one-dimensional code. In this case, the presence of a code on the document corresponds to a case where the image is distorted, and the absence of a code on the document corresponds to a case where the image is not distorted.
  • a code such as a two-dimensional code or a one-dimensional code.
  • step S6 the information processing server device 10 performs pattern matching sorting. Specifically, the control unit 11 sorts each document into its type by code judgment, which reads the code on the document to determine the document type, or pattern matching (PM) judgment, which determines the document type by pattern matching with a template image. The details of the process will be described later in the pattern matching sorting subroutine.
  • step S7 the information processing server device 10 performs feature point sorting (step S7). Specifically, the control unit 11 sorts each document in the document set into document types by pattern matching with the feature points of the template image. The details of the process will be described later in the feature point sorting subroutine.
  • step S8 the information processing server device 10 performs AI sorting (step S8). Specifically, the control unit 11 extracts character features from the text data generated by optically reading the documents, and sorts each document in the set into document types based on the extracted character features using a trained machine learning model that outputs the document type when the character features are input. The detailed process will be described later in the AI sorting subroutine.
  • the information processing server device 10 counts the number of unique matching feature points (step S59).
  • the control unit 11 refers to the image feature database 12b, compares the feature points p of the target image as shown in FIG. 16 with the feature points of the template image as shown in FIG. 7, calculates the Hamming distance, and extracts the closest feature point Mp as the matching feature point.
  • the closest feature point Mp is extracted as the matching feature point.
  • FIG. 17A in the case of the template image #99-12345 in FIG. 7, the feature point Mp corresponding to the feature point of the square shape is extracted from the target image, and in the case of the template image #99-12346 in FIG. 7, the feature point Mp corresponding to the feature point of the lattice square shape is extracted from the target image.
  • the first and second closest feature points are also extracted and used to select the unique feature points described below.
  • the information processing server device 10 functions as an example of a standard document discrimination means that extracts the feature points from the document image, compares the extracted feature points with the feature points of the template image, and discriminates the type of the document.
  • the control unit 11 detects unique feature points Up (an example of a selected feature point) from among the matched feature points Mp (corresponding feature points).
  • the most similar feature point and the second most similar feature point are extracted from the target image among the matched feature points, and if the similarity rate is equal to or greater than a threshold, the most similar feature point is selected as a unique feature point, i.e., a matched unique feature point. That is, as shown in FIG. 17B, if the most similar feature point and the second most similar feature point are similar to each other by a certain percentage or more, they are considered to be matched but not unique, and are deleted from the valid feature point list.
  • Accelerated-KAZE features are represented as bit strings (binary values) of 0s and 1s, so the similarity can be compared using the number of digits (Hamming distance) that 0s and 1s differ when two bit strings are lined up.
  • the information processing server device 10 determines a template image of the sorting result (step S61). Specifically, the control unit 11 determines the template image with the highest similarity among the similarities calculated in step S60 as the sorting result.
  • the information processing server device 10 functions as an example of a standard document discrimination means that discriminates the type of the document based on the number of selected feature points and the number of feature points of the template image.
  • the information processing server device 10 judges whether the number of matched unique feature points is equal to or greater than a threshold value (step S62). Specifically, the control unit 11 judges whether the number of matched unique feature points in the most similar template image is equal to or greater than a predetermined value. This process is performed because, even if the most similar template image results, if the number of matched unique feature points in the target image is not equal to or greater than a predetermined value, there is a possibility of erroneous judgment or distortion correction cannot be performed correctly.
  • the information processing server device 10 functions as an example of a standard document type discrimination means that discriminates the type of the document based on the number of feature points of the target image that correspond to the feature points of the template image.
  • step S63 the information processing server device 10 calculates a homography transformation matrix (step S63). Specifically, the control unit 11 calculates a homography transformation matrix using each of the extracted, matched unique feature points, and applies the homography transformation matrix to calculate a distortion-corrected image for the target image.
  • the information processing server device 10 functions as an example of an image correction means that corrects the document image based on feature points extracted from the document image.
  • the information processing server device 10 judges whether the post-conversion area is within the normal range (step S64). Specifically, the control unit 11 detects "points corresponding to the four corners of the target image" in the coordinate system of the distortion-corrected image by multiplying the area calculated from the four corners of the target image (image before distortion correction) and each point of the four corners of the target image by a homography transformation matrix, and compares this with the post-conversion area calculated from these points to judge whether the post-conversion area is within the normal range.
  • the information processing server device 10 outputs the image after the distortion correction (step S66). Specifically, the control unit 11 transmits the image after the distortion correction, in which the document type has been determined, to the terminal device 20 for the next task.
  • step S67 the control unit 11 sets a flag indicating that the document type is undetermined.
  • FIG. 18 shows the AI sorting subroutine of FIG. 8B.
  • the information processing server device 10 starts loop processing for the target image of each document in the document set, as in the processing in step S10 (step S70).
  • the information processing server device 10 performs AI sorting of the documents (step S71). Specifically, the control unit 11 determines the document type using AI from the character features of the target image, and outputs the result of whether the document type is confirmed or not confirmed. The details of the process will be described later in the AI document sorting subroutine.
  • the information processing server device 10 registers the document type as a target for manual sorting (step S73). Specifically, the control unit 11 stores the document in the target image, together with the document ID, in the database of the storage unit 12 as a target for manual sorting.
  • step S72 the information processing server device 10 determines whether the number of loops has reached the number of target images, and if the number of target images has not been reached, returns to the processing of step S70 and moves on to processing the next target image according to the next document ID, and if the number of target images has been reached, ends the target image loop processing and ends the AI sorting subroutine (step S74). Note that the determination of the finalization of document type sorting in step S72 and the processing of registering manual sorting in step S73 may be outside the target image loop.
  • the position of a specific item in the image data may be identified according to the determined document type, and an item image for the identified item may be extracted.
  • Standard documents such as questionnaires and identity verification documents such as driver's licenses are subjected to image segmentation processing.
  • the position of the item may be identified using a machine learning model for item detection, and item images may be extracted.
  • the control unit 11 refers to the storage unit 12 and selects a machine learning model for item detection based on the confirmed document type, and identifies the writing area of each item for the document whose document type has been confirmed, using machine learning for the selected document type.
  • the control unit 11 extracts the area corresponding to each item from the coordinate information of each item whose writing area has been identified and the image data of the target image, and generates an item image.
  • the control unit 11 stores the item image data in the image database 12c in association with the document type ID of the document type and the item ID.
  • the information processing server device 10 functions as an example of an extraction means that extracts an item image for an item according to the determined type of the document.
  • the information processing server device 10 functions as an example of an extraction means that identifies the position of a specific item in the document image according to the determined type of the document, and extracts an item image for the identified item.
  • FIG. 19 shows a subroutine for AI sorting of the documents shown in FIG.
  • the information processing server device 10 acquires the target image (step S75) as in step S16.
  • the control unit 11 acquires image data of a document such as a questionnaire document 40 as shown in FIG. 2A.
  • the information processing server device 10 functions as an example of a document image acquisition means that acquires a document image of the document in which certain items among the multiple items written on the document are masked.
  • the information processing server device 10 performs a sharpening process (step S76). Specifically, the control unit 11 applies a filter to the image data that enhances the contrast of edges to enhance the contours of the target image.
  • the information processing server device 10 performs binarization processing (step S77). Specifically, the control unit 11 converts the pixel luminance value to "1" if the pixel luminance value is equal to or greater than a predetermined luminance value, and converts the pixel luminance value to "0" if the pixel luminance value is less than the predetermined luminance value. By performing binarization and sharpening processing, processing is performed to remove noise and emphasize the outlines of characters in preparation for the next OCR processing.
  • the information processing server device 10 performs OCR processing (step S78). Specifically, the control unit 11 performs OCR processing on the image data that has been sharpened and binarized, and concatenates multiple pieces of text data obtained to generate text data as shown in FIG. 20. In the case of horizontal writing, the control unit 11 scans horizontally from the top left of the image and concatenates the text data in order.
  • the information processing server device 10 uses AI to determine the type of document (step S79). Specifically, the control unit 11 divides the string into one-character units and two-character units, which are examples of predetermined character units. Character n-grams can be used to divide text data. Character n-grams are a method of dividing a document into n consecutive characters. In the case of dividing a string into two-character units, the text data in FIG. 20 is divided into two consecutive characters. As shown in FIG. 21A, the string is divided into two characters, such as " ⁇ ", " ⁇ ”, etc. In the case of dividing a string into one-character units, the string is divided into " ⁇ ", " ⁇ ", " ⁇ ”, etc., as shown in FIG. 21B.
  • Strings split into multiple character increments include strings split into 1-character and 2-character increments, strings split into 1-character and 3-character increments, strings split into 2-character and 3-character increments, and strings split into 1-character, 2-character and 3-character increments.
  • strings split into 1-character, 2-character and 3-character increments since the text is converted using OCR, there is a possibility of characters being misread or omitted, so it is preferable that single-character increments are included.
  • the control unit 11 divides one piece of text data into multiple character strings of a predetermined length, and then calculates the frequency of occurrence of each character string in the text data.
  • the frequency of occurrence is calculated for each character string divided into two characters.
  • the frequency of occurrence is calculated for each character string divided into one character.
  • Two feature vectors are extracted, each of which has each divided character string as an element and the frequency of occurrence of that character string as the element value. That is, as shown in FIG. 21A, the feature vector is composed of two-character strings and the frequency of occurrence of the string, and as shown in FIG. 21B, the feature vector is composed of one-character strings and the frequency of occurrence of the string.
  • These feature vectors are also called character features.
  • the number of character strings divided into two-character units and the number of character strings divided into one-character units are the number of dimensions of the feature vector.
  • the character strings split by the control unit 11 may be weighted according to their importance.
  • a weighting method may be the TF-IDF method. Two other feature vectors (feature vectors with weighted elements of frequency of occurrence) are extracted, with each of the split character strings as an element and the element value weighted by the frequency of occurrence of the character string.
  • TF Term Frequency
  • IDF Inverse Document Frequency
  • the control unit 11 performs the sorting process. For example, in a driver's license, which is an identification document, the character string "drive” is unlikely to appear in other identification documents such as My Number and insurance cards, and is therefore thought to have a high TF-IDF value. Therefore, in determining the document type using this AI, if an image of an identification document is read and the character string "drive" appears in the image, the control unit 11 can determine that the document type of the identification document image is a "driver's license.”
  • the information processing server device 10 functions as an example of a character feature extraction means that extracts character features from text data generated by optically reading the document image.
  • the information processing server device 10 extracts the character features composed of character strings obtained by dividing the text data into predetermined character units and the frequency of occurrence of each of the divided character strings in the text data, and functions as an example of a character feature extraction means having a plurality of the predetermined character units.
  • the information processing server device 10 functions as an example of a character feature extraction means that extracts the character features by weighting the frequency of occurrence of each of the divided character strings.
  • the control unit 11 then inputs each two-character and single-character feature vector for the target image into multiple machine learning models for classification, which output the confidence level for each document type, and outputs the maximum value as the AI sorting result.
  • the information processing server device 10 determines whether or not the confidence level for each document type is equal to or greater than the threshold value for that document type (step S80). Specifically, the control unit 11 refers to the database in the storage unit 12 and determines whether or not the confidence level for the maximum document type is equal to or greater than the threshold value for that document type.
  • the information processing server device 10 functions as an example of a document discrimination means that discriminates the type of document based on the extracted character features using a trained machine learning model that outputs the document type when character features are input.
  • step S80 If it is equal to or greater than the threshold (step S80; YES), the information processing server device 10 outputs the document type sorting result (step S81). Specifically, the control unit 11 stores the confirmed document type and sets a flag indicating that the document type has been confirmed.
  • step S80 determines that the document type is undetermined (step S82). Specifically, the control unit 11 sets a flag indicating that the document type is undetermined.
  • the questionnaire documents and identity verification documents are sorted; for example, the questionnaire documents on which a person's name is written are identified in operation (4).
  • the identity verification documents are sorted and the type of identity verification document is determined in operation (4).
  • image extraction image division
  • image division image division

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2024/027081 2023-07-31 2024-07-30 情報処理装置、情報処理方法、および、情報処理装置用のプログラム Pending WO2025028507A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2025537434A JPWO2025028507A1 (https=) 2023-07-31 2024-07-30

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023125139 2023-07-31
JP2023-125139 2023-07-31

Publications (1)

Publication Number Publication Date
WO2025028507A1 true WO2025028507A1 (ja) 2025-02-06

Family

ID=94394816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/027081 Pending WO2025028507A1 (ja) 2023-07-31 2024-07-30 情報処理装置、情報処理方法、および、情報処理装置用のプログラム

Country Status (2)

Country Link
JP (1) JPWO2025028507A1 (https=)
WO (1) WO2025028507A1 (https=)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10154191A (ja) * 1996-11-26 1998-06-09 Nec Corp 帳票識別方法及び装置並びに帳票識別プログラムを記録した媒体
JP2006260063A (ja) * 2005-03-16 2006-09-28 Dainippon Printing Co Ltd 応募システム
JP2011065311A (ja) * 2009-09-16 2011-03-31 Fuji Xerox Co Ltd 画像処理装置及び画像処理プログラム
JP2011065646A (ja) * 2009-09-18 2011-03-31 Fujitsu Ltd 文字列認識装置及び文字列認識方法
JP2016139335A (ja) * 2015-01-28 2016-08-04 キヤノン株式会社 個人番号管理システム、画像処理装置、画像処理方法、及びプログラム
JP2021117802A (ja) * 2020-01-28 2021-08-10 京都電子計算株式会社 帳票振分装置及び帳票振分用プログラム
JP2022108130A (ja) * 2021-01-12 2022-07-25 大日本印刷株式会社 情報処理装置及びコンピュータプログラム
JP2022150300A (ja) * 2021-03-26 2022-10-07 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理方法、プログラム
JP2023067100A (ja) * 2021-10-29 2023-05-16 キヤノン電子株式会社 情報処理装置、情報処理装置の制御方法、及びプログラム

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10154191A (ja) * 1996-11-26 1998-06-09 Nec Corp 帳票識別方法及び装置並びに帳票識別プログラムを記録した媒体
JP2006260063A (ja) * 2005-03-16 2006-09-28 Dainippon Printing Co Ltd 応募システム
JP2011065311A (ja) * 2009-09-16 2011-03-31 Fuji Xerox Co Ltd 画像処理装置及び画像処理プログラム
JP2011065646A (ja) * 2009-09-18 2011-03-31 Fujitsu Ltd 文字列認識装置及び文字列認識方法
JP2016139335A (ja) * 2015-01-28 2016-08-04 キヤノン株式会社 個人番号管理システム、画像処理装置、画像処理方法、及びプログラム
JP2021117802A (ja) * 2020-01-28 2021-08-10 京都電子計算株式会社 帳票振分装置及び帳票振分用プログラム
JP2022108130A (ja) * 2021-01-12 2022-07-25 大日本印刷株式会社 情報処理装置及びコンピュータプログラム
JP2022150300A (ja) * 2021-03-26 2022-10-07 キヤノンマーケティングジャパン株式会社 情報処理装置、情報処理方法、プログラム
JP2023067100A (ja) * 2021-10-29 2023-05-16 キヤノン電子株式会社 情報処理装置、情報処理装置の制御方法、及びプログラム

Also Published As

Publication number Publication date
JPWO2025028507A1 (https=) 2025-02-06

Similar Documents

Publication Publication Date Title
US11694499B2 (en) Systems and methods for updating an image registry for use in fraud detection related to financial documents
US20210124919A1 (en) System and Methods for Authentication of Documents
US12111953B2 (en) Sensitive data detection and replacement
US9626555B2 (en) Content-based document image classification
US7092561B2 (en) Character recognition, including method and system for processing checks with invalidated MICR lines
US7558418B2 (en) Real time image quality analysis and verification
JP6528147B2 (ja) 会計データ入力支援システム、方法およびプログラム
US7983468B2 (en) Method and system for extracting information from documents by document segregation
US20050096992A1 (en) Image-enabled item processing for point of presentment application
Artaud et al. Find it! fraud detection contest report
WO2004055725A2 (en) System and method for capture, storage and processing of receipts and related data
CA2619873A1 (en) Front counter and back counter workflow integration
US20140268250A1 (en) Systems and methods for receipt-based mobile image capture
US20160379186A1 (en) Element level confidence scoring of elements of a payment instrument for exceptions processing
CN114511866A (zh) 数据稽核方法、装置、系统、处理器及机器可读存储介质
US20210090086A1 (en) Systems and methods for fraud detection for images of financial documents
US10049350B2 (en) Element level presentation of elements of a payment instrument for exceptions processing
WO2025028507A1 (ja) 情報処理装置、情報処理方法、および、情報処理装置用のプログラム
JP4356908B2 (ja) 財務諸表自動入力装置
WO2025028505A1 (ja) 情報処理装置、情報処理方法、および、情報処理装置用のプログラム
WO2025028506A1 (ja) 情報処理装置、情報処理方法、および、情報処理装置用のプログラム
US12437303B1 (en) Detecting and remediating anomalies in institutional financial instruments using image processing
US12614183B1 (en) Detecting and remediating anomalies in institutional financial instruments using image processing
US20260120108A1 (en) Detecting and remediating anomalies in institutional financial instruments using image processing
US20260120103A1 (en) Detecting and remediating anomalies in institutional financial instruments using image processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24849158

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025537434

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025537434

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE