WO2020044537A1

WO2020044537A1 - Image comparison device, image comparison method, and program

Info

Publication number: WO2020044537A1
Application number: PCT/JP2018/032358
Authority: WO
Inventors: 智洋林; 郷道場; 幸代川幡; 央佐々木; 起一郎渡邊; 智也萩原
Original assignee: 株式会社Pfu; 株式会社富士通コンピュータテクノロジーズ
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-03-05
Also published as: JPWO2020044537A1

Abstract

This image comparison device comprises: a comparison image storage unit that mutually associates and stores a plurality of comparison images, each of which is generated by performing a processing process on a different region represented by image data of the same document; and a matching degree determination unit that compares each comparison image stored in the comparison image storage unit with a newly input image to determine a degree of matching therebetween. The comparison image storage unit associates and stores format definition information, which defines document formats, with the comparison images, and comprises: a format definition selection unit that selects format definition information to be applied, from the format definition information stored in the comparison image storage unit, on the basis of the result of the determination by the matching degree determination unit; and an extraction unit that extracts information from the newly input image on the basis of the format definition information selected by the format definition selection unit.

Description

Image matching device, image matching method, and program

The present invention relates to an image matching device, an image matching method, and a program.

For example, in Patent Literature 1, in an information sharing system that discloses personal information to a plurality of users and supports information sharing, a storage unit that stores personal public information, and a public storage unit that is stored by the storage unit. Information, information providing means for providing a notification condition for notifying an information provider who provided the public information of a use state of the user to the public information in response to a user request, and the information based on the notification condition. Notification means for notifying the information provider of the detected use state of the user when the use state of the user with respect to the public information provided by the provision means is detected, wherein the notification condition is the information provider There is disclosed an information sharing support system that can be changed only.

Further, Patent Document 2 discloses a storage unit that stores coordinates of an area in a document and identification information corresponding to the area, and a plurality of areas for character recognition in a received document from a newly received document. Creating means for creating a document, wherein the area created by the creating means includes an area extracted by a block selection process on a document, and an arbitrary area designated by a user. A comparison unit that compares the coordinates of the created region with the coordinates of the region stored by the storage unit; a comparison result obtained by the comparison unit; and identification information corresponding to the region stored by the storage unit. Determining means for determining identification information corresponding to the coordinates of the area created by the creating means on the basis of the identification information determined by the determining means. Transmitting means for transmitting text information based on character recognition for the area created by the creating means; and a script for inputting text information to an application based on the identification information sent by the sending means. Executing means for executing the specified script, wherein the storing means, based on a result of transmission by the transmitting means, coordinates of the area created by the creating means, and corresponds to the area There is disclosed a system characterized by storing identification information.

Patent Document 3 discloses a storage unit that stores a form image including an item name and data corresponding to the item name, a search unit that searches for a predetermined item name from the form image, An input unit for receiving information for selecting data on the image of the image, an associating unit for associating the selected data with the searched item name, and a character recognizing unit for character recognizing the associated data. A form reader is disclosed.

JP 2009-122723 A JP 2017-84198 A JP 2018-37036 A

を An object of the present invention is to provide an image matching system that supports image matching.

An image matching device according to the present invention is a comparison image storage unit that stores a plurality of comparison images generated by performing processing on image data of the same document in areas different from each other, in association with each other, A comparison unit configured to compare each of the comparison images stored in the storage unit and the newly input image to determine a degree of coincidence;

Preferably, the comparison image storage unit stores format definition information defining a format of a document in association with the comparison image, and, from among the format definition information stored in the comparison image storage unit, A format definition selection unit that selects format definition information to be applied based on the determination result by the degree determination unit, and information from a newly input image based on the format definition information selected by the format definition selection unit. And an extraction unit for extracting.

Preferably, when the degree of coincidence determined by the degree of coincidence determination unit is equal to or less than a reference for any of the comparison images, the input image is subjected to processing in mutually different regions, The image processing apparatus further includes a comparison image generation unit that generates a comparison image, and a comparison image registration unit that additionally registers a plurality of comparison images generated by the comparison image generation unit in the comparison image storage unit.

Preferably, the comparison image generation unit performs processing on the same input image so that a plurality of generated comparison images have different data sizes.

Preferably, the comparison image generation unit generates a plurality of comparison images by deleting images in different areas from the same input image.

Preferably, the comparison image storage unit includes a comparison image that is the same as the input image, a comparison image in which an arbitrary area of the input image is deleted, a comparison image in which a ruled line frame is deleted from the input image, At least two of a comparison image in which only the outside of the ruled line frame is extracted from the image and a comparison image in which only the ruled line included in the input image is extracted are stored.

Preferably, for the image region to be excluded from the comparison target, the number of image regions, the size of the image region, and the position of the image region, further comprising an exclusion region change unit that changes at least one, The coincidence determining unit excludes, from at least one of the comparison images, the image area changed by the exclusion area change unit from comparison targets, and compares the input image with the comparison image to determine a coincidence. I do.

An image matching method according to the present invention includes a step of generating a plurality of comparison images generated by performing processing on image data of the same document in areas different from each other, and generating the plurality of comparison images. The method includes the steps of registering in a database in association with each other, and comparing each of the comparison images registered in the database with a newly input image to determine the degree of coincidence.

A program according to the present invention includes a step of generating a plurality of comparative images generated by performing processing on image data of the same document on areas different from each other, and associating the generated plurality of comparative images with each other. And causing the computer to execute a step of comparing each of the comparison images registered in the database with the newly input image to determine the degree of coincidence.

照合 We can support image collation.

FIG. 3 is a diagram illustrating learning data in the image matching system 1; FIG. 2 is a diagram illustrating an outline of OCR recognition in the image matching system 1. FIG. 1 is a diagram illustrating a system configuration of an image matching system 1. FIG. 2 is a diagram illustrating a hardware configuration of the image matching device 5. FIG. 2 is a diagram illustrating a functional configuration of an image matching device 5; It is a figure which illustrates an OCR recognition result confirmation screen. FIG. 9 is a diagram illustrating a layout correction screen. It is a figure showing an example of a comparative image patterned. 5 is a flowchart illustrating a learning data generation process (S10) in the image matching system 1. It is a flowchart explaining the image collation processing (S30) in the image collation system 1. FIG. 9 is a diagram for describing an outline of OCR recognition in a comparative example.

[background]
The background on which the present invention is made will be described.
Documents generated in transactions with customers are paper, which is a wide variety of documents. In recognizing these documents as OCR (Optical Character Recognition) from image data as semi-standard documents, it is necessary to define a format for semi-standard OCR recognition for each type of document. At the time of OCR recognition as a document, OCR recognition problems may occur such that the OCR recognition format definition cannot be collated, or there is a portion that is not OCR recognized. When a problem occurs in the OCR recognition, correction of the OCR recognition result, correction of the definition of the OCR recognition range, and the like must be performed, resulting in poor work efficiency. There is also a growing need for more efficient office work, such as the conversion of electronic data from paper.
The term "semi-standard document" refers to a document such as an invoice, the format of which is slightly different depending on the requesting company.

FIG. 11 is a diagram illustrating an outline of the OCR recognition process in the comparative example.
As illustrated in FIG. 11, in the OCR recognition process of the comparative example, a user creates a format definition for OCR recognition for each type of document, and performs OCR recognition based on the created format definition. In addition, when a document was recognized as a semi-standard form and there was a defect in OCR recognition during operation, even if the range of OCR recognition was corrected, the corrected information was not reflected in the format definition. For this reason, there is a problem in that the format definition is forgotten to be corrected after the OCR recognition, or the range of the recognition failure portion occurs every time the OCR recognition is performed, so that the recognition range must be reset. In order to improve the extraction rate of the OCR recognition range, a format definition is required for each document to be recognized, and the format definition becomes enormous. Therefore, at the time of collating the format definitions, a matching format definition was not found, or it took time to collate the format definitions. There is also a problem that management of the corrected OCR recognition format definition becomes complicated.

FIG. 1 is a diagram exemplifying learning data managed by the image matching device 5 of the present invention.
FIG. 2 is a diagram illustrating an outline of OCR recognition in the image matching system 1.
In order to solve the above problem, the present invention holds a plurality of pieces of image data (comparison images) having different amounts of information for one document as illustrated in FIG. Associated with The image matching apparatus 5 of the present invention specifies a document that does not completely match the document to be subjected to the OCR recognition, but matches the plurality of pieces of image data, and thereby, based on the format definition suitable for the document to be subjected to the OCR recognition. It performs character recognition in order to increase the matching rate.
Further, as illustrated in FIG. 2, when the user corrects the OCR recognition range, that is, when the character recognition layout is corrected, the image matching system of the present invention generates learning data based on the corrected contents. Therefore, it is not necessary for the user to reset the format definition.

An embodiment of the present invention will be described with reference to the drawings.
FIG. 3 is a diagram illustrating an overall configuration of the image matching system 1.
As illustrated in FIG. 3, the image matching system 1 includes a plurality of scanners 3a, 3b, 3c and an image matching device 5, and is connected to each other via a network 7.
The scanner 3a, the scanner 3b, and the scanner 3c are collectively referred to as a scanner 3. The scanner 3 transmits image data (hereinafter, referred to as an input image) acquired by the optical reading device to the image matching device 5.
The image matching device 5 is a computer terminal and performs character recognition of image data received from the scanner 3. Specifically, the image matching device 5 specifies a format definition suitable for the input image to be used for character recognition, and performs character recognition of the input image by applying the specified format definition. More specifically, a format definition suitable for the input image is specified based on the comparison image generated by the image matching device 5.

FIG. 4 is a diagram illustrating a hardware configuration of the image matching device 5.
As illustrated in FIG. 4, the image matching device 5 includes a CPU 200, a memory 202, an HDD 204, a network interface 206 (network IF 206), a display device 208, and an input device 210, and these components are connected via a bus 212. Connected to each other.
The CPU 200 is, for example, a central processing unit.
The memory 202 is, for example, a volatile memory and functions as a main storage device.
The HDD 204 is, for example, a hard disk drive, and stores a computer program and other data files as a nonvolatile recording device.
The network IF 206 is an interface for performing wired or wireless communication.
The display device 208 is, for example, a liquid crystal display.
The input device 210 is, for example, a keyboard and a mouse.

FIG. 5 is a diagram illustrating a functional configuration of the image matching device 5.
As illustrated in FIG. 5, an image collation program 50 is installed in the image collation device 5, and the image collation program 50 is stored in a recording medium such as a CD-ROM, for example. The learning data database 600 (learning data DB 600) is configured while being installed in the image matching device 5.
The learning data DB 600 manages layout data for each document as illustrated in FIG. The layout data includes a format definition for character recognition of an input image, a comparison image associated with the format definition, and feature point data associated with the format definition. The comparison image and the feature point data are elements that determine a format definition used for character recognition of the input image.
Part or all of the image matching program 50 may be realized by hardware such as an ASIC, or may be realized by partially borrowing the function of an OS (Operating System). Further, the entire program may be installed on a single computer terminal, or may be installed on a virtual machine on a cloud.

The image matching program 50 includes an image acquisition unit 500, a comparison image storage unit 502, a coincidence determination unit 504, a format definition selection unit 506, an extraction unit 508, a layout correction unit 510, a comparison image generation unit 512, and a fixed format definition generation unit 514. , A feature point data extraction unit 516, and a comparison image registration unit 518.

In the image collation program 50, the image acquisition unit 500 acquires image data of a document scanned by the scanner 3 and sets it as an input image.
The comparison image storage unit 502 stores a plurality of comparison images generated by performing processing on image data of the same document in different regions. More specifically, the comparison image storage unit 502 stores at least two of the five types of comparison images patterned for one document. Further, the comparison image storage unit 502 stores format definition information (hereinafter, referred to as a format definition) that defines the format of the document in association with the comparison image. The format definition is information for specifying a document type for OCR recognition and information for specifying an OCR recognition range by using one of image data obtained by capturing a plurality of semi-standardized documents of the same type. For example, the format definition is information for specifying the OCR recognition range based on the keyword “customer name” and the position from the keyword (a condition consisting of upper, lower, left, and right). The format definition is defined by the user.

The coincidence determining unit 504 determines the coincidence by comparing each of the comparison images stored in the comparative image storage unit 502 with the newly input image. When the degree of coincidence between the comparison image and the input image exceeds the reference, the coincidence determination unit 504 determines that the two coincide. In addition, the matching degree determination unit 504 extracts learning data candidates used for character recognition of the input image based on the feature point data, and from among the candidates extracted based on the matching degree between the comparison image and the input image. Learning data having a matching degree exceeding a reference is determined.

The format definition selection unit 506 selects a format definition to be applied from the format definitions stored in the comparison image storage unit 502 based on the determination result by the matching degree determination unit 504. Specifically, the format definition selection unit 506 selects the format definition of the learning data determined by the matching degree determination unit 504 as the format definition used for character recognition of the input image.

The extracting unit 508 extracts information from a newly input image based on the format definition selected by the format definition selecting unit 506. Specifically, the extraction unit 508 performs character recognition on the input image based on the format definition, and displays the recognition result on an OCR recognition result confirmation screen as illustrated in FIG. On the OCR recognition result confirmation screen, each item name (date, telephone number, name, etc.) of the document and the value of the item are displayed. The user confirms the result of character recognition on the OCR recognition result confirmation screen, and corrects any error.

The layout correction unit 510 changes the character recognition range of the input image or the meaning (value of date, telephone number, name, etc.) of the item described in the character recognition range. Specifically, as illustrated in FIG. 7, an image of the input image is displayed on the layout correction screen, and when the character recognition range is reset by the user, the layout correction unit 510 receives the change, Change the character recognition range.

When the degree of coincidence determined by the degree-of-coincidence determination unit 504 is equal to or less than the reference for any of the comparative images, the comparative image generation unit 512 performs processing on the input image in regions different from each other, Generate a plurality of comparison images. Specifically, the comparison image generation unit 512 performs processing on the same input image so that a plurality of generated comparison images have different data sizes. Further, the comparison image generation unit 512 generates a plurality of comparison images by deleting images in different areas from the same input image.

The standard format definition generation unit 514 stores the format definition in which the character recognition range is changed by the layout correction unit 510 or the format definition in which the meaning of the item of the document is changed in the learning data DB 600 in association with the comparison image.
The feature point data extraction unit 516 extracts feature points of the comparison image corrected by the layout correction unit 510, and stores the feature points in the learning data DB 600 in association with the comparison image.
The comparison image registration unit 518 additionally registers a plurality of comparison images generated by the comparison image generation unit 512 in the comparison image storage unit 502. More specifically, the generated comparison images are stored in the learning data DB 600 in association with the format definition generated by the standard format definition generation unit 514 and the feature point data extracted by the feature point data extraction unit 516.

Next, a comparative image will be described.
FIG. 8 is a diagram illustrating an example of a patterned comparative image.
In this example, as illustrated in FIG. 8, the learning data DB 600 has five levels of comparison images for one document. The five-stage comparison image includes the same comparison image (original image data) as the input image, a comparison image in which an arbitrary region of the input image is deleted (pattern 1), and a comparison image in which the ruled line frame is deleted from the input image. (Pattern 2), a comparative image extracted only from outside the ruled line frame from the input image, and a comparative image (pattern 4) extracted only from the ruled line included in the input image.
Since five levels of comparison images are prepared for each document, even if the document that has been slightly changed is an input image, if it is determined that the document matches one of the five levels, the format definition can be specified. In addition, the character recognition for the input image becomes possible, and the matching rate is improved.
Further, the comparison image of pattern 1 is image data in which an area not to be collated is created at random from the original image data. Specifically, in the image data, the area that is not to be collated is placed at a random position (the x-coordinate and the y-coordinate range from (0, 0) to the maximum pixel of the document image data) at a random size (the size of the document). There are a plurality of rectangles (the size is in the range of 5% to 20% of one side (pixel) per side) in the image data (the number is random in the range of 1 to 10).

FIG. 9 is a flowchart illustrating the learning data generation process (S10).
As illustrated in FIG. 9, in step 100 (S100), the image acquiring unit 500 acquires image data of a document scanned by the scanner 3 and sets the image data as an input image.
In step 105 (S105), the coincidence determining unit 504 compares the input image with the comparative image, and searches for a comparative image whose coincidence exceeds the reference. If there is no comparison image with the matching degree exceeding the reference, the process proceeds to S110. If there is a comparison image with the matching degree exceeding the reference, the process proceeds to the image matching process (S30).
In step 110 (S110), the format definition selection unit 506 acquires a format definition associated with the semi-standard document.
In step 115 (S115), the extraction unit 508 performs character recognition of the input image based on the format definition selected by the format definition selection unit 506.
In step 120 (S120), the extraction unit 508 displays the character recognition result on the OCR recognition result confirmation screen, and the user confirms the result.

In step 125 (S125), if there is a character string that has not been recognized, the process proceeds to S145. If all character strings have been recognized, the process proceeds to S130.
In step 130 (S130), the comparison image generation unit 512 generates comparison images having five levels of different information amounts based on the image data of the semi-standard document used for the character recognition by the extraction unit 508.
In step 135 (S135), the standard format definition generation unit 514 generates a standard document format definition based on the semi-standard document format definition used for character recognition.
In step 140 (S140), the feature point data extraction unit 516 extracts feature points of the image data of the semi-standard document used by the extraction unit 508 for character recognition. The comparative image registration unit 518 stores the generated format definition, the comparative image generated in S130, and the feature point data in association with each other in the learning data DB 600.
In step 145 (S145), the layout correction unit 510 resets the range in which the character string is to be recognized based on a user operation performed on the layout correction screen.
In step 150 (S150), the extraction unit 508 performs character recognition in the range reset by the layout correction unit 510.
In step 155 (S155), if there is an error in the result of character recognition, the process proceeds to S160, and if there is no error, the process proceeds to S165.
In step 160 (S160), the extraction unit 508 receives and reflects the correction of the character recognition result by the user.
In step 165 (S165), the comparison image generation unit 512 generates comparison images having five levels of different information amounts based on the image data of the semi-standard document used for the character recognition by the extraction unit 508.

In step 170 (S170), the standard format definition generation unit 514 generates a standard document format definition based on the format definition of the semi-standard document used for character recognition and the correction information by the layout correction unit 510.
In step 175 (S175), the feature point data extraction unit 516 extracts feature points of the reset layout that has been reset. The comparison image registration unit 518 stores the generated format definition, the comparison image generated in S165, and the feature point data in association with each other in the learning data DB 600.
In step 180 (S180), the comparison image storage unit 502 manages the learning data stored in the learning data DB 600.
Conventionally, it is necessary to correct the format definition of the character recognition range after OCR recognition. However, the image collating device 5 sets the character recognition range by the user again or changes the meaning of the document item when the user changes the character recognition range. Since the learning data is generated based on the reset information, there is no need for the user to reset the format definition. There is no forgetting. In other words, it is not necessary to maintain the format definition necessary for recognizing a large number of OCRs.

FIG. 10 is a flowchart illustrating the image matching process (S30).
As illustrated in FIG. 10, in step 300 (S300), the image acquisition unit 500 acquires image data of a document scanned by the scanner 3 and sets the image data as an input image.
In step 305 (S305), if there is no learning data, the process proceeds to a learning data generation process (S10), and if there is learning data, the process proceeds to S310.
In step 310 (S310), the matching degree determination unit 504 compares the input image with the feature point data stored in the learning data DB 600, and extracts learning data candidates whose matching degree exceeds the reference.
In step 315 (S315), the matching degree determination unit 504 compares the input image with the five-stage comparison image of the extracted candidate learning data. The matching degree determination unit 504 compares the comparison image with the input image in descending order of the information amount. Specifically, the matching degree determination unit 504 determines the input image in the order of the first-stage comparison image, the second-stage comparison image, the third-stage comparison image, the fourth-stage comparison image, and the fifth-stage comparison image. Compare with By comparing the comparison image with the input image in the order of a large amount of information, more accurate collation can be performed.

In step 320 (S320), when the matching degree determination unit 504 determines that there is a comparison image whose matching degree with the input image exceeds the reference, the image matching process (S30) proceeds to S325, and the matching degree If there is no comparison image exceeding the standard, the image comparison process (S30) proceeds to S110 of the learning data generation process (S10).
In step 325 (S325), the format definition selection unit 506 acquires a format definition associated with the comparative image whose degree of coincidence with the comparative image exceeds the reference.
In step 330 (S330), the extraction unit 508 performs character recognition of the input image based on the format definition selected by the format definition selection unit 506.
In step 335 (S335), the user checks the recognition result on the OCR recognition result check screen.
In step 340 (S340), if there is any unrecognized character string, the image matching process (S30) shifts to S130 of the learning data generation process (S10). finish.

As described above, according to the image collation system 1 of the present embodiment, since a plurality of patterns of comparison images are generated for one document, the input image is slightly different from the original image data. However, the user can specify the format definition by matching any one of the plurality of comparison images without correcting the character recognition range each time. That is, the work efficiency of the character recognition processing, the collation performance, and the collation rate of the character recognition are increased.
In addition, when a comparative image of a plurality of patterns is generated, an area that is not collated is created at random, so that the area that is not collated differs for each document, and the pattern of the comparative image is not fixed.
Then, even when there is no learning data suitable for the input image, the user recognizes the correction operation of the comparison image by the user and generates and manages new learning data based on the correction information, so that maintenance of the format definition is unnecessary. Become.
Further, even if the scanner characteristics are changed due to the change of the model of the scanner 3 and the former format definition cannot be used, the image collating device 5 generates a new format definition by learning. There is no need to create a user-defined format definition.

In the above embodiment, the input image is compared with the five-stage learning data created by the comparison image generation unit 512, but the comparison image of the pattern 1 associated with one document may be changed.
Specifically, the image matching device 5 according to the modified example includes an exclusion area changing unit 520 in addition to the functional configuration illustrated in FIG. The comparison image generation unit 512 randomly creates a non-matching area of the pattern 1 for each document, while the exclusion area change unit 520 changes the already created matching area of the pattern 1. Specifically, the exclusion area change unit 520 changes at least one of the number of image areas, the size of the image area, and the position of the image area for the image area excluded from the comparison target. For example, when a comparison image of one pattern 1 is generated for one document by the comparison image generation unit 512 and is managed, when the comparison image of the pattern 1 is compared with the input image, no comparison is performed. Since the area is fixed, a document having a high collation rate and a document having a low collation rate appear. And low documents can be reduced.

In the present embodiment, the image scanned by the scanner 3 is transmitted to the image matching device 5 and the image matching device 5 compares the input image with the comparison image. However, the present invention is not limited to this. The matching program 50 may be installed, the scanner 3 scans the image, and compares the input image with the comparative image.

DESCRIPTION OF SYMBOLS 1 ... Image collation system 3 ... Scanner 5 ... Image collation device 50 ... Image collation program

Claims

A comparison image storage unit that stores a plurality of comparison images generated by performing processing on different areas for image data of the same document in association with each other;
An image collating apparatus comprising: a comparison image stored in the comparison image storage unit; and a coincidence determining unit that determines a coincidence by comparing a newly input image with the input image.
The comparison image storage unit stores format definition information defining a format of a document in association with the comparison image,
From the format definition information stored in the comparison image storage unit, based on a determination result by the matching degree determination unit, a format definition selection unit that selects format definition information to be applied,
The image collating apparatus according to claim 1, further comprising: an extracting unit configured to extract information from a newly input image based on the format definition information selected by the format definition selecting unit.
When the matching degree determined by the matching degree determination unit is equal to or less than a reference for any of the comparison images, a processing process is performed on the input image in areas different from each other to generate a plurality of comparison images. A comparison image generation unit,
The image matching device according to claim 1, further comprising: a comparison image registration unit that additionally registers a plurality of comparison images generated by the comparison image generation unit in the comparison image storage unit.
The image comparison device according to claim 3, wherein the comparison image generation unit performs processing on the same input image so that a plurality of generated comparison images have different data sizes.
The image comparison device according to claim 3, wherein the comparison image generation unit generates a plurality of comparison images by deleting images in different regions from the same input image.
The comparison image storage unit includes: a comparison image that is the same as the input image; a comparison image in which an arbitrary region of the input image is deleted; a comparison image in which the inside of a ruled line frame is deleted from the input image; The image matching device according to claim 1, wherein at least two of the comparison image from which only the extracted image is extracted and the comparison image from which only the ruled line included in the input image is extracted are stored.
The image area excluded from the comparison target, further includes an exclusion area change unit that changes at least one of the number of image areas, the size of the image area, and the position of the image area,
The coincidence determining unit, for at least one of the comparative images, excludes the image area changed by the exclusion area changing unit from the comparison target, compares the input image with the comparative image, and determines the coincidence. The image collation device according to claim 1.
Generating a plurality of comparative images generated by performing processing on different regions with respect to image data of the same document;
Registering the plurality of generated comparative images in a database in association with each other,
Comparing each of the comparison images registered in the database with the newly input image to determine the degree of coincidence.
Generating a plurality of comparative images generated by performing processing on different regions with respect to image data of the same document;
Registering the plurality of generated comparative images in a database in association with each other,
Comparing each of the comparison images registered in the database with a newly input image to determine the degree of coincidence.