JP2020053891A

JP2020053891A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2020053891A
Application number: JP2018182651A
Authority: JP
Inventors: 侑吾西川; Yugo Nishikawa; 伊藤　直之; Naoyuki Ito; 直之伊藤; 聡田端; Satoshi Tabata; 拓也生駒; Takuya Ikoma
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2020-04-02
Anticipated expiration: 2038-09-27
Also published as: JP7107138B2

Abstract

To provide an information processing apparatus, or the like, configured to properly manage documents.SOLUTION: An information processing apparatus 1 include: an acquisition unit which acquires a sheet image obtained by scanning a filled sheet having a content printed thereon and a handwritten object written by hand; a content specifying unit which specifies the content from the sheet image; an extraction unit which extracts the handwritten object from the sheet image; and a generation unit which specifies a masking area by excluding the handwritten object from a content area, which is a peripheral area including the contents, and generates the sheet image processed to obscure the masking area.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

文書管理を行う場合に、文書の秘匿すべき部分をマスキング等によって視認不可能にした上で文書を管理することが一般的に行われている。例えば特許文献１では、文書をスキャンした画像にマスキングを行った上で保存する文書管理システムであって、文書内の領域毎に公開レベルを設定し、公開レベルに応じて異なる領域をマスキングした複数の画像を生成する文書管理システムが開示されている。 When performing document management, it is common practice to manage a document after making a portion of the document to be concealed invisible by masking or the like. For example, Patent Document 1 discloses a document management system in which a scanned image of a document is masked and then stored, and a disclosure level is set for each area in the document, and a plurality of areas are masked in different areas according to the disclosure level. A document management system for generating an image is disclosed.

特開２００７−１４０９５８号公報JP 2007-140958 A

しかしながら、特許文献１に係る発明は、文書内の予め定められた領域をマスクするものである。従って、文書内に手書きで文字等が記入されている場合に、文字等が記入された位置によっては、手書きの文字等までマスクされる虞がある。 However, the invention according to Patent Document 1 masks a predetermined area in a document. Therefore, when a character or the like is written by hand in a document, depending on the position where the character or the like is written, even a handwritten character or the like may be masked.

一つの側面では、文書管理を適切に行うことができる情報処理装置等を提供することを目的とする。 An object of one aspect is to provide an information processing apparatus or the like that can appropriately perform document management.

一つの側面では、情報処理装置は、予め印刷されたコンテンツと手書きで記入された手書きオブジェクトとを有する記入用紙をスキャンした用紙画像を取得する取得部と、前記用紙画像から、前記コンテンツを特定するコンテンツ特定部と、前記用紙画像から、前記手書きオブジェクトを抽出する抽出部と、前記コンテンツを含む周辺領域であるコンテンツ領域から前記手書きオブジェクトを除いたマスキング領域を特定し、前記マスキング領域を視認できないように画像処理した前記用紙画像を生成する生成部とを備えることを特徴とする。 In one aspect, an information processing apparatus specifies an acquisition unit that acquires a paper image obtained by scanning an entry sheet having pre-printed content and a handwritten object entered by hand, and the content from the paper image. A content specifying unit, an extracting unit that extracts the handwritten object from the paper image, and a masking area that excludes the handwritten object from a content area that is a peripheral area including the content so that the masking area cannot be visually recognized. And a generation unit that generates the paper image subjected to image processing.

一つの側面では、文書管理を適切に行うことができる。 In one aspect, document management can be performed appropriately.

文書管理システムの構成例を示す模式図である。FIG. 1 is a schematic diagram illustrating a configuration example of a document management system. ＰＣの構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a PC. コンテンツ領域の特定処理に関する説明図である。FIG. 9 is an explanatory diagram regarding a process of specifying a content area. 手書き記入領域の特定処理に関する説明図である。It is explanatory drawing regarding the specific processing of a handwriting entry area. マスク処理に関する説明図である。It is explanatory drawing regarding a mask process. ＰＣが実行する処理手順の一例を示すフローチャートである。9 is a flowchart illustrating an example of a processing procedure executed by a PC. 実施の形態２に係るコンテンツ領域の特定処理に関する説明図である。FIG. 14 is an explanatory diagram relating to a content area specifying process according to the second embodiment; 実施の形態２に係るＰＣが実行する処理手順の一例を示すフローチャートである。13 is a flowchart illustrating an example of a processing procedure executed by the PC according to the second embodiment. 上述した形態のＰＣの動作を示す機能ブロック図である。It is a functional block diagram showing operation of PC of the above-mentioned form.

以下、本発明をその実施の形態を示す図面に基づいて詳述する。
（実施の形態１）
図１は、文書管理システムの構成例を示す模式図である。本実施の形態では、管理対象とする文書の一例として、学校のテストで回答者（生徒）が設問に対する回答を手書きで記入するテスト用紙（記入用紙）を想定し、テスト用紙をスキャンした用紙画像にマスキングを行った上で保存する文書管理システムについて説明する。文書管理システムは、情報処理装置１、サーバ２、スキャナ３を含む。情報処理装置１及びサーバ２は、インターネット等のネットワークＮを介して通信接続されている。 Hereinafter, the present invention will be described in detail with reference to the drawings showing the embodiments.
(Embodiment 1)
FIG. 1 is a schematic diagram illustrating a configuration example of a document management system. In the present embodiment, as an example of a document to be managed, a test sheet (entry sheet) in which a respondent (student) manually writes an answer to a question in a school test is assumed, and a sheet image obtained by scanning the test sheet is assumed. A document management system that performs masking and then saves the document will be described. The document management system includes an information processing device 1, a server 2, and a scanner 3. The information processing device 1 and the server 2 are communicatively connected via a network N such as the Internet.

情報処理装置１は、種々の情報処理、情報の送受信が可能な情報処理装置であり、例えばパーソナルコンピュータ、複合機、多機能端末等である。本実施の形態では情報処理装置１がパーソナルコンピュータであるものとし、以下の説明では簡潔のためにＰＣ１と読み替える。本実施の形態に係るＰＣ１は学校の教員が操作する端末装置であり、紙媒体を光学的に読み取るスキャナ３に接続されている。ＰＣ１は、スキャナ３がスキャンしたテスト用紙の画像をサーバ２にアップロードし、保存させる。この場合にＰＣ１は、テスト用紙に予め印刷されている文章、絵、写真等のコンテンツであって、著作物に該当する可能性があるコンテンツが印刷されたコンテンツ領域に対しマスキングを行い、マスクされた画像をアップロードする。 The information processing device 1 is an information processing device capable of performing various types of information processing and information transmission / reception, and is, for example, a personal computer, a multifunction peripheral, a multifunction terminal, or the like. In the present embodiment, it is assumed that the information processing apparatus 1 is a personal computer, and in the following description, it is replaced with PC1 for simplicity. The PC 1 according to the present embodiment is a terminal device operated by a school teacher, and is connected to a scanner 3 that optically reads a paper medium. The PC 1 uploads the image of the test paper scanned by the scanner 3 to the server 2 and stores it. In this case, the PC 1 performs masking on a content area in which content such as text, a picture, and a photograph, which is printed in advance on a test sheet, and which may correspond to a literary work, is masked and masked. Upload the image.

サーバ２は所謂クラウドサーバであり、ＰＣ１からアップロードされたテスト用紙の画像をデータベースに格納して保存する。本実施の形態と無関係であるため詳細な説明は省略するが、サーバ２はテスト用紙に記入された回答の採点、集計、分析を行って分析結果を生徒及び教員に通知し、個々の生徒に適した教材の提供を行う。 The server 2 is a so-called cloud server, and stores an image of a test sheet uploaded from the PC 1 in a database. Although the detailed description is omitted because it is irrelevant to the present embodiment, the server 2 performs the scoring, totaling, and analysis of the answers written on the test sheet, notifies the analysis result to the student and the teacher, and notifies each student. Provide suitable teaching materials.

なお、画像の保存場所はクラウド上のサーバ２に限定されず、例えばＰＣ１がローカルで保存しておいてもよい。 The storage location of the image is not limited to the server 2 on the cloud, and the PC 1 may store the image locally, for example.

また、以下の説明ではローカル端末であるＰＣ１がマスク処理を行うものとして説明するが、画像のアップロード先であるサーバ２がマスク処理を行ってもよい。 In the following description, it is assumed that the PC 1, which is the local terminal, performs the masking process, but the server 2, which is the upload destination of the image, may perform the masking process.

上記のようにテスト用紙をスキャンして画像を保存する際に、テスト用紙に著作物が含まれる場合、スキャンした画像をそのまま保存すると著作物の不正な複製に該当する虞がある。このような事態を回避するため、著作物に該当する可能性がある部分をマスクすることが考えられる。 When the test paper is scanned and the image is stored as described above, if the test paper contains a copyrighted work, if the scanned image is stored as it is, it may correspond to an illegal copy of the copyrighted work. In order to avoid such a situation, it is conceivable to mask a portion that may be a copyrighted work.

一方で、例えば回答者が問題文と重ねて回答を記入した場合など、回答が記入された位置によってはマスクすることで回答が判別できなくなる虞がある（図４、図５等参照）。そこで本実施の形態では、ＰＣ１は、手書きの回答（オブジェクト）が記入された領域を画像から特定し、特定した領域をマスキングの対象から除外することで、回答が判別できなくなる事態を防止する。 On the other hand, for example, when a respondent fills in an answer with a question sentence over, there is a possibility that the answer cannot be determined by masking depending on the position where the answer is written (see FIGS. 4 and 5). Therefore, in the present embodiment, the PC 1 identifies a region in which a handwritten answer (object) is written from an image, and excludes the identified region from a target of masking, thereby preventing a situation in which the answer cannot be determined.

図２は、ＰＣ１の構成例を示すブロック図である。ＰＣ１は、制御部１１、主記憶部１２、通信部１３、表示部１４、入力部１５、補助記憶部１６を備える。
制御部１１は、一又は複数のＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）等の演算処理装置を有し、補助記憶部１６に記憶されたプログラムＰを読み出して実行することにより、ＰＣ１に係る種々の情報処理、制御処理等を行う。主記憶部１２は、ＲＡＭ等の一時記憶領域であり、制御部１１が演算処理を実行するために必要なデータを一時的に記憶する。通信部１３は通信モジュールであり、外部と情報の送受信を行う。表示部１４は、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ等の表示装置であり、制御部１１から与えられた画像を表示する。入力部１５はキーボード、マウス等の操作インターフェイスであり、操作内容を制御部１１に入力する。補助記憶部１６はハードディスク、大容量メモリ等の不揮発性記憶領域であり、制御部１１が処理を実行するために必要なプログラムＰ、その他のデータを記憶している。 FIG. 2 is a block diagram illustrating a configuration example of the PC 1. The PC 1 includes a control unit 11, a main storage unit 12, a communication unit 13, a display unit 14, an input unit 15, and an auxiliary storage unit 16.
The control unit 11 has one or more arithmetic processing units such as a CPU (Central Processing Unit) and an MPU (Micro-Processing Unit), and reads and executes the program P stored in the auxiliary storage unit 16, Various information processing, control processing, and the like related to the PC 1 are performed. The main storage unit 12 is a temporary storage area such as a RAM, and temporarily stores data necessary for the control unit 11 to execute arithmetic processing. The communication unit 13 is a communication module, and transmits and receives information to and from the outside. The display unit 14 is a display device such as a liquid crystal display and an organic EL (Electro Luminescence) display, and displays an image given from the control unit 11. The input unit 15 is an operation interface such as a keyboard and a mouse, and inputs operation contents to the control unit 11. The auxiliary storage unit 16 is a non-volatile storage area such as a hard disk or a large-capacity memory, and stores a program P and other data necessary for the control unit 11 to execute a process.

図３は、コンテンツ領域の特定処理に関する説明図である。図３では、テスト用紙をスキャンした画像からコンテンツが印刷された領域を特定する様子を概念的に図示している。以下では本実施の形態の概要について説明する。 FIG. 3 is an explanatory diagram relating to the process of specifying the content area. FIG. 3 conceptually illustrates how a region where the content is printed is specified from an image obtained by scanning a test sheet. Hereinafter, an outline of the present embodiment will be described.

ＰＣ１はスキャナ３から、回答者が手書きの回答（手書きオブジェクト）を記入するテスト用紙（回答用紙）であって、著作物に該当する可能性があるコンテンツが印刷されたテスト用紙をスキャンした用紙画像を取得する。テスト用紙は、設問に対する回答を記入する記入用紙であって、例えば図３に示すように、設問文、設問内容、回答欄等が文字、記号等によって印刷されている。なお、テスト用紙の印刷内容（コンテンツ）は文字、記号のほかに、例えば図形、写真、絵などを含んでもよく、その内容は特に限定されない。 The PC 1 is a paper image obtained by scanning a test sheet (answer sheet) on which a respondent writes a handwritten answer (handwritten object) from the scanner 3 and on which a content that may be a copyrighted work is printed. To get. The test sheet is an entry sheet for writing an answer to the question, and, for example, as shown in FIG. 3, a question sentence, a question content, an answer column, and the like are printed with characters, symbols, and the like. The print contents (contents) of the test paper may include, for example, figures, photographs, pictures, and the like in addition to characters and symbols, and the contents are not particularly limited.

回答者はテスト用紙に、文字等によって手書きで回答を記入する。なお、記入する内容は文字のほかにストローク、記号、図形等であってもよく、手書きで記入された何らかのオブジェクトであればよい。 The respondent writes the answer by hand on a test sheet using characters or the like. The contents to be entered may be strokes, symbols, figures, and the like in addition to the characters, and may be any object that is entered by hand.

ＰＣ１は用紙画像から、設問文、設問内容、回答欄等に相当する箇所、すなわちコンテンツを特定し、コンテンツを含む周辺領域であるコンテンツ領域を特定する。図３では、ハッチングを付した部分がコンテンツ領域に該当する。例えばＰＣ１は、回答が未記入であるテスト用紙の用紙画像からコンテンツ領域を特定する。ＰＣ１は、著作物に該当する可能性があるコンテンツ領域を特定し、当該領域にマスキングを行う。 The PC 1 specifies a portion corresponding to a question sentence, a question content, an answer column, and the like, that is, a content, from the paper image, and specifies a content region that is a peripheral region including the content. In FIG. 3, a hatched portion corresponds to a content area. For example, the PC 1 specifies a content area from a sheet image of a test sheet for which no answer has been entered. The PC 1 specifies a content area that may correspond to a literary work, and masks the area.

コンテンツ領域を特定する手法は、種々の手法が考えられる。図３では例示として、３つの手法について概念的に図示している。 Various methods are conceivable for specifying the content area. FIG. 3 conceptually illustrates three methods as examples.

第１の手法として、ＰＣ１は、用紙画像に対する文字認識を行ってテスト用紙に印刷されているテキストを抽出し、抽出したテキストをコンテンツとして特定して、当該テキストを囲む領域をコンテンツ領域として特定する。図３右上に、文字認識を行ってコンテンツ領域を特定する様子を図示している。ＰＣ１は、用紙画像に対して文字認識を行い、テスト用紙に印刷されている個々のテキストを抽出する。ＰＣ１は、各テキストをコンテンツとして特定し、各テキストが被覆されるように、ハッチングで示す矩形領域をコンテンツ領域に設定（特定）する。 As a first method, the PC 1 performs character recognition on a paper image to extract a text printed on a test paper, specifies the extracted text as content, and specifies an area surrounding the text as a content area. . The upper right part of FIG. 3 illustrates how character recognition is performed to specify a content area. The PC 1 performs character recognition on the paper image and extracts individual texts printed on the test paper. The PC 1 specifies each text as a content, and sets (specifies) a rectangular area indicated by hatching as a content area so that each text is covered.

第２の手法として、ＰＣ１は、用紙画像内の画素値の分布（画素密度）に応じてコンテンツ領域を特定する。図３右側中央に、画素値の分布に基づきコンテンツ領域を特定する様子を図示している。ＰＣ１は、用紙画像内の各画素の画素値を参照し、画素値の分布状況を判別して、テスト用紙に印刷されている文字、記号、ストローク等のように、テスト用紙の背景と異なる部分を特定する。例えばＰＣ１は、用紙画像内のある画素に着目する場合、当該画素及びその周辺の画素（例えば３×３ピクセル）における、背景色とのＲＧＢ値の差分が閾値以上の画素の数をカウントする。ＰＣ１は、カウントした画素が所定数（例えば３ピクセル）以上ある場合、当該画素を含む部分をコンテンツとして特定する。 As a second method, the PC 1 specifies a content area according to the distribution of pixel values (pixel density) in a paper image. In the center on the right side of FIG. 3, a state in which a content area is specified based on a distribution of pixel values is illustrated. The PC 1 refers to the pixel value of each pixel in the paper image, determines the distribution state of the pixel value, and determines a portion different from the background of the test paper, such as characters, symbols, strokes, etc., printed on the test paper. To identify. For example, when focusing on a certain pixel in the paper image, the PC 1 counts the number of pixels in the pixel and its surrounding pixels (for example, 3 × 3 pixels) whose difference in RGB value from the background color is equal to or larger than a threshold value. When the number of counted pixels is equal to or larger than a predetermined number (for example, three pixels), the PC 1 specifies a portion including the pixel as the content.

ＰＣ１は、特定した部分を被覆するようにコンテンツ領域を設定（特定）する。これにより、文字認識では特定することが困難な記号、ストロークなどもマスキングを施すことができる。 The PC 1 sets (specifies) a content area so as to cover the specified portion. This makes it possible to mask even symbols and strokes that are difficult to identify by character recognition.

なお、ＰＣ１が参照する画素値はＲＧＢ値のような色に関する値であってもよく、輝度のような明るさに関する値であってもよい。 Note that the pixel value referred to by the PC1 may be a value related to color such as an RGB value or a value related to brightness such as luminance.

第３の手法として、ＰＣ１は、ディープラーニング等の機械学習により予め構築されている識別器（学習済みモデル）を用いて、用紙画像からコンテンツを特定する。図３下側に、識別器に用紙画像を入力してコンテンツ領域の識別結果を出力として得る様子を図示している。例えばＰＣ１は、ＣＮＮ（Convolution Neural Network；畳み込みニューラルネットワーク）等に係る識別器であって、コンテンツ領域の正解値（正解の座標範囲）がラベル付けされた用紙画像の教師データを元に、コンテンツの特徴を学習済みの識別器を用いる。なお、機械学習の手法はＣＮＮに限定されず、他のニューラルネットワーク、ＳＶＭ（Support Vector Machine）、ベイジアンネットワーク、決定木等であってもよい。ＰＣ１は、用紙画像を識別器に入力して画像特徴量を抽出し、コンテンツを識別した識別結果を出力値として取得する。ＰＣ１は、識別器から出力された識別結果に従ってコンテンツ領域を設定（特定）する。 As a third method, the PC 1 specifies a content from a paper image using a classifier (learned model) constructed in advance by machine learning such as deep learning. The lower side of FIG. 3 illustrates a state in which a paper image is input to the classifier and the result of the identification of the content area is obtained as an output. For example, the PC 1 is a classifier related to a CNN (Convolution Neural Network) or the like, and based on teacher data of a paper image to which a correct answer value (correct coordinate range) of a content area is labeled, the PC 1 A classifier with learned features is used. The method of machine learning is not limited to CNN, but may be another neural network, an SVM (Support Vector Machine), a Bayesian network, a decision tree, or the like. The PC 1 inputs the paper image to the discriminator, extracts the image feature amount, and obtains a result of discriminating the content as an output value. The PC 1 sets (specifies) a content area according to the identification result output from the identifier.

ＰＣ１は、上記で例示した手法のいずれかを用いてコンテンツ領域を特定する。あるいはＰＣ１は、複数の手法を組み合わせてコンテンツ領域を特定するようにしてもよい。 The PC 1 specifies the content area by using any of the methods exemplified above. Alternatively, the PC 1 may specify a content area by combining a plurality of methods.

図４は、手書き記入領域の特定処理に関する説明図である。図４では、回答が記入済みのテスト用紙の用紙画像から、手書きで記入された回答（オブジェクト）を抽出し、抽出した回答を囲む手書き記入領域を特定する様子を概念的に図示している。 FIG. 4 is an explanatory diagram relating to a process for specifying a handwritten entry area. FIG. 4 conceptually illustrates a state in which an answer (object) written by hand is extracted from a sheet image of a test sheet on which an answer has been written, and a handwritten writing area surrounding the extracted answer is specified.

図４上段に示す画像例では、回答者が記入した回答が、本来回答を記入すべき範囲からはみ出して記入されており、一部が問題文（コンテンツ）と重なっている。既に述べたように、このような状態でコンテンツ領域にそのままマスキングを施した場合、回答の一部も視認不可能となる。そこでＰＣ１は、手書きで記入されている回答を用紙画像から抽出し、抽出した回答を囲むように、マスク対象から除外する手書き記入領域を特定する。 In the example of the image shown in the upper part of FIG. 4, the answer entered by the respondent is out of the range where the answer should be originally entered, and is partially overlapped with the question sentence (content). As described above, if the content area is masked as it is in such a state, a part of the answer becomes invisible. Therefore, the PC 1 extracts the handwritten answer from the paper image and specifies a handwritten entry area to be excluded from the masking target so as to surround the extracted answer.

例えばＰＣ１は、回答が記入済みの用紙画像と、回答が未記入の用紙画像とを比較し、両者の差分を抽出することで手書きの回答を抽出する。例えばＰＣ１は、回答が記入済みである用紙画像内の画素値を、回答が未記入である用紙画像内の画素値であって、記入済みの用紙画像の画素と同一画素の画素値と比較する。ＰＣ１は、各画素について画素値の比較を行い、両者の差分を抽出していく。 For example, the PC 1 compares a paper image in which a response has been entered and a paper image in which a response has not been entered, and extracts a handwritten response by extracting the difference between the two. For example, the PC 1 compares the pixel value in the paper image for which the answer has been entered with the pixel value of the same pixel as the pixel value of the paper image for which the answer has not been entered. . PC1 compares the pixel values of each pixel and extracts the difference between them.

図４中段に、抽出した回答を示す画像例を図示する。図４に示すように、ＰＣ１は、手書きで記入された回答に相当するオブジェクトを抽出する。ＰＣ１は、当該オブジェクトを囲むように、例えば矩形枠の領域を手書き記入領域として設定（特定）する。 The middle part of FIG. 4 illustrates an image example showing the extracted answer. As shown in FIG. 4, the PC 1 extracts an object corresponding to a handwritten answer. The PC 1 sets (specifies), for example, a rectangular frame area as a handwritten entry area so as to surround the object.

図５は、マスク処理に関する説明図である。図５では、回答が記入されたテスト用紙の用紙画像に対し、マスキングを行う様子を図示している。 FIG. 5 is an explanatory diagram relating to the mask processing. FIG. 5 illustrates a state in which masking is performed on the sheet image of the test sheet in which the answer has been entered.

ＰＣ１は、回答が記入済みの用紙画像に対して、上記で特定したコンテンツ領域から、手書き記入領域を除外した領域を視認できないように加工する画像処理、つまりマスク処理を行う。図５右側に、当該処理を概念的に図示している。図５右側では、黒塗りの矩形枠がマスキング対象であるコンテンツ領域を、白抜きの矩形枠が手書き記入領域をそれぞれ表す。ＰＣ１は、図５右の上段に示すコンテンツ領域と、図５右の中段に示す手書き記入領域とを比較し、図５右の下段に示す、コンテンツ領域から手書き記入領域を除外した領域を特定する。これにより、図５右の下段で点線矩形枠により示すように、コンテンツ領域と手書き記入領域との重複部分、すなわち問題文と回答とが重なった部分がマスキングの対象から除外される。 The PC 1 performs image processing, that is, mask processing, on the paper image in which the answer has been entered, so that the area excluding the handwritten entry area from the content area specified above cannot be visually recognized. The process is conceptually illustrated on the right side of FIG. On the right side of FIG. 5, a black rectangular frame indicates a content area to be masked, and a white rectangular frame indicates a handwritten entry area. The PC 1 compares the content area shown in the upper right part of FIG. 5 with the handwritten entry area shown in the middle part of FIG. 5 and specifies the area shown in the lower right part of FIG. 5 excluding the handwritten entry area from the content area. . As a result, as shown by the dotted rectangular frame in the lower right part of FIG. 5, the overlapping part of the content area and the handwritten entry area, that is, the part where the question sentence and the answer overlap are excluded from the masking target.

図５左側に示すように、ＰＣ１は、上記で特定した領域を視認不可能にするマスキングを行い、保存用のマスク画像を生成する。ＰＣ１は、生成したマスク画像をサーバ２に送信し、データベース上に記憶させる。 As shown on the left side of FIG. 5, the PC 1 performs masking to make the above-identified area invisible and generates a mask image for storage. The PC 1 transmits the generated mask image to the server 2 and stores it on the database.

図６は、ＰＣ１が実行する処理手順の一例を示すフローチャートである。図６に基づき、ＰＣ１が実行する処理内容について説明する。
ＰＣ１の制御部１１は、スキャナ３から、予め印刷されたコンテンツと、手書きで記入されたオブジェクト（回答）とを有するテスト用紙をスキャンした用紙画像を取得する（ステップＳ１１）。コンテンツは、著作物に該当する可能性がある文章、絵、写真等であるが、その具体的な内容は特に限定されない。 FIG. 6 is a flowchart illustrating an example of a processing procedure executed by the PC 1. The processing executed by the PC 1 will be described with reference to FIG.
The control unit 11 of the PC 1 obtains, from the scanner 3, a sheet image obtained by scanning a test sheet having preprinted content and an object (answer) entered by handwriting (step S11). The content is a text, a picture, a photograph, or the like that may correspond to a copyrighted work, but the specific content is not particularly limited.

制御部１１は、用紙画像からコンテンツを特定する（ステップＳ１２）。具体的には、制御部１１は、回答が未記入の用紙画像からコンテンツを特定する。例えば制御部１１は、用紙画像に対する文字認識によりテスト用紙に印刷されているテキストを抽出し、抽出したテキストをコンテンツとして特定してもよい。また、例えば制御部１１は、用紙画像内の画素値の分布（画素密度）に応じてコンテンツを特定してもよい。また、例えば制御部１１は、オブジェクトが未記入の用紙画像をコンテンツの教師データとして学習済みの識別器を用いて、コンテンツを特定するようにしてもよい。 The control unit 11 specifies the content from the paper image (Step S12). Specifically, the control unit 11 specifies the content from the paper image for which no answer has been entered. For example, the control unit 11 may extract a text printed on a test sheet by character recognition of a sheet image, and specify the extracted text as a content. Further, for example, the control unit 11 may specify the content in accordance with the distribution (pixel density) of the pixel values in the paper image. Further, for example, the control unit 11 may specify the content by using a learned classifier using a sheet image in which an object is not filled in as teacher data of the content.

制御部１１は用紙画像から、テスト用紙に手書きで記入されたオブジェクト（回答）を抽出する（ステップＳ１３）。具体的には、制御部１１は、回答が記入済みであるテスト用紙の用紙画像と、回答が未記入であるテスト用紙の用紙画像とを比較し、手書きで記入された文字等を抽出する。制御部１１は、抽出したオブジェクトを囲むように、手書き記入領域を特定する（ステップＳ１４）。 The control unit 11 extracts an object (answer) written on the test paper by hand from the paper image (step S13). Specifically, the control unit 11 compares the sheet image of the test sheet on which the answer has been entered with the sheet image of the test sheet on which the answer has not been entered, and extracts characters and the like written by hand. The control unit 11 specifies a handwritten entry area so as to surround the extracted object (step S14).

制御部１１は用紙画像に対し、ステップＳ１２で特定したコンテンツを含む周辺領域であるコンテンツ領域から、ステップＳ１４で特定した手書き記入領域を除外したマスキング領域を特定し、特定したマスキング領域を視認できないように画像処理したマスク画像を生成する（ステップＳ１５）。制御部１１は、生成したマスク画像をサーバ２に送信し（ステップＳ１６）、一連の処理を終了する。 The control unit 11 specifies a masking area on the sheet image, excluding the handwritten area specified in step S14, from the content area that is the peripheral area including the content specified in step S12, so that the specified masking area cannot be visually recognized. (Step S15). The control unit 11 transmits the generated mask image to the server 2 (Step S16), and ends a series of processing.

なお、上記ではマスキングを施す記入用紙の一例としてテスト用紙を挙げたが、例えば所定の質問（設問）に対して回答を記入するアンケート用紙などであってもよい。 In the above description, a test sheet has been described as an example of an entry sheet to be masked. However, a questionnaire sheet for answering a predetermined question (question) may be used.

また、上記ではマスキングを行うことでコンテンツ領域を視認不可能にするようにしたが、本実施の形態はこれに限定されるものではなく、例えば用紙画像からコンテンツ領域を切り取る（除去する）ようにしてもよい。この場合でも、コンテンツ（著作物）に該当する部分を適切に処理することができ、上記と同様の効果を奏する。このように、ＰＣ１はコンテンツ領域を視認不可能とした用紙画像を生成することができればよく、画像の加工方法はマスキングに限定されない。 Further, in the above, the content area is made invisible by performing masking. However, the present embodiment is not limited to this. For example, the content area may be cut out (removed) from the paper image. You may. Even in this case, the portion corresponding to the content (work) can be appropriately processed, and the same effect as above can be obtained. As described above, the PC 1 only needs to be able to generate a paper image in which the content area is invisible, and the image processing method is not limited to masking.

また、上記では回答が未記入の用紙画像からコンテンツを特定したが、本実施の形態はこれに限定されるものではなく、回答が記入済みの用紙画像からコンテンツを特定してもよい。例えばＰＣ１は、多数の回答者それぞれの用紙画像から、各画像において共通して出現する文字、記号、図形等を抽出することにより、テスト用紙に印刷されたコンテンツを特定するようにしてもよい。また、例えばＰＣ１は、著作物に該当するコンテンツとして予め定められた画像データを格納したデータベースを参照して、用紙画像からコンテンツを特定するようにしてもよい。このように、ＰＣ１はテスト用紙をスキャンした用紙画像からコンテンツを特定可能であればよく、回答の記入の有無は特に限定されない。 Further, in the above description, the content is specified from the sheet image in which the answer is not entered. However, the present embodiment is not limited to this, and the content may be identified from the sheet image in which the answer is entered. For example, the PC 1 may specify the content printed on the test paper by extracting characters, symbols, graphics, and the like that appear in each image from the paper images of a large number of respondents. Further, for example, the PC 1 may specify the content from the paper image with reference to a database storing image data predetermined as the content corresponding to the literary work. As described above, the PC 1 only needs to be able to specify the content from the sheet image obtained by scanning the test sheet, and the presence or absence of the entry of the answer is not particularly limited.

また、上記では未記入の用紙画像と記入済みの用紙画像との差分を取ることで手書きの回答（オブジェクト）を抽出したが、本実施の形態はこれに限定されるものではない。例えばＰＣ１は、人間の手書き文字の特徴を学習済みの学習済みモデル（識別器）を用いて、用紙画像に含まれる文字の手書きらしさを評価し、手書き文字（オブジェクト）を抽出するようにしてもよい。このように、ＰＣ１は用紙画像から手書きで記入されたオブジェクトを抽出可能であればよく、その手法は未記入の用紙画像と記入済みの用紙画像との比較に限定されない。 In the above description, a handwritten answer (object) is extracted by taking a difference between an unfilled paper image and a filled paper image. However, the present embodiment is not limited to this. For example, the PC 1 may use a trained model (classifier) that has learned the characteristics of human handwritten characters to evaluate the likelihood of handwriting of characters included in the paper image and extract handwritten characters (objects). Good. As described above, the PC 1 only needs to be able to extract the object written by hand from the sheet image, and the method is not limited to the comparison between the unfilled sheet image and the filled sheet image.

以上より、本実施の形態１によれば、手書きで記入される文字等のオブジェクトが被覆されないようマスキングを行い、文書管理を適切に行うことができる。 As described above, according to the first embodiment, masking can be performed so that objects such as characters written by handwriting are not covered, and document management can be appropriately performed.

また、本実施の形態１によれば、記入済みの用紙画像と未記入の用紙画像とを比較することで、手書き記入領域を適切に特定することができる。 Further, according to the first embodiment, a handwritten entry area can be appropriately specified by comparing a filled-in sheet image with an unfilled sheet image.

また、本実施の形態１によれば、用紙画像に対する文字認識によりコンテンツ領域を適切に特定することができる。 Further, according to the first embodiment, the content area can be appropriately specified by character recognition of the paper image.

また、本実施の形態１によれば、用紙画像内の画素値の分布（画素密度）を判別することで、コンテンツ領域を適切に特定することができる。 Further, according to the first embodiment, the content area can be appropriately specified by determining the distribution (pixel density) of the pixel values in the paper image.

また、本実施の形態１によれば、用紙画像内のコンテンツの特徴を学習済みの識別器を用いることで、コンテンツ領域をより精度良く特定することができる。 Also, according to the first embodiment, the content area can be specified with higher accuracy by using the classifier that has learned the characteristics of the content in the paper image.

（実施の形態２）
本実施の形態では、テスト用紙から認識されるテキスト内容に応じて、マスクすべきコンテンツ領域を特定する形態について述べる。なお、実施の形態１と重複する内容については同一の符号を付して説明を省略する。
図７は、実施の形態２に係るコンテンツ領域の特定処理に関する説明図である。図７に基づき、本実施の形態の概要について説明する。 (Embodiment 2)
In the present embodiment, an embodiment will be described in which a content area to be masked is specified according to the text content recognized from a test sheet. Note that the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.
FIG. 7 is an explanatory diagram relating to a process of specifying a content area according to the second embodiment. An outline of the present embodiment will be described with reference to FIG.

図７上側には、用紙画像（テスト用紙）の一例を示す。本実施の形態でＰＣ１は、用紙画像に対する文字認識を行ってテスト用紙に印刷されているテキストを抽出し、抽出したテキストの内容に応じて、マスクすべきコンテンツ領域を決定する。 The upper part of FIG. 7 shows an example of a sheet image (test sheet). In the present embodiment, the PC 1 performs character recognition on a paper image to extract a text printed on a test paper, and determines a content area to be masked according to the content of the extracted text.

例えばＰＣ１は、テスト用紙のテキストから設問文を特定し、マスク対象とするコンテンツ領域から、設問文が印刷されている領域を除外する。設問文は回答者に回答を指示する文章であり、設問文自体に著作物が含まれる可能性が低い。そこでＰＣ１は、設問文をマスク対象から除外することで、不要な箇所までマスクされる事態を防止する。 For example, the PC 1 specifies the question sentence from the text of the test sheet, and excludes the area where the question sentence is printed from the content area to be masked. The question sentence is a sentence instructing the respondent to answer, and it is unlikely that the question sentence itself contains a copyrighted work. Therefore, the PC 1 excludes the question sentence from the masking target, thereby preventing an unnecessary portion from being masked.

具体的には、ＰＣ１は、テスト用紙のテキストから設問を表す特定のキーワードを認識し、当該キーワードに続く文章を設問文と特定する。当該キーワードは、例えば設問番号を表す文字列であり、図７では「第４問」及び「問１」が設問番号に該当する。例えばＰＣ１は、設問番号を表す文字列のテンプレート（ルール）を格納したテーブルを参照して、テスト用紙のテキストから設問番号を表すキーワードを特定する。ＰＣ１は、特定したキーワード（設問番号）と、当該キーワードに続く文章を設問文として特定する。ＰＣ１は、設問文がマスクされないように、設問文が印刷されている領域をコンテンツ領域から除外する。 Specifically, the PC 1 recognizes a specific keyword representing a question from the text of the test sheet, and specifies a sentence following the keyword as a question sentence. The keyword is, for example, a character string representing a question number. In FIG. 7, “fourth question” and “question 1” correspond to the question number. For example, the PC 1 refers to a table storing a template (rule) of a character string representing a question number, and specifies a keyword representing the question number from text on a test sheet. The PC 1 specifies the specified keyword (question number) and the text following the keyword as a question sentence. The PC 1 excludes the area where the question text is printed from the content area so that the question text is not masked.

なお、上記ではルールベースで設問文を特定したが、ＰＣ１は意味解析等の手法を用い、テスト用紙のテキストから直接的に設問文を特定（認識）してもよい。例えばＰＣ１は、テスト用紙のテキストに対する構文解析を行い、命令文、依頼文など、設問文に用いられる所定表現の構文を特定する。例えば図７のテスト用紙では、ＰＣ１は、「問いに答えよ」、「正しいものを〜選べ」といった命令文を特定する。ＰＣ１は、特定した命令文を含む一連の文章を設問文として特定し、コンテンツ領域から除外する。 In the above description, the question sentence is specified based on the rule base. However, the PC 1 may directly specify (recognize) the question sentence from the text of the test sheet using a technique such as semantic analysis. For example, the PC 1 performs a syntax analysis on the text of the test paper, and specifies a syntax of a predetermined expression used in a question sentence such as a command sentence or a request sentence. For example, in the test sheet of FIG. 7, the PC 1 specifies a command sentence such as “answer the question” or “select the correct one”. The PC 1 specifies a series of sentences including the specified command sentence as a question sentence and excludes the sentence from the content area.

上述の如く、ＰＣ１は文字認識によって抽出したテキストから設問文を特定し、設問文が印刷された領域をコンテンツ領域から除外する。これにより、著作物に該当する可能性が低い部分をマスク対象から除外し、著作物に該当する可能性が高い部分にのみマスキングを行う。 As described above, the PC 1 specifies the question sentence from the text extracted by the character recognition, and excludes the area where the question sentence is printed from the content area. As a result, a portion having a low possibility of being a literary work is excluded from masking targets, and only a portion having a high possibility of being a literary work is masked.

上記ではマスク対象から除外する領域を特定することで、著作物に該当する可能性が高い領域を間接的に特定する手法を取った。一方で、ＰＣ１は、テスト用紙のテキストの内容に応じて、著作物に該当する可能性が高いコンテンツの領域を直接的に特定するようにしてもよい。 In the above, a technique is employed in which an area that is likely to be a copyrighted work is indirectly identified by identifying an area to be excluded from the masking target. On the other hand, the PC 1 may directly specify an area of the content that is highly likely to be a copyrighted work according to the content of the text on the test sheet.

例えばＰＣ１は、コンテンツ（著作物）が印刷されている位置を表す所定表現の文章をテキスト中から特定し、当該文章の内容に応じて、マスクすべきコンテンツ領域を特定する。当該文章は、コンテンツの記載箇所を示唆する文章であり、例えば読解問題であるか否かなど、設問形式を判定可能な文章である。図７の例を用いて説明すると、例えばＰＣ１は、「第４問」から続く設問文の冒頭に出現する文章「次の文章Ａ・Ｂを読み」を判別（特定）し、当該文章から、読解問題である旨を判定する。この場合、ＰＣ１は当該文章の次の段落文をコンテンツとして特定し、当該段落文を被覆するようにコンテンツ領域を設定（特定）する。このように、ＰＣ１は用紙画像から抽出したテキストの内容に応じて、コンテンツが記載されている位置を特定する。例えばＰＣ１は、コンテンツの位置を示唆する表現の文章をテーブル化して保持しておき、テーブルを参照しながら該当文章の有無を判別して、コンテンツ領域を特定するようにすればよい。 For example, the PC 1 specifies a sentence of a predetermined expression indicating the position where the content (work) is printed from the text, and specifies a content area to be masked according to the content of the sentence. The sentence is a sentence that suggests a location where the content is described, and is a sentence that can determine the question format, for example, whether or not it is a reading comprehension question. To explain using the example of FIG. 7, for example, the PC 1 determines (identifies) a sentence “read the next sentence A / B” that appears at the beginning of the question sentence following “fourth question”, and Judge that it is a reading problem. In this case, the PC 1 specifies a paragraph sentence next to the sentence as the content, and sets (specifies) a content area so as to cover the sentence. As described above, the PC 1 specifies the position where the content is described according to the content of the text extracted from the paper image. For example, the PC 1 may store a sentence of an expression suggesting the position of the content in a table, determine the presence or absence of the corresponding sentence while referring to the table, and specify the content area.

また、例えばＰＣ１は、用紙画像から抽出したテキストから、コンテンツ（著作物）に該当する文章のキーワードを特定することで、コンテンツ領域を直接的に特定してもよい。例えばＰＣ１は、著作物に該当する文章のキーワードをデータベース化して予め保持しておき、当該キーワードを含む文章をテキスト中から検索（特定）して、当該キーワードを含む文章のみをコンテンツとして特定する。 Further, for example, the PC 1 may directly specify the content area by specifying a keyword of a sentence corresponding to the content (work) from the text extracted from the paper image. For example, the PC 1 stores in advance a keyword of a sentence corresponding to a copyrighted work in a database, searches (specifies) a sentence including the keyword from the text, and specifies only the sentence including the keyword as a content.

上記のように、ＰＣ１は文字認識によって抽出したテキストから、マスクすべき領域とマスクすべきでない領域とを識別し、マスクすべきコンテンツ領域を特定する。図７下側に、特定したコンテンツ領域をハッチングにより示す。ＰＣ１は、実施の形態１と同様にコンテンツ領域へのマスク処理を行い、マスク画像を生成する。以上より、著作物を含む可能性が高い部分のみマスキングが施され、不要な箇所へのマスキングが防止される。 As described above, the PC 1 identifies the area to be masked and the area not to be masked from the text extracted by the character recognition, and specifies the content area to be masked. In FIG. 7, the specified content area is indicated by hatching. The PC 1 performs a mask process on the content area as in the first embodiment, and generates a mask image. As described above, masking is performed only on a portion that is likely to include a copyrighted work, and masking of an unnecessary portion is prevented.

図８は、実施の形態２に係るＰＣ１が実行する処理手順の一例を示すフローチャートである。
テスト用紙をスキャンした用紙画像を取得した後（ステップＳ１１）、ＰＣ１の制御部１１は以下の処理を実行する。制御部１１は用紙画像に対する文字認識を行い、テスト用紙に印刷されているテキストを抽出する（ステップＳ２０１）。制御部１１は、抽出したテキストに応じてコンテンツを特定する（ステップＳ２０２）。例えば制御部１１は、ステップＳ２０１で抽出したテキストから設問を表す特定のキーワードを特定し、当該キーワードを含む文章を設問文として特定して、コンテンツから除外する。また、例えば制御部１１は、ステップＳ２０１で抽出したテキストに対する意味解析を行い、命令文、依頼文といった構文を特定し、当該構文を含む文章を設問文と特定して、コンテンツから除外する。また、例えば制御部１１は、ステップＳ２０１で抽出したテキストから、コンテンツが印刷されている位置を表す所定表現の文章を特定し、当該文章の内容に応じて、マスク対象であるコンテンツを特定する。また、例えば制御部１１は、ステップＳ２０１で抽出したテキストから、コンテンツに該当する文章のキーワードを特定し、当該キーワードを含む文章をコンテンツとして特定する。制御部１１は、処理をステップＳ１３に移行する。 FIG. 8 is a flowchart illustrating an example of a processing procedure executed by the PC 1 according to the second embodiment.
After acquiring the paper image obtained by scanning the test paper (step S11), the control unit 11 of the PC 1 executes the following processing. The control unit 11 performs character recognition on the sheet image and extracts a text printed on the test sheet (step S201). The control unit 11 specifies the content according to the extracted text (Step S202). For example, the control unit 11 specifies a specific keyword representing a question from the text extracted in step S201, specifies a sentence including the keyword as a question sentence, and excludes the sentence from the content. For example, the control unit 11 performs a semantic analysis on the text extracted in step S201, specifies a syntax such as a command sentence or a request sentence, specifies a sentence including the syntax as a question sentence, and excludes the sentence from the content. Further, for example, the control unit 11 specifies a text of a predetermined expression indicating a position where the content is printed from the text extracted in step S201, and specifies a content to be masked according to the content of the text. Further, for example, the control unit 11 specifies a keyword of a text corresponding to the content from the text extracted in step S201, and specifies a text including the keyword as the content. The control unit 11 shifts the processing to step S13.

以上より、本実施の形態２によれば、テスト用紙に印刷されているテキストから設問文を特定し、コンテンツ領域から除外することで、コンテンツを含む可能性が低い部分をマスク対象から除外することができる。 As described above, according to the second embodiment, the question sentence is specified from the text printed on the test sheet and is excluded from the content area, so that the portion having a low possibility of including the content is excluded from the masking target. Can be.

また、本実施の形態２によれば、テスト用紙に印刷されているテキストからコンテンツ領域を推定（特定）することで、コンテンツを含む可能性が高い部分のみにマスキングを施すことができる。 Further, according to the second embodiment, by estimating (identifying) the content area from the text printed on the test paper, it is possible to perform masking only on a portion having a high possibility of including the content.

（実施の形態３）
図９は、上述した形態のＰＣ１の動作を示す機能ブロック図である。制御部１１がプログラムＰを実行することにより、ＰＣ１は以下のように動作する。
取得部９１は、予め印刷されたコンテンツと手書きで記入された手書きオブジェクトとを有する記入用紙をスキャンした用紙画像を取得する。コンテンツ特定部９２は、前記用紙画像から、前記コンテンツを特定する。抽出部９３は、前記用紙画像から、前記手書きオブジェクトを抽出する。生成部９４は、前記コンテンツを含む周辺領域であるコンテンツ領域から前記手書きオブジェクトを除いたマスキング領域を特定し、前記マスキング領域を視認できないように画像処理した前記用紙画像を生成する。 (Embodiment 3)
FIG. 9 is a functional block diagram showing the operation of the PC 1 of the above-described embodiment. When the control unit 11 executes the program P, the PC 1 operates as follows.
The acquisition unit 91 acquires a sheet image obtained by scanning an entry sheet having preprinted content and a handwritten object entered by handwriting. The content specifying unit 92 specifies the content from the paper image. The extracting unit 93 extracts the handwritten object from the paper image. The generation unit 94 specifies a masking area excluding the handwritten object from a content area which is a peripheral area including the content, and generates the paper image subjected to image processing so that the masking area cannot be visually recognized.

本実施の形態３は以上の如きであり、その他は実施の形態１及び２と同様であるので、対応する部分には同一の符号を付してその詳細な説明を省略する。 The third embodiment is as described above, and the other portions are the same as those in the first and second embodiments. Corresponding portions are denoted by the same reference numerals, and detailed description thereof will be omitted.

今回開示された実施の形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time is an example in all respects and should be considered as not being restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１ＰＣ（情報処理装置）
１１制御部
１２主記憶部
１３通信部
１４表示部
１５入力部
１６補助記憶部
Ｐプログラム
２サーバ
３スキャナ 1 PC (information processing device)
Reference Signs List 11 control unit 12 main storage unit 13 communication unit 14 display unit 15 input unit 16 auxiliary storage unit P program 2 server 3 scanner

Claims

An acquisition unit configured to acquire a paper image obtained by scanning an entry sheet having pre-printed content and a handwritten object filled in by hand,
A content identification unit that identifies the content from the paper image;
An extracting unit that extracts the handwritten object from the paper image;
A generation unit configured to specify a masking area excluding the handwritten object from a content area that is a peripheral area including the content, and generate the paper image that has been subjected to image processing so that the masking area cannot be visually recognized. Information processing device.

An entry specifying unit that identifies a handwritten entry area surrounding the extracted handwritten object,
The information processing apparatus according to claim 1, wherein the generation unit specifies a masking area obtained by removing the handwritten entry area from the content area.

The said extraction part extracts the said handwritten object based on the said paper image in which the said handwritten object was filled in, and the said paper image in which the said handwritten object was not filled in. The Claims 1 or 2 characterized by the above-mentioned. Information processing device.

The information processing apparatus according to claim 1, wherein the content specifying unit specifies the content in accordance with a text extracted by character recognition on the paper image.

The content specifying unit includes:
A specific keyword is specified from the extracted text,
The information processing device according to claim 4, wherein a sentence including the keyword is excluded from the content.

The content specifying unit includes:
A specific keyword is specified from the extracted text,
The information processing apparatus according to claim 4, wherein only the text including the keyword is used as the content.

The information processing device according to any one of claims 4 to 6, wherein the content specifying unit specifies a position where the content is printed according to the content of the extracted text.

The information processing device according to any one of claims 1 to 7, wherein the content specifying unit specifies the content from a pixel density in the paper image or predetermined image data.

The content specifying unit specifies the content from the paper image acquired by the acquisition unit using a plurality of paper images on which the handwritten object is not entered as learning data of the content using a learned classifier. The information processing apparatus according to claim 1, wherein the information processing apparatus includes:

Obtain a paper image by scanning an entry paper having pre-printed content and a handwritten object filled in by hand,
From the paper image, identify the content,
Extracting the handwritten object from the paper image,
Specifying a masking area excluding the handwritten object from a content area that is a peripheral area including the content, and generating a paper image that has been subjected to image processing so that the masking area cannot be visually recognized. Information processing method.

Obtain a paper image by scanning an entry paper having pre-printed content and a handwritten object filled in by hand,
From the paper image, identify the content,
Extracting the handwritten object from the paper image,
Specifying a masking area excluding the handwritten object from a content area that is a peripheral area including the content, and generating a paper image that has been subjected to image processing so that the masking area cannot be visually recognized. Program to do.