JP6489041B2

JP6489041B2 - Information processing apparatus and program

Info

Publication number: JP6489041B2
Application number: JP2016038682A
Authority: JP
Inventors: 友博三浦
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2016-03-01
Filing date: 2016-03-01
Publication date: 2019-03-27
Anticipated expiration: 2036-03-01
Also published as: JP2017157992A

Description

本発明は、複数の文字列に重畳された複数のマーキングを含む原稿画像から、マーキングされた文字列を解答とする穴埋め問題を作成することが可能な情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and program capable of creating a hole filling problem using a marked character string as an answer from a document image including a plurality of markings superimposed on a plurality of character strings.

問題の解答となる文字列を手書きのマーキングで指定した原稿の画像を読み取り、読み取った原稿の画像データの中からマーキングの画像を抽出し、マーキングの位置で文字列を抽出し、マーキングの位置で文字列を消去し、文字列の消去部分に解答欄を設定する、穴埋め問題を作成する技術が知られている（特許文献１、要約書）。文字列を抽出するための技術としては、光学文字認識（ＯＣＲ、Optical Character Recognition）が知られている（特許文献２）。 Read the image of the manuscript with the character string that is the answer to the problem specified by handwritten marking, extract the marking image from the image data of the read manuscript, extract the character string at the marking position, and at the marking position A technique for creating a hole filling problem in which a character string is erased and an answer column is set in the erased portion of the character string is known (Patent Document 1, Abstract). Optical character recognition (OCR, Optical Character Recognition) is known as a technique for extracting a character string (Patent Document 2).

特開２００７−４５２３号公報JP 2007-4523 A 特開平１１−３２８３０９号公報JP 11-328309 A

穴埋め問題を作成することが可能な情報処理装置においては、問題作成者及び回答者の双方について益々ユーザーフレンドリーであることが望まれる。 In an information processing apparatus capable of creating a hole filling problem, it is desired that both the problem creator and the respondent are more user-friendly.

以上のような事情に鑑み、本発明の目的は、複数の文字列に重畳された複数のマーキングを含む原稿画像から、マーキングされた文字列を解答とする穴埋め問題を作成することが可能な情報処理装置及びプログラムにおいて、ユーザーの利便性を向上することにある。 In view of the circumstances as described above, an object of the present invention is information that can create a hole filling problem with a marked character string as an answer from a document image including a plurality of markings superimposed on a plurality of character strings. In the processing apparatus and the program, the convenience of the user is improved.

上記目的を達成するため、本発明の一形態に係る情報処理装置は、
複数の文字列にそれぞれ重畳された複数のマーキングを含む原稿画像から、前記複数のマーキングを抽出するマーキング抽出部と、
それぞれ前記マーキングが重畳された前記複数の文字列のうち、同一の文字列を識別する同一文字列識別部と、
前記同一の文字列に同一の符号を割り当て、異なる文字列に異なる符号を割り当てる符号決定部と
を具備する。 In order to achieve the above object, an information processing apparatus according to an aspect of the present invention provides:
A marking extracting unit for extracting the plurality of markings from a document image including a plurality of markings respectively superimposed on a plurality of character strings;
Among the plurality of character strings on which the markings are superimposed, the same character string identifying unit for identifying the same character string,
A code determination unit that assigns the same code to the same character string and assigns different codes to different character strings.

本形態によれば、同一の文字列に同一の符号を割り当て、異なる文字列に異なる符号を割り当てる。これにより、同一の文字列を解答すべき複数の空欄に異なる符号を割り当てた故、回答者が、異なる語句を回答すべきであると誤解するおそれがなくなる。 According to this embodiment, the same code is assigned to the same character string, and different codes are assigned to different character strings. As a result, since different codes are assigned to a plurality of blanks in which the same character string is to be answered, there is no possibility that the respondent will misunderstand that different words should be answered.

情報処理装置は、
それぞれ前記マーキングが重畳された前記複数の文字列にそれぞれ重畳する空欄画像を作成する空欄作成部と、
前記原稿画像と、前記複数の空欄画像と、前記割り当てた符号の画像である符号画像とを合成した合成画像を作成する画像合成部と
をさらに具備する。 Information processing device
A blank creation unit for creating a blank image to be superimposed on each of the plurality of character strings on which the markings are superimposed,
The image processing apparatus further includes an image combining unit that generates a combined image by combining the document image, the plurality of blank images, and a code image that is an image of the assigned code.

本形態によれば、原稿画像と、空欄画像と、符号画像とを合成した合成画像を作成する。これにより、問題作成者が、同一の文字列を解答すべき複数の空欄に同一の符号を手作業で割り当てる手間がなくなり、また、誤った符号を割り当ててしまうおそれもなくなる。 According to this embodiment, a composite image is created by combining the document image, the blank image, and the code image. As a result, the problem creator does not have to manually assign the same code to a plurality of blanks to which the same character string should be answered, and there is no possibility of assigning an incorrect code.

情報処理装置は、
前記マーキング抽出部が抽出した前記複数のマーキングそれぞれの、前記原稿画像内での位置及び形状を判定するマーキング判定部と、
前記複数のマーキングの前記判定した位置及び形状をもとに、それぞれの前記マーキングが重畳された前記複数の文字列を抽出する文字列抽出部とをさらに具備し、
前記同一文字列識別部は、前記文字列抽出部が抽出した前記複数の文字列のうち、同一の文字列を識別する。 Information processing device
A marking determination unit that determines the position and shape of each of the plurality of markings extracted by the marking extraction unit in the document image;
Based on the determined positions and shapes of the plurality of markings, further comprising a character string extraction unit that extracts the plurality of character strings on which the respective markings are superimposed,
The same character string identifying unit identifies the same character string among the plurality of character strings extracted by the character string extracting unit.

前記同一文字列識別部は、
前記文字列抽出部が抽出した前記複数の文字列を識別する文字列識別部と、
前記文字列識別部が識別した前記複数の文字列同士を比較して同一の文字列を識別することにより、同一の文字列を識別する文字列比較部と
を有する。 The same character string identification unit is
A character string identifying unit for identifying the plurality of character strings extracted by the character string extracting unit;
A character string comparison unit that identifies the same character string by comparing the plurality of character strings identified by the character string identification unit and identifying the same character string.

前記文字列識別部は、
前記文字列抽出部が抽出した前記複数の文字列それぞれから、複数の文字を抽出し、
前記抽出した複数の文字それぞれを識別し、
前記文字列に含まれる前記複数の文字を組み合わせることで、前記複数の文字列を識別する。 The character string identification unit
Extracting a plurality of characters from each of the plurality of character strings extracted by the character string extraction unit,
Identifying each of the extracted plurality of characters;
The plurality of character strings are identified by combining the plurality of characters included in the character string.

前記文字列識別部は、光学文字認識（Optical Character Recognition）により、前記複数の文字列を識別する。 The character string identifying unit identifies the plurality of character strings by optical character recognition.

前記同一文字列識別部は、前記文字列抽出部が抽出した前記複数の文字列同士の類似度を判定し、類似度が閾値以上の場合、前記複数の文字列が同一の文字列であると判定する文字列類似度判定部を有する。 The same character string identification unit determines the similarity between the plurality of character strings extracted by the character string extraction unit, and when the similarity is equal to or greater than a threshold, the plurality of character strings are the same character string. It has a character string similarity determination unit for determination.

本形態によれば、文字列に実際に含まれる文字は問題ではなく、文字列が同一であることさえ分かればよいので光学文字認識と比べて処理量が少なく、また、光学文字認識に要するデータベースが不要である。 According to this embodiment, the character actually included in the character string is not a problem, and it is only necessary to know that the character string is the same. Therefore, the processing amount is small as compared with optical character recognition, and a database required for optical character recognition. Is unnecessary.

上記目的を達成するため、本発明の一形態に係るプログラムは、
情報処理装置を、
複数の文字列にそれぞれ重畳された複数のマーキングを含む原稿画像から、前記複数のマーキングを抽出するマーキング抽出部と、
それぞれ前記マーキングが重畳された前記複数の文字列のうち、同一の文字列を識別する同一文字列識別部と、
前記同一の文字列に同一の符号を割り当て、異なる文字列に異なる符号を割り当てる符号決定部
として機能させる。 In order to achieve the above object, a program according to an aspect of the present invention is provided.
Information processing device
A marking extracting unit for extracting the plurality of markings from a document image including a plurality of markings respectively superimposed on a plurality of character strings;
Among the plurality of character strings on which the markings are superimposed, the same character string identifying unit for identifying the same character string,
The same code is assigned to the same character string and functions as a code determination unit that assigns different codes to different character strings.

本発明によれば、複数の文字列に重畳された複数のマーキングを含む原稿画像から、マーキングされた文字列を解答とする穴埋め問題を作成することが可能な情報処理装置及びプログラムにおいて、ユーザーの利便性が向上する。 According to the present invention, in an information processing apparatus and program capable of creating a hole-filling problem using a marked character string as an answer from a document image including a plurality of markings superimposed on a plurality of character strings, Convenience is improved.

第１の実施形態に係る画像形成装置のハードウェア構成を示すブロック図である。1 is a block diagram illustrating a hardware configuration of an image forming apparatus according to a first embodiment. 画像形成装置の機能的構成を示すブロック図である。2 is a block diagram illustrating a functional configuration of the image forming apparatus. FIG. 画像形成装置の動作を示すフローチャートである。3 is a flowchart illustrating an operation of the image forming apparatus. 画像形成装置の動作を説明するための図である。It is a figure for demonstrating operation | movement of an image forming apparatus. 第２の実施形態に係る画像形成装置の機能的構成を示すブロック図である。FIG. 6 is a block diagram illustrating a functional configuration of an image forming apparatus according to a second embodiment. 画像形成装置の動作を示すフローチャートである。3 is a flowchart illustrating an operation of the image forming apparatus.

以下、図面を参照しながら、本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（１．第１の実施形態）
（１−１．画像形成装置のハードウェア構成）
図１は、本発明の第１の実施形態に係る画像形成装置のハードウェア構成を示すブロック図である。 (1. First embodiment)
(1-1. Hardware Configuration of Image Forming Apparatus)
FIG. 1 is a block diagram showing a hardware configuration of an image forming apparatus according to the first embodiment of the present invention.

本発明の各実施形態に係る情報処理装置は、画像形成装置（例えば、ＭＦＰ、Multifunction Peripheral）であり、以下ＭＦＰと称する。 An information processing apparatus according to each embodiment of the present invention is an image forming apparatus (for example, MFP, Multifunction Peripheral), and is hereinafter referred to as an MFP.

ＭＦＰ１は、制御部１１を備える。制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）及び専用のハードウェア回路等から構成され、ＭＦＰ１の全体的な動作制御を司る。ＭＦＰ１を各機能部（後述）として機能させるコンピュータプログラムは、ＲＯＭ等の非一過性の記憶媒体に記憶される。 The MFP 1 includes a control unit 11. The control unit 11 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), a dedicated hardware circuit, and the like, and controls overall operation of the MFP 1. A computer program that causes the MFP 1 to function as each functional unit (described later) is stored in a non-transitory storage medium such as a ROM.

制御部１１は、画像読取部１２、画像処理部１４、画像メモリー１５、画像形成部１６、操作部１７、記憶部１８、ネットワーク通信部１３等と接続されている。制御部１１は、接続されている上記各部の動作制御や、各部との間での信号又はデータの送受信を行う。 The control unit 11 is connected to an image reading unit 12, an image processing unit 14, an image memory 15, an image forming unit 16, an operation unit 17, a storage unit 18, a network communication unit 13, and the like. The control unit 11 performs operation control of each of the above connected units and transmission / reception of signals or data to / from each unit.

制御部１１は、ユーザーから、操作部１７またはネッワーク接続されたパーソナルコンピュータ（図示せず）等を通じて入力されるジョブの実行指示に従って、スキャナ機能、印刷機能及びコピー機能機能などの各機能についての動作制御を実行するために必要な機構の駆動及び処理を制御する。 The control unit 11 operates for each function such as a scanner function, a print function, and a copy function function in accordance with a job execution instruction input from the user through the operation unit 17 or a personal computer (not shown) connected via a network. Controls the drive and processing of the mechanisms necessary to perform the control.

画像読取部１２は、原稿から画像を読み取る。 The image reading unit 12 reads an image from a document.

画像処理部１４は、画像読取部１２で読み取られた画像の画像データを必要に応じて画像処理する。例えば、画像処理部１４は、画像読取部１２により読み取られた画像が画像形成された後の品質を向上させるために、シェーディング補正等の画像処理を行う。 The image processing unit 14 performs image processing on the image data of the image read by the image reading unit 12 as necessary. For example, the image processing unit 14 performs image processing such as shading correction in order to improve the quality after the image read by the image reading unit 12 is formed.

画像メモリー１５は、画像読取部１２による読み取りで得られた原稿画像のデータを一時的に記憶したり、画像形成部１６での印刷対象となるデータを一時的に記憶したりする領域を有する。 The image memory 15 has an area for temporarily storing document image data obtained by reading by the image reading unit 12 and temporarily storing data to be printed by the image forming unit 16.

画像形成部１６は、画像読取部１２で読み取られた画像データ等の画像形成を行う。 The image forming unit 16 forms an image of the image data read by the image reading unit 12.

操作部１７は、ＭＦＰ１が実行可能な各種動作及び処理についてユーザーからの指示を受け付けるタッチパネル部および操作キー部を備える。タッチパネル部は、タッチパネルが設けられたＬＣＤ（Liquid Crystal Display）等の表示部１７ａを備えている。 The operation unit 17 includes a touch panel unit and operation key units that receive instructions from the user regarding various operations and processes that can be executed by the MFP 1. The touch panel unit includes a display unit 17a such as an LCD (Liquid Crystal Display) provided with a touch panel.

ネットワーク通信部１３は、ネットワークに接続するためのインタフェースである。 The network communication unit 13 is an interface for connecting to a network.

記憶部１８は、画像読取部１２によって読み取られた原稿画像等を記憶する、ＨＤＤ（Hard Disk Drive）などの大容量の記憶装置である。 The storage unit 18 is a large-capacity storage device such as an HDD (Hard Disk Drive) that stores a document image read by the image reading unit 12.

（１−２．画像形成装置の機能的構成）
図２は、画像形成装置の機能的構成を示すブロック図である。 (1-2. Functional Configuration of Image Forming Apparatus)
FIG. 2 is a block diagram illustrating a functional configuration of the image forming apparatus.

ＭＦＰ１は、情報処理プログラムを実行することで、マーキング抽出部１０１、マーキング判定部１０２、文字列抽出部１０３、同一文字列識別部１１０、符号決定部１０４、空欄作成部１０５及び画像合成部１０６として機能する。 By executing the information processing program, the MFP 1 serves as a marking extraction unit 101, a marking determination unit 102, a character string extraction unit 103, an identical character string identification unit 110, a code determination unit 104, a blank creation unit 105, and an image composition unit 106. Function.

マーキング抽出部１０１は、画像読取部１２が読み取った原稿画像から、複数の文字列にそれぞれ重畳された複数のマーキングを抽出する。 The marking extraction unit 101 extracts a plurality of markings respectively superimposed on a plurality of character strings from the document image read by the image reading unit 12.

マーキング判定部１０２は、マーキング抽出部１０１が抽出した複数のマーキングそれぞれの、原稿画像内での位置及び形状を判定する。 The marking determination unit 102 determines the position and shape of each of the plurality of markings extracted by the marking extraction unit 101 in the document image.

文字列抽出部１０３は、マーキング判定部１０２が判定した複数のマーキングの位置及び形状をもとに、それぞれのマーキングが重畳された複数の文字列を抽出する。 The character string extraction unit 103 extracts a plurality of character strings on which the respective markings are superimposed based on the positions and shapes of the plurality of markings determined by the marking determination unit 102.

同一文字列識別部１１０は、文字列識別部１１１と、ＯＣＲ用データベース１１２と、文字列比較部１１３とを有し、それぞれマーキングが重畳された複数の文字列のうち、同一の文字列を識別する。 The same character string identification unit 110 includes a character string identification unit 111, an OCR database 112, and a character string comparison unit 113, and identifies the same character string among a plurality of character strings on which markings are superimposed. To do.

符号決定部１０４は、文字列比較部１１３が識別した同一の文字列に同一の符号を割り当て、異なる文字列に異なる符号を割り当てる。 The code determination unit 104 assigns the same code to the same character string identified by the character string comparison unit 113 and assigns different codes to different character strings.

空欄作成部１０５は、それぞれマーキングが重畳された複数の文字列にそれぞれ重畳する空欄画像を作成する。 The blank creation unit 105 creates blank images that are respectively superimposed on a plurality of character strings on which markings are superimposed.

画像合成部１０６は、画像読取部１２が読み取った原稿画像と、空欄作成部１０５が作成した空欄画像と、符号決定部１０４が割り当てた符号の画像である符号画像とを合成した合成画像を作成する。 The image composition unit 106 creates a composite image by combining the document image read by the image reading unit 12, the blank image created by the blank creation unit 105, and the code image that is the code image assigned by the code determination unit 104. To do.

（１−３．画像形成装置の動作）
図３は、画像形成装置の動作を示すフローチャートである。図４は、画像形成装置の動作を説明するための図である。 (1-3. Operation of Image Forming Apparatus)
FIG. 3 is a flowchart showing the operation of the image forming apparatus. FIG. 4 is a diagram for explaining the operation of the image forming apparatus.

前提として、原稿（典型的には、紙）には、テキストデータとしての文字列が印字されている。あるいは、原稿には、画像データとしての文字列の画像が形成されている（文字列が印字された原稿がコピーされている）。そして、文字列全体のうち複数の一部の文字列（熟語、文節、数値等）が、ユーザーである問題作成者により、蛍光マーカーペン等を用いて手書きでマーキングされている。マーキングされた文字列は、穴埋め問題において、解答となる文字列である。 As a premise, a character string as text data is printed on an original (typically, paper). Alternatively, a character string image as image data is formed on the document (a document on which the character string is printed is copied). A plurality of partial character strings (jukugo, phrases, numerical values, etc.) of the entire character string are marked by hand using a fluorescent marker pen or the like by a problem creator as a user. The marked character string is a character string that becomes an answer in the hole filling problem.

画像読取部１２は、原稿を光学的にスキャンし、原稿画像を読み取る（ステップＳ１０１、図４参照）。「原稿画像」は、全体的に見れば多数の文字列を含む文書原稿の画像データであり、複数の文字列に重畳された複数のマーキング（図４に示すハッチング部分）を含む。「文字列」は、語句（単語、文節、文章等）、数値等であり、厳密には、その画像である。 The image reading unit 12 optically scans the original and reads the original image (step S101, see FIG. 4). The “document image” is image data of a document document including a large number of character strings as a whole, and includes a plurality of markings (hatched portions shown in FIG. 4) superimposed on a plurality of character strings. The “character string” is a phrase (word, phrase, sentence, etc.), a numerical value, and the like, strictly speaking, an image thereof.

マーキング抽出部１０１は、画像読取部１２が読み取った原稿画像から、複数の文字列にそれぞれ重畳された複数のマーキングを抽出する（ステップＳ１０２）。具体的には、マーキング抽出部１０１は、背景（白色等）と明度及び／又は彩度等が異なり、特定の形状及びサイズ（特定の幅の帯状の長方形等）の領域を、マーキングとして抽出する。 The marking extraction unit 101 extracts a plurality of markings respectively superimposed on a plurality of character strings from the document image read by the image reading unit 12 (step S102). Specifically, the marking extraction unit 101 extracts a region having a specific shape and size (such as a strip-shaped rectangle having a specific width) that is different from the background (white or the like) in brightness and / or saturation and the like as a marking. .

マーキング判定部１０２は、マーキング抽出部１０１が抽出した複数のマーキングそれぞれの、原稿画像内での位置及び形状を判定する（ステップＳ１０３）。具体的には、マーキング判定部１０２は、原稿画像全体を座標系全体として、マーキングの位置及び形状を座標として算出する。 The marking determination unit 102 determines the position and shape in the document image of each of the plurality of markings extracted by the marking extraction unit 101 (step S103). Specifically, the marking determination unit 102 calculates the entire document image as the entire coordinate system and the marking position and shape as coordinates.

文字列抽出部１０３は、マーキング判定部１０２が判定した複数のマーキングの位置及び形状（座標）をもとに、それぞれのマーキングが重畳された複数の文字列を抽出する（ステップＳ１０４）。具体的には、文字列抽出部１０３は、マーキング判定部１０２が判定した位置及び形状（座標）により定義されるマーキングが重畳された文字列を、エッジ検出により抽出する。なお、文字列抽出部１０３は、文字列に含まれる１文字１文字を抽出するのではなく、１つのマーキングが重畳された文字列全体を、１つの文字列として抽出する。 The character string extraction unit 103 extracts a plurality of character strings on which the respective markings are superimposed based on the positions and shapes (coordinates) of the plurality of markings determined by the marking determination unit 102 (step S104). Specifically, the character string extraction unit 103 extracts a character string on which a marking defined by the position and shape (coordinates) determined by the marking determination unit 102 is superimposed by edge detection. Note that the character string extraction unit 103 does not extract one character and one character included in the character string, but extracts the entire character string on which one marking is superimposed as one character string.

文字列識別部１１１は、文字列抽出部１０３が抽出した複数の文字列それぞれを識別する。具体的には、文字列識別部１１１は、文字列抽出部１０３が抽出した複数の文字列それぞれから、複数の文字を抽出する。「文字」は、文字列に含まれる１文字１文字（言語によっては、１語１語としてもよい。）であり、厳密には、その画像である。文字列識別部１１１は、ＯＣＲ用データベース１１２を参照し、抽出した複数の文字それぞれを識別する（ステップＳ１０５）。具体的には、ＯＣＲ用データベース１１２には、文字の画像パターンと文字コードとが対応付けられて１文字ずつ登録されている。文字列識別部１１１は、抽出した文字を示す画像パターンをＯＣＲ用データベース１１２から検索し、検索により得られた画像パターンが対応付けられた文字コードを取得する。文字列識別部１１１は、文字列に含まれる全ての文字について、文字コードを取得する。文字列識別部１１１は、文字列に含まれる全ての文字それぞれの文字コードを組み合わせることで、文字列を識別する（ステップＳ１０６）。文字列識別部１１１は、マーキングが重畳されている全ての文字列について、文字を抽出し、文字それぞれが示す文字コードを取得し、文字コードを組み合わせ、この組み合わせた文字コードにより文字列を識別する。 The character string identifying unit 111 identifies each of a plurality of character strings extracted by the character string extracting unit 103. Specifically, the character string identification unit 111 extracts a plurality of characters from each of the plurality of character strings extracted by the character string extraction unit 103. “Character” is one character per character included in the character string (may be one word per word depending on the language), and is strictly an image thereof. The character string identifying unit 111 refers to the OCR database 112 and identifies each of the extracted plurality of characters (step S105). Specifically, in the OCR database 112, character image patterns and character codes are associated and registered one by one. The character string identification unit 111 searches the OCR database 112 for an image pattern indicating the extracted character, and acquires a character code associated with the image pattern obtained by the search. The character string identifying unit 111 acquires character codes for all characters included in the character string. The character string identifying unit 111 identifies the character string by combining the character codes of all the characters included in the character string (step S106). The character string identifying unit 111 extracts characters from all the character strings on which the marking is superimposed, acquires the character codes indicated by the characters, combines the character codes, and identifies the character strings based on the combined character codes. .

文字列比較部１１３は、文字列識別部１１１が識別した文字列同士を比較して同一の文字列を識別することにより、同一の文字列を識別する（ステップＳ１０７）。具体的には、文字列比較部１１３は、マーキングが重畳されている全ての文字列について、組み合わせた文字コード同士を比較することにより、同一の文字列を識別する。 The character string comparison unit 113 identifies the same character string by comparing the character strings identified by the character string identification unit 111 and identifying the same character string (step S107). Specifically, the character string comparison unit 113 identifies the same character string by comparing the combined character codes with respect to all the character strings on which the marking is superimposed.

符号決定部１０４は、文字列比較部１１３が識別した同一の文字列に同一の符号（数字、文字、記号等）を割り当て、異なる文字列に異なる符号を割り当てる（ステップＳ１０８）。 The code determination unit 104 assigns the same code (numbers, characters, symbols, etc.) to the same character string identified by the character string comparison unit 113, and assigns different codes to different character strings (step S108).

空欄作成部１０５は、それぞれマーキングが重畳された複数の文字列にそれぞれ重畳する空欄画像を作成する（ステップＳ１０９、図４参照）。具体的には、空欄作成部１０５は、文字列抽出部１０３がエッジ検出により抽出した文字列（ステップＳ１０４）と、この文字列からはみ出たマーキングの縁部分とを消し込むような形状及び位置（座標）の空欄画像を作成する。空欄画像は、単なるブランクでも良いし、予め決められたスタイル（下線、矩形枠、括弧等）を含んでも良い。図４に示す例では、空欄画像は下線を含んでいる。 The blank creation unit 105 creates blank images that are respectively superimposed on a plurality of character strings on which markings are superimposed (see step S109, FIG. 4). Specifically, the blank creating unit 105 erases the character string extracted by the character string extraction unit 103 by edge detection (step S104) and the edge portion of the marking protruding from the character string ( Create a blank image of coordinates. The blank image may be a simple blank or may include a predetermined style (underline, rectangular frame, parentheses, etc.). In the example shown in FIG. 4, the blank image includes an underline.

画像合成部１０６は、画像読取部１２が読み取った原稿画像（ステップＳ１０１）と、空欄作成部１０５が作成した空欄画像（ステップＳ１０９）と、符号決定部１０４が割り当てた符号（ステップＳ１０８）の画像である符号画像とを合成した合成画像を作成する（ステップＳ１１０、図４参照）。具体的には、画像合成部１０６は、原稿画像の座標系に、形状及び位置が座標により定義される空欄画像を配置する。そして、画像合成部１０６は、原稿画像に配置された空欄画像の予め定められた位置（中央、左端等）に、予め定められたスタイル（フォント、サイズ等）の符号を配置することで、合成画像を作成する。図４に示す例では、原稿画像と、下線を含む空欄画像と、符号の画像とが合成されている。 The image composition unit 106 reads the original image read by the image reading unit 12 (step S101), the blank image created by the blank creation unit 105 (step S109), and the image assigned by the code determination unit 104 (step S108). A synthesized image is created by synthesizing the encoded image (see step S110, FIG. 4). Specifically, the image composition unit 106 arranges a blank image whose shape and position are defined by coordinates in the coordinate system of the document image. Then, the image composition unit 106 arranges a code of a predetermined style (font, size, etc.) at a predetermined position (center, left end, etc.) of the blank image arranged in the document image, thereby synthesizing. Create an image. In the example shown in FIG. 4, the document image, the blank image including the underline, and the code image are combined.

画像形成部１６は、画像合成部１０６が作成した合成画像を、用紙に形成（プリントアウト）する（ステップＳ１１１）。 The image forming unit 16 forms (prints out) the composite image created by the image composition unit 106 on paper (step S111).

（２．第２の実施形態）
以下の説明において、第１の実施形態と同様の構成及び動作等については説明を省略し、異なる点を中心に説明する。 (2. Second Embodiment)
In the following description, the same configurations and operations as those of the first embodiment will not be described, and different points will be mainly described.

第１の実施形態で、同一文字列識別部１１０は、光学文字認識（ＯＣＲ）（ステップＳ１０５）により文字列を識別し（ステップＳ１０６）、同一の文字列を識別した（ステップＳ１０７）。これに対して、第２の実施形態では、同一文字列識別部は、光学文字認識（ＯＣＲ）とは別の方法で、同一の文字列を識別する。 In the first embodiment, the same character string identification unit 110 identifies a character string by optical character recognition (OCR) (step S105) (step S106), and identifies the same character string (step S107). On the other hand, in the second embodiment, the same character string identification unit identifies the same character string by a method different from optical character recognition (OCR).

（２−１．画像形成装置の機能的構成）
図５は、第２の実施形態に係る画像形成装置の機能的構成を示すブロック図である。 (2-1. Functional Configuration of Image Forming Apparatus)
FIG. 5 is a block diagram illustrating a functional configuration of the image forming apparatus according to the second embodiment.

ＭＦＰ２は、情報処理プログラムを実行することで、マーキング抽出部１０１、マーキング判定部１０２、文字列抽出部１０３、文字列類似度判定部２０１、符号決定部１０４、空欄作成部１０５及び画像合成部１０６として機能する。ＭＦＰ２は、第１の実施形態の同一文字列識別部１１０として、文字列識別部１１１、ＯＣＲ用データベース１１２及び文字列比較部１１３の代わりに、文字列類似度判定部２０１を有する。その他は第１の実施形態のＭＦＰ１と同様である。 By executing the information processing program, the MFP 2 performs a marking extraction unit 101, a marking determination unit 102, a character string extraction unit 103, a character string similarity determination unit 201, a code determination unit 104, a blank creation unit 105, and an image composition unit 106. Function as. The MFP 2 has a character string similarity determination unit 201 instead of the character string identification unit 111, the OCR database 112, and the character string comparison unit 113 as the same character string identification unit 110 of the first embodiment. Others are the same as those of the MFP 1 of the first embodiment.

文字列類似度判定部２０１は、文字列抽出部１０３が抽出した複数の文字列同士の類似度を判定し、類似度が閾値以上の場合、複数の文字列が同一の文字列であると判定する。 The character string similarity determination unit 201 determines the similarity between a plurality of character strings extracted by the character string extraction unit 103, and determines that the plurality of character strings are the same character string when the similarity is equal to or greater than a threshold value. To do.

（２−２．画像形成装置の動作）
図６は、画像形成装置の動作を示すフローチャートである。 (2-2. Operation of Image Forming Apparatus)
FIG. 6 is a flowchart showing the operation of the image forming apparatus.

ステップＳ１０１〜ステップＳ１０４は、第１の実施形態と同様である。 Steps S101 to S104 are the same as those in the first embodiment.

ステップＳ１０４の後、文字列類似度判定部２０１は、文字列抽出部１０３が抽出（ステップＳ１０４）した複数の文字列同士の類似度を判定し、類似度が閾値以上の場合、複数の文字列が同一の文字列であると判定する（ステップＳ２０１）。具体的には、文字列類似度判定部２０１は、文字列の画像パターン（画素）同士を比較し、画像パターンの重複度が閾値以上の場合、同一の文字列であると判定する。この閾値は、文字構成は同一だがフォントが異なる場合でも同一の文字列であると判定できるような値とすればよい。例えば、閾値は、９０％などであり、ユーザーが識別レベルとして任意に設定できてもよい。 After step S104, the character string similarity determination unit 201 determines the similarity between a plurality of character strings extracted by the character string extraction unit 103 (step S104). If the similarity is equal to or greater than a threshold, a plurality of character strings Are the same character strings (step S201). Specifically, the character string similarity determination unit 201 compares the image patterns (pixels) of the character strings, and determines that the character strings are the same character string when the overlapping degree of the image patterns is equal to or greater than a threshold value. This threshold value may be a value that can be determined to be the same character string even if the character configuration is the same but the font is different. For example, the threshold is 90% or the like, and the user may arbitrarily set the identification level.

ステップＳ２０１の後、ステップＳ１０８〜ステップＳ１１１は、第１の実施形態と同様である。 After step S201, steps S108 to S111 are the same as in the first embodiment.

（３．変形例）
各実施形態では、画像読取部１２が原稿画像を読み取った（ステップＳ１０１）。これに替えて、ＭＦＰ１、２は、ネットワーク通信部１３を通じて、ネットワークに接続された情報処理装置（図示せず）から原稿画像を受信しても良い。 (3. Modified examples)
In each embodiment, the image reading unit 12 reads a document image (step S101). Alternatively, the MFPs 1 and 2 may receive document images from an information processing apparatus (not shown) connected to the network via the network communication unit 13.

（４．まとめ）
元原稿を利用して穴埋め問題を作成する場合、１つの文章中に、解答となる文字列が何度も出現することがある。この場合、解答となる文字列が文章中に表示されたままだとヒントになってしまうため、解答となる同一の文字列は、全て空欄にする必要がある。即ち、同一の文字列を解答すべき空欄が複数存在することになる。しかし、同一の文字列を解答すべき複数の空欄に異なる符号を割り当てると、回答者が、異なる語句を回答すべきであると誤解するおそれがある。この事態を防ぐため、問題作成者が、同一の文字列を解答すべき複数の空欄に同一の符号を手作業で割り当てることは、手間となる。特に、空欄の総数が多い場合や、同一の文字列の組が複数組ある場合等には、問題作成者が手作業で符号を割り当てるとなると、誤った符号を割り当ててしまうおそれもある。 (4. Summary)
When creating a hole-filling problem using an original manuscript, a character string as an answer may appear many times in one sentence. In this case, if the character string that becomes the answer is displayed as a hint in the sentence, it is necessary to leave all the same character strings that become the answer blank. That is, there are a plurality of blanks in which the same character string should be answered. However, if different codes are assigned to a plurality of blanks in which the same character string is to be answered, the respondent may misunderstand that he / she should answer different words. In order to prevent this situation, it is troublesome for the problem creator to manually assign the same code to a plurality of blanks to which the same character string should be answered. In particular, when the total number of blanks is large or when there are a plurality of sets of the same character string, if the problem creator manually assigns a code, there is a possibility that an incorrect code is assigned.

これに対して、各実施形態によれば、ＭＦＰ１は、同一の文字列に同一の符号を割り当て、異なる文字列に異なる符号を割り当てる（ステップＳ１０８）。これにより、同一の文字列を解答すべき複数の空欄に異なる符号を割り当てた故、回答者が、異なる語句を回答すべきであると誤解するおそれがなくなる。また、問題作成者が、同一の文字列を解答すべき複数の空欄に同一の符号を手作業で割り当てる手間がなくなり、また、誤った符号を割り当ててしまうおそれもなくなる。 In contrast, according to each embodiment, the MFP 1 assigns the same code to the same character string, and assigns different codes to different character strings (step S108). As a result, since different codes are assigned to a plurality of blanks in which the same character string is to be answered, there is no possibility that the respondent will misunderstand that different words should be answered. In addition, there is no need for the problem creator to manually assign the same code to a plurality of blanks in which the same character string should be answered, and there is no possibility of assigning an incorrect code.

第２の実施形態では、ＭＦＰ２は、複数の文字列同士の類似度を判定し、類似度が閾値以上の場合、複数の文字列が同一の文字列であると判定した（ステップＳ２０１）。すなわち、文字列に実際に含まれる文字は問題ではなく、文字列が同一であることさえ分かればよい。第２の実施形態では、第１の実施形態のＯＣＲと比べて処理量が少なく、また、データベースが不要である、という利点がある。 In the second embodiment, the MFP 2 determines the similarity between a plurality of character strings, and determines that the plurality of character strings are the same character string when the similarity is equal to or greater than a threshold value (step S201). That is, the characters actually included in the character string are not a problem, and it is only necessary to know that the character strings are the same. The second embodiment has an advantage that the processing amount is smaller than that of the OCR of the first embodiment and a database is unnecessary.

なお、各実施形態では、複数の文字を有する文字列がマーキングされた場合の処理を説明したが、１文字からなる文字列がマーキングされた場合も図３及び図６に準じて処理を行うことも可能である。この場合、図３に示すステップＳ１０６において、文字列識別部１１１は、文字列に含まれる全ての文字それぞれの文字コードを組み合わせる必要はなく、マーキングが重畳された１文字の文字コードにより文字を識別すればよい。加えて、図３に示すステップＳ１０７において、文字列識別部１１１は、マーキングが重畳されている全ての１文字について、文字コード同士を比較することにより、同一の１文字を識別すればよい。 In addition, in each embodiment, although the process when the character string which has a some character was marked was demonstrated, also when the character string which consists of one character is marked, it processes according to FIG.3 and FIG.6. Is also possible. In this case, in step S106 shown in FIG. 3, the character string identifying unit 111 does not need to combine the character codes of all the characters included in the character string, and identifies the character by one character code on which the marking is superimposed. do it. In addition, in step S107 illustrated in FIG. 3, the character string identifying unit 111 may identify the same single character by comparing the character codes with respect to all the single characters on which the marking is superimposed.

１、２…ＭＦＰ
１２…画像読取部
１６…画像形成部
１０１…マーキング抽出部
１０２…マーキング判定部
１０３…文字列抽出部
１０４…符号決定部
１０５…空欄作成部
１０６…画像合成部
１１０…同一文字列識別部
１１１…文字列識別部
１１２…ＯＣＲ用データベース
１１３…文字列比較部
２０１…文字列類似度判定部 1, 2, ... MFP
DESCRIPTION OF SYMBOLS 12 ... Image reading part 16 ... Image formation part 101 ... Marking extraction part 102 ... Marking determination part 103 ... Character string extraction part 104 ... Code | symbol determination part 105 ... Blank creation part 106 ... Image composition part 110 ... Same character string identification part 111 ... Character string identification unit 112 ... OCR database 113 ... Character string comparison unit 201 ... Character string similarity determination unit

Claims

A marking extracting unit for extracting the plurality of markings from a document image including a plurality of markings respectively superimposed on a plurality of character strings;
Among the plurality of character strings on which the markings are superimposed, the same character string identifying unit for identifying the same character string,
An information processing apparatus comprising: a code determining unit that assigns the same code to the same character string and assigns different codes to different character strings.

The information processing apparatus according to claim 1,
A blank creation unit for creating a blank image to be superimposed on each of the plurality of character strings on which the markings are superimposed,
An information processing apparatus further comprising: an image composition unit that creates a composite image by combining the document image, the plurality of blank images, and a code image that is an image of the assigned code.

The information processing apparatus according to claim 1 or 2,
A marking determination unit that determines the position and shape of each of the plurality of markings extracted by the marking extraction unit in the document image;
Based on the determined positions and shapes of the plurality of markings, further comprising a character string extraction unit that extracts the plurality of character strings on which the respective markings are superimposed,
The information processing apparatus, wherein the same character string identification unit identifies the same character string among the plurality of character strings extracted by the character string extraction unit.

The information processing apparatus according to claim 3,
The same character string identification unit is
A character string identifying unit for identifying the plurality of character strings extracted by the character string extracting unit;
An information processing apparatus comprising: a character string comparison unit that identifies the same character string by comparing the plurality of character strings identified by the character string identification unit to identify the same character string.

The information processing apparatus according to claim 4,
The character string identification unit
Extracting a plurality of characters from each of the plurality of character strings extracted by the character string extraction unit,
Identifying each of the extracted plurality of characters;
An information processing apparatus that identifies the plurality of character strings by combining the plurality of characters included in the character string.

An information processing apparatus according to claim 4 or 5,
The information processing apparatus, wherein the character string identification unit identifies the plurality of character strings by optical character recognition.

The information processing apparatus according to claim 3,
The same character string identification unit determines the similarity between the plurality of character strings extracted by the character string extraction unit, and when the similarity is equal to or greater than a threshold, the plurality of character strings are the same character string. An information processing apparatus having a character string similarity determination unit for determining.

Information processing device
A marking extracting unit for extracting the plurality of markings from a document image including a plurality of markings respectively superimposed on a plurality of character strings;
Among the plurality of character strings on which the markings are superimposed, the same character string identifying unit for identifying the same character string,
A program that functions as a code determination unit that assigns the same code to the same character string and assigns different codes to different character strings.