JP2023076051A

JP2023076051A - Information processing apparatus, program, and method for controlling information processing apparatus

Info

Publication number: JP2023076051A
Application number: JP2021189214A
Authority: JP
Inventors: 啓太小笠原; Keita Ogasawara
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2023-06-01

Abstract

To use layout analysis or natural language analysis according to characteristics of pages, and automatically determine a page division position of scan data.SOLUTION: An information processing apparatus has determination means that determines if a first page satisfies a predetermined condition, when the determination means determines that the page satisfies the predetermined condition, decides whether to divide document data at a position between the first page and a second page by using information on a layout, and when the determination means does not determine that the page satisfies the predetermined condition, decides whether to divide the document data at the position between the first page and the second page by using a text.SELECTED DRAWING: Figure 5

Description

本発明は、複数ページのスキャン文書の分割位置を自動判定する情報処理装置、プログラムおよびその制御方法に関する。 The present invention relates to an information processing apparatus, program, and control method for automatically determining division positions of a scanned document of multiple pages.

近年、紙文書をスキャナで読み取って得られたスキャンデータを所定フォーマットのファイルに変換し、ネットワーク上のストレージサーバに送信して保存する手法が広く利用されている。そのような手法が利用されるユースケースとして、複数ページの紙文書をまとめてスキャンして電子化し、複数ページのスキャンデータを分割してストレージサーバに保存することがある。 In recent years, a method of converting scan data obtained by reading a paper document with a scanner into a file of a predetermined format and transmitting the file to a storage server on a network for storage has been widely used. As a use case in which such a method is used, a multi-page paper document is collectively scanned and digitized, and the multi-page scanned data is divided and stored in a storage server.

複数ページのスキャンデータを複数文書に自動分割するための手法として、特許文献１がある。特許文献１には、スキャンデータ群にレイアウト解析のみ、または自然言語解析のみ、または、その両方を施して、各スキャンデータの特徴量を抽出する。抽出された特徴量を用いて、スキャンデータの分割位置（例えば、１０ページ目のスキャンデータと１１ページ目のスキャンデータとの間で分割する等）を自動で判定する方法が開示されている。 Japanese Patent Application Laid-Open No. 2002-200002 discloses a technique for automatically dividing multiple pages of scan data into multiple documents. In Patent Document 1, only layout analysis, only natural language analysis, or both are applied to a scan data group to extract a feature amount of each scan data. A method of automatically determining the division position of the scan data (for example, division between the scan data of the 10th page and the scan data of the 11th page) is disclosed using the extracted feature amount.

特開２００６－５９０６６号公報JP-A-2006-59066

しかしながら、特徴の異なるスキャンデータが混在した文書の場合、特許文献１のように、どちらか一方の解析方法だけで判定すると分割位置の判定を誤る可能性がある。 However, in the case of a document in which scan data with different features are mixed, there is a possibility that determination of division positions will be erroneous if determination is made using only one of the analysis methods as in Japanese Patent Application Laid-Open No. 2002-200018.

自然言語解析による文書の自動分割はレイアウト解析よりも精度が高い一方、自然言語解析の対象となる一ページあたりのテキスト量が少ない場合、自然言語解析であったとしても精度が下がってしまう。つまり、自然言語解析による自動分割が適したスキャンページと、レイアウト解析による自動分割が適したスキャンデータとが混在した文書の場合、一方の解析方法のみを用いるのは、分割位置の誤判定につながり得る。 While automatic document segmentation by natural language analysis has higher accuracy than layout analysis, if the amount of text per page to be analyzed by natural language analysis is small, the accuracy of even natural language analysis is lowered. In other words, in the case of a document in which scanned pages suitable for automatic division by natural language analysis and scanned data suitable for automatic division by layout analysis are mixed, using only one analysis method leads to misjudgment of the division position. obtain.

その一方で、レイアウト解析および自然言語解析両方の解析方法を施すと処理パフォーマンスが低下するという問題が生じる。 On the other hand, if both layout analysis and natural language analysis are applied, the processing performance is degraded.

本発明は、上記事情に鑑みてなされたものであり、各ページの特徴に応じて、レイアウト解析あるいは自然言語解析を使い分け、スキャンデータのページ分割位置を自動で判定することを目的とするものである。 SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and it is an object of the present invention to automatically determine the page division position of scan data by selectively using layout analysis or natural language analysis according to the characteristics of each page. be.

複数のページで構成された文書に含まれる第一のページおよび第二のページであって、前記第一のページに含まれるテキストを抽出する抽出手段と、
前記第一のページのレイアウトに関する情報を取得する取得手段と、
を有する情報処理装置であって、
前記第一のページが所定の条件を満たすかを判定する判定手段を有し、
前記判定手段によって前記所定の条件を満たすと判定された場合、前記取得手段によって取得されたレイアウトに関する情報を用いて、前記文書を前記第一のページと前記第二のページとの間で分割するかを決定し、
前記判定手段によって前記所定の条件を満たすと判定されなかった場合、前記抽出手段によって抽出されたテキストを用いて、前記文書を前記第一のページと前記第二のページとの間で分割するかを決定することを特徴とする。 a first page and a second page included in a document composed of a plurality of pages, extracting means for extracting text included in the first page;
acquisition means for acquiring information about the layout of the first page;
An information processing device having
having determination means for determining whether the first page satisfies a predetermined condition;
If the determining means determines that the predetermined condition is satisfied, the document is divided between the first page and the second page using the layout information acquired by the acquiring means. decide whether
If the determining means does not determine that the predetermined condition is satisfied, whether to split the document between the first page and the second page using the text extracted by the extracting means? is characterized by determining

本発明によれば、各ページの特徴に応じて、レイアウト解析あるいは自然言語解析を使い分け、スキャンデータのページ分割位置を自動で判定することができる。 According to the present invention, layout analysis or natural language analysis can be selectively used according to the characteristics of each page, and page division positions of scan data can be automatically determined.

画像処理システムの構成例を示す図である。It is a figure which shows the structural example of an image processing system. 本システムを構成する処理端末のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the processing terminal which comprises this system. 本システムを構成する処理端末のソフトウェア構成例を示す図である。It is a figure which shows the software structural example of the processing terminal which comprises this system. 第一の実施形態における処理全体を説明するフローチャートである。4 is a flowchart for explaining the overall processing in the first embodiment; 第一の実施形態における画像処理部が行う文書分割位置判定処理の詳細を示すフローチャートである。9 is a flowchart showing details of document division position determination processing performed by an image processing unit according to the first embodiment; 画像処理装置または情報処理端末が表示する画面の一例を示す図である。FIG. 3 is a diagram showing an example of a screen displayed by an image processing device or an information processing terminal; 第二の実施形態における処理全体を説明するフローチャートである。9 is a flowchart for explaining the overall processing in the second embodiment; 第二の実施形態における画像処理部が行う文書分割位置判定処理の詳細を示すフローチャートである。10 is a flowchart showing details of document division position determination processing performed by an image processing unit according to the second embodiment;

以下、本発明を実施するための形態について図面を用いて説明する。なお、以下の実施の形態は特許請求の範囲に係る発明を限定するものでなく、また実施の形態で説明されている特徴の組み合わせの全てが発明の解決手段に必須のものとは限らない。 EMBODIMENT OF THE INVENTION Hereinafter, the form for implementing this invention is demonstrated using drawing. It should be noted that the following embodiments do not limit the invention according to the claims, and not all combinations of features described in the embodiments are essential to the solution of the invention.

［第一の実施形態］
＜全体構成＞
図１は、本実施形態における画像処理システムの構成の一例を表す図である。画像処理システムは、画像形成装置１１０と、画像処理サーバ１２０、情報処理端末１３０、ストレージサーバ１４０を有する。これらの各装置および各サーバは、ネットワーク１５０により相互に接続されて、通信可能である。 [First embodiment]
<Overall composition>
FIG. 1 is a diagram showing an example of the configuration of an image processing system according to this embodiment. The image processing system has an image forming apparatus 110 , an image processing server 120 , an information processing terminal 130 and a storage server 140 . These devices and servers are interconnected by a network 150 and are communicable.

本実施形態の画像形成装置１１０は、画像処理サーバ１２０を介してスキャンデータをストレージサーバ１４０への画像送信の依頼をしたりすることが可能である。 The image forming apparatus 110 of this embodiment can request image transmission of scan data to the storage server 140 via the image processing server 120 .

また、本実施形態では、画像形成装置１１０をスキャン機能、印刷機能、複写機能などを備える複合機を例にして説明するが、複合機に限るものではない。例えば、スキャン機能を備える装置であれば、後述する本発明の処理を実行可能である。ここで、スキャン機能とは、画像形成装置１１０に備わるスキャナで文書を読み取ることにより生成した画像データ（スキャンデータ）を外部に送信したりする機能である。印刷機能とは、情報処理端末１３０などから受信した印刷データを印刷する機能である。複写機能とは、スキャナで読み取った文書の画像データを印刷することにより、文書の複写物を得る機能である。 In addition, in the present embodiment, the image forming apparatus 110 is described as an example of a multifunction device having a scanning function, a printing function, a copying function, etc., but the image forming device 110 is not limited to the multifunction device. For example, any device having a scanning function can execute the processing of the present invention, which will be described later. Here, the scan function is a function of transmitting image data (scan data) generated by reading a document with a scanner provided in the image forming apparatus 110 to the outside. The print function is a function of printing print data received from the information processing terminal 130 or the like. A copy function is a function of obtaining a copy of a document by printing image data of the document read by a scanner.

なお、本実施形態の画像処理システムは、画像形成装置１１０と、画像処理サーバ１２０、情報処理端末１３０、ストレージサーバ１４０からなる構成としているがこれに限定されない。例えば、画像形成装置１１０が情報処理端末１３０や画像処理サーバ１２０の役割を兼ね備えてもよい。また、画像処理サーバ１２０はインターネット上ではなくＬＡＮ上のサーバに配置した接続形態であってもよい。また、ストレージサーバ１４０はメールサーバなどに置き換えて、スキャンした画像をメールに添付し送信してもよい。 Note that the image processing system of this embodiment includes the image forming apparatus 110, the image processing server 120, the information processing terminal 130, and the storage server 140, but is not limited to this. For example, the image forming apparatus 110 may also serve as the information processing terminal 130 and the image processing server 120 . Also, the image processing server 120 may be connected to a server on a LAN rather than on the Internet. Alternatively, the storage server 140 may be replaced by a mail server or the like, and the scanned image may be attached to an e-mail and transmitted.

＜ＭＦＰのハードウェア構成＞
図２は、本実施形態における画像処理システムのハードウェア構成の一例を表す図である。画像形成装置１１０は、プリンタ２０１、スキャナ２０２、操作部２０３、ＣＰＵ２１１、ＲＡＭ２１２、ＨＤＤ２１３、ネットワークＩ／Ｆ２１４、プリンタＩ／Ｆ２１５、スキャナＩ／Ｆ２１６、操作部Ｉ／Ｆ２１７、および拡張Ｉ／Ｆ２１８を有する。ＣＰＵ２１１は、ＲＡＭ２１２、ＨＤＤ２１３、ネットワークＩ／Ｆ２１４、プリンタＩ／Ｆ２１５、スキャナＩ／Ｆ２１６、操作部Ｉ／Ｆ２１７、および拡張Ｉ／Ｆ２１８とデータを授受することが可能である。また、ＣＰＵ２１１は、ＨＤＤ２１３から読み出した命令（コンピュータプログラム）をＲＡＭ２１２に展開し、ＲＡＭ２１２に展開した命令を実行することにより、後述する各処理の実行を制御する。なお、本実施形態では１つのＣＰＵ２１１が１つのメモリ（ＲＡＭ２１２またはＨＤＤ２１３）を用いて後述のフローチャートに示す各処理を実行するものとするが、これに限定されない。例えば、複数のＣＰＵや複数のＲＡＭまたはＨＤＤを協働させて各処理を実行してもよい。 <Hardware Configuration of MFP>
FIG. 2 is a diagram showing an example of the hardware configuration of the image processing system according to this embodiment. Image forming apparatus 110 has printer 201, scanner 202, operation unit 203, CPU 211, RAM 212, HDD 213, network I/F 214, printer I/F 215, scanner I/F 216, operation unit I/F 217, and extension I/F 218. . CPU 211 can exchange data with RAM 212 , HDD 213 , network I/F 214 , printer I/F 215 , scanner I/F 216 , operation unit I/F 217 and extension I/F 218 . In addition, the CPU 211 expands instructions (computer programs) read from the HDD 213 into the RAM 212 and executes the instructions expanded into the RAM 212, thereby controlling the execution of each process described later. Note that in the present embodiment, one CPU 211 uses one memory (RAM 212 or HDD 213) to execute each process shown in flowcharts described later, but the present invention is not limited to this. For example, multiple CPUs, multiple RAMs or HDDs may cooperate to execute each process.

ＨＤＤ２１３は、ＣＰＵ２１１で実行可能な命令、画像形成装置１１０で使用する設定値、およびユーザから依頼された処理に関するデータ等を記憶しておくことが可能である。ＲＡＭ２１２は、ＣＰＵ２１１がＨＤＤ２１３から読み出した命令を一時的に格納するための領域である。またＲＡＭ２１２は、命令の実行に必要な各種のデータを記憶しておくことも可能である。たとえば画像処理では、入稿されたデータをＲＡＭ２１２に展開することで処理を行うことが可能である。 The HDD 213 can store commands executable by the CPU 211, setting values used in the image forming apparatus 110, data related to processing requested by the user, and the like. The RAM 212 is an area for temporarily storing instructions read from the HDD 213 by the CPU 211 . The RAM 212 can also store various data necessary for executing instructions. For example, in image processing, the received data can be processed by developing it in the RAM 212 .

ネットワークＩ／Ｆ２１４は、画像形成システム内の装置とネットワーク通信を行うためのインターフェースである。ネットワークＩ／Ｆ２１４は、データ受信を行ったことをＣＰＵ２１１に伝達することや、ＣＰＵ２１１からの指示にしたがって、ＲＡＭ２１２上のデータをネットワーク１５０に送信することが可能である。プリンタＩ／Ｆ２１５は、ＣＰＵ２１１からの指示にしたがって印刷対象の印刷データをプリンタ２０１に送信し、プリンタ２０１から受信したプリンタの状態をＣＰＵ２１１に伝達することが可能である。 A network I/F 214 is an interface for network communication with devices in the image forming system. The network I/F 214 can transmit data reception to the CPU 211 and transmit data on the RAM 212 to the network 150 according to instructions from the CPU 211 . A printer I/F 215 is capable of transmitting print data to be printed to the printer 201 according to an instruction from the CPU 211 and transmitting the printer status received from the printer 201 to the CPU 211 .

スキャナＩ／Ｆ２１６は、ＣＰＵ２１１から指示された画像読み取り指示をスキャナ２０２に送信、スキャナ２０２から受信した画像データをＣＰＵ２１１に伝達する。また、スキャナ２０２から受信した当該スキャナの状態の情報をＣＰＵ２１１に伝達したりすることが可能である。操作部Ｉ／Ｆ２１７は、操作部２０３を介して為されたユーザからの指示をＣＰＵ２１１に伝達したり、ユーザが操作するための画面情報を操作部２０３に表示させたりことが可能である。 A scanner I/F 216 transmits an image reading instruction from the CPU 211 to the scanner 202 and transmits image data received from the scanner 202 to the CPU 211 . Further, it is possible to transmit information on the status of the scanner received from the scanner 202 to the CPU 211 . The operation unit I/F 217 can transmit an instruction from the user through the operation unit 203 to the CPU 211 and can cause the operation unit 203 to display screen information for user operation.

拡張Ｉ／Ｆ２１８は、画像形成装置１１０に外部機器を接続することを可能とするインターフェースである。拡張Ｉ／Ｆ２１８は、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）形式のインターフェースを具備する。画像形成装置１１０は、ＵＳＢメモリ等の外部記憶装置が拡張Ｉ／Ｆ２１８に接続されることにより、当該外部記憶装置に記憶されているデータの読み取りおよび当該外部記憶装置に対するデータの書き込みを行うことが可能である。 An extension I/F 218 is an interface that allows an external device to be connected to the image forming apparatus 110 . The expansion I/F 218 has, for example, a USB (Universal Serial Bus) format interface. By connecting an external storage device such as a USB memory to the expansion I/F 218, the image forming apparatus 110 can read data stored in the external storage device and write data to the external storage device. It is possible.

プリンタ２０１は、プリンタＩ／Ｆ２１５を介して受信した画像データを用紙に印刷することができ、また、プリンタ２０１の状態をプリンタＩ／Ｆ２１５に伝達することもできる。スキャナ２０２は、スキャナＩ／Ｆ２１６を介して受信した画像読み取り指示に従って、スキャナに置かれた文書（用紙）を読み取ることによって得た画像データをスキャナＩ／Ｆ２１６に伝達することが可能である。また、スキャナ２０２は、スキャナの状態をスキャナＩ／Ｆ２１６に伝達することも可能である。操作部２０３は、ユーザの操作に基づいて、画像形成装置１１０に対して各種の指示を行うためのインターフェースである。例えば、操作部２０３は、タッチパネル液晶画面を具備し、操作画面を表示すると共に、ユーザからの操作を受け付ける。 The printer 201 can print image data received via the printer I/F 215 on paper, and can also transmit the status of the printer 201 to the printer I/F 215 . The scanner 202 can transmit image data obtained by reading a document (paper) placed on the scanner to the scanner I/F 216 according to an image reading instruction received via the scanner I/F 216 . The scanner 202 can also transmit the status of the scanner to the scanner I/F 216 . An operation unit 203 is an interface for giving various instructions to the image forming apparatus 110 based on user operations. For example, the operation unit 203 has a touch panel liquid crystal screen, displays an operation screen, and receives an operation from the user.

画像処理サーバ１２０は、ＣＰＵ２２１、ＲＡＭ２２２、ＨＤＤ２２３、ネットワークＩ／Ｆ２２４から構成される。ＣＰＵ２２１は、装置全体の制御を司り、ＲＡＭ２２２、ＨＤＤ２２３、及びネットワークＩ／Ｆ２２４の間でデータの授受を制御可能である。また、ＣＰＵ２２１はＨＤＤ２２３から読みだした制御プログラム（命令）をＲＡＭ２２２に展開して実行する。画像処理サーバ１２０のＨＤＤ２２３は、画像データや各種プログラムを記憶する大容量記憶部である。 The image processing server 120 is composed of a CPU 221 , a RAM 222 , an HDD 223 and a network I/F 224 . The CPU 221 is in charge of controlling the entire apparatus, and is capable of controlling data transfer among the RAM 222 , HDD 223 and network I/F 224 . Further, the CPU 221 develops a control program (instruction) read from the HDD 223 in the RAM 222 and executes it. The HDD 223 of the image processing server 120 is a large-capacity storage unit that stores image data and various programs.

情報処理端末１３０は、ＣＰＵ２３１、ＲＡＭ２３２、ＨＤＤ２３３、ネットワークＩ／Ｆ２３４、操作部Ｉ／Ｆ２３５、操作部２３６から構成される。 The information processing terminal 130 includes a CPU 231 , a RAM 232 , an HDD 233 , a network I/F 234 , an operation section I/F 235 and an operation section 236 .

ＣＰＵ２３１は、装置全体の制御を司り、ＲＡＭ２３２、ＨＤＤ２３３、ネットワークＩ／Ｆ２３４、操作部Ｉ／Ｆ２３５、及び操作部２３６の間でデータの授受を制御可能である。また、ＣＰＵ２３１はＨＤＤ２３３から読みだした制御プログラム（命令）をＲＡＭ２３２に展開し実行する。操作部Ｉ／Ｆ２３５は、操作部２３６から入力されたユーザからの指示をＣＰＵ２３１に伝達し、ＣＰＵ２３１による表示制御に基づき、表示すべき操作画面に関する情報を操作部２３６に伝達するインターフェースである。情報処理端末１３０には、画像データを確認するためのアプリケーションがインストールされており、該アプリケーションを実行することで、画像処理サーバの画像データ表示し、画像データの保存を要求する機能を利用することができる。画像データを確認するためのアプリケーションは、Ｗｅｂアプリケーションとして提供されている場合は、情報処理端末１３０は、Ｗｅｂブラウザを介してＷｅｂアプリケーションを実行することで画像データの表示、画像データの保存を要求する形態でも構わない。 The CPU 231 controls the entire apparatus, and can control data transfer among the RAM 232 , HDD 233 , network I/F 234 , operation unit I/F 235 , and operation unit 236 . In addition, the CPU 231 develops a control program (instruction) read from the HDD 233 in the RAM 232 and executes it. The operation unit I/F 235 is an interface that transmits to the CPU 231 a user's instruction input from the operation unit 236 and transmits information regarding an operation screen to be displayed to the operation unit 236 based on display control by the CPU 231 . An application for checking image data is installed in the information processing terminal 130. By executing the application, the image data of the image processing server is displayed, and the function of requesting the storage of the image data can be used. can be done. If the application for checking the image data is provided as a web application, the information processing terminal 130 executes the web application via the web browser to request image data display and image data storage. Any form is acceptable.

ストレージサーバ１４０は、ＣＰＵ２４１、ＲＡＭ２４２、ＨＤＤ２４３、ネットワークＩ／Ｆ２４４から構成される。ＣＰＵ２４１は、装置全体の制御を司り、ＲＡＭ２４２、ＨＤＤ２４３、及びネットワークＩ／Ｆ２４４の間でデータの授受を制御可能である。また、ＣＰＵ２４１はＨＤＤ２４３から読みだした制御プログラム（命令）をＲＡＭ２４２に展開して実行する。ストレージサーバ１４０のＨＤＤ２４３は、画像処理サーバ１２０より受信した画像データを保存することが可能である。 The storage server 140 is composed of a CPU 241 , a RAM 242 , an HDD 243 and a network I/F 244 . The CPU 241 is in charge of controlling the entire apparatus, and is capable of controlling data transfer among the RAM 242 , HDD 243 and network I/F 244 . Also, the CPU 241 develops a control program (instruction) read from the HDD 243 in the RAM 242 and executes it. The HDD 243 of the storage server 140 can store image data received from the image processing server 120 .

＜画像処理システムのソフトウェア構成＞
図３は、本実施形態における画像処理システムのソフトウェア構成の一例を表す図である。 <Software configuration of image processing system>
FIG. 3 is a diagram showing an example of the software configuration of the image processing system according to this embodiment.

画像形成装置１１０はコントローラ３１０を有する。コントローラ３１０は、制御部３１１、記憶部３１２、通信部３１３、表示部３１４、スキャン部３１５で構成される。 The image forming apparatus 110 has a controller 310 . The controller 310 includes a control section 311 , a storage section 312 , a communication section 313 , a display section 314 and a scanning section 315 .

制御部３１１は、コントローラ３１０の機能に関する処理全般の機能を有する。 The control unit 311 has functions of overall processing related to the functions of the controller 310 .

記憶部３１２は、スキャンに関わる設定を保存する機能とスキャン画像を保存する機能を有する。 A storage unit 312 has a function of saving settings related to scanning and a function of saving a scanned image.

通信部３１３は、ネットワーク１５０を介してスキャンした画像データとスキャンに関わる設定情報を画像処理サーバ１２０へ送信する。 The communication unit 313 transmits the scanned image data and the setting information related to scanning to the image processing server 120 via the network 150 .

表示部３１４は、ユーザによる操作を受け付けるためのＵＩ画面を表示する。スキャン部３１５は、制御部３１１からのスキャン要求を受信し、スキャナＩ／Ｆ２１６を介してスキャナ２０２へスキャン処理を実行する。また、スキャン実行して受信した画像データを制御部３１１へ送信する。 A display unit 314 displays a UI screen for accepting user operations. A scanning unit 315 receives a scanning request from the control unit 311 and executes scanning processing on the scanner 202 via the scanner I/F 216 . Also, the image data received by scanning is transmitted to the control unit 311 .

画像処理サーバ１２０は、画像処理サービス３２０を有する。画像処理サービス３２０は、制御部３２１、記憶部３２２、通信部３２３、画像処理部３２４で構成される。制御部３２１は、画像処理サービス３２０の機能に関する処理全般の機能を有する。記憶部３２２は、スキャンに関わる設定を保存する機能、画像形成装置１１０から受信した画像データを保存する機能、画像データのマッチングを行うための学習済み文書を保存する機能を有する。 The image processing server 120 has an image processing service 320 . The image processing service 320 is composed of a control section 321 , a storage section 322 , a communication section 323 and an image processing section 324 . The control unit 321 has functions for overall processing related to the functions of the image processing service 320 . The storage unit 322 has a function of saving settings related to scanning, a function of saving image data received from the image forming apparatus 110, and a function of saving a learned document for matching image data.

通信部３２３は、画像形成装置１１０からの画像データを受信する機能、情報処理端末１３０に処理結果を送信する機能、情報処理端末１３０からのスキャン画像保存要求を受信する機能、ストレージサーバ１４０へ画像データを送信する機能を有する。 The communication unit 323 has a function of receiving image data from the image forming apparatus 110 , a function of transmitting a processing result to the information processing terminal 130 , a function of receiving a scan image storage request from the information processing terminal 130 , and a function of transmitting an image to the storage server 140 . It has the function of sending data.

画像処理部３２４は、画像に対して文字領域解析、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）、画像の回転や傾き補正などの画像に対する画像解析処理や二値化等の画像の加工処理を行う。画像解析処理には、前後ページのＯＣＲ結果等を入力として、文書分割位置を判定する文書分割位置判定処理や特徴量による学習、学習済み文書とのマッチング判定も含まれる。 The image processing unit 324 performs character area analysis, OCR (Optical Character Recognition), image analysis such as image rotation and tilt correction, and image processing such as binarization. The image analysis processing includes document division position determination processing for determining the document division position by inputting the OCR results of the preceding and succeeding pages, learning based on the feature amount, and matching determination with the learned document.

＜全体の処理の流れ＞
図４は、画像形成装置１１０でスキャンした画像をファイル化して、ストレージサーバ１４０へ送信する処理の流れを示すシーケンス図である。ここでは、各装置間のやり取りを中心に説明する。 <Overall processing flow>
FIG. 4 is a sequence diagram showing the flow of processing for converting an image scanned by the image forming apparatus 110 into a file and transmitting the file to the storage server 140. As shown in FIG. Here, the explanation will focus on exchanges between devices.

なお、図４では画像形成装置１１０が画像処理サーバ１２０と解析結果取得、画面の表示のやり取りを行う記載としているが、画像形成装置１１０でなく情報処理端末１３０が実行する構成でも構わない。 Although FIG. 4 shows that the image forming apparatus 110 exchanges analysis result acquisition and screen display with the image processing server 120, the information processing terminal 130 instead of the image forming apparatus 110 may execute these operations.

画像形成装置１１０の制御部３１１は、表示部３１４を介して、ユーザからのスキャン要求を受信し、スキャン部３１５を介してスキャンを実行する（Ｓ４０１）。画像形成装置１１０の制御部３１１は、通信部３１３を介して、Ｓ４０１で生成されたスキャンデータを画像処理サーバ１２０へ送信する（Ｓ４０２）。 The control unit 311 of the image forming apparatus 110 receives a scan request from the user via the display unit 314 and executes scanning via the scanning unit 315 (S401). The control unit 311 of the image forming apparatus 110 transmits the scan data generated in S401 to the image processing server 120 via the communication unit 313 (S402).

画像処理サーバ１２０の制御部３２１は、通信部３２３を介して、Ｓ４０２で送信されたスキャン画像（スキャンデータ）を受信し、画像処理部３２４を介して文書分割位置の判定処理を行う（Ｓ４０３）。処理の詳細については後述する。 The control unit 321 of the image processing server 120 receives the scanned image (scan data) transmitted in S402 via the communication unit 323, and performs document division position determination processing via the image processing unit 324 (S403). . Details of the processing will be described later.

画像形成装置１１０の制御部３１１は、通信部３１３を介して、画像処理サーバ１２０から処理の完了を検知すると画像処理サーバ１２０に判定結果の要求を行い、Ｓ４０３の文書分割位置の判定結果情報を取得する（Ｓ４０４）。 When the control unit 311 of the image forming apparatus 110 detects the completion of processing from the image processing server 120 via the communication unit 313, it requests the image processing server 120 for the determination result, and transmits the determination result information of the document division position in S403. Acquire (S404).

画像形成装置１１０の制御部３１１は、表示部３１４を介して、Ｓ４０４において取得した判定結果情報を使用して、複数のスキャン画像からなるスキャン画像群から文書の分割位置を確定するための文書分割位置確定画面を表示する（Ｓ４０５）。 The control unit 311 of the image forming apparatus 110 uses the determination result information acquired in S404 via the display unit 314 to perform document division for determining document division positions from a scan image group including a plurality of scan images. A position determination screen is displayed (S405).

画像形成装置１１０の制御部３１１は、表示部３１４を介して、Ｓ４０５で表示した文書分割位置確定画面上でユーザからの文書分割位置の修正を受信し、文書分割位置確定画面上で文書分割位置の更新を行う。文書分割位置確定画面での処理の詳細については後述する（Ｓ４０６）。 The control unit 311 of the image forming apparatus 110 receives the correction of the document division position from the user on the document division position confirmation screen displayed in S405 via the display unit 314, and changes the document division position on the document division position confirmation screen. update. The details of the processing on the document dividing position determination screen will be described later (S406).

画像形成装置１１０の制御部３１１は、分割位置確定画面で「送信」ボタンが押下されると、通信部３１３を介して、Ｓ４０６で更新した判定結果情報、文書ごとのファイル名やファイルフォーマット等の情報を画像処理サーバ１２０へ送信する（Ｓ４０７）。 When the "Send" button is pressed on the division position determination screen, the control unit 311 of the image forming apparatus 110 transmits the determination result information updated in S406, the file name of each document, the file format, etc. via the communication unit 313. The information is transmitted to the image processing server 120 (S407).

画像処理サーバ１２０の制御部３２１は、通信部３２３を介して、Ｓ４０７で送信されたデータを受信すると判定結果情報、ファイル名等の情報に基づいてスキャン画像群から複数のファイルを生成する（Ｓ４０８）。画像処理サーバ１２０の制御部３２１は、通信部３２３を介して、Ｓ４０８で生成した複数のファイルをストレージサーバ１４０に送信する（Ｓ４０９）。 Upon receiving the data transmitted in S407 via the communication unit 323, the control unit 321 of the image processing server 120 generates a plurality of files from the scan image group based on information such as the determination result information and the file name (S408). ). The control unit 321 of the image processing server 120 transmits the plurality of files generated in S408 to the storage server 140 via the communication unit 323 (S409).

図５は、本システムにおける画像処理サーバ１２０の画像処理部３２４が行う文書分割位置の判定処理の詳細を示すフローチャートである。本フローは図４におけるＳ４０３に相当する。 FIG. 5 is a flow chart showing the details of document division position determination processing performed by the image processing unit 324 of the image processing server 120 in this system. This flow corresponds to S403 in FIG.

画像処理部３２４は、画像形成装置１１０よりスキャン画像群とスキャン画像群の総ページ数の情報を取得する（Ｓ５０１）。スキャンされた順番で片面につきページ１、ページ２、ページ３とページ番号を割り振り、スキャン画像群とスキャン画像群の総ページ数を記憶部３２２に保存する。 The image processing unit 324 acquires the scanned image group and information on the total number of pages of the scanned image group from the image forming apparatus 110 (S501). Page 1, page 2, and page 3 are assigned to each side in the order of scanning, and the scanned image group and the total number of pages of the scanned image group are stored in the storage unit 322 .

画像処理部３２４は、Ｓ５０１で取得したスキャン画像群が２ページ以上存在するかを判定する（Ｓ５０２）。２ページ以上存在しない場合には、文書分割位置は存在し得ないので、文書分割位置判定をスキップして、Ｓ５１５に進む。それ以外の場合には、Ｓ５０３に進む。 The image processing unit 324 determines whether there are two or more pages of the scan image group acquired in S501 (S502). If there are not two or more pages, the document division position cannot exist, so the document division position determination is skipped and the process advances to step S515. Otherwise, the process proceeds to S503.

画像処理部３２４は、スキャン画像群の先頭ページを取得する（Ｓ５０３）。Ｓ５０３もしくはＳ５１４からＳ５０４に進むと、画像処理部３２４は、１つ前に処理をしたページの次のページを取得する。１つ前のページが先頭ページの場合は、２ページ目を取得する（Ｓ５０４）。 The image processing unit 324 acquires the first page of the scan image group (S503). When proceeding from S503 or S514 to S504, the image processing unit 324 acquires the next page of the previously processed page. If the previous page is the first page, the second page is acquired (S504).

画像処理部３２４は、前後ページのスキャン画像からテキスト情報を取得する（Ｓ５０５）。ここでの前後ページとは、Ｓ５０４で取得したスキャン画像と、Ｓ５０３で取得したスキャン画像に相当する。画像処理部３２４は、前後ページの全ての領域に対して文字領域解析とＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）処理を実行する。そして、文字領域とテキストを抽出し、文字領域の領域番号、座標（Ｘ座標、Ｙ座標、幅、高さ）、テキスト、ＯＣＲ確信度をページ番号に紐づけて記憶部３２２に保存する。 The image processing unit 324 acquires text information from the scanned images of the preceding and succeeding pages (S505). The preceding and succeeding pages here correspond to the scanned image acquired in S504 and the scanned image acquired in S503. The image processing unit 324 executes character area analysis and OCR (Optical Character Recognition) processing on all areas of the preceding and succeeding pages. Then, the character area and text are extracted, and the area number, coordinates (X coordinate, Y coordinate, width, height), text, and OCR certainty of the character area are stored in the storage unit 322 in association with the page number.

表１はその文字領域抽出結果の一例を示す表である。文字領域抽出結果は、文字領域の領域番号、Ｘ座標、Ｙ座標、幅、高さからなる文字領域を取得し、ＯＣＲ処理結果としては文字領域のテキストデータとＯＣＲ確信度が取得される。ＯＣＲ確信度とは、ＯＣＲ結果であるテキストデータがどのくらい正しいかの統計的な尺度である。ＯＣＲ確信度として、各文字の確信度の平均値でも、単語（高、中、低など）で表現してもよい。ＯＣＲ確信度の値域は、任意の範囲を取り得るが、本実施形態においては、０～１の値とする。 Table 1 is a table showing an example of the character area extraction result. As the result of character area extraction, a character area consisting of the area number, X coordinate, Y coordinate, width, and height of the character area is obtained. OCR confidence is a statistical measure of how correct the text data resulting from OCR is. The OCR confidence may be expressed as an average value of the confidence of each character or as a word (high, medium, low, etc.). Although the value range of the OCR certainty can take any range, it is set to a value of 0 to 1 in this embodiment.

なお、１ページ目以外のページのうちで、前ページと位置付けられるスキャン画像は、１つ前に実行されたＳ５０５において、既に後ページとして処理済みであるので、再度処理を行う必要はない。具体的には、１ページ目が前ページと位置付けられ、２ページ目が後ページと位置づけられた状況でＳ５０５を行う場合は、１ページ目および２ページ目に対してＳ５０５の処理が実行される。その後、Ｓ５０４で３ページ目を新たに取得してＳ５０５の処理を実行する際には、２ページ目は既にＳ５０５の処理は実行済みであるため、新たに取得した３ページ目に対してのみＳ５０５の処理が実行される。 Of the pages other than the first page, the scanned image positioned as the previous page has already been processed as the subsequent page in step S505 executed immediately before, so there is no need to perform the processing again. Specifically, when performing S505 in a situation where the first page is positioned as the previous page and the second page is positioned as the subsequent page, the processing of S505 is executed for the first and second pages. . After that, when the third page is newly acquired in S504 and the processing of S505 is executed, the processing of S505 has already been executed for the second page, so S505 is executed only for the newly acquired third page. process is executed.

画像処理部３２４は、前後ページのスキャン画像に対して、テキスト信頼度を算出する（Ｓ５０６）。テキスト信頼度とは、対象のスキャン画像が後述のＳ５０８における自然言語解析に適した画像であるか否かを判断する条件の尺度である。具体的には、スキャン画像内に存在するテキスト量と、Ｓ５０５で取得したＯＣＲ確信度から算出する。 The image processing unit 324 calculates text reliability for the scanned images of the preceding and following pages (S506). The text reliability is a criterion for determining whether or not the target scanned image is an image suitable for natural language analysis in S508, which will be described later. Specifically, it is calculated from the amount of text present in the scanned image and the OCR certainty obtained in S505.

例えば、テキスト量が一定の値より少ない場合や、テキスト量が一定量あった場合でも、ＯＣＲ確信度の平均値が低い場合は、テキスト信頼度は低下する。その際、文字領域の位置でテキストの重要性が変化するという前提で文字領域の座標を考慮した重み付けをテキスト信頼度に対して行って算出してもよい。画像処理部３２４は、テキスト信頼度をページ番号に紐づけて記憶部３２２に保存する。 For example, if the amount of text is less than a certain value, or even if the amount of text is a certain amount, if the average value of the OCR certainty is low, the reliability of the text decreases. In this case, on the premise that the importance of the text changes depending on the position of the character area, weighting may be performed on the text reliability in consideration of the coordinates of the character area. The image processing unit 324 associates the text reliability with the page number and stores it in the storage unit 322 .

テキスト信頼度は、テキスト量とＯＣＲ確信度から算出したが、これだけに限定するものでなく、スキャン画像の解像度などの画質に関連する情報等を含めてテキスト信頼度を算出するようにしてもよい。なお、Ｓ５０５のときと同様に、Ｓ５０６の処理は、１ページ目、およびＳ５０４で新たに取得したページに対してのみ、Ｓ５０６の処理を実行すればよい。 Although the text reliability is calculated from the amount of text and the OCR certainty, it is not limited to this, and the text reliability may be calculated including information related to the image quality such as the resolution of the scanned image. . As in S505, the processing of S506 may be executed only for the first page and the page newly acquired in S504.

画像処理部３２４は、Ｓ５０６で算出したテキスト信頼度が事前に設定した閾値より高いかを判定する（Ｓ５０７）。前後ページともに閾値より高い場合はＳ５０８に進み、どちらか一方のページのテキスト信頼度が閾値を下回る場合はＳ５１０に進む。ページ当たりのテキスト量が多くなるほどテキスト信頼度は高くなることから、通常であれば、表紙や末尾ページ以外の大部分のページはテキスト信頼度が閾値より高くなり、Ｓ５０８に進むことになる。 The image processing unit 324 determines whether the text reliability calculated in S506 is higher than a preset threshold (S507). If both the preceding and following pages are higher than the threshold, the process proceeds to S508, and if the text reliability of either page is lower than the threshold, the process proceeds to S510. Since the text reliability increases as the amount of text per page increases, normally most pages other than the cover page and the last page will have a text reliability higher than the threshold, and the process proceeds to S508.

Ｓ５０７で前後ページともにテキスト信頼度が閾値より高いと判定された場合、画像処理部３２４は自然言語解析を行う（Ｓ５０８）。自然言語解析では、Ｓ５０５で取得した前後ページのＯＣＲ処理結果であるテキストデータから、分割確信度算出モデルに入力する特徴量を生成する。分割確信度算出モデルで算出された分割確信度は、Ｓ５１３の文書分割位置判定処理で前後ページの間に文書分割位置が存在するかを判定する際の指標として用いられる。 If it is determined in S507 that the text reliability levels of both the preceding and succeeding pages are higher than the threshold, the image processing unit 324 performs natural language analysis (S508). In the natural language analysis, a feature amount to be input to the split confidence calculation model is generated from the text data, which is the OCR processing result of the preceding and succeeding pages acquired in S505. The division certainty calculated by the division certainty calculation model is used as an index for judging whether the document division position exists between the preceding and succeeding pages in the document division position judgment processing in S513.

ＯＣＲ処理結果であるテキストデータから抽出される特徴量は、その特徴量が用いられるモデルの種別に応じて変更してもよい。例えば、分割確信度算出モデルが、公知のＢＥＲＴ（ＢｉｄｉｒｅｃｔｉｏｎａｌＥｎｃｏｄｅｒＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｒｏｍＴｒａｎｓｆｏｒｍｅｒｓ）ベースのモデルである場合は、ＢＥＲＴに入力可能なトークンをテキストデータから生成する必要がある。種々の公知の方法を利用すればよいが、例えば、前後ページを含めて、文字列を結合し、さらにその結合文字列をｂｉｇｒａｍ処理や形態素解析処理で単語分割して、トークン化する方法などが考えられる。 The feature amount extracted from the text data that is the result of OCR processing may be changed according to the type of model that uses the feature amount. For example, if the split confidence calculation model is a known BERT (Bidirectional Encoder Representations from Transformers)-based model, it is necessary to generate tokens that can be input to the BERT from text data. Various known methods may be used, but for example, a method of combining character strings including the preceding and succeeding pages, further dividing the combined character string into words by bigram processing or morphological analysis processing, and tokenizing. Conceivable.

自然言語解析により特徴量が生成された後、画像処理部３２４は分割確信度を算出する（Ｓ５０９）。具体的には、Ｓ５０８で生成された特徴量を分割確信度算出モデルに入力することで分割確信度を取得する。一般的に、文書分割位置であるか否かという２値の判定を行う判定モデル（本実施例における分割確信度算出モデル）であっても、連続値である確信度（本実施例における分割確信度）によってその判定結果が表現される。この確信度に対して閾値処理を行うことにより、最終的な判定処理（文書分割位置であるか否かの判定）を行うという構成が一般的である。 After the feature amount is generated by the natural language analysis, the image processing unit 324 calculates the division confidence (S509). Specifically, the split confidence is obtained by inputting the feature amount generated in S508 into the split confidence calculation model. In general, even in a judgment model (division confidence calculation model in this embodiment) that makes a binary judgment as to whether or not it is a document division position, the confidence that is a continuous value (division confidence in this embodiment) degree) expresses the determination result. It is common to perform a final judgment process (judgment as to whether or not it is a document division position) by performing threshold processing on this certainty.

ここで閾値処理について説明する。例えば、この写真は猫の写真であるかという問題に対して、機械学習により確信度０．８７（＝８７％）等が出力される。その値に対して、閾値を設定することによってＹＥＳまたはＮｏの二値で判定される。閾値の設定方法としては、確信度が０．７以上である場合に「ＹＥＳ」（猫である）と判定される設定方法がある。算出された確信度に対して、閾値を設け、その閾値よりも大きいか小さいかによってその判定結果を二値化する。 Threshold processing will now be described. For example, machine learning outputs a degree of certainty of 0.87 (=87%) to the question whether this photo is of a cat. By setting a threshold value for that value, it is judged as a binary value of YES or NO. As a method of setting the threshold, there is a setting method in which "YES" (it is a cat) is determined when the degree of certainty is 0.7 or more. A threshold is set for the calculated certainty, and the determination result is binarized depending on whether it is larger or smaller than the threshold.

Ｓ５０７で前後ページのどちらかでもテキスト信頼度が閾値より低いと判定された場合、画像処理部３２４は、後ページのテキスト信頼度が閾値より高いか判定する。後ページのテキスト信頼度が閾値より高い場合はＳ５１３に進み、Ｓ５１１のレイアウト解析、Ｓ５１２の文書分類の処理を行わない。 If it is determined in S507 that the text reliability of either the previous or subsequent page is lower than the threshold, the image processing unit 324 determines whether the text reliability of the subsequent page is higher than the threshold. If the text reliability of the subsequent page is higher than the threshold, the process advances to S513, and the layout analysis of S511 and the document classification of S512 are not performed.

後ページのテキスト信頼度が閾値よりも高い（前ページはテキスト信頼度が閾値よりも低い）場合にＳ５１１およびＳ５１２の処理を行わない理由は、テキスト信頼度が閾値よりも高いと判断された文書はテキスト量が比較的多いことを示すため、後述のＳ５１２の文書分類の処理において、テキスト量が比較的少ない「表紙」や「文末」といった分類を行う必要がないからである。Ｓ５１１およびＳ５１２の処理を行ったとしても、「表紙」や「文末」以外の「その他」に分類される可能性が高い。「その他」に分類された文書はＳ４０６で文書分割位置をユーザーによって確認されるため、「その他」に分類された文書が増えるとユーザ負荷の増大に繋がる。そのため、テキスト信頼度が閾値よりも高い文書はＳ５１１およびＳ５１２の処理を行うことなくＳ５１３に進むのである。 The reason why the processing of S511 and S512 is not performed when the text reliability of the subsequent page is higher than the threshold (the text reliability of the previous page is lower than the threshold) is that the text reliability of the document is determined to be higher than the threshold. indicates that the amount of text is relatively large, it is not necessary to classify documents such as "cover" and "end of sentence", which have relatively small amount of text, in the document classification process of S512, which will be described later. Even if the processing of S511 and S512 is performed, there is a high possibility that it will be classified into "others" other than "cover" and "end of sentence". Document division positions of documents classified as "others" are checked by the user in step S406, so an increase in the number of documents classified as "others" leads to an increase in user load. Therefore, the document whose text reliability is higher than the threshold proceeds to S513 without performing the processing of S511 and S512.

一方、Ｓ５１０で後ページのテキスト信頼度が閾値より低いと判断された場合はＳ５１１に進む。 On the other hand, if it is determined in S510 that the text reliability of the subsequent page is lower than the threshold, the process proceeds to S511.

画像処理部３２４は、後ページのスキャン画像に対して、レイアウト解析を行う（Ｓ５１１）。レイアウト解析の結果取得される特徴量は、後述するＳ５１２における文書分類で用いられる。ここでの特徴量とは例えば、文字の大きさ、字間、行間、余白、文字領域の座標など、所謂文書フォーマットの特徴量が挙げられる。ただし、必ずしもＳ５１１で新たにレイアウト解析をする必要はなく、Ｓ５０５で抽出した文字領域抽出結果である文字領域のＸ座標、Ｙ座標、幅、高さの情報をＳ５１１の解析結果を流用してもよい。また、レイアウト解析は上記の方法に限定するものではなく、その他一般的に知られている文書分類可能な特徴量の抽出方法であってもよい。 The image processing unit 324 performs layout analysis on the scanned image of the subsequent page (S511). The feature amount acquired as a result of layout analysis is used for document classification in S512, which will be described later. The feature amount here includes, for example, the feature amount of a so-called document format such as character size, character spacing, line spacing, margins, and coordinates of character areas. However, it is not always necessary to perform a new layout analysis in S511, and the information of the X coordinate, Y coordinate, width, and height of the character area extracted in S505 can be used as the analysis result in S511. good. Also, the layout analysis is not limited to the above method, and other commonly known methods for extracting feature amounts that enable document classification may be used.

画像処理部３２４は、Ｓ５１１でレイアウト解析したスキャン画像に対して文書分類を行う（Ｓ５１２）。具体的にはスキャン画像を「表紙」、「文末」、「その他」のいずれかに分類する。一般的に、表紙のレイアウトの特徴として、「大きな文字による見出し」、「空白部分が広い」、「文字領域が中央に寄っている」、「行間が広い」等が挙げられる。一方、文末ページのレイアウトの特徴として、「ページ上部に文字が偏っている」、「ページ下部の空白が広い」、「表紙に対して文字が小さく均一」、「行間が一定で狭い」等が挙げられる。このような表紙のレイアウトや文末ページのレイアウトの一般的な特徴をもとに、スキャン画像を「表紙」、「文末」、「その他」に分類する。 The image processing unit 324 performs document classification on the scanned image layout-analyzed in S511 (S512). Specifically, the scanned image is classified into one of "cover", "end of sentence", and "others". In general, the characteristics of the cover layout include "heading with large characters", "wide blank space", "character area close to the center", and "wide line spacing". On the other hand, the characteristics of the layout of the end page include "characters are biased toward the top of the page," "wide blank space at the bottom of the page," "characters are small and uniform relative to the cover page," and "constant and narrow line spacing." mentioned. Scanned images are classified into "cover", "end of article", and "others" based on the general characteristics of the cover layout and the layout of the end page.

具体的な分類方法としては、スキャン画像が上記の特徴に一致する度に加算して、加算値が閾値より高い場合に分類するといった方法や各特徴量をベクトル量として表現し、事前に設定した代表的な表紙や文末ページのベクトル量との一致度合いを比較して分類するといった方法等が考えられる。分類結果はページ番号に紐づけて記憶部３２２に保存する。 As a specific classification method, each time the scanned image matches the above feature, it is added, and if the added value is higher than the threshold, it is classified. A method of classifying by comparing the degree of matching with the vector quantity of a typical cover page or the last page of the text can be considered. The classification result is stored in the storage unit 322 in association with the page number.

画像処理部３２４は文書分割位置判定処理を行う（Ｓ５１３）。Ｓ５０９の処理で取得した分割確信度に対しては閾値処理を行い、文書分割位置を判定する。事前に設定されている閾値を上回っている場合には文書分割位置であると判定し、閾値を下回っている場合には、非文書分割位置であると判定する。 The image processing unit 324 performs document division position determination processing (S513). Threshold processing is performed on the division reliability obtained in the process of S509 to determine the document division position. If it exceeds a preset threshold, it is determined to be a document division position, and if it is less than the threshold, it is determined to be a non-document division position.

Ｓ５１２の処理で取得した分類結果に対しては、Ｓ５１２で分類した結果とＳ５０６で算出したテキスト信頼度から文書分割位置を判定する（Ｓ５１３）。後ろページが「表紙」と分類された文書は、前ページと分割される。後ろページが「文末」と分類された文書は前ページと分割されない。ただし、前ページが「文末」と分類されている場合のみ、分割の判定を保留する。後ろページが「その他」と分類された文書も、分割の判定を保留する。 For the classification result acquired in the process of S512, the document division position is determined from the result of classification in S512 and the text reliability calculated in S506 (S513). A document whose back page is classified as "cover" is separated from the front page. A document whose back page is classified as "end of sentence" is not separated from the front page. However, only when the previous page is classified as "end of sentence", the determination of division is suspended. A document whose back page is classified as "other" also suspends the determination of division.

前後ページの分類結果の組み合わせにより、前後ページを分割するかどうかの結果は表２に示される通りである。ただし、テキスト信頼度「高」とテキスト信頼度「高」の組み合わせは、Ｓ５０９で取得した分割確信度を用いた文書分割位置判定処理で判定される。 Table 2 shows the result of whether or not to divide the front and rear pages according to the combination of the classification results of the front and rear pages. However, the combination of the text reliability "high" and the text reliability "high" is determined by document division position determination processing using the division certainty obtained in S509.

表２は前後ページの分類結果とテキスト信頼度の組み合わせで決定する前後ページ間の関係性の一例である。表２の「分割」は前後ページとの間が分割位置であることを示し、「－」は前後ページとの間が分割位置でないことを示し、「保留」は前後ページとの間が分割位置であるか否か判断が付かないことを示し、Ｓ４０５の文書分割位置確認画面で確認を促す。表２に示される通り、Ｓ５０７でテキスト信頼度「低」と判定された前後ページは「表紙」「文末」「その他」のいずれかに分類されている。前ページの分類結果と後ページの分類結果の組み合わせによって、「分割」「－（分割しない）」「保留」と判定される。これらの判定結果は中間情報として記憶部３２２に保存する。 Table 2 shows an example of the relationship between the preceding and succeeding pages determined by combining the classification results of the preceding and succeeding pages and the text reliability. "Split" in Table 2 indicates that the split position is between the previous and next pages, "-" indicates that the split position is not between the previous and next pages, and "Retain" indicates that the split position is between the previous and next pages. , and prompts confirmation on the document division position confirmation screen in S405. As shown in Table 2, the preceding and succeeding pages determined to have a text reliability of "low" in S507 are classified into one of "cover", "end of sentence", and "others". Depending on the combination of the classification result of the previous page and the classification result of the subsequent page, it is determined as "divided", "-(not divided)", or "suspended". These determination results are stored in the storage unit 322 as intermediate information.

画像処理部３２４は未処理のスキャン画像があるか否かを判定する（Ｓ５１４）。未処理のスキャン画像があった場合はＳ５０４に戻る。すべてのスキャン画像の処理が終了した場合はＳ５１５に進む。 The image processing unit 324 determines whether or not there is an unprocessed scanned image (S514). If there is an unprocessed scanned image, the process returns to S504. If all scanned images have been processed, the process advances to step S515.

画像処理部３２４は、記憶部３２２に保存されているＳ５１３における分割の判定結果とＳ５１２における文書分類結果から、判定結果情報を生成する（Ｓ５１５）。判定結果情報は、分割された画像群ごとに割り与えられたＩＤ番号、１画像群を構成するページ数、個々の文書の文書分類結果などを保持している。生成された判定結果情報は、記憶部３２２に保存する。以上がＳ４０３の詳細処理となる。 The image processing unit 324 generates determination result information from the division determination result in S513 and the document classification result in S512 stored in the storage unit 322 (S515). The determination result information holds an ID number assigned to each divided image group, the number of pages constituting one image group, document classification results of individual documents, and the like. The generated determination result information is saved in the storage unit 322 . The above is the detailed processing of S403.

図６（ａ）は画像形成装置１１０または情報処理端末１３０が表示する文書分割位置確定画面の一例を示す図である。画面６１０はＳ４０５で表示される。画面６１０ではスキャン及び文書分割位置の判定処理が完了し、ストレージサーバ１４０に送信する前のスキャン画像が一覧表示される。さらに画面６１０では、スキャン画像群に対する文書分割位置の判定結果（Ｓ４０３の判定結果）を確認、修正できるようになっている。 FIG. 6A is a diagram showing an example of a document dividing position determination screen displayed by the image forming apparatus 110 or the information processing terminal 130. FIG. Screen 610 is displayed in S405. A screen 610 displays a list of scanned images before transmission to the storage server 140 after completion of scanning and document division position determination processing. Further, on the screen 610, it is possible to confirm and correct the determination result of the document division position for the scan image group (determination result of S403).

画面６１０では、Ｓ４０３の判定結果に基づいた分割状態でスキャン画像が表示されている。画面６１０には、文書群ごとのファイル名６１１及び文書群ごとの総ページ数６１２、文書群のサムネイル画像６１３、及びページ番号６１４が表示されている。さらに、各文書群の最終ページの右側には文書分割線６１５が存在し、例えば文書分割線６１５をドラッグすることで文書分割位置を修正することができる。 On the screen 610, the scanned image is displayed in a divided state based on the determination result of S403. A screen 610 displays a file name 611 for each document group, a total number of pages 612 for each document group, a thumbnail image 613 of the document group, and a page number 614 . Furthermore, a document dividing line 615 exists on the right side of the last page of each document group, and the document dividing position can be corrected by dragging the document dividing line 615, for example.

文書分割位置の修正方法は、サムネイル画像６１３をドラッグすることによって調整できるようにしてもよい。また、Ｓ４０３で「保留」と判定されたサムネイル画像６１６は、その他のサムネイル画像と区別されるように表示される。例えば、サムネイル画像６１６を太枠で囲ったり、異なる色で表示されるようにしてもよい。「送信」ボタン６１７は、文書分割位置を確定するためのボタンであり、ユーザによって選択されると、画面６１０で表示されている文書分割位置を反映した判定結果情報と文書ごとのファイル名やファイルフォーマット等の情報が画像処理サーバ１２０に送信される。 The document division position correction method may be adjusted by dragging the thumbnail image 613 . Also, the thumbnail image 616 determined to be “suspended” in S403 is displayed so as to be distinguished from the other thumbnail images. For example, the thumbnail image 616 may be surrounded by a thick frame or displayed in a different color. A "send" button 617 is a button for confirming the document splitting position. Information such as the format is sent to the image processing server 120 .

「一括確認」ボタン６１８は、文書分割位置の確認、修正を一括で行う画面に遷移するボタンであり、ユーザの押下により、「保留」の判定をされた保留サムネイル画像６１６のみを表紙であるか否かを選択するために一覧で表示する。 A "batch confirmation" button 618 is a button for transitioning to a screen for collectively confirming and correcting document division positions. A list is displayed to select whether or not.

図６（ｂ）は、画像形成装置１１０または情報処理端末１３０が表示する一括確認画面６２０の一例である。一括確認画面６２０に示されるサムネイル画像は、文書分割位置を判定した際に「保留」と判断されたページの画像である。 FIG. 6B is an example of a collective confirmation screen 620 displayed by the image forming apparatus 110 or the information processing terminal 130. As shown in FIG. The thumbnail image displayed on the collective confirmation screen 620 is the image of the page determined to be “suspended” when determining the document division position.

画面６１０の「一括確認」ボタン６１８が選択されると表示される。一括確認画面６２０には、保留サムネイル画像６２１及び保留サムネイル画像の所属を示すタグ６２２が一覧して表示されている。保留サムネイル画像６２１がユーザに選択されると、表紙の判定をされた保留サムネイル画像６２３として、周囲の文書に対して区別される形で表示される。「戻る」ボタン６２４は、文書分割位置確定画面６１０に遷移するためのボタンであり、ユーザの押下により、一括確認画面６２０での変更を保存しないで遷移する。「確定」ボタン６２５は、一括確認画面６２０での変更を保存するためのボタンである。ユーザの押下により、画面６１０に遷移をして、選択された画像を表紙として再構成した新しい画像群を表示する。 It is displayed when the “Confirm All” button 618 on the screen 610 is selected. On the collective confirmation screen 620, a list of pending thumbnail images 621 and tags 622 indicating the affiliation of the pending thumbnail images is displayed. When the reserved thumbnail image 621 is selected by the user, it is displayed as a reserved thumbnail image 623 whose cover has been determined in a manner distinguishable from surrounding documents. A “return” button 624 is a button for transitioning to the document dividing position determination screen 610 , and when the user presses it, the transition is made without saving the changes on the collective confirmation screen 620 . A “confirm” button 625 is a button for saving changes made on the collective confirmation screen 620 . When pressed by the user, a transition is made to a screen 610 to display a new group of images reconstructed from the selected image as the cover.

以上の処理手順で本発明を実施することで、複数の文書が含まれるスキャン画像群から文書分割位置を自動判定する処理において、レイアウト解析または自然言語解析で捉えられない特徴を持つ文書が混在する画像群に対して、文書分割の精度を向上させることが可能となる。さらに、文書の特性に応じて解析方法を変更することにより、処理のパフォーマンスも向上させている。また、精度向上により、自動判定が誤った箇所を修正する工数を抑えることができるため、ユーザ負荷を軽減することが可能となる。 By carrying out the present invention according to the above processing procedure, documents with features that cannot be captured by layout analysis or natural language analysis coexist in the process of automatically determining document division positions from a group of scanned images containing multiple documents. It is possible to improve the accuracy of document segmentation for an image group. Furthermore, processing performance is improved by changing the analysis method according to the characteristics of the document. In addition, since the accuracy improvement can reduce the number of man-hours for correcting a portion where the automatic determination is incorrect, it is possible to reduce the user's load.

［第二の実施形態］
第一の実施形態では、レイアウト解析で抽出した特徴量から文書分類を行っていた。本実施形態では、文書分類をする際に学習済み文書とのマッチングを利用することで、精度のよい分類が可能になることを説明する。文書の学習は、文書分割位置確定画面において、ユーザによって修正された文書に対して主に行われる。なお、本実施形態の説明に際して、第一の実施形態と構成や処理手順が同一である箇所の説明は省略し、差異のある箇所のみを説明する。 [Second embodiment]
In the first embodiment, document classification was performed based on feature amounts extracted by layout analysis. In this embodiment, it will be explained that accurate classification can be achieved by using matching with learned documents when classifying documents. Document learning is mainly performed on the document corrected by the user on the document division position determination screen. In the description of this embodiment, the description of the portions having the same configuration and processing procedure as those of the first embodiment will be omitted, and only the portions with differences will be described.

第一の実施形態との差分は、図７のフローチャート、図８のフローチャートの判定結果である。図７は、画像形成装置１１０でスキャンした画像をファイル化して、ストレージサーバ１４０へ送信する処理の流れを示すシーケンス図である。図４との差分は、Ｓ７０１である。 Differences from the first embodiment are the determination results of the flowcharts of FIGS. 7 and 8. FIG. FIG. 7 is a sequence diagram showing the flow of processing for converting an image scanned by the image forming apparatus 110 into a file and transmitting the file to the storage server 140. As shown in FIG. The difference from FIG. 4 is S701.

Ｓ７０１において画像処理部３２４は、Ｓ４０７で送信された判定結果情報を基にユーザによって修正された文書に対して特徴量の学習を行う。主に学習する文書は、Ｓ５１２において「表紙」または「文末」と判定されたが、Ｓ４０６におけるユーザの修正によって、文書群の表紙にも文末にも配置されなかった文書とＳ５１２において「その他」と判定されたが、Ｓ４０６におけるユーザの修正によって、文書群の表紙や文末に配置された文書である。「表紙」または「文末」と判定されたが、文書群の表紙にも文末にも配置されなかった文書は、「本文」に再分類される。 In S701, the image processing unit 324 learns the feature amount for the document corrected by the user based on the determination result information transmitted in S407. The main documents to be learned are the documents that were determined to be the "cover" or the "end" in S512, but were not placed on the cover or the end of the document group due to the user's correction in S406, and the documents that were determined to be "other" in S512. It is a document that has been determined but placed on the front cover or the end of the document group as a result of the user's correction in S406. Documents that are determined to be "cover" or "end of sentence" but are not placed on either the cover or the end of the document group are reclassified into "text".

一方、「その他」と判定されたが、文書群の表紙や文末に配置された文書は、「表紙」または「文末」に再分類される。「本文」はＳ７０１の文書学習でのみ分類される結果である。なお、学習する文書は、ユーザによって修正された文書以外でもよい。特徴量の学習方法としては、例えば、Ｓ５０５で抽出した文字領域抽出結果である文字領域の座標情報と分類結果を紐づけて記憶するなどが挙げられる。他にもＯＣＲ処理結果であるテキスト情報も含めて記憶してもよい。学習した文書は学習済み文書として記憶部３２２に保存する。特徴量と再分類した分類結果を紐づけて記憶することでＳ５１２の文書分類することなく、特徴量と一致する文書の分類結果を判定することが可能となる。 On the other hand, documents that are determined to be "others" but are placed on the cover or at the end of the document group are reclassified into "cover" or "end of text". "Text" is the result of classification only in document learning in S701. Note that the documents to be learned may be documents other than those modified by the user. As a method of learning the feature amount, for example, the coordinate information of the character area, which is the character area extraction result extracted in S505, and the classification result are linked and stored. In addition, text information, which is the result of OCR processing, may also be stored. The learned document is stored in the storage unit 322 as a learned document. By linking and storing the feature amount and the reclassified classification result, it becomes possible to determine the classification result of the document that matches the feature amount without the document classification in S512.

図８は、本システムにおける画像処理サーバ１２０の画像処理部３２４が行う文書分割位置の判定処理の詳細を示すフローチャートである。本フローは図４におけるＳ４０３に相当する。図５との差分は、Ｓ８０１、Ｓ８０２、Ｓ８０３である。 FIG. 8 is a flow chart showing the details of the document division position determination process performed by the image processing unit 324 of the image processing server 120 in this system. This flow corresponds to S403 in FIG. Differences from FIG. 5 are S801, S802, and S803.

Ｓ５１１でスキャン画像に対しレイアウト解析が実行された後、画像処理部３２４は、レイアウト解析したスキャン画像に対して、文書マッチングを行う（Ｓ８０１）。文書マッチングはＳ７０１で学習した特徴量と処理対象のスキャン画像の特徴量を比較し、類似文書かどうか判定する。例えば、特徴量が文字領域の座標情報である場合、学習済みの文字領域全体の面積に占める文字領域同士の重なる面積の割合が閾値以上かどうかで判定する。 After layout analysis is performed on the scanned image in S511, the image processing unit 324 performs document matching on the scanned image subjected to layout analysis (S801). Document matching compares the feature amount learned in S701 with the feature amount of the scanned image to be processed, and determines whether or not the document is similar. For example, when the feature amount is coordinate information of a character region, determination is made based on whether or not the ratio of the overlapping area of the character regions to the area of the entire learned character region is equal to or greater than a threshold.

また、テキスト情報も特徴量として含まれている場合、文字領域内のテキストと一致するかどうかも判定に加えてもよい。第一の実施形態では、文書分類が誤っている文書に対してユーザが修正をしたとしても、類似文書がスキャンされれば、同じ判定をしてしまう。画像処理部３２４は、Ｓ７０１でユーザによって修正された文書を学習しておくことで、類似文書がスキャンされた時に記憶部３２２からマッチングした学習済み文書と紐づいている分類結果を取得し、その分類結果を処理対象のスキャン画像の分類結果とすることができる。分類される結果は、「表紙」、「文末」、「本文」である。分類結果はページ番号に紐づけて記憶部３２２に保存する。 Moreover, if text information is also included as a feature amount, whether or not the text in the character area matches may also be added to the determination. In the first embodiment, even if the user corrects a misclassified document, the same determination will be made if a similar document is scanned. The image processing unit 324 learns the document modified by the user in S701, and acquires the classification result associated with the matched learned document from the storage unit 322 when the similar document is scanned. The classification result can be the classification result of the scanned image to be processed. The classified results are "cover", "end of sentence", and "text". The classification result is stored in the storage unit 322 in association with the page number.

画像処理部３２４は、文書マッチングを実行した結果、スキャン画像が学習済み文書であるかを判定する（Ｓ８０２）。学習済み文書が存在すると判定された場合はＳ８０３に進み、Ｓ５１２の文書分類の処理を行わない。学習済み文書が存在しないと判定された場合、Ｓ５１２に進む。 As a result of executing document matching, the image processing unit 324 determines whether the scanned image is a learned document (S802). If it is determined that a learned document exists, the process advances to S803, and the document classification processing of S512 is not performed. If it is determined that the learned document does not exist, the process proceeds to S512.

続いて、Ｓ８０３の処理について説明する。本実施形態では、Ｓ７０１で「表紙」、「文末」、「その他」の他に「本文」が文書分類として追加された。そのため、前後ページ間の関係性を示す組み合わせが計２０に増加する。表２に差分を追加した表３を示す。表２と同様に「分割」は前後ページとの間が分割位置であることを示し、「－」は前後ページとの間が分割位置でないことを示し、「保留」は前後ページとの間が分割位置であるか否か判断が付かないことを示し、Ｓ４０５の文書分割位置確認画面で確認を促す。 Next, the processing of S803 will be described. In this embodiment, in addition to "cover", "end of sentence", and "others", "text" is added as a document classification in S701. Therefore, the number of combinations indicating the relationship between the preceding and succeeding pages increases to 20 in total. Table 3 is shown by adding differences to Table 2. As in Table 2, "Split" indicates that the split position is between the previous and next pages, "-" indicates that the split position is not between the previous and next pages, and "Hold" indicates that the split position is between the previous and next pages. This indicates that it is not possible to determine whether or not it is the division position, and prompts confirmation on the document division position confirmation screen in S405.

以上の処理手順で本発明を実施することにより、第一の実施形態の効果に加えて、学習済み文書とのマッチングを利用することで、精度のよい分類が可能となる。これにより、さらなる文書分割の精度向上とユーザの負担を低減することが可能となる。 By carrying out the present invention according to the above processing procedure, in addition to the effect of the first embodiment, accurate classification can be achieved by using matching with already-learned documents. This makes it possible to further improve the accuracy of document segmentation and reduce the user's burden.

以上で本実施形態の説明を終える。 This completes the description of the present embodiment.

［その他の実施例］
本発明の目的は以下の処理を実行することによっても達成される。即ち、上述した実施例の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）が記憶媒体に格納されたプログラムコードを読み出す処理である。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード及び該プログラムコードを記憶した記憶媒体は本発明を構成することになる。 [Other Examples]
The object of the present invention is also achieved by executing the following processing. That is, a storage medium recording software program code for realizing the functions of the above-described embodiments is supplied to a system or device, and the computer (or CPU, MPU, etc.) of the system or device executes the program stored in the storage medium. This is the process of reading the code. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiments, and the program code and the storage medium storing the program code constitute the present invention.

１１０画像形成装置
１２０画像処理サーバ
１３０情報処理端末
１４０ストレージサーバ
３１１、３２１制御部
３１２、３２２記憶部
３１３、３２３通信部
３１４表示部
３１５スキャン部
３２４画像処理部 110 image forming apparatus 120 image processing server 130 information processing terminal 140 storage server 311, 321 control unit 312, 322 storage unit 313, 323 communication unit 314 display unit 315 scanning unit 324 image processing unit

Claims

a first page and a second page included in a document composed of a plurality of pages, extracting means for extracting text included in the first page;
acquisition means for acquiring information about the layout of the first page;
An information processing device having
having determination means for determining whether the first page satisfies a predetermined condition;
If the determining means determines that the predetermined condition is satisfied, the document is divided between the first page and the second page using the layout information acquired by the acquiring means. decide whether
If the determining means does not determine that the predetermined condition is satisfied, whether to split the document between the first page and the second page using the text extracted by the extracting means? An information processing apparatus characterized by determining

The determination means is
2. The information processing apparatus according to claim 1, wherein it is determined whether one of the OCR certainty of said first page and the amount of text of said first page is below a predetermined threshold.

the first page is included in a multi-page document;
If the determining means determines that the predetermined condition is satisfied, the first page is classified into a cover page, an end page, or other pages using the layout information. , determining whether to divide the first page from the second page based on the text reliability calculated based on the OCR reliability and the text amount, and the classification result of the first page. 3. The information processing apparatus according to claim 2, characterized by:

When the text reliability of the first page is low and it is determined that the layout of the first page is the first page, the first page and the page located before the first page. 4. The information processing apparatus according to claim 3, wherein said document is divided between.

between the first page and the page after the first page when the text reliability of the first page is low and the layout of the first page is determined to be the last page 5. The information processing apparatus according to claim 3, wherein the document is divided.

If the first page cannot be classified as either a cover page or an end page using the layout information,
6. The method according to any one of claims 3 to 5, wherein a judgment result as to whether or not the division position is between the first page and a page positioned before or after the first page is suspended. The information processing device according to the item.

If the judgment result is withheld,
7. The information processing apparatus according to claim 6, wherein the division position is determined by a user.

a first page and a second page included in a document composed of a plurality of pages, an extracting step of extracting text included in the first page;
a obtaining step of obtaining information about the layout of the first page;
A control method for an information processing device having
a determination step of determining whether the first page satisfies a predetermined condition;
If the determining step determines that the predetermined condition is satisfied, the document is divided between the first page and the second page using the layout information acquired by the acquiring step. decide whether
If the determining step does not determine that the predetermined condition is satisfied, the text extracted by the extracting step is used to split the document between the first page and the second page, or A control method for an information processing device, characterized by determining:

a first page and a second page included in a document composed of a plurality of pages, extracting means for extracting text included in the first page;
acquisition means for acquiring information about the layout of the first page;
A program executed by an information processing device having
having determination means for determining whether the first page satisfies a predetermined condition;
If the determining means determines that the predetermined condition is satisfied, the document is divided between the first page and the second page using the layout information acquired by the acquiring means. decide whether
If the determining means does not determine that the predetermined condition is satisfied, whether to split the document between the first page and the second page using the text extracted by the extracting means? A program executed by an information processing apparatus characterized by determining