JP2021056871A

JP2021056871A - Information processor, computer program, and information processing method

Info

Publication number: JP2021056871A
Application number: JP2019180605A
Authority: JP
Inventors: 荘介下山; Sosuke Shimoyama
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2021-04-08
Anticipated expiration: 2039-09-30
Also published as: JP7395915B2

Abstract

To provide an information processor, a computer program, and an information processing method capable of improving the operability for contents in a document.SOLUTION: An information processor includes: a determination part which uses a learned model obtained by learning a set of related contents as teacher data to determine whether or not there is relation in the set of contents inputted; and an extraction part which extracts in association with each other, as cluster contents, the set of the contents determined to be related to each other by the determination part.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、コンピュータプログラム及び情報処理方法に関する。 The present invention relates to an information processing device, a computer program, and an information processing method.

雑誌、書籍、新聞等のレイアウト作成を支援する種々の手法が提案されている。特許文献１には、ドキュメントから複数のコンテンツを抽出し、抽出した複数のコンテンツ間の意味的な関連性の度合いに基づいてドキュメント上の各コンテンツの位置を決定し、決定した位置にコンテンツを配置した新たなドキュメントを生成する情報処理装置が開示されている。 Various methods have been proposed to support the layout creation of magazines, books, newspapers, etc. In Patent Document 1, a plurality of contents are extracted from the document, the position of each content on the document is determined based on the degree of semantic relevance between the extracted plurality of contents, and the content is arranged at the determined position. An information processing device that generates a new document is disclosed.

特開２００９−１６９５３６号公報JP-A-2009-169536

しかし、特許文献１の情報処理装置では、ドキュメント内のコンテンツに対して所要の操作（例えば、複写、移動などの操作）を行う場合には、コンテンツ毎に個別に操作を行う必要がある。特に、関連性のあるコンテンツに対しては、同様の操作を繰り返す可能性が高く煩雑となる。 However, in the information processing apparatus of Patent Document 1, when performing a required operation (for example, an operation such as copying or moving) on the content in the document, it is necessary to perform the operation individually for each content. In particular, for related contents, there is a high possibility that the same operation will be repeated, which is complicated.

本発明は、斯かる事情に鑑みてなされたものであり、文書内のコンテンツに対する操作性を向上させることができる情報処理装置、コンピュータプログラム及び情報処理方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide an information processing device, a computer program, and an information processing method capable of improving the operability of the contents in a document.

本発明の実施の形態に係る情報処理装置は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する特定部と、前記特定部で特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する抽出部とを備える。 The information processing apparatus according to the embodiment of the present invention uses a trained model in which a set of related contents is learned as teacher data, and a specific unit that specifies whether or not the set of input contents is related. , And an extraction unit that extracts a set of related contents specified by the specific unit as cluster contents in association with each other.

本発明の実施の形態に係るコンピュータプログラムは、コンピュータに、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する処理と、特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する処理とを実行させる。 The computer program according to the embodiment of the present invention is a process of identifying whether or not a set of input contents is related by using a trained model in which a set of related contents is learned as teacher data on a computer. And the process of associating the specified set of related contents and extracting them as cluster contents.

本発明の実施の形態に係る情報処理方法は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定し、特定された関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する。 In the information processing method according to the embodiment of the present invention, the presence or absence of relevance of the input content set is specified and specified by using a learned model in which the related content set is learned as teacher data. The related sets of contents are associated and extracted as cluster contents.

本発明によれば、文書内のコンテンツに対する操作性が向上する。 According to the present invention, the operability of the content in the document is improved.

本実施の形態の情報処理装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the information processing apparatus of this embodiment. レイアウトデータの一例を示す模式図である。It is a schematic diagram which shows an example of layout data. 関連グラフの一例を示す模式図である。It is a schematic diagram which shows an example of the relation graph. 画像の特徴量の算出方法の一例を示す模式図である。It is a schematic diagram which shows an example of the calculation method of the feature amount of an image. キャプションの特徴量の算出方法の一例を示す模式図である。It is a schematic diagram which shows an example of the calculation method of the feature amount of a caption. 画像及びキャプションの関連性判定の第１例を示す模式図である。It is a schematic diagram which shows the 1st example of the relevance determination of an image and a caption. 画像及びキャプションの関連性判定の第２例を示す模式図である。It is a schematic diagram which shows the 2nd example of the relevance determination of an image and a caption. ニューラルネットワークの学習方法の第１例を示す模式図である。It is a schematic diagram which shows the 1st example of the learning method of a neural network. ニューラルネットワークの学習方法の第２例を示す模式図である。It is a schematic diagram which shows the 2nd example of the learning method of a neural network. クラスタコンテンツに対する操作の第１例を示す模式図である。It is a schematic diagram which shows the 1st example of the operation with respect to a cluster content. クラスタコンテンツに対する操作の第２例を示す模式図である。It is a schematic diagram which shows the 2nd example of the operation with respect to a cluster content. 情報処理装置のクラスタコンテンツ抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the cluster content extraction processing of an information processing apparatus.

以下、本開示の実施の形態を図面に基づいて説明する。図１は本実施の形態の情報処理装置５０の構成の一例を示すブロック図である。情報処理装置５０は、通信ネットワーク１を介してサーバ１０に接続することができる。また、情報処理装置５０にはスキャナ２０を接続することができる。サーバ１０は、レイアウトデータを蓄積するデータサーバとすることができるが、これに限定されない。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. FIG. 1 is a block diagram showing an example of the configuration of the information processing device 50 of the present embodiment. The information processing device 50 can be connected to the server 10 via the communication network 1. Further, the scanner 20 can be connected to the information processing device 50. The server 10 can be a data server that stores layout data, but is not limited thereto.

情報処理装置５０は、装置全体を制御する制御部５１、通信部５２、記憶部５３、レイアウトデータ推定部５４、特定部５５、抽出部５６、表示パネル５７、表示部５８及び操作部５９を備える。 The information processing device 50 includes a control unit 51, a communication unit 52, a storage unit 53, a layout data estimation unit 54, a specific unit 55, an extraction unit 56, a display panel 57, a display unit 58, and an operation unit 59 that control the entire device. ..

制御部５１は、ＣＰＵ、ＲＯＭ及びＲＡＭなどで構成することができる。 The control unit 51 can be composed of a CPU, a ROM, a RAM, and the like.

通信部５２は、通信ネットワーク１を介して、サーバ１０との間で通信を行う機能を有し、所要の情報の送受信を行うことができる。より具体的には、通信部５２は、サーバ１０からレイアウトデータを取得することができる。 The communication unit 52 has a function of communicating with the server 10 via the communication network 1, and can transmit and receive necessary information. More specifically, the communication unit 52 can acquire layout data from the server 10.

図２はレイアウトデータの一例を示す模式図である。レイアウトデータは、例えば、文書の１頁に相当する領域内の余白を除いたレイアウト枠に複数のコンテンツが配置された状態を表す情報を含む。コンテンツには、例えば、タイトル、本文（テキスト）、画像（図）、キャプション（画像の説明文）などが含まれる。レイアウトデータは、コンテンツを配置するために必要なデータであり、例えば、各コンテンツのサイズ、コンテンツの座標、コンテンツ間の相対座標などを含む。図２の例では、文書の１頁内に、タイトル、本文Ａ、Ｂ、画像Ａ、Ｂ、Ｃ、キャプションＡ、Ｂ、Ｃが配置されている。なお、レイアウトデータは、図２の例に限定されない。 FIG. 2 is a schematic diagram showing an example of layout data. The layout data includes, for example, information indicating a state in which a plurality of contents are arranged in a layout frame excluding margins in an area corresponding to one page of a document. The content includes, for example, a title, a text (text), an image (figure), a caption (description of the image), and the like. The layout data is data necessary for arranging the contents, and includes, for example, the size of each content, the coordinates of the contents, the relative coordinates between the contents, and the like. In the example of FIG. 2, the title, the text A, B, the images A, B, C, and the captions A, B, and C are arranged on one page of the document. The layout data is not limited to the example of FIG.

通信部５２は、スキャナ２０との間のインタフェース機能も備え、スキャナ２０から、例えば、ＰＤＦファイルを取得することができる。 The communication unit 52 also has an interface function with the scanner 20, and can acquire, for example, a PDF file from the scanner 20.

レイアウトデータ推定部５４は、物体検出ニューラルネットワークを備え、スキャナ２０から得られたＰＤＦファイルを画像化し、レイアウトデータを推定することができる。 The layout data estimation unit 54 includes an object detection neural network, can image a PDF file obtained from the scanner 20, and estimate the layout data.

記憶部５３は、ハードディスク又はフラッシュメモリなどで構成することができ、通信部５２を介して取得したレイアウトデータ、レイアウトデータ推定部５４で推定したレイアウトデータを記憶することができる。 The storage unit 53 can be configured by a hard disk, a flash memory, or the like, and can store layout data acquired via the communication unit 52 and layout data estimated by the layout data estimation unit 54.

特定部５５は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する。学習済みモデルは、後述のニューラルネットワーク５５１、５５２、５５３である。 The identification unit 55 uses a trained model in which a set of related contents is learned as teacher data to specify whether or not the set of input contents is related. The trained models are neural networks 551, 552, and 553, which will be described later.

特定部５５は、レイアウトデータに基づいて、関連性のあるコンテンツの組を特定する。「関連性があるコンテンツ」とは、例えば、ユーザが、表示パネル５７に表示された文書内のコンテンツに対して同様の操作（例えば、複写、移動、削除、拡縮など）を繰り返す可能性が高いコンテンツ同士とすることができる。関連性のあるコンテンツ同士は、例えば、模式的に、関連グラフで表すことができる。 The identification unit 55 identifies a set of related contents based on the layout data. “Relevant content” means, for example, that the user is likely to repeat the same operation (for example, copying, moving, deleting, scaling, etc.) on the content in the document displayed on the display panel 57. It can be content to each other. Relevant contents can be schematically represented by a relevance graph, for example.

図３は関連グラフの一例を示す模式図である。図３に示すように、１つのレイアウト内に複数のコンテンツとして画像Ｇ１、Ｇ２、Ｇ３、キャプションＣ１、Ｃ２、Ｃ３が含まれるとする。画像Ｇ１とキャプションＣ１及びＣ２とがお互いに関連性があり、キャプションＣ３と画像Ｇ２及びＧ３とがお互いに関連性があるとする。この場合、関連グラフは、画像Ｇ１とキャプションＣ１とが繋がり、かつ、画像Ｇ１とキャプションＣ２とが繋がったグラフで表すことができる。また、関連グラフは、キャプションＣ３と画像Ｇ２とが繋がり、かつ、キャプションＣ３と画像Ｇ３とが繋がったグラフで表すことができる。 FIG. 3 is a schematic diagram showing an example of a related graph. As shown in FIG. 3, it is assumed that the images G1, G2, G3, and the captions C1, C2, and C3 are included as a plurality of contents in one layout. It is assumed that the image G1 and the captions C1 and C2 are related to each other, and the caption C3 and the images G2 and G3 are related to each other. In this case, the related graph can be represented by a graph in which the image G1 and the caption C1 are connected and the image G1 and the caption C2 are connected. Further, the related graph can be represented by a graph in which the caption C3 and the image G2 are connected and the caption C3 and the image G3 are connected.

次に、コンテンツ同士の関連性の判定方法について説明する。以下では、コンテンツとして、図３に示すように、画像とキャプションを例に挙げて説明するが、他のコンテンツも同様である。関連性の判定には、画像の特徴量、キャプションの特徴量、コンテンツ（画像及びキャプション）間の相対位置情報（相対座標）を用いる。まず、画像の特徴量の算出方法について説明する。 Next, a method of determining the relevance between the contents will be described. Hereinafter, as the content, as shown in FIG. 3, an image and a caption will be described as an example, but the same applies to other contents. Image features, caption features, and relative position information (relative coordinates) between contents (images and captions) are used to determine the relevance. First, a method of calculating the feature amount of the image will be described.

図４は画像の特徴量の算出方法の一例を示す模式図である。特定部５５は、ニューラルネットワーク５５１を有する。ニューラルネットワーク５５１は、例えば、畳み込みニューラルネットワークであり、入力層５５１ａ、畳み込み層５５１ｂ、プーリング層５５１ｃ、畳み込み層５５１ｄ、プーリング層５５１ｅ、全結合層５５１ｆが、この順で接続されている。なお、畳み込み層、プーリング層及び全結合層の数は便宜上のものであり、図４に示す数に限定されない。また、便宜上、活性化関数の層、出力層は省略している。入力層５５１ａには、画像Ｇ１が入力される。全結合層５５１ｆは、入力された画像Ｇ１の特徴を組み合わせたものであるため、全結合層５５１ｆから特徴量ｇ１（ベクトル）を算出することができる。他の画像Ｇ２、Ｇ３も同様にして、特徴量ｇ２、ｇ３を算出することができる。なお、画像の特徴量の算出は、ニューラルネットワークを用いる方法に限定されるものではなく、エッジ検出、線検出、領域分割、テクスチャ解析などの一般的な画像処理を用いてもよい。 FIG. 4 is a schematic diagram showing an example of a method for calculating the feature amount of an image. The specific unit 55 has a neural network 551. The neural network 551 is, for example, a convolutional neural network, in which an input layer 551a, a convolutional layer 551b, a pooling layer 551c, a convolutional layer 551d, a pooling layer 551e, and a fully connected layer 551f are connected in this order. The number of convolutional layers, pooling layers, and fully connected layers is for convenience, and is not limited to the number shown in FIG. For convenience, the activation function layer and the output layer are omitted. The image G1 is input to the input layer 551a. Since the fully connected layer 551f is a combination of the features of the input image G1, the feature amount g1 (vector) can be calculated from the fully connected layer 551f. The feature amounts g2 and g3 can be calculated in the same manner for the other images G2 and G3. The calculation of the feature amount of the image is not limited to the method using the neural network, and general image processing such as edge detection, line detection, region division, and texture analysis may be used.

次に、キャプションの特徴量の算出方法について説明する。 Next, a method of calculating the feature amount of the caption will be described.

図５はキャプションの特徴量の算出方法の一例を示す模式図である。特定部５５は、ニューラルネットワーク５５２を有する。ニューラルネットワーク５５２は、例えば、ｗｏｒｄ２ｖｅｃであり、入力層５５２ａ、隠れ層５５２ｂ、出力層５５２ｃが、この順で接続されている。入力層５５２ａと隠れ層５５２ｂとの間、隠れ層５５２ｂと出力層５５２ｃとの間はそれぞれ重みＷ、Ｗ′で全結合されている。入力層５５２ａには、特定部５５が有する言語処理部（例えば、形態素析処理）によって、キャプションＣ１から抽出された単語（または単語の列）が入力される。具体的には、形態素解析によってキャプションＣ１を複数の単語に分割し、分割した単語をベクトルに変換して入力層５５２ａに入力する。この場合、各単語のベクトルを平均化したベクトルを入力層５５２ａに入力してもよい。入力層５５２ａにベクトルを入力すると、キャプションＣ１の意味がベクトル表現化され、特徴量ｃ１（ベクトル）を算出することができる。他のキャプションＣ２、Ｃ３も同様にして、特徴量ｃ２、ｃ３を算出することができる。なお、キャプションの特徴量の算出は、ニューラルネットワークを用いる方法に限定されるものではなく、一般的な言語処理を用いてもよい。例えば、辞書データを用い、キャプションから意味を持つ持つ最小単位である単語を抽出し、抽出した単語を所要の次元数のベクトルに変換してもよい。なお、タイトルの特徴量、本文の特徴量もニューラルネットワーク５５２を用いて算出することができる。 FIG. 5 is a schematic diagram showing an example of a method for calculating a caption feature amount. The specific unit 55 has a neural network 552. The neural network 552 is, for example, word2vec, and the input layer 552a, the hidden layer 552b, and the output layer 552c are connected in this order. The input layer 552a and the hidden layer 552b, and the hidden layer 552b and the output layer 552c are fully connected by weights W and W', respectively. A word (or a sequence of words) extracted from the caption C1 is input to the input layer 552a by the language processing unit (for example, morpheme analysis processing) possessed by the specific unit 55. Specifically, the caption C1 is divided into a plurality of words by morphological analysis, and the divided words are converted into a vector and input to the input layer 552a. In this case, the vector obtained by averaging the vectors of each word may be input to the input layer 552a. When a vector is input to the input layer 552a, the meaning of the caption C1 is expressed as a vector, and the feature amount c1 (vector) can be calculated. The feature amounts c2 and c3 can be calculated in the same manner for the other captions C2 and C3. The calculation of the feature amount of the caption is not limited to the method using the neural network, and general language processing may be used. For example, dictionary data may be used to extract a word, which is the smallest unit having meaning, from a caption, and the extracted word may be converted into a vector having a required number of dimensions. The feature amount of the title and the feature amount of the text can also be calculated using the neural network 552.

次に、コンテンツ（画像及びキャプション）同士の関連性の判定方法について説明する。 Next, a method of determining the relationship between the contents (images and captions) will be described.

図６は画像及びキャプションの関連性判定の第１例を示す模式図である。特定部５５は、ニューラルネットワーク５５３を有する。ニューラルネットワーク５５３に入力データを入力する。入力データは、画像の特徴量、キャプションの特徴量、及び当該画像と当該キャプションの相対位置情報を成分とするベクトルである。相対位置情報は、レイアウト上の画像Ｇ１の座標とキャプションＣ１の座標との間の相対座標、画像Ｇ１のレイアウト上のサイズ、キャプションＣ１のレイアウト上のサイズなどを含む。図６の例では、画像Ｇ１の特徴量ｇ１（ベクトル）、キャプションＣ１の特徴量ｃ１（ベクトル）、画像Ｇ１とキャプションＣ１との相対位置情報（ベクトル）を１個のベクトルとしている。 FIG. 6 is a schematic diagram showing a first example of determining the relevance of an image and a caption. The specific unit 55 has a neural network 553. Input data is input to the neural network 553. The input data is a vector whose components are the feature amount of the image, the feature amount of the caption, and the relative position information of the image and the caption. The relative position information includes the relative coordinates between the coordinates of the image G1 on the layout and the coordinates of the caption C1, the size on the layout of the image G1, the size on the layout of the caption C1, and the like. In the example of FIG. 6, the feature amount g1 (vector) of the image G1, the feature amount c1 (vector) of the caption C1, and the relative position information (vector) between the image G1 and the caption C1 are regarded as one vector.

ニューラルネットワーク５５３が出力するスコアが閾値以上であるので、画像Ｇ１とキャプションＣ１とは、図３に示す関連グラフのとおり、関連性があると判定することができる。なお、ニューラルネットワーク５５３は、ＳＶＭ（Support Vector Machine）、ベイジアンネットワークなど他の機械学習を用いてもよい。 Since the score output by the neural network 553 is equal to or higher than the threshold value, it can be determined that the image G1 and the caption C1 are related as shown in the related graph shown in FIG. The neural network 553 may use other machine learning such as SVM (Support Vector Machine) or Bayesian network.

図７は画像及びキャプションの関連性判定の第２例を示す模式図である。図７の例では、画像Ｇ２の特徴量ｇ２（ベクトル）、キャプションＣ１の特徴量ｃ１（ベクトル）、画像Ｇ２とキャプションＣ１との相対位置情報（ベクトル）を１個のベクトルとして、ニューラルネットワーク５５３に入力している。 FIG. 7 is a schematic diagram showing a second example of determining the relevance of an image and a caption. In the example of FIG. 7, the feature amount g2 (vector) of the image G2, the feature amount c1 (vector) of the caption C1, and the relative position information (vector) between the image G2 and the caption C1 are used as one vector in the neural network 553. I'm typing.

ニューラルネットワーク５５３が出力するスコアが閾値未満であるので、画像Ｇ２とキャプションＣ１とは、図３に示す関連グラフのとおり、関連性がないと判定することができる。 Since the score output by the neural network 553 is less than the threshold value, it can be determined that the image G2 and the caption C1 are not related as shown in the related graph shown in FIG.

上述のように、ニューラルネットワーク５５３には、レイアウト上の任意の画像及びキャプションの組について、各特徴量と相対位置情報が入力され、それぞれの組について関連性の有無が判定される。 As described above, each feature amount and relative position information are input to the neural network 553 for any set of images and captions on the layout, and the presence or absence of relevance is determined for each set.

次に、ニューラルネットワーク５５３の学習方法について説明する。 Next, the learning method of the neural network 553 will be described.

図８はニューラルネットワーク５５３の学習方法の第１例を示す模式図である。ニューラルネットワーク５５３の入力層には、学習用入力データとしてのベクトルを入力する。学習用のベクトルは、画像の特徴量、キャプションの特徴量、及び当該画像と当該キャプションの相対位置情報を成分とするベクトルである。図８の例では、画像Ｇ１の特徴量ｇ１（ベクトル）、キャプションＣ２の特徴量ｃ２（ベクトル）、画像Ｇ１とキャプションＣ２との相対位置情報（ベクトル）を１個のベクトルとしている。図３に示すように、画像Ｇ１とキャプションＣ２との間に関連性がある場合、教師ラベル「１」を出力層に与え、ニューラルネットワーク５５３の学習を行う。教師ラベル「１」は正解の組であることを示すラベルである。 FIG. 8 is a schematic diagram showing a first example of a learning method of the neural network 553. A vector as input data for learning is input to the input layer of the neural network 553. The learning vector is a vector whose components are the feature amount of the image, the feature amount of the caption, and the relative position information of the image and the caption. In the example of FIG. 8, the feature amount g1 (vector) of the image G1, the feature amount c2 (vector) of the caption C2, and the relative position information (vector) between the image G1 and the caption C2 are regarded as one vector. As shown in FIG. 3, when there is a relationship between the image G1 and the caption C2, the teacher label “1” is given to the output layer, and the neural network 553 is trained. The teacher label "1" is a label indicating that it is a set of correct answers.

図９はニューラルネットワーク５５３の学習方法の第２例を示す模式図である。ニューラルネットワーク５５３の入力層には、学習用入力データとしてのベクトルを入力する。学習用のベクトルは、画像の特徴量、キャプションの特徴量、及び当該画像と当該キャプションの相対位置情報を成分とするベクトルである。図９の例では、画像Ｇ２の特徴量ｇ２（ベクトル）、キャプションＣ１の特徴量ｃ１（ベクトル）、画像Ｇ２とキャプションＣ１との相対位置情報（ベクトル）を１個のベクトルとしている。図３に示すように、画像Ｇ２とキャプションＣ１との間に関連性がない場合、教師ラベル「０」を出力層に与え、ニューラルネットワーク５５３の学習を行う。教師ラベル「０」は不正解の組であることを示すラベルである。 FIG. 9 is a schematic diagram showing a second example of the learning method of the neural network 553. A vector as input data for learning is input to the input layer of the neural network 553. The learning vector is a vector whose components are the feature amount of the image, the feature amount of the caption, and the relative position information of the image and the caption. In the example of FIG. 9, the feature amount g2 (vector) of the image G2, the feature amount c1 (vector) of the caption C1, and the relative position information (vector) between the image G2 and the caption C1 are regarded as one vector. As shown in FIG. 3, when there is no relationship between the image G2 and the caption C1, the teacher label “0” is given to the output layer, and the neural network 553 is trained. The teacher label "0" is a label indicating that the set is an incorrect answer.

図８及び図９に示すような学習用入力データと教師ラベルとを多数用いてニューラルネットワーク５５３を学習させることができる。 The neural network 553 can be trained by using a large number of learning input data and teacher labels as shown in FIGS. 8 and 9.

上述のように、文書内の複数のコンテンツから、任意の２つのコンテンツを選択し、選択したコンテンツ同士の関連性を示すスコア（指標）を算出し、算出したスコアが所定の閾値以上であれば、関連性ありと判定し、算出したスコアが閾値未満であれば、関連性なしと判定することにより、関連性のあるコンテンツの組を特定することができる。 As described above, if any two contents are selected from a plurality of contents in the document, a score (index) indicating the relationship between the selected contents is calculated, and the calculated score is equal to or higher than a predetermined threshold value. , If it is determined that there is relevance and the calculated score is less than the threshold value, it is possible to identify a set of related contents by determining that there is no relevance.

また、紙媒体の文書をスキャナ２０等で読み取るだけで、文書内の関連性のあるコンテンツの組を特定することができる。 In addition, a set of related contents in the document can be specified only by reading the document on the paper medium with a scanner 20 or the like.

抽出部５６は、特定部で特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する。クラスタコンテンツは、関連性のあるコンテンツ同士を一つのコンテンツとして纏めたものであり、関連性のあるコンテンツ同士は、一つのコンテンツとして扱うことができる。 The extraction unit 56 extracts a set of related contents specified by the specific unit as cluster contents in association with each other. The cluster content is a collection of related contents as one content, and the related contents can be treated as one content.

上述の構成により、クラスタコンテンツに対して所要の操作を行うと、関連性のあるコンテンツ同士に対して操作が行われたものとして扱われるので、文書内のコンテンツに対する操作性が向上する。 With the above configuration, when a required operation is performed on the cluster contents, it is treated as if the operations were performed on the related contents, so that the operability for the contents in the document is improved.

次に、クラスタコンテンツに対する操作について説明する。 Next, the operation for the cluster contents will be described.

表示部５８は、表示パネル５７にコンテンツが配置された文書を表示することができる。表示パネル５７は、液晶パネル又は有機ＥＬ（Electro Luminescence）ディスプレイ等で構成することができる。なお、表示パネル５７に代えて、情報処理装置５０とは別個の表示装置を備える構成でもよい。 The display unit 58 can display a document in which the content is arranged on the display panel 57. The display panel 57 can be composed of a liquid crystal panel, an organic EL (Electro Luminescence) display, or the like. In addition, instead of the display panel 57, a display device separate from the information processing device 50 may be provided.

操作部５９は、例えば、ハードウェアキーボード、マウスなどで構成され、表示パネル５７に表示されたアイコンなどの操作、文字等の入力などを行うことができる。なお、操作部５９は、タッチパネルで構成してもよい。 The operation unit 59 is composed of, for example, a hardware keyboard, a mouse, or the like, and can operate an icon or the like displayed on the display panel 57, input characters or the like, and the like. The operation unit 59 may be composed of a touch panel.

図１０はクラスタコンテンツに対する操作の第１例を示す模式図である。図１０に示すように、表示パネル５７に複数のコンテンツが配置された文書（例えば、１頁、あるいは両開きの２頁相当）が表示されている。図１０の例では、コンテンツとして、タイトル、本文Ａ、本文Ｂ、画像Ａ、キャプションＡ、キャプションＢが表示されている。また、画像ＡとキャプションＡ及びＢとがお互いに関連性があるとする。 FIG. 10 is a schematic diagram showing a first example of an operation on cluster contents. As shown in FIG. 10, a document in which a plurality of contents are arranged (for example, one page or two pages of double-sided opening) is displayed on the display panel 57. In the example of FIG. 10, the title, the text A, the text B, the image A, the caption A, and the caption B are displayed as the contents. Further, it is assumed that the image A and the captions A and B are related to each other.

図１０の左図のように、アイコン１００を画像Ａ（または画像Ａの周辺、キャプションＡ又はＢでもよい）に近づけて、タッチ操作及びドラッグ操作を行うと、右図に示すように、画像ＡとともにキャプションＡ及びＢを同じように移動させることができる。画像Ａ、キャプションＡ及びキャプションＢは、1個のクラスタコンテンツ１０１を構成している。 As shown in the left figure of FIG. 10, when the icon 100 is brought close to the image A (or the periphery of the image A, the caption A or B may be used) and the touch operation and the drag operation are performed, the image A is as shown in the right figure. Captions A and B can be moved in the same way. The image A, the caption A, and the caption B constitute one cluster content 101.

このように、表示部５８は、表示パネル５７に表示したクラスタコンテンツを選択する操作を受け付けた場合、クラスタコンテンツよって関連付けられたコンテンツそれぞれを選択した表示態様で表示する。例えば、表示パネル５７に表示されたクラスタコンテンツ内の一のコンテンツまたはコンテンツの周辺を選択する操作を行い、表示パネル５７上を移動（ドラッグ）すると、クラスタコンテンツ内のすべてのコンテンツが選択された表示態様で表示され、クラスタコンテンツ全体を移動（ドラッグ）させることができる。これにより、関連性のあるコンテンツに対しては、同様の操作を繰り返す必要がなく、文書内のコンテンツに対する操作性が向上する。 In this way, when the display unit 58 accepts the operation of selecting the cluster content displayed on the display panel 57, the display unit 58 displays each of the contents associated with the cluster content in the selected display mode. For example, when one content in the cluster content displayed on the display panel 57 or the periphery of the content is selected and moved (drag) on the display panel 57, all the contents in the cluster content are selected and displayed. It is displayed in a mode, and the entire cluster content can be moved (drag). As a result, it is not necessary to repeat the same operation for the related contents, and the operability for the contents in the document is improved.

図１１はクラスタコンテンツに対する操作の第２例を示す模式図である。図１１に示すように、表示パネル５７に複数のコンテンツが配置された文書（例えば、１頁、あるいは両開きの２頁相当）が表示されている。図１１の例では、コンテンツとして、本文Ａ、本文Ｂ、本文Ｃ、画像Ａ、キャプションＡが表示されている。また、本文Ｂと本文Ｃとがお互いに関連性があるとする。 FIG. 11 is a schematic diagram showing a second example of the operation for the cluster contents. As shown in FIG. 11, a document in which a plurality of contents are arranged (for example, one page or two pages of double-sided opening) is displayed on the display panel 57. In the example of FIG. 11, the text A, the text B, the text C, the image A, and the caption A are displayed as the contents. Further, it is assumed that the text B and the text C are related to each other.

図１１の左図のように、アイコン１００を本文Ｂ（または本文Ｂの周辺、本文Ｃでもよい）に近づけて、タッチ操作及びドラッグ操作を行うと、右図に示すように、本文Ｂとともに本文Ｃを同じように移動させることができる。本文Ｂと本文Ｃは、1個のクラスタコンテンツ１０２を構成している。これにより、関連性のあるコンテンツに対しては、同様の操作を繰り返す必要がなく、文書内のコンテンツに対する操作性が向上する。 As shown in the left figure of FIG. 11, when the icon 100 is brought close to the text B (or the periphery of the text B, the text C may be used) and the touch operation and the drag operation are performed, the text is combined with the text B as shown in the right figure. C can be moved in the same way. The text B and the text C constitute one cluster content 102. As a result, it is not necessary to repeat the same operation for the related contents, and the operability for the contents in the document is improved.

制御部５１は、グループ化処理部としての機能を有し、クラスタコンテンツを一つのコンテンツとしてグループ化する。具体的には、制御部５１は、クラスタコンテンツの一のコンテンツに対して所定の処理を行う場合、当該クラスタコンテンツの他のコンテンツに対して当該所定の処理と同じ処理を行うことができる。例えば、所定の処理として、文書内のコンテンツに対する編集処理（例えば、複写、移動、拡縮、削除など）の場合、一のコンテンツに対して編集処理を行うと当該一のコンテンツと関連性のある他のコンテンツに対しても同様の編集処理を行うことができ、文書内のコンテンツに対する操作性を向上させることができる。 The control unit 51 has a function as a grouping processing unit, and groups cluster contents as one content. Specifically, when the control unit 51 performs a predetermined process on one content of the cluster content, the control unit 51 can perform the same process as the predetermined process on the other contents of the cluster content. For example, in the case of editing processing for content in a document (for example, copying, moving, scaling, deleting, etc.) as a predetermined process, if the editing process is performed for one content, it is related to the other content. The same editing process can be performed on the contents of the document, and the operability of the contents in the document can be improved.

また、制御部５１は、クラスタコンテンツに対して所定の処理を行う場合、クラスタコンテンツ内のコンテンツそれぞれに対して当該所定の処理と同じ処理を行うことができる。例えば、所定の処理として、文書内のコンテンツを探索する処理の場合、クラスタコンテンツ単位で探索することができ、文書内のコンテンツに対する操作性を向上させることができる。 Further, when the control unit 51 performs a predetermined process on the cluster contents, the control unit 51 can perform the same process as the predetermined process on each of the contents in the cluster contents. For example, in the case of a process of searching for the content in the document as a predetermined process, the search can be performed for each cluster content, and the operability of the content in the document can be improved.

図１２は情報処理装置５０のクラスタコンテンツ抽出処理の一例を示すフローチャートである。以下では、便宜上、処理の主体を制御部５１として説明する。制御部５１は、レイアウトデータを取得し（Ｓ１１）、レイアウトデータの構造情報（例えば、コンテンツのサイズ、座標）を取得する（Ｓ１２）。 FIG. 12 is a flowchart showing an example of the cluster content extraction process of the information processing apparatus 50. Hereinafter, for convenience, the subject of processing will be described as the control unit 51. The control unit 51 acquires layout data (S11) and acquires structural information (for example, content size and coordinates) of layout data (S12).

制御部５１は、コンテンツの組を選択し（Ｓ１３）、選択したコンテンツの特徴量を抽出する（Ｓ１４）。制御部５１は、抽出した特徴量及び選択したコンテンツの相対位置情報に基づいて、選択したコンテンツの関連性を判定する（Ｓ１５）。 The control unit 51 selects a set of contents (S13) and extracts the feature amount of the selected contents (S14). The control unit 51 determines the relevance of the selected content based on the extracted feature amount and the relative position information of the selected content (S15).

制御部５１は、未処理のコンテンツの有無を判定し（Ｓ１６）、未処理のコンテンツがある場合（Ｓ１６でＹＥＳ）、ステップＳ１３以降の処理を続ける。未処理のコンテンツがない場合（Ｓ１６でＮＯ）、制御部５１は、関連性のあるコンテンツ同士を纏めてクラスタコンテンツとして抽出し（Ｓ１７）、処理を終了する。 The control unit 51 determines whether or not there is unprocessed content (S16), and if there is unprocessed content (YES in S16), the control unit 51 continues the processing after step S13. When there is no unprocessed content (NO in S16), the control unit 51 collectively extracts related contents as cluster content (S17), and ends the process.

情報処理装置５０は、例えば、ＣＰＵ（例えば、複数のプロセッサコアを実装したマルチ・プロセッサなど）、ＧＰＵ（Graphics Processing Units）、ＲＡＭなどを備えたコンピュータを用いて実現することもできる。図１２に示すような処理の手順を定めたコンピュータプログラム（記録媒体に記録可能）をコンピュータに備えられたＲＡＭにロードし、コンピュータプログラムをＣＰＵ（プロセッサ）で実行することにより、コンピュータ上で情報処理装置５０を実現することができる。 The information processing device 50 can also be realized by using, for example, a computer including a CPU (for example, a multiprocessor having a plurality of processor cores mounted on it), a GPU (Graphics Processing Units), a RAM, and the like. Information processing is performed on the computer by loading a computer program (which can be recorded on a recording medium) that defines the processing procedure as shown in FIG. 12 into the RAM provided in the computer and executing the computer program on the CPU (processor). The device 50 can be realized.

本実施の形態の情報処理装置は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する特定部と、前記特定部で特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する抽出部とを備える。 The information processing apparatus of the present embodiment uses a trained model in which a set of related contents is learned as teacher data, and uses a specific unit that specifies whether or not the set of input contents is related, and the above-mentioned identification. It is provided with an extraction unit that extracts a set of related contents specified in the unit as cluster contents by associating them with each other.

本実施の形態のコンピュータプログラムは、コンピュータに、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する処理と、特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する処理とを実行させる。 The computer program of the present embodiment uses a trained model in which a set of related contents is trained as teacher data in a computer, and specifies whether or not the set of input contents is related. The process of associating the related sets of contents and extracting them as cluster contents is executed.

本実施の形態の情報処理方法は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定し、特定された関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する。 In the information processing method of the present embodiment, the presence or absence of the relevance of the input content set is specified by using the trained model in which the related content set is learned as the teacher data, and the specified relevance is specified. The set of contents with is associated and extracted as cluster contents.

特定部は、関連性のあるコンテンツの組を教師データとして学習した学習済みモデルを用いて、入力されたコンテンツの組の関連性の有無を特定する。
コンテンツには、例えば、タイトル、本文（テキスト）、画像（図）、キャプション（画像の説明文）などが含まれる。レイアウトデータは、コンテンツを配置するために必要なデータであり、例えば、各コンテンツのサイズ、コンテンツの座標、コンテンツ間の相対座標などを含む。「関連性があるコンテンツ」とは、例えば、ユーザが、同様の操作（例えば、複写、移動など）を繰り返す可能性が高いコンテンツ同士とすることができる。 The specific part uses a trained model in which a set of related contents is learned as teacher data to specify whether or not the set of input contents is related.
The content includes, for example, a title, a text (text), an image (figure), a caption (description of the image), and the like. The layout data is data necessary for arranging the contents, and includes, for example, the size of each content, the coordinates of the contents, the relative coordinates between the contents, and the like. The “relevant content” can be, for example, content that the user is likely to repeat the same operation (for example, copying, moving, etc.).

文書内の複数のコンテンツから、任意の２つのコンテンツを選択し、選択したコンテンツ同士の関連性を示す指標を算出し、算出した指標が所定の閾値以上であれば、関連性ありと判定し、算出した指標が閾値未満であれば、関連性なしと判定することにより、関連性のあるコンテンツの組を特定することができる。 Arbitrary two contents are selected from a plurality of contents in the document, an index showing the relevance between the selected contents is calculated, and if the calculated index is equal to or more than a predetermined threshold value, it is determined to be relevance. If the calculated index is less than the threshold value, it is possible to identify a set of related contents by determining that there is no relevance.

抽出部は、特定部で特定した関連性のあるコンテンツの組を対応付けてクラスタコンテンツとして抽出する。クラスタコンテンツは、関連性のあるコンテンツ同士を一つのコンテンツとして纏めたものであり、関連性のあるコンテンツ同士は、一つのコンテンツとして扱うことができる。 The extraction unit extracts a set of related contents specified in the specific unit as cluster contents in association with each other. The cluster content is a collection of related contents as one content, and the related contents can be treated as one content.

本実施の形態の情報処理装置において、前記入力されたコンテンツの組は、画像データと文書データの組である。 In the information processing apparatus of the present embodiment, the set of the input contents is a set of image data and document data.

これにより、画像と文書とをクラスタコンテンツとして抽出することができる。 As a result, the image and the document can be extracted as cluster contents.

本実施の形態の情報処理装置は、前記クラスタコンテンツを一つのコンテンツとしてグループ化するグループ化処理部を備える。 The information processing device of the present embodiment includes a grouping processing unit that groups the cluster contents as one content.

グループ化処理部は、抽出部で抽出したクラスタコンテンツに対して所定の処理を行う場合、クラスタコンテンツよって関連付けられたコンテンツそれぞれに対して当該所定の処理と同じ処理を行うことができる。例えば、所定の処理として、文書内のコンテンツを探索する処理の場合、クラスタコンテンツ単位で探索することができ、文書内のコンテンツに対する操作性を向上させることができる。 When the grouping processing unit performs a predetermined processing on the cluster contents extracted by the extraction unit, the grouping processing unit can perform the same processing as the predetermined processing on each of the contents associated with the cluster contents. For example, in the case of a process of searching for the content in the document as a predetermined process, the search can be performed for each cluster content, and the operability of the content in the document can be improved.

また、グループ化処理部は、クラスタコンテンツの一のコンテンツに対して所定の処理を行う場合、当該クラスタコンテンツの他のコンテンツに対して当該所定の処理と同じ処理を行うことができる。例えば、所定の処理として、文書内のコンテンツに対する編集処理（例えば、複写、移動、拡縮、削除など）の場合、一のコンテンツに対して編集処理を行うと当該一のコンテンツと関連性のある他のコンテンツに対しても同様の編集処理を行うことができ、文書内のコンテンツに対する操作性を向上させることができる。 Further, when the grouping processing unit performs a predetermined process on one content of the cluster content, the grouping process unit can perform the same process as the predetermined process on the other contents of the cluster content. For example, in the case of editing processing for content in a document (for example, copying, moving, scaling, deleting, etc.) as a predetermined process, if the editing process is performed for one content, it is related to the other content. The same editing process can be performed on the contents of the document, and the operability of the contents in the document can be improved.

本実施の形態の情報処理装置は、前記抽出部で抽出したクラスタコンテンツを表示画面に表示する表示部と、前記表示画面に表示したクラスタコンテンツを選択する操作を受け付ける受付部とを備え、前記表示部は、前記受付部で前記操作を受け付けた場合、前記クラスタコンテンツよって関連付けられたコンテンツそれぞれを選択した表示態様で表示する。 The information processing device of the present embodiment includes a display unit that displays the cluster content extracted by the extraction unit on the display screen, and a reception unit that accepts an operation of selecting the cluster content displayed on the display screen. When the reception unit receives the operation, the unit displays each of the contents associated with the cluster contents in a selected display mode.

表示部は、抽出部で抽出したクラスタコンテンツを表示画面に表示する。受付部で表示画面に表示したクラスタコンテンツを選択する操作を受け付けた場合、表示部は、クラスタコンテンツよって関連付けられたコンテンツそれぞれを選択した表示態様で表示する。例えば、表示画面に表示されたクラスタコンテンツ内の一のコンテンツまたはコンテンツの周辺を選択する操作を行い、例えば、表示画面上を移動（ドラッグ）すると、クラスタコンテンツ内のすべてのコンテンツが選択された表示態様で表示され、クラスタコンテンツ全体を移動（ドラッグ）させることができる。これにより、関連性のあるコンテンツに対しては、同様の操作を繰り返す必要がなく、文書内のコンテンツに対する操作性が向上する。 The display unit displays the cluster content extracted by the extraction unit on the display screen. When the reception unit receives the operation of selecting the cluster content displayed on the display screen, the display unit displays each of the contents associated with the cluster content in the selected display mode. For example, if you perform an operation to select one content in the cluster content displayed on the display screen or the periphery of the content, and move (drag) on the display screen, for example, all the content in the cluster content is selected and displayed. It is displayed in a mode, and the entire cluster content can be moved (drag). As a result, it is not necessary to repeat the same operation for the related contents, and the operability for the contents in the document is improved.

１通信ネットワーク
１０サーバ
２０スキャナ
５０情報処理装置
５１制御部
５２通信部
５３記憶部
５４レイアウトデータ推定部
５５特定部
５５１、５５２、５５３ニューラルネットワーク
５６抽出部
５７表示パネル
５８表示部
５９操作部 1 Communication network 10 Server 20 Scanner 50 Information processing device 51 Control unit 52 Communication unit 53 Storage unit 54 Layout data estimation unit 55 Specific unit 551, 552, 553 Neural network 56 Extraction unit 57 Display panel 58 Display unit 59 Operation unit

Claims

Using a trained model in which a set of related contents is trained as teacher data, a specific part that identifies whether or not the set of input contents is related, and a specific part,
An information processing device including an extraction unit that extracts a set of related contents specified by the specific unit as cluster contents in association with each other.

The information processing device according to claim 1, wherein the input content set is a set of image data and document data.

The information processing apparatus according to claim 1 or 2, further comprising a grouping processing unit for grouping the cluster contents as one content.

A display unit that displays the cluster content extracted by the extraction unit on the display screen,
It is equipped with a reception unit that accepts operations to select the cluster content displayed on the display screen.
The display unit
The information processing apparatus according to any one of claims 1 to 3, wherein when the reception unit receives the operation, each of the contents associated with the cluster contents is displayed in a selected display mode.

On the computer
Using a trained model that trains a set of related contents as teacher data, a process to identify whether or not the set of input contents is related, and
A computer program that executes the process of associating the specified set of related contents and extracting them as cluster contents.

Using a trained model in which a set of related contents is trained as teacher data, the presence or absence of relevance of the set of input contents is identified.
An information processing method that extracts a set of identified related contents as cluster contents by associating them with each other.