JP6770709B2

JP6770709B2 - Model generator and program for machine learning.

Info

Publication number: JP6770709B2
Application number: JP2016175389A
Authority: JP
Inventors: 元樹谷口; 大熊　智子; 智子大熊; 鈴木　星児; 星児鈴木
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2016-09-08
Filing date: 2016-09-08
Publication date: 2020-10-21
Anticipated expiration: 2036-09-08
Also published as: JP2018041300A

Description

本発明は、機械学習用モデル生成装置及びプログラムに関する。 The present invention relates to a machine learning model generator and a program.

自然言語処理を用いたアプリケーションの要素技術としても用いられる固有表現抽出は、テキストに含まれる固有名詞等の固有表現を抽出する技術であり、Support Vector Machine（ＳＶＭ）や、Conditional Random Fields（ＣＲＦｓ）などの識別問題（分類問題）を扱う機械学習手法を用いたシステムが知られている。 Named entity recognition, which is also used as an elemental technology for applications that use natural language processing, is a technology that extracts proper expressions such as proper nouns contained in text, such as Support Vector Machine (SVM) and Conditional Random Fields (CRFs). A system using a machine learning method for dealing with identification problems (classification problems) such as is known.

下記特許文献１には、既存のモデルのパラメータへの影響を少なくしたままで、追加データに適応したモデルパラメータを推定することができる、モデルパラメータ推定方法が開示されている。 Patent Document 1 below discloses a model parameter estimation method capable of estimating model parameters adapted to additional data while reducing the influence on the parameters of the existing model.

下記非特許文献１には、ＣＲＦｓに隠れユニット層が追加されたＣＲＦｓの派生の一種であるHidden-Unit CRF（ＨＵＣＲＦ）が開示されている。 Non-Patent Document 1 below discloses Hidden-Unit CRF (HUCRF), which is a kind of derivative of CRFs in which a hidden unit layer is added to CRFs.

下記非特許文献２には、ラベル付けされていないデータ群に対して正準相関分析を利用したクラスタリングによって仮のラベル付けをしたデータをＨＵＣＲＦに学習させ、得られるパラメータを初期値として実際にラベルがつけられているデータで追加学習を行う事前学習の方法が開示されている。 In Non-Patent Document 2 below, HUCRF is made to learn tentatively labeled data by clustering using canonical correlation analysis for an unlabeled data group, and the obtained parameters are actually labeled as initial values. A pre-learning method for performing additional learning with the data marked with is disclosed.

特開２０１５− ３８７０９号公報JP 2015-38709

L.maaten, M.welling, L.K.Saul "Hidden-Unit Conditional Random Fields" Int. Conf. on Artificial Intelligence & Statistics pp.479-488, 2011L.maaten, M.welling, L.K.Saul "Hidden-Unit Conditional Random Fields" Int. Conf. On Artificial Intelligence & Statistics pp.479-488, 2011 Y.B.Kim, K.Stratos, R.Sarikaya, "Pre-training of Hidden-Unit CRFs" Association for Computational Linguistics, pp.192-198, 2015Y.B.Kim, K.Stratos, R.Sarikaya, "Pre-training of Hidden-Unit CRFs" Association for Computational Linguistics, pp.192-198, 2015

識別問題においては、一般的に、ラベル（「タグ」等ともいわれる）が既に付与されている学習データ（既知データ）を与えて学習させる教師あり学習が行われ、学習の結果得られたモデルパラメータを用いて未知データの識別（いずれかのラベルを付与することによるクラス分類）を行う。このような、機械学習手法を用いて高い精度を達成するためには、一般的に大量の学習データ（コーパスや辞書）が必要となる。 In the identification problem, in general, supervised learning is performed by giving learning data (known data) to which a label (also called "tag") is already attached to learn, and model parameters obtained as a result of learning. Is used to identify unknown data (classification by assigning one of the labels). In order to achieve high accuracy by using such a machine learning method, a large amount of learning data (corpus or dictionary) is generally required.

例えば、企業内の文書には機密性の高い情報を含むものや、各企業独自の知識やルールを前提にしないと意味を理解できないものが多いため、企業内文書に対して固有表現抽出を行うためには、学習用コーパスを独自に用意する必要が生じる。このような企業内文書に対して大量のアノテーション（関連する注釈情報）を付与するには極めてコストがかかるため、結果として、企業内文書に対して固有表現抽出を行おうとしても、高精度を達成できるだけの大量の学習データを用意することは現実的ではない。 For example, many documents in a company contain highly confidential information, and many documents cannot be understood without assuming the knowledge and rules unique to each company. Therefore, unique expressions are extracted for documents in a company. For that purpose, it is necessary to prepare a learning corpus independently. It is extremely costly to add a large amount of annotations (related annotation information) to such an in-house document, and as a result, even if an attempt is made to extract a unique expression for the in-house document, high accuracy is achieved. It is not realistic to prepare as much training data as possible.

学習データを大量に用意できないタスクの場合でも、大量の学習データが存在する別のタスクで事前学習させたモデルパラメータ（「モデル」ともいう）の一部を初期条件として、目的のタスクで追加学習することで、学習精度（識別問題であれば、得られたモデルを用いて未知データを識別する場合の精度）を向上させられる場合があるが、精度が高くなるような適切な事前学習の条件が不明であるため、条件の探索に時間がかかる。 Even in the case of a task for which a large amount of training data cannot be prepared, additional learning is performed for the target task with some of the model parameters (also called "models") pre-trained in another task that has a large amount of training data as initial conditions. By doing so, the learning accuracy (in the case of an identification problem, the accuracy when identifying unknown data using the obtained model) may be improved, but appropriate pre-learning conditions such that the accuracy is high. Is unknown, so it takes time to search for the condition.

本発明は、事前学習の際にクラスの統合を行わない場合と比較して、学習精度が高くなるようなモデルを生成することのできる機械学習用モデル生成装置及びプログラムの提供を目的とする。 An object of the present invention is to provide a machine learning model generator and a program capable of generating a model having higher learning accuracy as compared with the case where classes are not integrated at the time of pre-learning.

［機械学習用モデル生成装置］
請求項１に係る本発明は、予め関係性が定義されたクラスの互いの関係性に基づいて、前記クラス間の類似度を算出する、第一類似度算出手段と、
前記クラスの特徴量に基づいて、前記クラス間の類似度を算出する、第二類似度算出手段と、
前記第一算出手段及び前記第二算出手段の算出結果に基づき、前記クラス間が類似しているか否かを判定する、判定手段と、
前記判定手段によって類似していると判定されたクラス同士を、一つのクラスに統合するクラス統合手段と、
教師あり機械学習処理を行う機械学習手段と、
を具えた、機械学習用モデル生成装置である。 [Model generator for machine learning]
The present invention according to claim 1 comprises a first similarity calculation means for calculating the similarity between the classes based on the mutual relationship between the classes in which the relationship is defined in advance.
A second similarity calculation means that calculates the similarity between the classes based on the features of the classes, and
A determination means for determining whether or not the classes are similar based on the calculation results of the first calculation means and the second calculation means.
A class integration means for integrating classes determined to be similar by the determination means into one class,
Machine learning means for supervised machine learning processing,
It is a model generator for machine learning equipped with.

請求項２に係る本発明は、前記関係性が階層構造またはグラフ構造である、請求項１記載の機械学習用モデル生成装置である。 The present invention according to claim 2 is the machine learning model generation device according to claim 1, wherein the relationship is a hierarchical structure or a graph structure.

請求項３に係る本発明は、前記機械学習手段が、モデルパラメータを推定するモデルパラメータ推定手段と、学習されたモデルパラメータからデータの各クラスにおける事後確率を推定する手段と、を具えている請求項１又は２記載のモデル生成装置である。 According to the third aspect of the present invention, the machine learning means includes a model parameter estimation means for estimating model parameters and a means for estimating posterior probabilities of data in each class of data from the learned model parameters. Item 2. The model generator according to item 1 or 2.

［プログラム］
請求項４に係る本発明は、予め関係性が定義されたクラスの互いの関係性に基づいて、前記クラス間の類似度を第一類似度として算出するステップと、
前記クラスの特徴量に基づいて、前記クラス間の類似度を第二類似度として算出するステップと、
前記第一類似度及び前記第二類似度に基づき、前記クラス間が類似しているか否かを判定するステップと、
類似していると判定されたクラス同士を、一つのクラスに統合するステップと、
教師あり機械学習処理を行うステップと、
をコンピュータに実行させるためのプログラムである。 [program]
The present invention according to claim 4 includes a step of calculating the similarity between the classes as the first similarity based on the mutual relationship between the classes in which the relationship is defined in advance.
A step of calculating the similarity between the classes as a second similarity based on the features of the classes, and
A step of determining whether or not the classes are similar based on the first similarity and the second similarity, and
A step to integrate classes judged to be similar into one class,
Steps to perform supervised machine learning processing,
Is a program to make a computer execute.

請求項１に係る本発明よれば、事前学習の際にクラスの統合を行わない場合と比較して、学習精度が高くなるようなモデルを生成することが可能となる。 According to the first aspect of the present invention, it is possible to generate a model in which the learning accuracy is higher than that in the case where the classes are not integrated at the time of pre-learning.

請求項２に係る本発明よれば、請求項１の効果に加えて、クラスの関係性が階層構造又はグラフ構造の場合に適用可能となる。 According to the second aspect of the present invention, in addition to the effect of the first aspect, it can be applied when the class relationship has a hierarchical structure or a graph structure.

請求項３に係る本発明よれば、請求項１又は２の効果に加えて、事後確率に基づいて算出された第二類似度に基づいてクラス間が類似しているか否かを判定される、機械学習用モデル生成装置が得られる。 According to the present invention according to claim 3, in addition to the effect of claim 1 or 2, it is determined whether or not the classes are similar based on the second similarity calculated based on the posterior probability. A model generator for machine learning is obtained.

請求項４に係る本発明よれば、事前学習の際にクラスの統合を行わない場合と比較して、学習精度が高くなるようなモデルを生成することが可能となる。 According to the fourth aspect of the present invention, it is possible to generate a model in which the learning accuracy is higher than that in the case where the classes are not integrated at the time of pre-learning.

本発明の実施形態に係るモデル生成装置１０の機能構成を示すブロック図である。It is a block diagram which shows the functional structure of the model generation apparatus 10 which concerns on embodiment of this invention. 本発明の実施形態に係るモデル生成装置１０のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware structure of the model generation apparatus 10 which concerns on embodiment of this invention. 本発明の実施形態に係るモデル生成装置１０の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation of the model generation apparatus 10 which concerns on embodiment of this invention. 実施例１に係るクラスの階層構造の例を示す。An example of the hierarchical structure of the class according to the first embodiment is shown. 実施例１に係るクラスにおける、各インスタンスの推定確率の例である。This is an example of the estimated probability of each instance in the class according to the first embodiment. 実施例１に係る学習クラス間の相関係数を示す行列である。It is a matrix which shows the correlation coefficient between learning classes which concerns on Example 1. 実施例１に係る学習クラス間の階層構造上の類似度と、クラスの階層構造との関係を示す。The relationship between the hierarchical structure similarity between the learning classes according to the first embodiment and the hierarchical structure of the classes is shown. 実施例１において、クラスの統合処理がされた後の階層構造を示す。In the first embodiment, the hierarchical structure after the class integration processing is performed is shown. 実施例２に係るクラスのグラフ構造の例を示す。An example of the graph structure of the class according to the second embodiment is shown. 実施例２に係るクラスにおける、各インスタンスの推定確率の例である。This is an example of the estimated probability of each instance in the class according to the second embodiment. 実施例２に係る学習クラス間の相関係数を示す行列である。It is a matrix which shows the correlation coefficient between learning classes which concerns on Example 2. 実施例２に係る学習クラス間のグラフ構造上の類似度と、クラスのグラフ構造との関係を示す。The relationship between the graph structure similarity between the learning classes according to the second embodiment and the graph structure of the classes is shown. 実施例２において、クラスの統合処理がされた後のグラフ構造を示す。In Example 2, the graph structure after the class integration process is shown.

図１は、本発明の実施形態に係るモデル生成装置１０の機能構成を示すブロック図である。本実施形態のモデル生成装置１０は、機械学習部１１と、第一類似度算出部１２と、第二類似度算出部１３と、判定部１４と、クラス統合部１５と、出力部１６を具えている。 FIG. 1 is a block diagram showing a functional configuration of the model generation device 10 according to the embodiment of the present invention. The model generation device 10 of the present embodiment includes a machine learning unit 11, a first similarity calculation unit 12, a second similarity calculation unit 13, a determination unit 14, a class integration unit 15, and an output unit 16. I have.

機械学習部１１では、ラベル付き学習データが記録された学習データ１７から機械学習処理が行われ、モデルパラメータの推定、及び、インスタンス（一件ごとの学習データ）がどのクラスに該当するのかというクラス分類に関する事後確率の推定、を行う。 In the machine learning unit 11, machine learning processing is performed from the learning data 17 in which the labeled learning data is recorded, model parameters are estimated, and a class of which class the instance (learning data for each case) corresponds to. Estimate posterior probabilities for classification.

第一類似度算出部１２では、機械学習部１１で推定された各インスタンスの事後確率（以下、「推定確率」ともいう）を基にして生成されるベクトル量を各クラスの特徴量として、当該特徴量に基づいて各クラス間の類似度（「特徴量類似度」）を算出する。 In the first similarity calculation unit 12, the vector amount generated based on the posterior probability of each instance estimated by the machine learning unit 11 (hereinafter, also referred to as “estimated probability”) is used as the feature amount of each class. The similarity between each class (“feature similarity”) is calculated based on the feature amount.

第二類似度算出部１３では、クラスの関係性の情報１８（例えば「クラスの階層」の情報）に基づいて、各クラス間の類似度（「関係性類似度」）を算出する。クラスの関係性の情報の初期値は、学習データ１６のラベルを基に人手で構築する。 The second similarity calculation unit 13 calculates the similarity between each class (“relationship similarity”) based on the class relationship information 18 (for example, information on the “class hierarchy”). The initial value of the class relationship information is manually constructed based on the label of the learning data 16.

判定部１４では、特徴量類似度と関係性類似度のそれぞれにおいて、各所定の閾値を超えている場合、そのクラスペアを「類似している」と判定する。 When the feature amount similarity and the relationship similarity exceed each predetermined threshold value, the determination unit 14 determines that the class pair is "similar".

クラス統合部１５では、判定部１４において「類似している」と判定されたクラスペアを一つのクラスに統合する処理を行い、学習データ１７のラベルとクラスの関係性の情報１８を更新する。 The class integration unit 15 performs a process of integrating the class pairs determined to be “similar” by the determination unit 14 into one class, and updates the label of the learning data 17 and the information 18 of the relationship between the classes.

出力部１６では、判定部１４において「類似している」と判定されたクラスペアが存在しない場合、機械学習部１１によって推定されたモデルパラメータを出力する。 The output unit 16 outputs the model parameters estimated by the machine learning unit 11 when there is no class pair determined to be “similar” by the determination unit 14.

図２は、モデル生成装置１０ハードウェア構成を示す図である。モデル生成装置１０は、ＣＰＵ２１、メモリ２２、ハードディスクドライブ（ＨＤＤ）等の記憶装置２３、表示装置２４を有し、これらの構成要素は、制御バス２５を介して互いに接続されている。 FIG. 2 is a diagram showing a hardware configuration of the model generator 10. The model generation device 10 includes a CPU 21, a memory 22, a storage device 23 such as a hard disk drive (HDD), and a display device 24, and these components are connected to each other via a control bus 25.

ＣＰＵ２１は、メモリ２２または記憶装置２３に格納された（あるいはＣＤ−ＲＯＭ等の記憶媒体（図示しない）から提供される）制御プログラム、に基づいて所定の処理を実行して、モデル生成装置１０の動作を制御する。 The CPU 21 executes a predetermined process based on a control program stored in the memory 22 or the storage device 23 (or provided from a storage medium (not shown) such as a CD-ROM), and the model generation device 10 Control the operation.

なお、本発明の実施に当たっては、モデル生成装置１０が、キーボード、タッチパネルなどの各種入力用インターフェイス装置を更に具えていても良い。 In carrying out the present invention, the model generation device 10 may further include various input interface devices such as a keyboard and a touch panel.

次に、図３のフローチャートを参照しながら、各クラスの特徴量を各インスタンスの推定確率を基に算出する場合を例に、本実施形態に係るモデル生成装置の動作を説明する。以下に示す実施例は、事前に定義されたクラスの関係性が、図４に示す階層構造を有している場合（実施例１）、図９に示すグラフ構造を有している場合（実施例２）である。 Next, with reference to the flowchart of FIG. 3, the operation of the model generator according to the present embodiment will be described by taking as an example the case where the feature amount of each class is calculated based on the estimated probability of each instance. In the examples shown below, when the predefined class relationships have the hierarchical structure shown in FIG. 4 (Example 1) and the graph structure shown in FIG. 9 (implementation). Example 2).

ステップＳ１では、階層構造の最下位のクラス（「法人名」、「政治組織名」、「都道府県名」、「市町村名」、「人名」）で機械学習処理が実行された後、各インスタンスについて推定確率が算出され、ステップＳ２へ進む。以下、各インスタンスの推定確率が、図５に示した値となった場合を例として説明する。（インスタンス１のデータの推定確率は、法人名：０．１、政治組織名：０．１、都道府県名：０．４、市町村名：０．３、人名：０．１となっている。) In step S1, each instance is executed after the machine learning process is executed in the lowest class of the hierarchical structure (“corporate name”, “political organization name”, “prefecture name”, “city / town / village name”, “personal name”). The estimated probability is calculated for, and the process proceeds to step S2. Hereinafter, the case where the estimated probability of each instance becomes the value shown in FIG. 5 will be described as an example. (The estimated probability of the data of instance 1 is corporation name: 0.1, political organization name: 0.1, prefecture name: 0.4, city / town / village name: 0.3, person name: 0.1. )

各クラスの特徴量は、例えば、各インスタンスの推定確率の値を各次元の値として持つベクトル量として算出することができる。 The feature quantity of each class can be calculated as, for example, a vector quantity having the value of the estimation probability of each instance as the value of each dimension.

ステップＳ２では、各インスタンスの推定確率から各学習クラスの特徴量が生成された後、生成された各学習クラスの特徴量に基づいて学習クラス間の相関係数が算出される。図６には、特徴量の相関行列として表現した。 In step S2, after the features of each learning class are generated from the estimated probabilities of each instance, the correlation coefficient between the learning classes is calculated based on the generated features of each learning class. In FIG. 6, it is represented as a correlation matrix of features.

ステップＳ１、Ｓ２と並列で行うことが可能なステップＳ３では、クラスの階層構造に基づいて、各学習クラス間の階層構造上の類似度を算出する。例えば、同一の上位階層（親）の配下に存在するクラス同士（兄弟クラス）を類似度「１」とし、それ以外のクラス間については類似度「０」とすることで、階層構造上の類似度を算出することができる。 In step S3, which can be performed in parallel with steps S1 and S2, the degree of hierarchical similarity between the learning classes is calculated based on the hierarchical structure of the classes. For example, by setting the similarity between classes (brother classes) existing under the same upper hierarchy (parent) to "1" and the similarity between other classes to "0", the similarity in the hierarchical structure is set. The degree can be calculated.

図４の階層構造を例にした場合、「法人名」と「政治組織名」、及び、「都道府県名」と「市町村名」の組み合わせは、それぞれ、同一の親（「組織名」、「地名」）を親としているため、階層構造上の類似度は「１」となり、これら以外の組み合わせについては、階層構造上の類似度は「０」となる。図７には、各学習クラス間の類似度を行列として表現した。 Taking the hierarchical structure of FIG. 4 as an example, the combinations of "corporate name" and "political organization name" and "prefecture name" and "city / town / village name" are the same parent ("organization name" and "organization name", respectively. Since the parent is the place name), the similarity in the hierarchical structure is "1", and for combinations other than these, the similarity in the hierarchical structure is "0". In FIG. 7, the similarity between each learning class is represented as a matrix.

ステップＳ２及びＳ３の完了後にステップＳ４へ進み、ステップＳ４では、ステップＳ２で算出された特徴量の相関係数、及び、ステップＳ３で算出された階層構造上の類似度、がそれぞれ所定の閾値以上である学習クラスペアが、類似しているクラスペアとして抽出される。例として、特徴量の相関係数の閾値を０．５、階層構造上の類似度の閾値を０．５とした場合は、「市町村名」と「都道府県名」のペアが類似しているクラスペアとして抽出されることになる。 After the completion of steps S2 and S3, the process proceeds to step S4, and in step S4, the correlation coefficient of the feature amount calculated in step S2 and the similarity in the hierarchical structure calculated in step S3 are each equal to or more than a predetermined threshold value. The learning class pairs that are are extracted as similar class pairs. As an example, when the threshold value of the correlation coefficient of the feature amount is 0.5 and the threshold value of the similarity in the hierarchical structure is 0.5, the pair of "city name" and "prefecture name" is similar. It will be extracted as a class pair.

ステップＳ４に続くステップＳ５では、類似しているクラスペアが存在するかどうかの判定が行われ、類似しているペアが存在する場合は、ステップＳ６へ進み、類似しているクラスペアが存在しない場合は、ステップＳ７へ進む。 In step S5 following step S4, it is determined whether or not a similar class pair exists, and if there is a similar pair, the process proceeds to step S6 and there is no similar class pair. If so, the process proceeds to step S7.

ステップＳ６では、ステップＳ４で抽出された類似しているクラスペアを一つのクラスに統合する。具体的には、階層構造として記録されているクラスの関係性の情報（階層構造の情報）と学習データのラベルの更新を行う。「市町村名」と「都道府県名」のペアが類似しているクラスペアであった場合、両者が統合され、統合後のクラス名（ラベル）としては、例えば、階層構造上の上位階層名（例：「地名」）を用いることができる（図８）。クラスの統合処理後、ステップＳ１、Ｓ３へ進む。 In step S6, the similar class pairs extracted in step S4 are integrated into one class. Specifically, the label of the class relationship information (hierarchical structure information) and the learning data recorded as the hierarchical structure is updated. If the pair of "city name" and "prefecture name" is a similar class pair, both are integrated, and the class name (label) after integration is, for example, a higher hierarchy name in the hierarchical structure ( Example: "place name") can be used (Fig. 8). After the class integration process, the process proceeds to steps S1 and S3.

再び進んだステップＳ１及びＳ３では、更新された階層構造の情報及び学習データのラベルに基づいて、それぞれ機械学習処理、各学習クラス間の階層構造上の類似度の算出が再度行われる。階層構造の更新により、「地名」が追加され、「市町村名」と「都道府県名」が削除された結果、階層構造の最下位のクラスは、「法人名」、「政治組織名」、「地名」、「人名」となる。 In steps S1 and S3 that proceed again, the machine learning process and the calculation of the hierarchical similarity between the learning classes are performed again based on the updated hierarchical structure information and the labels of the learning data, respectively. Due to the update of the hierarchical structure, "place name" was added, and "city / town / village name" and "prefecture name" were deleted. As a result, the lowest class of the hierarchical structure was "corporate name", "political organization name", and " It becomes "place name" and "personal name".

このように、類似しているペアが存在しなくなるまでループが繰り返され、最終的にステップＳ７では、モデルが出力されて終了となる。 In this way, the loop is repeated until there are no similar pairs, and finally in step S7, the model is output and the process ends.

実施例２では、小説のテキストから作者を推定するというタスクを想定したものであり、師弟・友人関係などから定義される作者間の関係性は、図９に示したようなグラフ構造を取る（線で結ばれるクラス（小説家名）間に、師弟関係や友人関係が存在することを示している）。 In the second embodiment, the task of estimating the author from the text of the novel is assumed, and the relationship between the authors defined from the teacher-apprentice / friend relationship has a graph structure as shown in FIG. 9 ( It shows that there is a teacher-apprentice relationship and a friendship between the classes (novelist name) connected by a line).

ステップＳ１では、グラフ構造を構成する全クラスで機械学習処理が実行された後、各インスタンスについて推定確率が算出され、ステップＳ２へ進む。以下、各インスタンスの推定確率が、図５に示した値となった場合を例として説明する。（インスタンス１のデータの推定確率は、小説家Ａ：０．１、小説家Ｂ：０．１、小説家Ｃ：０．４、小説家Ｄ：０．３、小説家Ｅ：０．１となっている。) In step S1, after the machine learning process is executed in all the classes constituting the graph structure, the estimated probability is calculated for each instance, and the process proceeds to step S2. Hereinafter, the case where the estimated probability of each instance becomes the value shown in FIG. 5 will be described as an example. (The estimated probabilities of the data of instance 1 are novelist A: 0.1, novelist B: 0.1, novelist C: 0.4, novelist D: 0.3, novelist E: 0.1. It has become.)

ステップＳ２では、各インスタンスの推定確率から各学習クラスの特徴量が生成された後、生成された各学習クラスの特徴量に基づいて学習クラス間の相関係数が算出される。図１１には、特徴量の相関行列として表現した。 In step S2, after the features of each learning class are generated from the estimated probabilities of each instance, the correlation coefficient between the learning classes is calculated based on the generated features of each learning class. In FIG. 11, it is represented as a correlation matrix of features.

ステップＳ１、Ｓ２と並列で行うことが可能なステップＳ３では、クラスのグラフ構造に基づいて、各学習クラス間のグラフ構造上の類似度を算出する。例えば、線で直接接続されているクラス同士を類似度「１」とし、それ以外のクラス間については類似度「０」とすることで、グラフ構造上の類似度を算出することができる。 In step S3, which can be performed in parallel with steps S1 and S2, the degree of similarity in the graph structure between the learning classes is calculated based on the graph structure of the classes. For example, the similarity in the graph structure can be calculated by setting the similarity between the classes directly connected by the line to "1" and setting the similarity between the other classes to "0".

図１２のグラフ構造を例にした場合、例えば「小説家Ａ」は、「小説家Ｂ」、「小説家Ｃ」及び「小説家Ｄ」との組み合わせについて、グラフ構造上の類似度が「１」となり、「小説家Ｅ」との類似度は「０」となる。図１２には、各学習クラス間の類似度を行列として表現した。 Taking the graph structure of FIG. 12 as an example, for example, "Novelist A" has a degree of similarity in graph structure of "1" for combinations with "Novelist B", "Novelist C", and "Novelist D". , And the degree of similarity with "Novelist E" is "0". In FIG. 12, the similarity between each learning class is represented as a matrix.

ステップＳ２及びＳ３の完了後にステップＳ４へ進み、ステップＳ４では、ステップＳ２で算出された特徴量の相関係数、及び、ステップＳ３で算出されたグラフ構造上の類似度、がそれぞれ所定の閾値以上である学習クラスペアが、類似しているクラスペアとして抽出される。例として、特徴量の相関係数の閾値を０．５、グラフ構造上の類似度の閾値を０．５とした場合は、「小説家Ｃ」と「小説家Ｄ」のペアが類似しているクラスペアとして抽出されることになる。 After the completion of steps S2 and S3, the process proceeds to step S4, and in step S4, the correlation coefficient of the feature amount calculated in step S2 and the similarity in the graph structure calculated in step S3 are equal to or higher than a predetermined threshold value. The learning class pair that is is extracted as a similar class pair. As an example, when the threshold value of the correlation coefficient of the feature amount is 0.5 and the threshold value of the similarity on the graph structure is 0.5, the pair of "Novelist C" and "Novelist D" are similar. It will be extracted as a class pair.

ステップＳ６では、ステップＳ４で抽出された類似しているクラスペアを一つのクラスに統合する。具体的には、グラフ構造として記録されているクラスの関係性の情報（グラフ構造の情報）と学習データのラベルの更新を行う。例として、「小説家Ｃ」と「小説家Ｄ」のペアが類似しているクラスペアであった場合の統合処理後のグラフ構造を図１３に示す。統合後のクラス名（ラベル）としては、例えば、統合前の両者の名称を結合した名称を用いることができる。クラスの統合処理後、ステップＳ１、Ｓ３へ進む。 In step S6, the similar class pairs extracted in step S4 are integrated into one class. Specifically, the label of the class relationship information (graph structure information) and the learning data recorded as the graph structure is updated. As an example, FIG. 13 shows a graph structure after the integration process when the pair of “Novelist C” and “Novelist D” are similar class pairs. As the class name (label) after the integration, for example, a name that combines both names before the integration can be used. After the class integration process, the process proceeds to steps S1 and S3.

再び進んだステップＳ１及びＳ３では、更新されたグラフ構造の情報及び学習データのラベルに基づいて、それぞれ機械学習処理、各学習クラス間のグラフ構造上の類似度の算出が再度行われる。 In steps S1 and S3 that proceed again, the machine learning process and the calculation of the similarity in the graph structure between the learning classes are performed again based on the updated graph structure information and the labels of the learning data, respectively.

以上で説明をしたように、本発明は、機械学習用モデル生成装置及びプログラムに適用することができる。 As described above, the present invention can be applied to a machine learning model generator and a program.

１０モデル生成装置
１１機械学習部
１２第一類似度算出部
１３第二類似度算出部
１４判定部
１５クラス統合部
１６出力部
１７学習データ
１８クラスの関係性の情報
２１ＣＰＵ
２２メモリ
２３記憶装置
２４表示装置
２５制御バス 10 Model generator 11 Machine learning unit 12 First similarity calculation unit 13 Second similarity calculation unit 14 Judgment unit 15 Class integration unit 16 Output unit 17 Learning data 18 Class relationship information 21 CPU
22 Memory 23 Storage device 24 Display device 25 Control bus

Claims

A first similarity calculation means that calculates the similarity between the classes based on the mutual relationship between the classes whose relationships are defined in advance.
A second similarity calculation means that calculates the similarity between the classes based on the features of the classes, and
A determination means for determining whether or not the classes are similar based on the calculation results of the first calculation means and the second calculation means.
A class integration means for integrating classes determined to be similar by the determination means into one class,
A machine learning means that performs supervised machine learning processing in a class integrated by the class integration means ,
A model generator for machine learning equipped with.

The machine learning model generator according to claim 1, wherein the relationship is a hierarchical structure or a graph structure.

The model generator according to claim 1 or 2, wherein the machine learning means includes a model parameter estimating means for estimating model parameters and a means for estimating posterior probabilities of data in each class of data from the learned model parameters. ..

A step of calculating the similarity between the classes as the first similarity based on the mutual relationship of the classes whose relationships are defined in advance, and
A step of calculating the similarity between the classes as a second similarity based on the features of the classes, and
A step of determining whether or not the classes are similar based on the first similarity and the second similarity, and
A step to integrate classes judged to be similar into one class,
Steps to perform supervised machine learning processing in an integrated class ,
A program that lets your computer run.