JP6453942B2

JP6453942B2 - Text classification device, text classification method and program

Info

Publication number: JP6453942B2
Application number: JP2017113368A
Authority: JP
Inventors: 山田　聡; 聡山田; 松田　慎一; 慎一松田
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2017-06-08
Filing date: 2017-06-08
Publication date: 2019-01-16
Anticipated expiration: 2037-06-08
Also published as: JP2018206238A

Description

本発明は、テキスト分類装置、テキスト分類方法及びプログラムに関する。 The present invention relates to a text classification device, a text classification method, and a program.

コールセンタのような電話でカスタマーサポートが必要とされる業種では、電話対応時にさまざまな情報を参照するとともに、経験に基づくノウハウを活用しながら、お客の質問に回答していくことが必要となる。 In industries such as call centers where customer support is required, it is necessary to refer to various information when responding to telephone calls and to answer customer questions while utilizing know-how based on experience.

カスタマーサポートにおいて過去に行われた質疑応答の内容は、後の同種の質問に備え、紙媒体に印刷するなどしてマニュアルを作成し、各オペレータに配布することでカスタマーサポート業務の効率化が図られている（例えば特許文献１参照）。 The contents of questions and answers that were conducted in the past in customer support are prepared for subsequent questions of the same type, printed on paper media, etc., and manuals are distributed to each operator to improve the efficiency of customer support operations. (For example, refer to Patent Document 1).

特開２００２−１５７４４５号公報JP 2002-157445 A

しかしながら、経験の浅いオペレータが電話対応する場合や、不慣れな分野で電話対応することが必要な場合には、マニュアルの参照回数が多くなりがちで通話の保留回数が増え、対応時間が長くなる傾向にある。これにより、お客にストレスを与えるといった問題や、コールセンタのコストが上昇するといった問題が指摘されていた。 However, when an inexperienced operator is available to answer a phone call or when it is necessary to answer a phone call in an unfamiliar field, the number of manual calls tends to increase, and the number of calls held tends to increase, resulting in a longer response time. It is in. As a result, problems have been pointed out such as stressing customers and the cost of call centers rising.

本発明は以上説明した事情を鑑みてなされたものであり、電話によるカスタマーサポート業務等において、お客からの質問等に対して迅速かつ適切な対応を実現するのに好適なテキスト分類技術を提供することを目的の一つとする。 The present invention has been made in view of the circumstances described above, and provides a text classification technique suitable for realizing a prompt and appropriate response to a question from a customer in a telephone customer support operation or the like. Is one of the purposes.

本発明の一態様におけるテキスト分類装置は、分類器を用いて自然言語のクラス分類を行うテキスト分類装置であって、複数の分類器と、学習テキストデータと学習カテゴリを含む学習データを複数入力する入力部と、分類器ごとに、入力される複数の学習データの一部を、精度確認用の学習データとして割り当てるとともに、残りの前記学習データを、学習用の学習データとして割り当てるアサイン部と、割り当てられた学習用の学習データを用いて、対応する分類器を学習するとともに、割り当てられた精度確認用の学習データを用いて、対応する分類器の学習の精度確認を行う学習部とを備え、アサイン部は、いずれか１つの分類器において、精度確認用の学習データとして割り当てる学習データの一部を、残りのいずれか１つの分類器において、学習用の学習データとして割り当てる。 A text classification apparatus according to an aspect of the present invention is a text classification apparatus that performs natural language class classification using a classifier, and inputs a plurality of learning data including a plurality of classifiers, learning text data, and a learning category. Assigning a part of a plurality of input learning data as learning data for accuracy check and assigning the remaining learning data as learning data for learning for each input unit and classifier A learning unit for learning the corresponding classifier using the learning data for learning, and using the learning data for checking the assigned accuracy to check the accuracy of learning of the corresponding classifier, In any one classifier, the assigning unit assigns a part of the learning data to be assigned as learning data for accuracy check to any one of the remaining classifications. In allocates as learning data for learning.

上記構成において、各分類器の学習が終了した後に入力されるテキストデータに対し、各分類器を統合してテキスト分類を行う分類部をさらに具備する構成としてもよい。 The above configuration may further include a classification unit that integrates the classifiers and classifies the text data input after the learning of the classifiers is completed.

上記構成において、複数の分類器は、所定のモデルを用いたテキスト分類器であり、複数の分類器のうち、少なくとも１つの分類器は、他の分類器と異なるモデルを用いたテキスト分類器であってもよい。 In the above configuration, the plurality of classifiers are text classifiers using a predetermined model, and at least one of the plurality of classifiers is a text classifier using a model different from the other classifiers. There may be.

上記構成において、少なくとも１つの分類器は、ＬＲＣのモデルを用いたテキスト分類器であり、他の分類器は、ＤＮＮのモデルを用いたテキスト分類器であってもよい。 In the above configuration, at least one classifier may be a text classifier using an LRC model, and the other classifier may be a text classifier using a DNN model.

本発明の他の態様におけるテキスト分類装置は、分類器を用いて自然言語のクラス分類を行うテキスト分類装置であって、複数の分類器と、各分類器で共用されるマスター単語情報を記録したマスター単語コーパスと、分類器ごとに生成されるドメイン単語情報を記録した複数のドメイン単語コーパスと、各分類器を用いて、入力されるテキストデータをテキスト分類する分類部とを備え、分類部は、分類器に対応するドメイン単語コーパスを優先的に参照し、ドメイン単語コーパスに、テキストデータの単語が含まれている場合には、ドメイン単語コーパスから単語を読み出し、読み出した単語とともに、対応する分類器を用いてテキスト分類を行う。 A text classification apparatus according to another aspect of the present invention is a text classification apparatus that performs natural language class classification using a classifier, and records a plurality of classifiers and master word information shared by each classifier. A master word corpus, a plurality of domain word corpora that record domain word information generated for each classifier, and a classifier that classifies input text data using each classifier, The domain word corpus corresponding to the classifier is preferentially referred to, and if the word of the text data is included in the domain word corpus, the word is read from the domain word corpus and the corresponding classification is performed together with the read word. Classify text using a container.

上記構成において、分類部は、ドメイン単語コーパスに、テキストデータの単語が含まれていない場合には、マスター単語コーパスを参照し、マスター単語コーパスに記録されている単語に対して表現の補完を行い、表現の補完後の単語とともに、対応する分類器を用いてテキスト分類を行ってもよい。 In the above configuration, when the domain word corpus does not include a word of the text data, the classification unit refers to the master word corpus and complements the expression recorded on the words recorded in the master word corpus. The text classification may be performed using the corresponding classifier together with the word after completion of the expression.

本発明によれば、電話によるカスタマーサポート業務等において、お客からの質問等に対して迅速かつ適切な対応を実現するのに好適なテキスト分類技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the text classification technique suitable for implement | achieving a prompt and appropriate response | compatibility with respect to a customer's question etc. in the customer support business etc. by telephone can be provided.

本実施形態に係るテキスト分類システムの基本原理を示す説明図である。It is explanatory drawing which shows the basic principle of the text classification system which concerns on this embodiment. コールセンタでの運用を例示した図である。It is the figure which illustrated the operation | use in a call center. 学習データを例示した図である。It is the figure which illustrated learning data. コールセンタでの運用を例示した図である。It is the figure which illustrated the operation | use in a call center. テキスト分類装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of a text classification device. 複数種類の分類器を例示した図である。It is the figure which illustrated multiple types of classifiers. 各コーパスの利用方法を説明するための図である。It is a figure for demonstrating the utilization method of each corpus. 学習データの利用方法を説明するための図である。It is a figure for demonstrating the utilization method of learning data. 学習データの利用方法を説明するための図である。It is a figure for demonstrating the utilization method of learning data. 単語の分散表現を二次元表示した図である。It is the figure which displayed the distributed expression of the word two-dimensionally. 単語分散表現の学習を説明するための図である。It is a figure for demonstrating learning of word dispersion | distribution expression. 単語分散表現の学習を複数回実施したときの学習結果を示す図である。It is a figure which shows a learning result when learning of word dispersion | distribution expression is implemented in multiple times. 異なるコーパス間で単語を当てはめた状態を示す図である。It is a figure which shows the state which applied the word between different corpora. 表現の補完を説明するための模式図である。It is a schematic diagram for demonstrating the complementation of expression. 表現の補完の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of expression complementation.

以下に本発明の実施形態を説明する。以下の図面の記載において、同一または類似の部分には同一または類似の符号で表している。但し、図面は模式的なものである。したがって、具体的な寸法などは以下の説明を照らし合わせて判断するべきものである。また、図面相互間においても互いの寸法の関係や比率が異なる部分が含まれていることは勿論である。さらに、本発明の技術的範囲は、当該実施形態に限定して解するべきではない。 Embodiments of the present invention will be described below. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals. However, the drawings are schematic. Therefore, specific dimensions and the like should be determined in light of the following description. Moreover, it is a matter of course that portions having different dimensional relationships and ratios are included between the drawings. Furthermore, the technical scope of the present invention should not be understood as being limited to the embodiment.

Ａ．本実施形態
Ａ−１．基本原理
図１は、本実施形態に係るテキスト分類システム１００の基本原理を示す説明図である。テキスト分類システム１００は、コールセンタ等で用いて好適な「問い合わせ対応」を効率化するためのシステムであり、自然文分類器を用いて自然言語処理（クラス分類）を行う。テキスト分類システム１００は、図１に示すように、学習処理機能Ｆ１と、分類処理機能Ｆ２とを備えている。 A. Embodiment A-1. Basic Principle FIG. 1 is an explanatory diagram showing the basic principle of a text classification system 100 according to this embodiment. The text classification system 100 is a system for improving the efficiency of “response to inquiry” that is suitable for use in a call center or the like, and performs natural language processing (class classification) using a natural sentence classifier. As shown in FIG. 1, the text classification system 100 includes a learning processing function F1 and a classification processing function F2.

＜学習処理機能Ｆ１＞
ディープラーニングを用いて自然言語のクラス分類を行う自然文分類処理部１２０は、利用者サーバ１１０から、大量に用意された自然文（例えば問い合わせ）と、自然文に対応する分類（例えば問い合わせに対する回答分類；以下、「カテゴリ」と略称する。）とのセットを、学習データＤｌとして受け取る。自然文分類処理部１２０は、受け取った学習データをもとに、機械学習することでクラス分類を行うテキスト分類器１３０を作成する。 <Learning processing function F1>
The natural sentence classification processing unit 120 that performs natural language class classification using deep learning is provided with a large amount of natural sentences (for example, inquiries) and classifications corresponding to the natural sentences (for example, responses to inquiries) from the user server 110. A set of “classification” (hereinafter abbreviated as “category”) is received as learning data Dl. The natural sentence classification processing unit 120 creates a text classifier 130 that performs class classification by machine learning based on the received learning data.

図２に示すコールセンタでの運用を例に説明すると、まず、管理者は、業務中の問い合わせ内容やログを参照しながら、利用者サーバ１１０を適宜操作して、問い合わせ内容をあらわすテキストデータとカテゴリとのセットである学習データＤｌを作成する。図３は、学習データＤｌを例示した図である。図３に示すように、学習データＤｌは、実際にオペレータが問い合わせ内容から書き起こしたテキストデータ（以下、学習テキストデータ）Ｄｌｔと、問い合わせに応じて実際に回答した内容を分類したカテゴリ（以下、学習カテゴリ）Ｃｌｙとのセットによって構成されている。一例をあげて説明すると、例えば「・・・低速化について知りたい。」という問い合わせ（すなわち学習テキストデータＤｌｔ）については、「Ｓｐｅｅｄｌｉｍｉｔ」というカテゴリ（すなわち学習カテゴリＣｌｙ）が対応づけられている（図３参照）。なお、学習データＤ１を何セット用意するか、カテゴリを何種類登録しておくか等は、システム設計等に応じて適宜設定・変更可能である。管理者は、図３に示すような複数セットの学習データＤｌを作成すると、作成した学習データＤｌを学習リクエストＲｌとともに自然文分類処理部１２０に出力する（図２参照）。 The operation at the call center shown in FIG. 2 will be described as an example. First, the administrator appropriately operates the user server 110 while referring to the inquiry contents and logs during work, and the text data and categories representing the inquiry contents. Learning data Dl that is a set of FIG. 3 is a diagram illustrating the learning data Dl. As shown in FIG. 3, the learning data Dl includes text data (hereinafter referred to as learning text data) Dlt that is actually written from the inquiry content and a category (hereinafter referred to as “category”) in which the content actually answered in response to the inquiry is classified. Learning category) Cly. As an example, for example, a query “... I want to know about slowdown” (ie, learning text data Dlt) is associated with a category (ie, learning category Cly) “Speedlimit” (FIG. 3). It should be noted that how many sets of learning data D1 are prepared and how many categories are registered can be appropriately set and changed according to the system design or the like. When the administrator creates a plurality of sets of learning data Dl as shown in FIG. 3, the administrator outputs the created learning data Dl together with the learning request Rl to the natural sentence classification processing unit 120 (see FIG. 2).

＜分類処理機能Ｆ２＞
一方、分類処理を行う際、自然文分類処理部１２０は、外部（利用者サーバ１１０など）から自然文Ｓｎを受け取ると、分類器１３０を用いて受け取った自然文Ｓｎを解析することで、適切なカテゴリＣｙを判別し、判別したカテゴリＣｙを出力する。 <Classification processing function F2>
On the other hand, when performing the classification process, the natural sentence classification processing unit 120, when receiving the natural sentence Sn from the outside (such as the user server 110), appropriately analyzes the received natural sentence Sn using the classifier 130. The category Cy is determined, and the determined category Cy is output.

図４に示すコールセンタでの運用を例に説明すると、まず、オペレータは、与えられたＰＣ端末などを適宜操作し、お客の問い合わせ内容をテキスト入力する（例えば、「○×放題に変更したい」など）。利用者サーバ１１０は、問い合わせ内容を示すテキストデータＤｔｗを受け取ると、受け取ったテキストデータＤｔｗを分類リクエストＲｃとともに自然文分類処理部１２０に出力する。自然文分類処理部１２０は、テキスト分類器１３０を用いて、受け取ったテキストデータＤｔｗを解析し、カテゴリＣｙの判別を行う。自然文分類処理部１２０は、テキスト分類器１３０の学習データを参照することで、例えば「○×放題に変更したい」というテキストデータＤｔｗについては、“料金プラン変更”というカテゴリＣｙが最適であると判断すると、判断結果として“料金プラン変更”というカテゴリＣｙを利用者サーバ１１０に返答する。利用者サーバ１１０は、自然文分類処理部１２０からの返答結果に基づき、料金プランを変更するのに最適な業務画面をオペレータのＰＣ端末などに表示する。オペレータは、ＰＣ端末などに表示される業務画面を参照しながら、問い合わせに対する回答として契約プランの変更に必要な情報をお客に求める。以下、本実施形態に係る自然文分類処理部１２０を備えたテキスト分類装置２００について説明する。 The operation at the call center shown in FIG. 4 will be described as an example. First, the operator appropriately operates a given PC terminal or the like and inputs the customer's inquiry content (for example, “I want to change to XX”). ). When the user server 110 receives the text data Dtw indicating the inquiry content, the user server 110 outputs the received text data Dtw to the natural sentence classification processing unit 120 together with the classification request Rc. The natural sentence classification processing unit 120 analyzes the received text data Dtw using the text classifier 130 and determines the category Cy. The natural sentence classification processing unit 120 refers to the learning data of the text classifier 130, and for example, for text data Dtw “I want to change to XX”, the category Cy “change rate plan” is optimal. When the determination is made, the category Cy “charge plan change” is returned to the user server 110 as the determination result. Based on the response result from the natural sentence classification processing unit 120, the user server 110 displays a work screen optimal for changing the charge plan on the operator's PC terminal or the like. The operator asks the customer for information necessary for changing the contract plan as an answer to the inquiry while referring to the business screen displayed on the PC terminal or the like. Hereinafter, the text classification device 200 including the natural sentence classification processing unit 120 according to the present embodiment will be described.

Ａ−２．構成
図５は、テキスト分類装置２００の機能構成を示すブロック図である。テキスト分類装置２００は、例えばパーソナルコンピュータにより構成され、自然文分類処理部１２０、分類器１３０のほか、通信インタフェース部１４０、操作部１５０、記憶部１６０を備えている。 A-2. Configuration FIG. 5 is a block diagram showing a functional configuration of the text classification device 200. The text classification device 200 is configured by a personal computer, for example, and includes a communication interface unit 140, an operation unit 150, and a storage unit 160 in addition to a natural sentence classification processing unit 120 and a classifier 130.

自然文分類処理部１２０は、例えばＣＰＵ、ＡＳＩＣ、ＦＰＧＡ等のプロセッサ、ＲＯＭやＲＡＭなどのメモリにより構成され、プロセッサがメモリに格納された各種制御プログラムを実行することにより、学習部１２１と分類部１２２の機能を提供する。なお、自然文分類処理部１２０が備える各部の機能の詳細は後述する。 The natural sentence classification processing unit 120 includes, for example, a processor such as a CPU, an ASIC, and an FPGA, and a memory such as a ROM and a RAM. The processor executes various control programs stored in the memory, so that the learning unit 121 and the classification unit 122 functions are provided. Details of the functions of each unit included in the natural sentence classification processing unit 120 will be described later.

分類器１３０は、自然文分類処理部１２０による制御の下、入力されるテキストデータＤｔｗを解析し、カテゴリＣｙの判別を行う。図５に示すように、本実施形態では、ＤＮＮ（Deep Neural Network）のモデルを用いた２種類の分類器１３０を利用する。もっとも、分類器の数は２つに限らず、３つ以上であってもよい。この場合、利用する分類器は、ＤＮＮのモデルを用いたものに限定する趣旨ではなく、例えばＤＮＮのモデルを用いた２種のテキスト分類器に加え、ＬＲＣ（Logistic Regression Classifier；ロジスティック回帰）を利用したモデルを用いた１種類のテキスト分類器を利用してもよい（例えば図６参照）。このように、複数の分類器を利用するのは、機械学習の分類器の性能（分類精度など）の向上を図るためである。なお、利用する分類器のタイプとして、例えばＳＶＭ（Support Vector Machine）などを用いてもよい。 The classifier 130 analyzes the input text data Dtw under the control of the natural sentence classification processing unit 120 and determines the category Cy. As shown in FIG. 5, in this embodiment, two types of classifiers 130 using a DNN (Deep Neural Network) model are used. However, the number of classifiers is not limited to two and may be three or more. In this case, the classifier to be used is not limited to the one using the DNN model. For example, in addition to the two text classifiers using the DNN model, LRC (Logistic Regression Classifier) is used. One type of text classifier using the model may be used (see, for example, FIG. 6). The reason for using a plurality of classifiers in this way is to improve the performance (classification accuracy, etc.) of the machine learning classifier. For example, SVM (Support Vector Machine) may be used as the type of classifier to be used.

通信インタフェース部１４０は、外部のネットワークと接続する装置などから構成され、インターネットやイントラネットなどの通信網を介して利用者サーバ１１０等との間で各種データを授受する。 The communication interface unit 140 includes a device connected to an external network and the like, and exchanges various data with the user server 110 and the like via a communication network such as the Internet or an intranet.

操作部１５０は、例えばキーボードやマウスなどの入力装置や、各種情報を表示する表示パネルなどを備えて構成される。 The operation unit 150 includes an input device such as a keyboard and a mouse, a display panel that displays various information, and the like.

記憶部１６０は、例えばＨＤＤ（Hard Disk Drive）やＳＤＤ（Solid State Drive）など、書き換え可能な不揮発性メモリなどから構成されている。記憶部１６０は、１種類のマスター単語コーパスＣｍと、複数のドメイン単語コーパスＣｄとを備えている。 The storage unit 160 includes, for example, a rewritable nonvolatile memory such as an HDD (Hard Disk Drive) and an SDD (Solid State Drive). The storage unit 160 includes one type of master word corpus Cm and a plurality of domain word corpora Cd.

マスター単語コーパスＣｍは、例えばインターネット上に存在する様々なサイトなど（外部）から入手した大量のテキスト情報に基づき、予め単語や単語の意味情報（以下、「マスター単語情報」と総称する。）を集積したものである。マスター単語コーパスＣｍは、複数の分類器１３０で共用されるコーパスである。マスター単語コーパスＣｍは、本システムの運用に先駆け、別装置が事前に作成し、自然文分類処理部１２０が保守・管理するようにしてもよい。また、自然文分類処理部１２０が、マスター単語コーパスＣｍの作成、保守、管理を統括的に実施してもよい。 The master word corpus Cm is based on a large amount of text information obtained from various sites (externally) on the Internet, for example, in advance, and word and word semantic information (hereinafter collectively referred to as “master word information”). It is an accumulation. The master word corpus Cm is a corpus shared by a plurality of classifiers 130. Prior to the operation of this system, the master word corpus Cm may be created in advance by another device and maintained and managed by the natural sentence classification processing unit 120. In addition, the natural sentence classification processing unit 120 may centrally create, maintain, and manage the master word corpus Cm.

ドメイン単語コーパスＣｄは、分類器１３０ごとに設けられており、本実施形態では２つのドメイン単語コーパスＣｄが設けられている。ドメイン単語コーパスＣｄは、インターネット上のサイトでほぼ見かけないような独自の単語や単語の意味情報（例えば、特定業界や特定企業内でのみ使用される単語や単語の意味情報など；以下、「ドメイン単語情報」と総称する。）を集積したものである。自然文分類処理部１２０は、各分類器１３０を生成する際に、対応する各ドメイン単語コーパスＣｄを生成し、分類器１３０ごとに保守・管理する。 The domain word corpus Cd is provided for each classifier 130, and in this embodiment, two domain word corpora Cd are provided. The domain word corpus Cd is a unique word or word semantic information (eg, a word or word semantic information used only in a specific industry or a specific company; Collectively referred to as “word information”). When generating each classifier 130, the natural sentence classification processing unit 120 generates each corresponding domain word corpus Cd, and maintains and manages each classifier 130.

周知のとおり、機械学習を使ってテキスト分類を行う際には、各単語や単語の意味情報をあらかじめ学習しておくと、分類精度の向上を図ることができる。そこで、本実施形態では、上述したマスター単語コーパスＣｍ及びドメイン単語コーパスＣｄを作成し、これらマスター単語コーパスＣｍ及びドメイン単語コーパスＣｄを用いてテキスト分類を行うことで、分類精度の向上を図っている。なお、詳細は後述するが、マスター単語コーパスＣｍは、各ドメイン単語コーパスＣｄに比べてデータ量が膨大であり、複数のバージョンを保持・管理するのは難しい。そこで、本実施形態では、入力されるテキストデータＤｔｗを解析し、カテゴリＣｙを判別する際、自然文分類処理部１２０は、対応する分類器１３０のドメイン単語コーパスＣｄを優先的に参照し、当該ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれているかいないかをチェックする（図７参照）。自然文分類処理部１２０は、ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれていないと判断した場合にのみ、マスター単語コーパスＣｍを参照し、当該マスター単語コーパスＣｍからテキストデータＤｔｗの単語を読み出す、といった作業を行う。これにより、各コーパスを効率的に利用することが可能となる。 As is well known, when text classification is performed using machine learning, it is possible to improve classification accuracy by learning each word and word semantic information in advance. Therefore, in the present embodiment, the above-described master word corpus Cm and domain word corpus Cd are created, and text classification is performed using these master word corpus Cm and domain word corpus Cd, thereby improving classification accuracy. . Although details will be described later, the master word corpus Cm has an enormous amount of data compared to each domain word corpus Cd, and it is difficult to hold and manage multiple versions. Therefore, in this embodiment, when analyzing the input text data Dtw and determining the category Cy, the natural sentence classification processing unit 120 preferentially refers to the domain word corpus Cd of the corresponding classifier 130, and It is checked whether or not the word of the text data Dtw is included in the domain word corpus Cd (see FIG. 7). Only when it is determined that the domain word corpus Cd does not include the word of the text data Dtw, the natural sentence classification processing unit 120 refers to the master word corpus Cm and extracts the word of the text data Dtw from the master word corpus Cm. Read the data. Thereby, each corpus can be used efficiently.

Ａ−３．学習データの利用方法
図８は、学習データの利用方法を説明するための図である。
すでに説明したように、自然文分類処理部１２０は、利用者サーバ１１０から、学習テキストデータＤｌｔ（問い合わせ内容）と学習カテゴリＣｌｙ（回答した内容の分類）とのセットからなる複数の学習データＤｌを受け取ると、受け取った学習データＤｌをもとに、分類器１３０を作成・学習する。分類器１３０を学習する際、学習の進捗度合いをモニタリングするためには、入力されるデータを、実際に分類器１３０が分類したときの精度を確認する必要がある。精度を確認する方法の一つとして、図６に示すように、自然文分類処理部１２０に入力される複数の学習データＤｌの一部を、精度確認用の学習データＤｌｖに割り当てるとともに、残りの学習データＤｌを、本来の学習用の学習データＤｌｌに割り当てる、といった方法が考えられる。ただし、単に、入力される複数の学習データＤｌの一部を精度確認用の学習データＤｌｖに割り当てただけでは、学習データの一部が学習に反映されないという問題が生じる。 A-3. Learning Data Utilization Method FIG. 8 is a diagram for explaining a learning data utilization method.
As described above, the natural sentence classification processing unit 120 receives, from the user server 110, a plurality of pieces of learning data D1 including a set of learning text data Dlt (inquiry contents) and a learning category Cly (classification of answered contents). Upon receipt, the classifier 130 is created and learned based on the received learning data Dl. When learning the classifier 130, in order to monitor the progress of learning, it is necessary to check the accuracy when the classifier 130 actually classifies the input data. As one of the methods for confirming accuracy, as shown in FIG. 6, a part of the plurality of learning data Dl input to the natural sentence classification processing unit 120 is assigned to the learning data Dlv for accuracy confirmation, and the remaining A method is conceivable in which the learning data Dl is assigned to the original learning data Dll for learning. However, simply allocating a part of the plurality of input learning data Dl to the learning data Dlv for accuracy check causes a problem that a part of the learning data is not reflected in the learning.

そこで、図９に示すように、自然文分類処理部１２０の学習部（アサイン部）１２１は、利用者サーバ１１０から、通信インタフェース部（入力部）１４０を介して入力される学習データＤｌを、２つの分類器１３０（図９では、分類器Ａと分類器Ｂ）に提供するとともに、一方の分類器１３０において精度確認用の学習データＤｌｖに割り当てる学習データＤｌの一部を、他方の分類器１３０において本来の学習用の学習データＤｌｌに割り当てるように、学習データＤｌの割り当てを制御する。 Therefore, as shown in FIG. 9, the learning unit (assignment unit) 121 of the natural sentence classification processing unit 120 receives the learning data Dl input from the user server 110 via the communication interface unit (input unit) 140. Provided to two classifiers 130 (classifier A and classifier B in FIG. 9), a part of learning data Dl assigned to learning data Dlv for accuracy check in one classifier 130 is assigned to the other classifier The assignment of the learning data Dl is controlled so as to be assigned to the original learning data Dll for learning at 130.

図９の場合を例に詳述すると、分類器Ａにおいて学習に利用されない学習データの一部（すなわち、分類器Ａにおける精度検証用の学習データＤｌｖ）は、分類器Ｂにおいて本来の学習用の学習データＤｌｌに割り当てられる。一方、分類器Ｂにおいて学習に利用されない学習データの一部（すなわち、分類器Ｂにおける精度検証用の学習データＤｌｖ）は、分類器Ａにおいて本来の学習用の学習データＤｌｌに割り当てられる。学習部１２１は、割り当てられた本来の学習用の学習データＤｌｌを用いて、対応する分類器１３０を学習するとともに、割り当てられた精度検証用の学習データＤｌｖを用いて、対応する分類器１３０の学習の精度確認を行う。 Referring to FIG. 9 as an example, a part of the learning data that is not used for learning in the classifier A (that is, the learning data Dlv for accuracy verification in the classifier A) is used for the original learning in the classifier B. Assigned to learning data Dll. On the other hand, part of the learning data that is not used for learning in the classifier B (that is, learning data Dlv for accuracy verification in the classifier B) is assigned to the original learning data Dll for learning in the classifier A. The learning unit 121 learns the corresponding classifier 130 using the assigned original learning data Dll for learning, and also uses the assigned accuracy verification learning data Dlv for the corresponding classifier 130. Check the accuracy of learning.

この結果、入力される学習データ全体を学習に利用しつつ、正確な精度確認（精度検証）を実現することが可能となる。なお、学習が終了した後、外部から入力されるテキストデータを分類する際、自然文分類処理部１２０の分類部１２２は、２つの分類器１３０（図９では、分類器Ａと分類器Ｂ）を統合利用することで、精度の高いテキスト分類を実現する。 As a result, it is possible to realize accurate accuracy confirmation (accuracy verification) while using the entire input learning data for learning. When the text data input from the outside is classified after learning is completed, the classification unit 122 of the natural sentence classification processing unit 120 includes two classifiers 130 (classifier A and classifier B in FIG. 9). By integrating and using, text classification with high accuracy is realized.

Ａ−４．単語コーパスの補完
すでに説明したように、本実施形態では、マスター単語コーパスＣｍとドメイン単語コーパスＣｄの両方を利用することを前提としつつ、ドメイン単語コーパスＣｄを優先的に利用し、ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれないと判断した場合にのみ、マスター単語コーパスＣｍを参照し、当該マスター単語コーパスＣｍからテキストデータＤｔｗの単語を読み出す、といった作業を行う（前掲図７参照）。この際、異なるコーパス間（すなわち、マスター単語コーパスＣｍとドメイン単語コーパスＣｄとの間）では、後述するように、単純な単語の当てはめはできない。よって、本実施形態では、マスター単語コーパスＣｍに格納されているマスター単語情報と、ドメイン単語コーパスＣｄに格納されているドメイン単語情報との間で、「表現の補完」を行う。以下、表現の補完の詳細について、説明する。 A-4. Completion of Word Corpus As already described, in the present embodiment, the domain word corpus Cd is preferentially used while assuming that both the master word corpus Cm and the domain word corpus Cd are used, and the domain word corpus Cd is used. Only when it is determined that the word of the text data Dtw is not included, the master word corpus Cm is referred to, and the word of the text data Dtw is read from the master word corpus Cm (see FIG. 7). At this time, simple word fitting cannot be performed between different corpora (that is, between the master word corpus Cm and the domain word corpus Cd) as described later. Therefore, in this embodiment, “complementation of expression” is performed between the master word information stored in the master word corpus Cm and the domain word information stored in the domain word corpus Cd. The details of the expression complement will be described below.

表現の補完の前提として、ＮＬＣを用いた自然言語処理では、単語の意味を低次元の実数値ベクトルであらわす表現（いわゆる単語の分散表現）が広く利用されている。ここで、図１０は、単語の分散表現（単語分散表現）を二次元表示した図である。 As a premise for expression complementation, in natural language processing using NLC, expressions that express the meaning of words as low-dimensional real-valued vectors (so-called distributed expression of words) are widely used. Here, FIG. 10 is a diagram in which two-dimensional representations of word dispersion expressions (word dispersion expressions) are displayed.

各単語は、あらかじめ定めた任意の次元の実数値ベクトルで表すことができ、各ベクトル間のユークリッド距離を演算することで各単語の意味の近さ（すなわち、単語間の類似度）を求めることができる。図１０では、各単語「りんご」、「オレンジ」、「スマートフォン」、「高機能携帯端末」が、それぞれ2次元の実数値ベクトル（０．１，０．２）、（０．２，０．１）、（０．４，０．９）、（０．５，０．８）で表現されている場合を示している。 Each word can be expressed as a real-valued vector of any predetermined dimension, and the closeness of the meaning of each word (ie, similarity between words) is obtained by calculating the Euclidean distance between each vector. Can do. In FIG. 10, each of the words “apple”, “orange”, “smart phone”, and “high-function mobile terminal” is a two-dimensional real value vector (0.1, 0.2), (0.2, 0. 1), (0.4, 0.9), and (0.5, 0.8).

図１１は、単語分散表現の学習を説明するための図である。
図１１に示す例では、各単語の意味が近いほど単語ベクトル間の距離が近くなるように、単語分散表現の学習を行う。この結果、各単語がランダムに配置された状態から（図１１に示す「学習前」参照）、各単語の意味が近いほど単語ベクトル間の距離が近い状態に遷移する（図１１に示す「学習後」参照）。図１１では、「リンゴ」と「オレンジ」との間の距離、「スマートフォン」と「高機能携帯端末」との間の距離が短くなっている（それぞれ近づいている）ことがわかる。なお、学習に際しては、各単語が含まれる文（文書や要約など）を利用する。 FIG. 11 is a diagram for explaining learning of word dispersion expressions.
In the example shown in FIG. 11, the word dispersion expression learning is performed so that the distance between the word vectors is closer as the meaning of each word is closer. As a result, from the state where each word is randomly arranged (see “before learning” shown in FIG. 11), the closer the meaning of each word is, the closer the distance between word vectors is (see “learning” shown in FIG. See later). In FIG. 11, it can be seen that the distance between “apple” and “orange” and the distance between “smartphone” and “high function mobile terminal” are shortened (approaching each). In learning, sentences (documents, summaries, etc.) including each word are used.

図１２は、単語分散表現の学習を複数回実施したときの学習結果を示す図である。
図１２に示す学習後Ａ及び学習後Ｂを比較して明らかなように、学習後Ａ及び学習後Ｂのいずれも、意味が近い単語同士（「リンゴ」と「オレンジ」、「スマートフォン」と「高機能携帯端末」）は相対的に近くなっているものの、各単語のベクトルの値自体は、学習のたびに変わることがわかる。図１２に示す「リンゴ」と「オレンジ」を例に説明すると、学習後Ａ及び学習後Ｂのいずれも、各単語間の距離は相対的に短くなっているが、「リンゴ」のベクトル値は、学習後Ａで（０．１，０．２）、学習後Ｂで（０．３，０．８）、「オレンジ」のベクトル値は、学習後Ａで（０．２，０．１）、学習後Ｂで（０．４，０．９）といったように、各単語のベクトル値は学習のたびに変わっている。 FIG. 12 is a diagram illustrating a learning result when the word dispersion expression learning is performed a plurality of times.
As is clear from comparison between post-learning A and post-learning B shown in FIG. 12, both post-learning A and post-learning B have similar words (“apple” and “orange”, “smartphone” and “ Although the high-performance portable terminal ") is relatively close, it can be seen that the value of the vector of each word itself changes with each learning. In the example of “apple” and “orange” shown in FIG. 12, the distance between each word is relatively short in both after learning A and after learning B, but the vector value of “apple” is , After learning (0.1, 0.2), after learning B (0.3, 0.8), the vector value of “orange” is A after learning (0.2, 0.1) Then, after learning, the vector value of each word changes every time learning is performed, such as (0.4, 0.9).

単語分散表現の上記性質により、あるコーパスの単語（単語ベクトル）を別のコーパスに当てはめただけでは、単語間の相対的な距離は意味をなさない（すなわち、各単語の意味が近いほど単語ベクトル間の距離が近くなるという相関関係が崩れる）。図１３は、異なるコーパス間で単語を当てはめた状態を示す図である。図１３に示すように、コーパスＡに登録されている単語「多機能携帯」を、コーパスＢにコピーしたとする。この場合、単語「多機能携帯」の単語ベクトルは、コーパスＡ及びコーパスＢにおいていずれも（０．３，０．９）となる。しかしながら、抜き出した単語「多機能携帯」と、その他の単語（「スマートフォン」など）の間の相対的な距離は、コーパス間で大きくことなってしまう。すなわち、コーパスＡ上では、抜き出した単語「多機能携帯」が単語「スマートフォン」や単語「高機能携帯端末」の近くに位置するのに対し、コーパスＢ上では、コピーされた単語「多機能携帯」は、単語「スマートフォン」や単語「高機能携帯端末」よりも、単語「オレンジ」や単語「りんご」に近くなってしまう。 Due to the above-mentioned nature of the word distribution expression, the relative distance between words does not make sense if only a word (word vector) of one corpus is applied to another corpus (that is, the closer the meaning of each word is, the more the word vector) The correlation that the distance between them becomes closer is broken.) FIG. 13 is a diagram illustrating a state in which words are applied between different corpora. As shown in FIG. 13, it is assumed that the word “multifunctional mobile phone” registered in the corpus A is copied to the corpus B. In this case, the word vector of the word “multifunctional mobile phone” is (0.3, 0.9) in both the corpus A and the corpus B. However, the relative distance between the extracted word “multi-function mobile phone” and other words (such as “smartphone”) varies greatly between corpora. That is, on the corpus A, the extracted word “multi-function mobile phone” is located near the word “smart phone” and the word “high-function mobile terminal”, whereas on the corpus B, the copied word “multi-function mobile phone” "Is closer to the word" orange "and the word" apple "than the word" smartphone "and the word" high-function mobile terminal ".

このような問題に対応するために、本実施形態では、「表現の補完」を行う。ここで、図１４は、表現の補完を説明するための模式図である。表現の補完の実装は任意の変換関数である。具体的には、図１４に示すように、コーパスＡの任意の単語ｘをコーパスＢにコピーする場合に、線形変換の行列式Ｗｘ＋ｂにより適切な変換ができるように学習する（Ｗ；線形変換の行列、ｂ；バイアス）。適切な変換を求める方法としては、両コーパス間（図１４では、コーパスＡとコーパスＢ）に共通する単語群が、もっとも正確に変換できるものを、あらかじめ機械学習により推定・学習する方法が挙げられる。このようにして求めた最適な変換関数を利用することで、転写元のコーパスＡと転写先のコーパスＢの間において、各単語間の距離は略一定に保たれる（図１４参照）。なお、表現の補完のコンセプトは、例えば先行する論文（R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. In NIPS, 2015）等に詳しい。 In order to deal with such a problem, in this embodiment, “complementation of expression” is performed. Here, FIG. 14 is a schematic diagram for explaining complementation of expressions. The implementation of expression completion is an arbitrary conversion function. Specifically, as shown in FIG. 14, when an arbitrary word x of corpus A is copied to corpus B, learning is performed so that appropriate conversion can be performed by the linear conversion determinant Wx + b (W; linear conversion Matrix, b; bias). As a method for obtaining an appropriate conversion, there is a method in which a word group common between both corpuses (corpus A and corpus B in FIG. 14) can be converted most accurately by using machine learning in advance. . By using the optimum conversion function obtained in this way, the distance between each word is kept substantially constant between the transfer source corpus A and the transfer destination corpus B (see FIG. 14). Note that the concept of expression complementation is based on, for example, previous papers (R. Kiros, Y. Zhu, R. Salakhutdinov, RS Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. In NIPS, 2015).

図１５は、表現の補完の処理フローを示すフローチャートである。
自然文分類処理部１２０の分類部１２２は、利用者サーバ１１０からテキストデータＤｔｗを受け取ると、まず、所定の分類器１３０（分類器Ａなど）のドメイン単語コーパスＣｄを優先的に参照し（ステップＳ１）、当該ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれているかいないかをチェックする（図７参照；ステップＳ２）。分類部１２２は、ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれていると判断すると（ステップＳ２；ＹＥＳ）、ドメイン単語コーパスＣｄから該当する単語を読み出し（ステップＳ３）、対応する分類器１３０を用いてテキスト分類（すなわち、適切なカテゴリＣｙを判別・出力）を行い、分類結果を利用者サーバ１１０に返答して（ステップＳ４）、処理を終了する。 FIG. 15 is a flowchart showing a processing flow for complementing expressions.
When the classification unit 122 of the natural sentence classification processing unit 120 receives the text data Dtw from the user server 110, first, the classification unit 122 preferentially refers to the domain word corpus Cd of a predetermined classifier 130 (classifier A or the like) (step S1). S1), it is checked whether or not the word of the text data Dtw is included in the domain word corpus Cd (see FIG. 7; step S2). If the classification unit 122 determines that the word of the text data Dtw is included in the domain word corpus Cd (step S2; YES), the classification unit 122 reads out the corresponding word from the domain word corpus Cd (step S3), and the corresponding classifier 130. Is used to classify the text (that is, determine and output an appropriate category Cy), return the classification result to the user server 110 (step S4), and terminate the process.

一方、分類部１２２は、ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれていないと判断すると（ステップＳ２；ＮＯ）、マスター単語コーパスＣｍを参照する（ステップＳ５）。そして、分類部１２２は、マスター単語コーパスＣｍに該当する単語があることを見つけると、図１４に示すような表現の補完を行う（ステップＳ６）。表現の補完を行うことで、分類部１２２は、マスター単語コーパスＣｍのマスター単語情報とドメイン単語コーパスＣｄのドメイン単語情報とが混在等する状況でも、分類器１３０を適切に動作させることが可能となる。分類部１２２は、表現の補完後のマスター単語情報と、対応する分類器１３０を用いてテキスト分類を行い、分類結果を利用者サーバ１１０に返答して（ステップＳ４）、処理を終了する。 On the other hand, when the classification unit 122 determines that the word of the text data Dtw is not included in the domain word corpus Cd (step S2; NO), the classification unit 122 refers to the master word corpus Cm (step S5). Then, when the classification unit 122 finds that there is a word corresponding to the master word corpus Cm, the classification unit 122 complements the expression as shown in FIG. 14 (step S6). By performing the expression complementation, the classification unit 122 can appropriately operate the classifier 130 even in a situation where the master word information of the master word corpus Cm and the domain word information of the domain word corpus Cd are mixed. Become. The classification unit 122 performs text classification using the master word information after complementing the expression and the corresponding classifier 130, returns the classification result to the user server 110 (step S4), and ends the processing.

以上説明したように、本実施形態によれば、入力される学習データを分類器Ａ及び分類器Ｂを用いて学習させ、分類器Ａにおいて学習に利用されない学習データの一部（すなわち、分類器Ａにおける精度検証用の学習データＤｌｖ）は、分類器Ｂにおいて本来の学習用の学習データＤｌｌに割り当てる。一方、分類器Ｂにおいて学習に利用されない学習データの一部（すなわち、分類器Ｂにおける精度検証用の学習データＤｌｖ）は、分類器Ａにおいて本来の学習用の学習データＤｌｌに割り当てる。この結果、入力される学習データ全体を学習に利用しつつ、正確な精度確認（精度検証）を実現することが可能となる。分類器１３０の学習精度を上げることで、迅速かつ正確なテキスト分類が可能となり、これを活用することで、電話によるカスタマーサポート業務等において、お客からの質問等に対して迅速かつ適切な対応を実現することができる。 As described above, according to the present embodiment, the learning data to be input is learned using the classifier A and the classifier B, and a part of the learning data that is not used for learning in the classifier A (that is, the classifier The learning data Dlv) for accuracy verification in A is assigned to the learning data Dll for original learning in the classifier B. On the other hand, a part of the learning data that is not used for learning in the classifier B (that is, learning data Dlv for accuracy verification in the classifier B) is assigned to the learning data Dll for original learning in the classifier A. As a result, it is possible to realize accurate accuracy confirmation (accuracy verification) while using the entire input learning data for learning. By increasing the learning accuracy of the classifier 130, it becomes possible to classify text quickly and accurately. By utilizing this, quick and appropriate responses to questions from customers are realized in customer support operations by telephone. can do.

また、本実施形態では、各分類器１３０に共通のマスター単語コーパスＣｍと、分類器１３０ごとに設けられたドメイン単語コーパスＣｄを用いることで分類精度の向上を図っている。ここで、マスター単語コーパスＣｍは、各ドメイン単語コーパスＣｄに比べてデータ量が膨大であり、複数のバージョンを保持・管理するのは難しいため、対応する分類器１３０のドメイン単語コーパスＣｄを優先的に参照し、ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれていない場合にのみ、マスター単語コーパスＣｍを参照し、当該マスター単語コーパスＣｍからテキストデータＤｔｗの単語を読み出す、といった作業を行う。これにより、各コーパスを効率的に利用することが可能となる。各コーパスを効率的に利用することで、迅速かつ正確なテキスト分類が可能となり、これを活用することで、電話によるカスタマーサポート業務等において、お客からの質問等に対して迅速かつ適切な対応を実現することができる。 In this embodiment, the classification accuracy is improved by using a master word corpus Cm common to each classifier 130 and a domain word corpus Cd provided for each classifier 130. Here, the master word corpus Cm has an enormous amount of data compared to each domain word corpus Cd, and it is difficult to maintain and manage multiple versions. Therefore, the domain word corpus Cd of the corresponding classifier 130 is given priority. The master word corpus Cm is referred to and the word of the text data Dtw is read from the master word corpus Cm only when the domain word corpus Cd does not include the word of the text data Dtw. Thereby, each corpus can be used efficiently. By using each corpus efficiently, it is possible to classify text quickly and accurately. By using this, quick and appropriate responses to customer questions, etc. are realized in customer support operations by telephone. can do.

Ｂ．変形例
本発明は、上記の実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な変更を加えて実施することができる。すなわち、上記の実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではなく、様々な変形例を採用することができる。 B. Modifications The present invention is not limited to the above-described embodiment, and can be implemented with various other modifications within the scope not departing from the gist of the present invention. In other words, the above-described embodiment is merely an example in all respects, and is not interpreted in a limited manner, and various modifications can be employed.

上記本実施形態では、表現の補完を行う際、所定の分類器１３０（分類器Ａなど）のドメイン単語コーパスＣｄを優先的に参照する構成としたが（図１５参照）、いずれの分類器１３０のドメイン単語コーパスＣｄを最初に参照するかは、任意に設定・変更可能である。分類器１３０が複数ある場合（例えば分類器Ａ及び分類器Ｂ）、分類部１２２は、例えば分類器Ａのドメイン単語コーパスＣｄを最初に参照し、当該ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれているかいないかをチェックする。分類部１２２は、当該ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれていないと判断した場合には、分類器Ｂのドメイン単語コーパスＣｄを参照し、当該ドメイン単語コーパスＣｄにテキストデータＤｔｗの単語が含まれているかいないかをチェックする。そして、分類部１２２は、いずれのドメイン単語コーパスＣｄにもテキストデータＤｔｗの単語が含まれていないと判断した場合にはじめて、マスター単語コーパスＣｍを参照するようにしてもよい。 In the present embodiment, when complementing expressions, the domain word corpus Cd of a predetermined classifier 130 (such as classifier A) is preferentially referenced (see FIG. 15). It is possible to arbitrarily set / change whether to refer to the domain word corpus Cd. When there are a plurality of classifiers 130 (for example, classifier A and classifier B), the classifying unit 122 first refers to, for example, the domain word corpus Cd of the classifier A, and the word of the text data Dtw is included in the domain word corpus Cd. Check if it is included or not. If the classification unit 122 determines that the word of the text data Dtw is not included in the domain word corpus Cd, the classification unit 122 refers to the domain word corpus Cd of the classifier B, and stores the text data Dtw in the domain word corpus Cd. Check if the word is included or not. The classification unit 122 may refer to the master word corpus Cm only when it is determined that none of the domain word corpus Cd includes the word of the text data Dtw.

なお、本明細書において、「部」とは、単に物理的構成を意味するものではなく、その「部」が有する機能をソフトウェアによって実現する場合も含む。また、１つの「部」や装置が有する機能が２つ以上の物理的構成や装置により実現されても、２つ以上の「部」や装置の機能が１つの物理的手段や装置により実現されてもよい。 In the present specification, the “unit” does not simply mean a physical configuration, but includes a case where the function of the “unit” is realized by software. Further, even if the functions of one “unit” or device are realized by two or more physical configurations or devices, the functions of two or more “units” or devices are realized by one physical means or device. May be.

また、本明細書において上述した各処理におけるステップは、処理内容に矛盾を生じない範囲で任意に順番を変更し、または並列に実行することができる。 In addition, the steps in each process described above in this specification can be arbitrarily changed in order or executed in parallel as long as the process contents do not contradict each other.

本明細書において説明した各処理を実施するプログラムは、記録媒体に記憶させてもよい。この記録媒体を用いれば、テキスト分類装置２００に、上記プログラムをインストールすることができる。ここで、上記プログラムを記憶した記録媒体は、非一過性の記録媒体であっても良い。非一過性の記録媒体は特に限定されないが、例えば、ＣＤ−ＲＯＭ等の記録媒体であっても良い。 A program for performing each process described in this specification may be stored in a recording medium. If this recording medium is used, the program can be installed in the text classification device 200. Here, the recording medium storing the program may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be a recording medium such as a CD-ROM.

１００…テキスト分類システム、１１０…利用者サーバ、１２０…自然文分類処理部、１３０…分類器、１４０…通信インタフェース部、１５０…操作部、１６０…記憶部、２００…テキスト分類装置、Ｃｄ…マスター単語コーパス、Ｃｄ…ドメイン単語コーパス DESCRIPTION OF SYMBOLS 100 ... Text classification system, 110 ... User server, 120 ... Natural sentence classification process part, 130 ... Classifier, 140 ... Communication interface part, 150 ... Operation part, 160 ... Memory | storage part, 200 ... Text classification apparatus, Cd ... Master Word corpus, Cd ... domain word corpus

Claims

A text classification device for classifying natural language using a classifier,
Multiple classifiers;
An input unit for inputting a plurality of learning data including learning text data and a learning category;
Assigning a part of the plurality of learning data to be input for each classifier as learning data for accuracy confirmation, and assigning the remaining learning data as learning data for learning;
Learning that learns the corresponding classifier using the assigned learning data for learning, and checks the accuracy of learning of the corresponding classifier using the assigned learning data for accuracy check With
The assigning section is
A text classification device, wherein a part of the learning data assigned as learning data for accuracy check is assigned as learning data for learning in any one of the other classifiers.

The text classification device according to claim 1, further comprising a classification unit that performs text classification by integrating the classifiers with respect to text data input after learning of the classifiers is completed.

The plurality of classifiers are text classifiers using a predetermined model, and at least one of the plurality of classifiers is a text classifier using a model different from other classifiers. The text classification device according to claim 2.

The text classification device according to claim 3, wherein the at least one classifier is a text classifier using an LRC model, and the other classifier is a text classifier using a DNN model.

A computer having a storage unit is a text classification method for performing natural language classification using a classifier,
The computer executes the program stored in the storage unit,
An input step for inputting a plurality of learning data including learning text data and a learning category;
Assigning a part of the plurality of learning data to be input as learning data for accuracy check and assigning the remaining learning data as learning data for learning for each of a plurality of classifiers;
Learning that learns the corresponding classifier using the assigned learning data for learning, and checks the accuracy of learning of the corresponding classifier using the assigned learning data for accuracy check Perform steps and
The assigning step includes
A text classification method in which any one of the classifiers assigns a part of the learning data to be assigned as learning data for accuracy check to the learning data for learning in any one of the remaining classifiers.

A computer with a storage unit that performs natural language classification using a classifier,
Multiple classifiers;
An input unit for inputting a plurality of learning data including learning text data and a learning category;
Assigning a part of the plurality of learning data to be input for each classifier as learning data for accuracy confirmation, and assigning the remaining learning data as learning data for learning;
Learning that learns the corresponding classifier using the assigned learning data for learning, and checks the accuracy of learning of the corresponding classifier using the assigned learning data for accuracy check A program for functioning as a part,
The assigning section is
A text classification program for assigning a part of the learning data to be assigned as learning data for accuracy check in any one classifier as learning data for learning in any one of the remaining classifiers.