JP2022047835A

JP2022047835A - Determination program, determination method and information processing device

Info

Publication number: JP2022047835A
Application number: JP2020153836A
Authority: JP
Inventors: 直司松尾; Naoji Matsuo
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-09-14
Filing date: 2020-09-14
Publication date: 2022-03-25

Abstract

To improve analysis accuracy.SOLUTION: An information processing device calculates statistics of feature quantities of a plurality of sample data, about a plurality of datasets each including the plurality of sample data. The information processing device executes selection or determination of priorities of datasets as training data used for training of a determination model, among the plurality of datasets, based on the calculated statistics.SELECTED DRAWING: Figure 5

Description

本発明は、決定プログラム、決定方法および情報処理装置に関する。 The present invention relates to a determination program, a determination method and an information processing apparatus.

近年、コールセンターでは、オペレータと顧客との会話を録音し、録音した会話の情報を蓄積している。コールセンターには、蓄積された会話の情報を活用して、サービスを向上させたいというニーズがある。 In recent years, call centers have recorded conversations between operators and customers and accumulated information on the recorded conversations. Call centers have a need to improve their services by utilizing the accumulated conversation information.

たとえば、蓄積された会話の情報を活用する技術として、次の様なものがある。顧客とオペレータとの会話に含まれる所定のキーワードの数を判定し、判定したキーワードの数に応じて、ＦＡＱ（Frequently Asked Questions）を表示したり、スーパーバイザへ通知したりする技術がある。また、オペレータの音声を文字列に変換し、文字列に伝達対象のキーワードが含まれているか否かをチェックすることで、オペレータが顧客に伝達内容を適切に伝えているかを判定する技術がある。 For example, there are the following technologies for utilizing the accumulated conversation information. There is a technology that determines the number of predetermined keywords included in a conversation between a customer and an operator, displays FAQ (Frequently Asked Questions) according to the number of determined keywords, and notifies the supervisor. In addition, there is a technology that converts the operator's voice into a character string and checks whether the character string contains the keyword to be transmitted to determine whether the operator properly conveys the content to be transmitted to the customer. ..

特開２０１５－９９３０４号公報Japanese Unexamined Patent Publication No. 2015-99304

コールセンターでは、オペレータと顧客との会話において、予め指定した特定の会話状況が存在するか否かを検出したいというニーズがある。ここで、上記の技術を利用して、特定の会話状況が存在するか否かを検出する場合、検出したい会話状況に応じて、網羅的にキーワード設定を行い、設定したキーワードが、会話の情報に含まれているか否かを判定する処理を行うことが考えられる。 In a call center, there is a need to detect whether or not a specific conversation situation specified in advance exists in a conversation between an operator and a customer. Here, when detecting whether or not a specific conversation situation exists by using the above technology, keyword setting is comprehensively performed according to the conversation situation to be detected, and the set keyword is the conversation information. It is conceivable to perform a process of determining whether or not it is included in.

ところが、特定の会話状況を検出するために、どれだけの数のキーワードを網羅すればよいかを事前に把握することは難しい。また、同じ意味の会話でも様々な言い回しがあるため、人手によってキーワードを網羅的に設定することは難しい。 However, it is difficult to know in advance how many keywords should be covered in order to detect a specific conversation situation. In addition, it is difficult to comprehensively set keywords manually because there are various phrases even in conversations with the same meaning.

このようなことから、会話状況の分析に、機械学習により訓練された訓練済みモデルを用いることも考えられる。また、訓練済みモデルの生成にかかる時間や訓練済みモデルの性能は、訓練データの性質に依存する。しかしながら、訓練データの性質を予め特定するのが難しく、特定できたとしても多くの時間がかかる。 Therefore, it is conceivable to use a trained model trained by machine learning to analyze the conversation situation. In addition, the time required to generate the trained model and the performance of the trained model depend on the nature of the training data. However, it is difficult to specify the nature of the training data in advance, and even if it can be specified, it takes a lot of time.

一つの側面では、分析精度の向上を図ることができる決定プログラム、決定方法および情報処理装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a decision program, a decision method, and an information processing apparatus capable of improving analysis accuracy.

第１の案では、決定プログラムは、コンピュータに、それぞれが複数のサンプルデータを含む複数のデータセットについて、前記複数のサンプルデータの特徴量の統計量を算出する処理を実行させる。決定プログラムは、コンピュータに、算出した前記統計量に基づいて、前記複数のデータセットについて、判定モデルの訓練に用いる訓練データとしてのデータセットの選択または優先度の決定を実行する処理を実行させる。 In the first proposal, the determination program causes a computer to perform a process of calculating a feature amount statistic of the plurality of sample data for a plurality of data sets each including a plurality of sample data. The determination program causes a computer to execute a process of selecting a data set as training data to be used for training a judgment model or determining a priority of the plurality of data sets based on the calculated statistic.

一実施形態によれば、分析精度の向上を図ることができる。 According to one embodiment, it is possible to improve the analysis accuracy.

図１は、量子化の全体的な流れを説明する図である。FIG. 1 is a diagram illustrating the overall flow of quantization. 図２は、実施例１にかかる情報処理装置の機能構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing a functional configuration of the information processing apparatus according to the first embodiment. 図３は、更新用会話データセットの一例を説明する図である。FIG. 3 is a diagram illustrating an example of an update conversation data set. 図４は、確からしさＤＢの一例を説明する図である。FIG. 4 is a diagram illustrating an example of the certainty DB. 図５は、更新用会話データセットの確からしさの算出の流れを説明する図である。FIG. 5 is a diagram illustrating a flow of calculating the certainty of the conversation data set for update. 図６は、更新の速さを決める係数の設定を説明する図である。FIG. 6 is a diagram illustrating the setting of the coefficient that determines the update speed. 図７は、処理の流れを示すフローチャートである。FIG. 7 is a flowchart showing the flow of processing. 図８は、適用システムの全体構成例を示す図である。FIG. 8 is a diagram showing an overall configuration example of the application system. 図９は、適用システムの学習装置および判定装置の処理の一例を説明するための図である。FIG. 9 is a diagram for explaining an example of processing of the learning device and the determination device of the application system. 図１０は、学習装置の生成部を説明するための図である。FIG. 10 is a diagram for explaining a generation unit of the learning device. 図１１は、量子化テーブルの更新を説明する図である。FIG. 11 is a diagram illustrating the update of the quantization table. 図１２は、量子化テーブルの適応制御の全体的な流れを説明する図である。FIG. 12 is a diagram illustrating the overall flow of adaptive control of the quantization table. 図１３は、会話データセットの分散例を説明する図である。FIG. 13 is a diagram illustrating an example of distribution of a conversation data set. 図１４は、図１３におけるＢ線の会話データセットで量子化テーブルの更新を実行した例を説明する図である。FIG. 14 is a diagram illustrating an example in which the quantization table is updated with the conversation data set of line B in FIG. 図１５は、図１３におけるＲ線の会話データセットで量子化テーブルの更新を実行した例を説明する図である。FIG. 15 is a diagram illustrating an example in which the quantization table is updated with the conversation data set of the R line in FIG. 図１６は、実施例３にかかる処理の流れを示すフローチャートである。FIG. 16 is a flowchart showing the flow of processing according to the third embodiment. 図１７は、実施例３にかかる更新処理を説明する図である。FIG. 17 is a diagram illustrating an update process according to the third embodiment. 図１８は、適用例を説明する図である。FIG. 18 is a diagram illustrating an application example. 図１９は、ハードウェア構成例を説明する図である。FIG. 19 is a diagram illustrating a hardware configuration example.

以下に、本願の開示する決定プログラム、決定方法および情報処理装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 Hereinafter, examples of the determination program, determination method, and information processing apparatus disclosed in the present application will be described in detail with reference to the drawings. The present invention is not limited to this embodiment. In addition, each embodiment can be appropriately combined within a consistent range.

［全体的な説明］
近年、コンタクトセンターのオペレータ支援のために、応対トラブルの発生検出などを含む会話状況の分析が行われている。顧客窓口の一つとして普及が進んでいるコンタクトセンターでは、待ち時間が顧客不満に直結するので、オペレータの稼働率をあげることが重要である。しかし、クレームなど顧客とのトラブルが発生した場合は、通常よりも長い応対時間となるので、結果として、他の顧客のオペレータ接続の待ち時間が長くなる。その事態を少しでも緩和するために、コンタクトセンターの管理者がその応対を把握し、早期に引継ぐなどのサポートが必要となる。 [Overall explanation]
In recent years, in order to support the operator of the contact center, the conversation situation including the detection of the occurrence of a response trouble has been analyzed. In contact centers, which are becoming popular as one of the customer contact points, waiting time is directly linked to customer dissatisfaction, so it is important to increase the operating rate of operators. However, when a trouble with a customer such as a complaint occurs, the response time becomes longer than usual, and as a result, the waiting time for the operator connection of another customer becomes long. In order to alleviate the situation as much as possible, it is necessary for the contact center manager to understand the response and provide support such as taking over at an early stage.

管理者によるサポート実現のためには、応対トラブル発生の早期検出が必要である。そのため、応対トラブル発生時に特有な顧客とオペレータの会話の言語特徴であるキーワード（例えば、「はい」など）や音素の並びの偏りを音声認識で検出し、その結果を基に会話状況を分析する技術が用いられている。 In order to realize support by the administrator, it is necessary to detect the occurrence of response troubles at an early stage. Therefore, when a response trouble occurs, the keyword (for example, "yes"), which is a language characteristic of the conversation between the customer and the operator, and the bias of the phoneme arrangement are detected by voice recognition, and the conversation situation is analyzed based on the result. Technology is used.

しかし、言語特徴の検出のためには、音声認識用の音響モデルや言語モデルなどの言語依存情報のモデルを予め作成する必要がある。ここで、これらの作成コストが言語毎に必要であり、特に、音響モデルの作成は高コストであることから、言語依存情報を用いない会話状況分析の実現が切望されている。 However, in order to detect language features, it is necessary to create a model of language-dependent information such as an acoustic model for speech recognition and a language model in advance. Here, these creation costs are required for each language, and in particular, the creation of an acoustic model is expensive, so the realization of conversational situation analysis that does not use language-dependent information is eagerly desired.

近年では、人間の聴覚特性を参考に、パワースペクトルなどの音声の物理的な特徴量を量子化し、その結果を基にして、会話状況の分析に用いる発声の偏りを検出する技術が利用されている。 In recent years, a technique has been used to quantify the physical features of speech such as the power spectrum with reference to human auditory characteristics, and to detect the bias of vocalization used for analysis of conversational situations based on the results. There is.

図１は、量子化の全体的な流れを説明する図である。図１に示すように、音声の物理的な特徴量の分布に適応させるために、更新用である複数の音声データを含む更新用会話データセットを用いて、量子化テーブルの更新が実行される。そして、最適化された量子化テーブルを用いて、訓練用である複数の音声データを含む訓練データセットの量子化が実行され、各音声データの量子化結果を用いて判定モデルの訓練（機械学習）が実行される。その後、訓練済みの判定モデルに、リアルタイムに収集される会話データ（評価用会話データ）を入力して分析結果を取得し、応対トラブル（異常通話）の発生検出などが実行される。 FIG. 1 is a diagram illustrating the overall flow of quantization. As shown in FIG. 1, the quantization table is updated using an update conversation data set containing a plurality of speech data for update in order to adapt to the distribution of physical features of speech. .. Then, using the optimized quantization table, the quantization of the training data set containing a plurality of voice data for training is executed, and the training of the judgment model (machine learning) using the quantization result of each voice data. ) Is executed. After that, conversation data (evaluation conversation data) collected in real time is input to the trained judgment model, analysis results are acquired, and the occurrence of response trouble (abnormal call) is detected.

一般的に、通常時と異常時の会話における量子化結果の分布（平均、分散）の差が大きいほど、会話データの分析精度が向上する。例えば、収集が比較的容易である通常時の会話データを用いて量子化テーブルを更新する場合は、評価時に、通常時の会話データ（通常会話）の量子化結果の分散が小さくなり、異常時の会話データ（異常会話）の量子化結果の分散が大きくなる。 In general, the larger the difference in the distribution (mean, variance) of the quantization results between normal and abnormal conversations, the better the analysis accuracy of conversation data. For example, when updating the quantization table using normal conversation data, which is relatively easy to collect, the dispersion of the quantization result of the normal conversation data (normal conversation) becomes small at the time of evaluation, and at the time of abnormality. The dispersion of the quantization result of the conversation data (abnormal conversation) of is large.

ここで、量子化に利用する量子化テーブルの更新処理では、更新用の会話データセットの特徴量の特性によって、量子化テーブルの更新時の収束の速さが変化し、特徴量の量子化結果の分布が変化する。つまり、量子化テーブルの更新時の収束が遅い場合、評価用会話データの特徴量の量子化結果の分散が比較的大きくなる。例えば、通常会話のデータセットを用いた量子化テーブルの更新時に収束が遅い場合、更新の収束が速い場合に比べて、評価用の通常会話の量子化結果の分散が大きくなり、評価用の異常会話の特徴量の量子化結果との差が比較的小さくなる。すなわち、通常会話と異常会話の分析精度が低下する。 Here, in the quantization table update process used for quantization, the speed of convergence at the time of updating the quantization table changes depending on the characteristics of the feature amount of the conversation data set for update, and the quantization result of the feature amount. Distribution changes. That is, when the convergence at the time of updating the quantization table is slow, the variance of the quantization result of the feature amount of the conversation data for evaluation becomes relatively large. For example, when the convergence is slow when updating the quantization table using the data set of the normal conversation, the dispersion of the quantization result of the normal conversation for evaluation is larger than when the update converges quickly, and the abnormality for evaluation is large. The difference between the quantized result of the feature quantity of the conversation is relatively small. That is, the analysis accuracy of normal conversation and abnormal conversation is lowered.

そこで、開示する技術は、量子化テーブルの更新用会話データセットの特徴量の特性を調べることで、量子化テーブルの更新に適した会話データセットの選択を実現する。 Therefore, the disclosed technique realizes selection of a conversation data set suitable for updating the quantization table by examining the characteristics of the features of the conversation data set for updating the quantization table.

［機能構成］
次に、量子化テーブルの更新に適した会話データセットの選択を実現する情報処理装置５０の機能構成について説明する。図２は、実施例１にかかる情報処理装置５０の機能構成を示す機能ブロック図である。 [Functional configuration]
Next, the functional configuration of the information processing apparatus 50 that realizes the selection of the conversation data set suitable for updating the quantization table will be described. FIG. 2 is a functional block diagram showing a functional configuration of the information processing apparatus 50 according to the first embodiment.

図２に示すように、情報処理装置５０は、通信部５１、記憶部５２、制御部６０を有する。通信部５１は、他の装置との間の通信を制御する処理部であり、例えば通信インタフェースなどにより実現される。例えば、通信部５１は、管理者端末などから各種指示や各種データを受信し、管理者端末などに各種結果を送信する。 As shown in FIG. 2, the information processing apparatus 50 includes a communication unit 51, a storage unit 52, and a control unit 60. The communication unit 51 is a processing unit that controls communication with other devices, and is realized by, for example, a communication interface. For example, the communication unit 51 receives various instructions and various data from the administrator terminal and the like, and transmits various results to the administrator terminal and the like.

記憶部５２は、各種データや制御部６０が実行するプログラムなどを記憶する処理部であり、例えばメモリやハードディスクなどにより実現される。この記憶部５２は、更新用会話データセット５３と確からしさＤＢ５４を記憶する。 The storage unit 52 is a processing unit that stores various data, programs executed by the control unit 60, and the like, and is realized by, for example, a memory or a hard disk. The storage unit 52 stores the conversation data set 53 for updating and the certainty DB 54.

更新用会話データセット５３は、複数の会話データを有する複数のデータセットを有し、上述した量子化テーブルの更新に利用されるデータセットである。すなわち、更新用会話データセット５３は、特徴量の特性の調査対象である。例えば、更新用会話データセット５３は、異なるコンタクトセンターで収集された通常時の会話データのデータセットである。 The update conversation data set 53 has a plurality of data sets having a plurality of conversation data, and is a data set used for updating the above-mentioned quantization table. That is, the conversation data set 53 for update is a target for investigating the characteristics of the feature amount. For example, the update conversation data set 53 is a data set of normal conversation data collected at different contact centers.

図３は、更新用会話データセット５３の一例を説明する図である。図３に示すように、更新用会話データセット５３は、「データセット名、データ」を対応付けて記憶する。「データセット名」は、データセットを特定する情報である。「データ」は、量子化テーブルの更新に利用される会話データであり、時系列のデータである。図３の例では、会話データセットＡは、会話データＡ１、会話データＡ２などの複数の会話データで構成されることを示している。 FIG. 3 is a diagram illustrating an example of the conversation data set 53 for update. As shown in FIG. 3, the update conversation data set 53 stores the “data set name, data” in association with each other. The "data set name" is information that identifies the data set. "Data" is conversation data used to update the quantization table, and is time-series data. In the example of FIG. 3, it is shown that the conversation data set A is composed of a plurality of conversation data such as conversation data A1 and conversation data A2.

確からしさＤＢ５４は、各会話データセットに対する調査結果を記憶する。つまり、確からしさＤＢ５４は、更新用会話データセット５３に記憶される各会話データセットについて、量子化テーブルの更新時の優先度の決定などに利用される確からしさを記憶する。 The certainty DB 54 stores the survey results for each conversation data set. That is, the certainty DB 54 stores the certainty used for determining the priority at the time of updating the quantization table for each conversation data set stored in the update conversation data set 53.

図４は、確からしさＤＢ５４の一例を説明する図である。図４に示すように、確からしさＤＢ５４は、「データセット名、データ、確からしさ」を対応付けて記憶する。「データセット名」は、データセットを特定する情報であり、「データ」は、量子化テーブルの更新に利用される会話データである。「確からしさ」は、後述する制御部６０によって算出されたスコアなどであり、量子化テーブルの更新時の優先度の決定などに利用される。図４の例では、会話データセットＡは、会話データＡ１、会話データＡ２などの複数の会話データで構成され、「確からしさ」として「確からしさ１」が算出されたことを示している。 FIG. 4 is a diagram illustrating an example of the certainty DB 54. As shown in FIG. 4, the certainty DB 54 stores the “data set name, data, and certainty” in association with each other. The "data set name" is information that identifies the data set, and the "data" is conversation data used to update the quantization table. The “certainty” is a score calculated by the control unit 60, which will be described later, and is used for determining the priority at the time of updating the quantization table. In the example of FIG. 4, the conversation data set A is composed of a plurality of conversation data such as conversation data A1 and conversation data A2, and shows that "probability 1" is calculated as "probability".

制御部６０は、情報処理装置５０全体を司る処理部であり、例えばプロセッサなどにより実現される。この制御部６０は、音響解析部６１、会話統計部６２、全体統計部６３、算出部６４を有する。なお、音響解析部６１、会話統計部６２、全体統計部６３、算出部６４は、プロセッサが有する電子回路やプロセッサが実行するプロセスなどにより実現される。 The control unit 60 is a processing unit that controls the entire information processing device 50, and is realized by, for example, a processor. The control unit 60 has an acoustic analysis unit 61, a conversation statistics unit 62, an overall statistics unit 63, and a calculation unit 64. The acoustic analysis unit 61, the conversation statistics unit 62, the overall statistics unit 63, and the calculation unit 64 are realized by an electronic circuit possessed by the processor, a process executed by the processor, or the like.

この制御部６０は、更新用会話データセット５３に記憶される各会話データセットについて、確からしさを算出する。図５は、更新用会話データセットの確からしさの算出の流れを説明する図である。図５に示すように、制御部６０は、会話データごとの処理と会話データセットごとの処理とを実行する。 The control unit 60 calculates the certainty of each conversation data set stored in the update conversation data set 53. FIG. 5 is a diagram illustrating a flow of calculating the certainty of the conversation data set for update. As shown in FIG. 5, the control unit 60 executes processing for each conversation data and processing for each conversation data set.

まず、制御部６０は、１つの会話データセットを読み出し、会話データセットに含まれる会話データごとに、所定単位（所定フレーム単位）で音響解析を実行して所定単位の特徴量を算出する。そして、制御部６０は、所定単位の特徴量を用いて、会話データごとに特徴量の統計量（平均、分散）を算出する。このようにして、制御部６０は、各会話データセットの各会話データについて、特徴量を算出して蓄積する。次に、制御部６０は、各会話データセットについて、会話データセットに含まれる各会話データの特徴量を用いて、会話データセットにおける分散値を算出する。そして、制御部６０は、各会話データセットの分散値を用いて、更新用に適する確からしさを算出する。 First, the control unit 60 reads out one conversation data set, executes acoustic analysis in a predetermined unit (predetermined frame unit) for each conversation data included in the conversation data set, and calculates the feature amount of the predetermined unit. Then, the control unit 60 calculates the statistic (average, variance) of the feature amount for each conversation data by using the feature amount of a predetermined unit. In this way, the control unit 60 calculates and stores the feature amount for each conversation data of each conversation data set. Next, the control unit 60 calculates the variance value in the conversation data set for each conversation data set by using the feature amount of each conversation data included in the conversation data set. Then, the control unit 60 calculates the certainty suitable for updating by using the distributed value of each conversation data set.

音響解析部６１は、会話データの特徴量を算出する処理部である。具体的には、音響解析部６１は、更新用会話データセット５３に記憶される各会話データについて、分析フレーム単位で、特徴量の一例であるＭＦＣＣ（mel－frequency cepstrum coefficient）を算出する。 The acoustic analysis unit 61 is a processing unit that calculates a feature amount of conversation data. Specifically, the acoustic analysis unit 61 calculates MFCC (mel-frequency cepstrum coefficient), which is an example of the feature amount, for each conversation data stored in the update conversation data set 53 in units of analysis frames.

例えば、音響解析部６１は、会話データＡ１を読み出し、会話データＡ１から窓掛による分析フレーム単位の各データを取り出す。そして、音響解析部６１は、分析フレーム単位に、特徴量ＭＦＣＣ（ｉ，ｊ，ｔ）を算出し、会話統計部６２に出力する。なお、ｉはデータ番号であり、ｔは分析フレーム番号であり、ｊはＭＦＣＣの次元である。 For example, the acoustic analysis unit 61 reads out the conversation data A1 and extracts each data of the analysis frame unit by the window hanging from the conversation data A1. Then, the acoustic analysis unit 61 calculates the feature amount MFCC (i, j, t) for each analysis frame and outputs it to the conversation statistics unit 62. In addition, i is a data number, t is an analysis frame number, and j is a dimension of MFCC.

会話統計部６２は、会話データごとの統計量（平均、分散）を算出する処理部である。具体的には、会話統計部６２は、更新用会話データセット５３に記憶される各会話データセットの各会話データについて、音響解析部６１により算出された分析フレーム単位の特徴量ＭＦＣＣ（ｉ，ｊ，ｔ）を用いて、会話データにおける統計量を算出する。例えば、会話統計部６２は、会話データＡ１について分析フレーム単位で３０個の特徴量ＭＦＣＣが算出された場合、３０個の特徴量ＭＦＣＣを用いて会話データＡの統計量を算出する。 The conversation statistics unit 62 is a processing unit that calculates statistics (average, variance) for each conversation data. Specifically, the conversation statistics unit 62 has a feature amount MFCC (i, j) for each analysis frame calculated by the acoustic analysis unit 61 for each conversation data of each conversation data set stored in the update conversation data set 53. , T) is used to calculate the statistics in the conversation data. For example, when the conversation statistics unit 62 calculates 30 feature quantities MFCC for the conversation data A1 in analysis frame units, the conversation statistics unit 62 calculates the statistics of the conversation data A using the 30 feature quantities MFCC.

より詳細には、会話統計部６２は、各会話データついて、会話データにおける各分析フレーム単位の特徴量ＭＦＣＣ（ｉ，ｊ，ｔ）を用いて、特徴量ＭＦＣＣ（ｉ，ｊ，ｔ）の平均ｍ（ｉ，ｊ）を式（１）により算出し、特徴量ＭＦＣＣ（ｉ，ｊ，ｔ）の分散ｖ（ｉ，ｊ）を式（２）により算出する。そして、会話統計部６２は、算出した平均と分散を全体統計部６３に出力する。なお、式（１）および式（２）におけるＮｔは、会話データの中のフレーム数である。 More specifically, the conversation statistics unit 62 uses the feature amount MFCC (i, j, t) of each analysis frame unit in the conversation data for each conversation data, and averages the feature amount MFCC (i, j, t). m (i, j) is calculated by the formula (1), and the variance v (i, j) of the feature quantity MFCC (i, j, t) is calculated by the formula (2). Then, the conversation statistics unit 62 outputs the calculated average and variance to the overall statistics unit 63. Note that Nt in the equations (1) and (2) is the number of frames in the conversation data.

全体統計部６３は、会話データセットごとの統計量（平均、分散）を算出する処理部である。具体的には、全体統計部６３は、更新用会話データセット５３に記憶される各会話データセットについて、会話統計部６２により算出された各会話データの統計量を用いて、会話データセットにおける統計量を算出する。例えば、全体統計部６３は、会話データセットＡについて、会話データセットＡ内の会話データＡ１の平均と分散、会話データＡ２の平均と分散などを用いて、会話データセットＡの統計量（平均、分散）を算出する。 The overall statistics unit 63 is a processing unit that calculates statistics (mean, variance) for each conversation data set. Specifically, the overall statistics unit 63 uses the statistics of each conversation data calculated by the conversation statistics unit 62 for each conversation data set stored in the update conversation data set 53, and statistics in the conversation data set. Calculate the amount. For example, the overall statistics unit 63 uses the average and variance of the conversation data A1 in the conversation data set A, the average and the variance of the conversation data A2, and the like for the conversation data set A, and uses the statistics (average, average, etc.) of the conversation data set A. Dispersion) is calculated.

より詳細には、全体統計部６３は、会話データごとの平均ｍ（ｉ，ｊ）と分散ｖ（ｉ，ｊ）とを用いて、会話データセットに含まれる会話データごとの平均ｍ（ｉ，ｊ）の平均ｍｍ（ｊ）を式（３）により算出し、分散ｖ（ｉ，ｊ）の平均ｖｍ（ｊ）を式（４）により算出する。また、全体統計部６３は、会話データごとの平均ｍ（ｉ，ｊ）と分散ｖ（ｉ，ｊ）とを用いて、会話データセットに含まれる会話データごとの平均ｍ（ｉ，ｊ）の分散ｍｖ（ｊ）を式（５）により算出し、分散ｖ（ｉ，ｊ）の分散ｖｖ（ｊ）を式（６）により算出する。そして、全体統計部６３は、算出した情報を算出部６４に出力する。なお、式（３）から式（６）におけるＮｉは、会話データの数である。 More specifically, the overall statistics unit 63 uses the mean m (i, j) for each conversation data and the variance v (i, j) for the mean m (i, i, for each conversation data included in the conversation data set. The average mm (j) of j) is calculated by the equation (3), and the average vm (j) of the variance v (i, j) is calculated by the equation (4). Further, the overall statistics unit 63 uses the average m (i, j) for each conversation data and the variance v (i, j) to determine the average m (i, j) for each conversation data included in the conversation data set. The variance mv (j) is calculated by the equation (5), and the variance vv (j) of the variance v (i, j) is calculated by the equation (6). Then, the overall statistics unit 63 outputs the calculated information to the calculation unit 64. In addition, Ni in equations (3) to (6) is the number of conversation data.

算出部６４は、各会話データセットに対して、更新用に適する確からしさ算出する処理部である。具体的には、算出部６４は、更新用会話データセット５３に記憶される各会話データセットについて、全体統計部６３により算出された統計量を用いて、各会話データセットの確からしさを算出する。例えば、算出部６４は、会話データセットＡについて、会話データセットＡの統計量（平均の平均と平均の分散、分散の平均と分散の分散）を用いて確からしさ（score）を算出する。このようにして、算出部６４は、各会話データセットの確からしさを算出して、確からしさＤＢ５４に格納する。 The calculation unit 64 is a processing unit that calculates the certainty suitable for updating for each conversation data set. Specifically, the calculation unit 64 calculates the certainty of each conversation data set using the statistics calculated by the overall statistics unit 63 for each conversation data set stored in the update conversation data set 53. .. For example, the calculation unit 64 calculates the certainty (score) of the conversation data set A using the statistics of the conversation data set A (mean and average variance, variance of variance and variance). In this way, the calculation unit 64 calculates the certainty of each conversation data set and stores it in the certainty DB 54.

より詳細には、算出部６４は、式（７）を用いて、各データセットにおける分散ｍｖ（ｉ，ｊ）とｖｖ（ｉ，ｊ）の重み付け加算と逆数計算により、確からしさ（score）を算出する。なお、式（７）におけるＮｊは、ＭＦＣＣの次数である。つまり、算出部６４は、会話データセットの分散の値が小さいほど、量子化テーブルの更新用に適している確からしさが高くなるように算出する。 More specifically, the calculation unit 64 uses the equation (7) to perform weighting addition and reciprocal calculation of the variances mv (i, j) and vv (i, j) in each data set to obtain a certainty (score). calculate. Nj in the equation (7) is an order of MFCC. That is, the calculation unit 64 calculates that the smaller the variance value of the conversation data set, the higher the certainty that it is suitable for updating the quantization table.

ここで、量子化テーブルは、白色雑音などで初期テーブルが生成され、更新用会話データセットの量子化誤差が最小になるように更新される。このとき、算出部６４は、更新用に適する確からしさが高い場合に更新が速くなるように、更新の速さを制御することができる。 Here, the quantization table is updated so that the initial table is generated due to white noise or the like and the quantization error of the conversation data set for update is minimized. At this time, the calculation unit 64 can control the update speed so that the update speed becomes faster when the certainty suitable for the update is high.

図６は、更新の速さを決める係数の設定を説明する図である。図６に示すように、係数αは、最小値を０、最大値を１として、確からしさが大きいほど大きい値に設定される。このような係数αを用いた量子化テーブルの更新式は、「更新後の量子化点ｎ＝（（１－α）×更新前の量子化点ｎ）＋（α×更新前の量子化点ｎで量子化された特徴量の平均）」で表すことができる。なお、更新の速さを決める係数αは、図６に示すように、０≦α≦１の範囲で、α＝１のときが最速であり、α＝０のときは更新が実行されない。 FIG. 6 is a diagram illustrating the setting of the coefficient that determines the update speed. As shown in FIG. 6, the coefficient α is set to a value as the minimum value is 0 and the maximum value is 1, and the greater the certainty, the larger the value. The update formula of the quantization table using such a coefficient α is "quantization point after update n = ((1-α) × quantization point n before update) + (α × quantization point before update). It can be expressed by "the average of the feature quantities quantized by n)". As shown in FIG. 6, the coefficient α that determines the update speed is the fastest in the range of 0 ≦ α ≦ 1 when α = 1, and the update is not executed when α = 0.

［処理の流れ］
図７は、処理の流れを示すフローチャートである。図７に示すように、音響解析部６１は、処理開始が指示されると（Ｓ１０１：Ｙｅｓ）、分析フレーム単位で会話データの特徴量を算出する音響解析を実行する（Ｓ１０２）。 [Processing flow]
FIG. 7 is a flowchart showing the flow of processing. As shown in FIG. 7, when the processing start is instructed (S101: Yes), the acoustic analysis unit 61 executes an acoustic analysis for calculating the feature amount of the conversation data in the analysis frame unit (S102).

続いて、会話統計部６２は、各分析フレーム単位の特徴量を用いて、会話データごとの平均と分散を算出する（Ｓ１０３）。そして、全体統計部６３は、会話データごとの平均と分散を用いて、各更新用会話データセットにおける平均と分散を算出する（Ｓ１０４）。 Subsequently, the conversation statistics unit 62 calculates the average and variance for each conversation data using the feature amount of each analysis frame unit (S103). Then, the overall statistics unit 63 calculates the average and variance in each update conversation data set by using the average and variance for each conversation data (S104).

その後、算出部６４は、各更新用会話データセットにおける平均と分散を用いて、各会話データセットの確からしさを算出し（Ｓ１０５）、各会話データセットの確からしさを用いて、更新の速さを決める係数を設定する（Ｓ１０６）。 After that, the calculation unit 64 calculates the certainty of each conversation data set by using the average and the variance in each conversation data set (S105), and uses the certainty of each conversation data set to update the speed. A coefficient for determining the value is set (S106).

［効果］
上述したように、情報処理装置５０は、量子化テーブルの更新前に、更新用の各会話データセットについて更新用に適している度合いを示す確からしさを算出することができる。このように、量子化テーブルの更新用会話データセットの特徴量の特性を調べることで、量子化テーブルの更新に適した会話データセットの選択を実現することができるので、結果として、特定の会話状況が音声データに含まれているか否かを分析する分析精度の向上を図ることができる。 [effect]
As described above, the information processing apparatus 50 can calculate the certainty indicating the degree of suitability for updating each conversation data set for updating before updating the quantization table. In this way, by examining the characteristics of the features of the conversation data set for updating the quantization table, it is possible to realize the selection of the conversation data set suitable for updating the quantization table, and as a result, a specific conversation. It is possible to improve the analysis accuracy of analyzing whether or not the situation is included in the voice data.

次に、実施例１で説明した「確からしさ」が算出された更新用会話データセットを用いて量子化テーブルの更新を実行し、更新後の量子化テーブルを用いて会話分析を実行する適用システムについて説明する。 Next, an application system that updates the quantization table using the update conversation data set for which the "probability" described in the first embodiment is calculated, and executes conversation analysis using the updated quantization table. Will be explained.

［適用システム］
図８は、適用システムの全体構成例を示す図である。図８に示すように、このシステムは、顧客端末１０と、オペレータ端末１５と、通話録音装置３０と、管理者端末４０と、学習装置１００と、判定装置２００とを有する。なお、学習装置１００に実施例１で説明した情報処理装置５０が組み込まれていてもよく、別々の装置で実現することもできる。 [Applicable system]
FIG. 8 is a diagram showing an overall configuration example of the application system. As shown in FIG. 8, this system includes a customer terminal 10, an operator terminal 15, a call recording device 30, an administrator terminal 40, a learning device 100, and a determination device 200. The information processing device 50 described in the first embodiment may be incorporated in the learning device 100, or may be realized by separate devices.

顧客端末１０と、オペレータ端末１５とは、ＩＰ（Internet Protocol）網等のネットワーク１を介して相互に接続される。また、オペレータ端末１５、通話録音装置３０、管理者端末４０、学習装置１００、判定装置２００も所定のネットワークにより、相互に接続される。なお、各ネットワークには、有線や無線を問わず、インターネットや専用線などの各種通信網を採用することができる。 The customer terminal 10 and the operator terminal 15 are connected to each other via a network 1 such as an IP (Internet Protocol) network. Further, the operator terminal 15, the call recording device 30, the administrator terminal 40, the learning device 100, and the determination device 200 are also connected to each other by a predetermined network. In addition, various communication networks such as the Internet and a dedicated line can be adopted for each network regardless of whether they are wired or wireless.

顧客端末１０は、顧客がオペレータと会話（通話）するために利用する端末装置である。オペレータ端末１５は、オペレータが顧客と会話するために利用する端末装置である。 The customer terminal 10 is a terminal device used by a customer to have a conversation (call) with an operator. The operator terminal 15 is a terminal device used by an operator to have a conversation with a customer.

通話録音装置３０は、顧客端末１０と、オペレータ端末１５との間で送受信される会話の音声を録音する装置である。学習時において、通話録音装置３０が録音した音声データは、学習装置１００に通知され、学習用の音声データとして用いられる。異常会話の検出時において、通話録音装置３０が録音した音声データは、判定装置２００に通知され、音声データに異常な会話状況が含まれるか否かが判定される。 The call recording device 30 is a device that records the voice of a conversation transmitted and received between the customer terminal 10 and the operator terminal 15. At the time of learning, the voice data recorded by the call recording device 30 is notified to the learning device 100 and used as voice data for learning. When the abnormal conversation is detected, the voice data recorded by the call recording device 30 is notified to the determination device 200, and it is determined whether or not the voice data includes an abnormal conversation situation.

管理者端末４０は、オペレータ端末１５を用いて、顧客と会話するオペレータを管理する管理者が利用する端末装置である。たとえば、判定装置２００が、顧客とオペレータとの会話に、異常な会話状況が含まれると判定した場合に、判定装置により、異常な会話状況を検出した旨の情報が、管理者端末４０に通知される。 The administrator terminal 40 is a terminal device used by an administrator who manages an operator who talks with a customer by using the operator terminal 15. For example, when the determination device 200 determines that the conversation between the customer and the operator includes an abnormal conversation situation, the determination device notifies the administrator terminal 40 of information that the abnormal conversation situation has been detected. Will be done.

学習装置１００は、学習用（訓練用）の音声データと正解情報とを用いて、図９のＬＳＴＭ（Long Short Term Memory）モデル１１０ｃ、ＤＮＮ（Deep Neural Network）モデル１１０ｄを学習（訓練）する装置である。学習装置１００は、学習したＬＳＴＭモデル１１０ｃ、ＤＮＮモデル１１０ｄの情報を、判定装置２００に通知する。 The learning device 100 is a device that learns (trains) the LSTM (Long Short Term Memory) model 110c and the DNN (Deep Neural Network) model 110d of FIG. 9 using the learning (training) voice data and the correct answer information. Is. The learning device 100 notifies the determination device 200 of the information of the learned LSTM model 110c and the DNN model 110d.

判定装置２００は、学習装置１００から通知されるＬＳＴＭモデル１１０ｃ、ＤＮＮモデル１１０ｄを用いて、顧客端末１０と、オペレータ端末１５との会話に、異常な会話状況が含まれるか否かを判定する装置である。判定装置２００は、顧客とオペレータとの会話に、異常な会話状況が含まれると判定した場合、異常な会話状況を検出した旨の情報を、管理者端末４０に通知する。 The determination device 200 uses the LSTM model 110c and the DNN model 110d notified from the learning device 100 to determine whether or not the conversation between the customer terminal 10 and the operator terminal 15 includes an abnormal conversation situation. Is. When the determination device 200 determines that the conversation between the customer and the operator includes an abnormal conversation situation, the determination device 200 notifies the administrator terminal 40 of information that the abnormal conversation situation has been detected.

［処理の説明］
図９は、適用システムの学習装置および判定装置の処理の一例を説明するための図である。図９に示すように、学習装置１００は、学習用音声データベース１１０ａと、生成部１２０と、第１計算部１３０と、第３計算部１４０と、第２計算部１５０と、学習部１６０とを有する。 [Description of processing]
FIG. 9 is a diagram for explaining an example of processing of the learning device and the determination device of the application system. As shown in FIG. 9, the learning device 100 includes a learning voice database 110a, a generation unit 120, a first calculation unit 130, a third calculation unit 140, a second calculation unit 150, and a learning unit 160. Have.

学習用音声データベース１１０ａは、各会話データを分割すること等により生成される複数の学習用の音声データを格納し、各学習用の音声データは、正解情報１１０ｂにそれぞれ対応付けられる。正解情報１１０ｂは、音声データごとに付与された、特定の会話状況が含まれるか否かを示す情報である。本実施例１では一例として、特定の会話状況を、「異常な会話状況」とする。異常な会話状況とは、顧客が不満を感じたり、怒り出したり、脅迫したりするなど、「通常でない状況」を含むものである。 The learning voice database 110a stores a plurality of learning voice data generated by dividing each conversation data, and each learning voice data is associated with the correct answer information 110b. The correct answer information 110b is information given to each voice data indicating whether or not a specific conversation situation is included. In the first embodiment, as an example, a specific conversation situation is referred to as an "abnormal conversation situation". Anomalous conversational situations include "unusual situations" such as customer dissatisfaction, anger, or intimidation.

生成部１２０は、学習用音声データベース１１０ａから学習用の音声データを取得する。以下の学習装置１００の説明において、学習用音声データベースから取得された学習用の音声データを、単に「音声データ」と表記する場合がある。生成部１２０は、音声データに対して、例えばベクトル量子化を行い、量子化結果の情報（量子化系列）を生成する。たとえば、量子化系列は、発声の偏りを示す指標の一例である。生成部１２０は、各量子化結果をOne Hotベクトル化し、各量子化結果のOne Hotベクトルを、第１計算部１３０に出力する。 The generation unit 120 acquires learning voice data from the learning voice database 110a. In the following description of the learning device 100, the learning voice data acquired from the learning voice database may be simply referred to as "voice data". The generation unit 120 performs, for example, vector quantization on the voice data, and generates information (quantization series) of the quantization result. For example, the quantization series is an example of an index showing the bias of vocalization. The generation unit 120 converts each quantization result into a One Hot vector, and outputs the One Hot vector of each quantization result to the first calculation unit 130.

第１計算部１３０は、再帰パスを持つ第１ネットワークに、量子化結果のOne Hotベクトルを順に入力し、第１ネットワークのパラメータに基づく計算を行うことで、内部ベクトルを算出する処理部である。たとえば、第１ネットワークは、ＬＳＴＭに対応する。第１計算部１３０は、音声データから生成される量子化結果のOne Hotベクトルを第１ネットワークに入力し、入力して得られる各内部ベクトルを、第３計算部１４０に出力する。 The first calculation unit 130 is a processing unit that calculates an internal vector by sequentially inputting a One Hot vector of a quantization result into a first network having a recursive path and performing a calculation based on the parameters of the first network. .. For example, the first network corresponds to LSTM. The first calculation unit 130 inputs the One Hot vector of the quantization result generated from the voice data to the first network, and outputs each internal vector obtained by inputting to the third calculation unit 140.

第３計算部１４０は、第１計算部１３０から出力される複数の内部ベクトルを平均化する処理部である。第３計算部１４０は、平均化した内部ベクトルを、第２計算部１５０に出力する。以下の説明では、平均化した内部ベクトルを「平均ベクトル」を表記する。 The third calculation unit 140 is a processing unit that averages a plurality of internal vectors output from the first calculation unit 130. The third calculation unit 140 outputs the averaged internal vector to the second calculation unit 150. In the following description, the averaged internal vector is referred to as an "average vector".

第２計算部１５０は、再帰パスを持たない第２ネットワークに、平均ベクトルを入力し、第２ネットワークのパラメータに基づく計算を行うことで、出力値（ニューロン値）を算出する処理部である。第２計算部１５０は、出力値を、学習部１６０に出力する。 The second calculation unit 150 is a processing unit that calculates an output value (neuron value) by inputting an average vector into the second network having no recursive path and performing a calculation based on the parameters of the second network. The second calculation unit 150 outputs the output value to the learning unit 160.

学習部１６０は、音声データを第１計算部１３０に入力した際に、第２計算部１５０から出力される出力値が、音声データに対応する正解情報１１０ｂに近づくように、第１計算部１３０のパラメータ、第２計算部１５０のパラメータを学習（誤差逆伝播法による学習）する。 When the learning unit 160 inputs the voice data to the first calculation unit 130, the first calculation unit 130 so that the output value output from the second calculation unit 150 approaches the correct answer information 110b corresponding to the voice data. The parameter of the second calculation unit 150 and the parameter of the second calculation unit 150 are learned (learning by the error back propagation method).

学習部１６０は、学習停止条件を満たすまで、誤差逆伝播学習を繰り返し実行し、ＬＳＴＭモデル１１０ｃ、ＤＮＮモデル１１０ｄを生成する。ＬＳＴＭモデル１１０ｃは、学習済みの第１ネットワークのパラメータに対応する情報である。ＤＮＮモデル１１０ｄは、学習済みの第２ネットワークのパラメータに対応する情報である。学習装置１００は、ＬＳＴＭモデル１１０ｃの情報およびＤＮＮモデル１１０ｄの情報を、判定装置２００に通知する。なお、学習部１６０は、ネットワークを介して、ＬＳＴＭモデル１１０ｃの情報およびＤＮＮモデル１１０ｄの情報を、判定装置２００に通知してもよいし、学習装置１００と、判定装置２００とを直接接続した上で、ＬＳＴＭモデル１１０ｃの情報およびＤＮＮモデル１１０ｄの情報を、判定装置２００に通知してもよい。 The learning unit 160 repeatedly executes error back propagation learning until the learning stop condition is satisfied, and generates an LSTM model 110c and a DNN model 110d. The LSTM model 110c is information corresponding to the trained parameters of the first network. The DNN model 110d is information corresponding to the learned parameters of the second network. The learning device 100 notifies the determination device 200 of the information of the LSTM model 110c and the information of the DNN model 110d. The learning unit 160 may notify the determination device 200 of the information of the LSTM model 110c and the information of the DNN model 110d via the network, or the learning device 100 and the determination device 200 are directly connected to each other. Then, the information of the LSTM model 110c and the information of the DNN model 110d may be notified to the determination device 200.

判定装置２００は、生成部２２０と、第１計算部２３０と、第３計算部２４０と、第２計算部２５０と、判定部２６０とを有する。 The determination device 200 includes a generation unit 220, a first calculation unit 230, a third calculation unit 240, a second calculation unit 250, and a determination unit 260.

生成部２２０は、異常な会話状況であるか否かの検出対象となる音声データの入力を受け付ける。以下の判定装置２００の説明において、異常な会話状況であるか否かの検出対象となる音声データを、単に、音声データと表記する場合がある。生成部２２０は、音声データに対して、例えばベクトル量子化を行い、量子化結果の情報を生成する。生成部２２０は、各量子化結果をOne Hotベクトル化し、各量子化結果のOne Hotベクトルを、第１計算部２３０に出力する。 The generation unit 220 accepts the input of voice data to be detected as to whether or not the conversation status is abnormal. In the following description of the determination device 200, the voice data to be detected whether or not the conversation status is abnormal may be simply referred to as voice data. The generation unit 220 performs, for example, vector quantization on the voice data, and generates information on the quantization result. The generation unit 220 converts each quantization result into a One Hot vector, and outputs the One Hot vector of each quantization result to the first calculation unit 230.

第１計算部２３０は、再帰パスを持つ第１ネットワークに、各量子化結果のOne Hotベクトルを順に入力し、第１ネットワークのパラメータに基づく計算を行うことで、内部ベクトルを算出する処理部である。第１計算部２３０は、第１ネットワークに設定するパラメータとして、ＬＳＴＭモデル１１０ｃのパラメータを用いる。第１計算部２３０は、音声データから生成される量子化結果のOne Hotベクトルを第１ネットワークに入力し、入力して得られる各内部ベクトルを、第３計算部２４０に出力する。 The first calculation unit 230 is a processing unit that calculates an internal vector by sequentially inputting the One Hot vector of each quantization result into the first network having a recursive path and performing a calculation based on the parameters of the first network. be. The first calculation unit 230 uses the parameters of the LSTM model 110c as the parameters to be set in the first network. The first calculation unit 230 inputs the One Hot vector of the quantization result generated from the voice data to the first network, and outputs each internal vector obtained by inputting to the third calculation unit 240.

第３計算部２４０は、第１計算部１３０から出力される複数の内部ベクトルを平均化する処理部である。第３計算部１４０は、平均化した内部ベクトル（平均ベクトル）を、第２計算部２５０に出力する。 The third calculation unit 240 is a processing unit that averages a plurality of internal vectors output from the first calculation unit 130. The third calculation unit 140 outputs the averaged internal vector (average vector) to the second calculation unit 250.

第２計算部２５０は、再帰パスを持たない第２ネットワークに、平均ベクトルを入力し、第２ネットワークのパラメータに基づく計算を行うことで、出力値（ニューロン値）を算出する処理部である。第２計算部２５０は、第２ネットワークに設定するパラメータとして、ＤＮＮモデル１１０ｄのパラメータを用いる。第２計算部２５０は、出力値を、判定部２６０に出力する。 The second calculation unit 250 is a processing unit that calculates an output value (neuron value) by inputting an average vector into the second network having no recursive path and performing a calculation based on the parameters of the second network. The second calculation unit 250 uses the parameters of the DNN model 110d as the parameters to be set in the second network. The second calculation unit 250 outputs the output value to the determination unit 260.

判定部２６０は、第２計算部２５０から出力される出力値と、閾値とを比較して、音声データに、異常な会話状況が含まれているか否かを判定する処理部である。たとえば、判定部２６０は、出力値が閾値以上である場合に、音声データに異常な会話状況が含まれていると判定する。 The determination unit 260 is a processing unit that compares the output value output from the second calculation unit 250 with the threshold value and determines whether or not the voice data includes an abnormal conversation situation. For example, the determination unit 260 determines that the voice data includes an abnormal conversation situation when the output value is equal to or greater than the threshold value.

上記のように、学習装置１００は、学習用の音声データから抽出した量子化結果と、正解情報との組を用いて、ＬＳＴＭモデル１１０ｃおよびＤＮＮモデル１１０ｄの機械学習を実行する。このため、特定の会話状況を検出するためのキーワードを設定するための試行錯誤、熟練の知識、ノウハウを用いることなく、ＬＳＴＭモデル１１０ｃおよびＤＮＮモデル１１０ｄの機械学習を実行することができる。また、判定装置２００が、学習済みのＬＳＴＭモデル１１０ｃおよびＤＮＮモデル１１０ｄを用いて、音声データに対する処理を行うことで、特定の会話状況が音声データに含まれているか否かを適切に判定することができる。 As described above, the learning device 100 executes machine learning of the LSTM model 110c and the DNN model 110d by using the set of the quantization result extracted from the learning voice data and the correct answer information. Therefore, machine learning of the LSTM model 110c and the DNN model 110d can be executed without using trial and error, skillful knowledge, and know-how for setting a keyword for detecting a specific conversation situation. Further, the determination device 200 appropriately determines whether or not a specific conversation situation is included in the voice data by processing the voice data using the learned LSTM model 110c and the DNN model 110d. Can be done.

［生成部の説明］
図１０は、学習装置の生成部を説明するための図である。図１０に示すように、この生成部１２０は、音響処理部１２１と、量子化テーブル１２２と、ベクトル量子化部１２３と、ベクトル化部１２４とを有する。 [Explanation of generator]
FIG. 10 is a diagram for explaining a generation unit of the learning device. As shown in FIG. 10, the generation unit 120 includes an acoustic processing unit 121, a quantization table 122, a vector quantization unit 123, and a vectorization unit 124.

音響処理部１２１は、音声データから音声認識に用いる情報を抽出する処理部である。音声データから抽出される情報は、特徴量と呼ばれる。音響処理部１２１は、音声データに、３２ｍｓ程度のフレームと呼ばれる短区間を設定し、１０ｍｓ程度シフトさせながら特徴量を抽出する。たとえば、音響処理部１２１は、ＭＦＣＣを基にして、音声データから特徴量を抽出する。音響処理部１２１は、特徴量を、ベクトル量子化部１２３に出力する。 The sound processing unit 121 is a processing unit that extracts information used for voice recognition from voice data. The information extracted from the voice data is called a feature quantity. The sound processing unit 121 sets a short section called a frame of about 32 ms in the voice data, and extracts a feature amount while shifting by about 10 ms. For example, the sound processing unit 121 extracts a feature amount from voice data based on the MFCC. The sound processing unit 121 outputs the feature amount to the vector quantization unit 123.

量子化テーブル１２２は、音声データの量子化に用いるベクトルテーブルである。この量子化テーブル１２２は、発話データの特徴量の複数の代表点であって、量子化テーブル１２２の適応処理用の各音声データを用いて、量子化テーブル生成部１２５で事前に最適化される。 The quantization table 122 is a vector table used for quantization of voice data. This quantization table 122 is a plurality of representative points of feature quantities of speech data, and is pre-optimized by the quantization table generation unit 125 using each voice data for adaptive processing of the quantization table 122. ..

ベクトル量子化部１２３は、量子化テーブル１２２と特徴量を照合し、各量子化点と対応している各量子化結果（例えば量子化番号に対応する）を基にして、特徴量に対応する量子化結果を出力する処理を、音響処理部１２１から特徴量を受け付ける度に、実行する処理部である。ベクトル量子化部１２３は、各特徴量に対応する量子化結果を時系列に並べた量子化系列の情報を、ベクトル化部１２４に出力する。 The vector quantization unit 123 collates the feature quantity with the quantization table 122, and corresponds to the feature quantity based on each quantization result (for example, corresponding to the quantization number) corresponding to each quantization point. It is a processing unit that executes a process of outputting a quantization result every time a feature amount is received from the acoustic processing unit 121. The vector quantization unit 123 outputs the information of the quantization series in which the quantization results corresponding to each feature amount are arranged in time series to the vectorization unit 124.

ベクトル化部１２４は、量子化系列に含まれる各量子化結果をベクトルに変換する処理部である。ベクトル化部は、量子化系列の各量子化結果に対応する各ベクトルを、第１計算部１３０に出力する。たとえば、ベクトル化部１２４は、各量子化結果を、４０次元のOne Hotベクトルで表す。入力された量子化結果のOne Hotベクトルは、入力された量子化結果の次元に「１」が設定され、他の次元に「０」が設定される。 The vectorization unit 124 is a processing unit that converts each quantization result included in the quantization series into a vector. The vectorization unit outputs each vector corresponding to each quantization result of the quantization series to the first calculation unit 130. For example, the vectorization unit 124 represents each quantization result as a 40-dimensional One Hot vector. In the input One Hot vector of the quantization result, "1" is set in the dimension of the input quantization result, and "0" is set in the other dimensions.

量子化テーブル生成部１２５は、量子化テーブル１２２を生成する処理部である。具体的には、量子化テーブル生成部１２５は、実施例１により算出された確からしさを有する更新用会話データセットを用いて、量子化テーブル１２２を生成する。例えば、量子化テーブル生成部１２５は、確からしさが最も高い更新用会話データセット、または、確からしさが閾値以上である更新用会話データセットを用いて、量子化テーブル１２２の更新を実行する。 The quantization table generation unit 125 is a processing unit that generates the quantization table 122. Specifically, the quantization table generation unit 125 generates the quantization table 122 using the update conversation data set having the certainty calculated by the first embodiment. For example, the quantization table generation unit 125 updates the quantization table 122 by using the update conversation data set having the highest probability or the update conversation data set having the probability equal to or higher than the threshold value.

図１１は、量子化テーブル１２２の更新を説明する図である。図１１に示すように、学習装置１００の生成部１２０の量子化テーブル生成部１２５は、量子化テーブル１２２の最適化を行う場合に、更新用会話データセットの一例である適応処理用の各音声データに音響解析を実行して、音響処理部１２１で特徴量を生成する。そして、量子化テーブル生成部１２５は、各音声データから生成された各特徴量と量子化テーブル１２２とを突合させて量子化結果を蓄積し、量子化誤差が最小になるように量子化テーブル１２２の更新を繰り返すことにより、量子化テーブル１２２を最適化する。 FIG. 11 is a diagram illustrating the update of the quantization table 122. As shown in FIG. 11, the quantization table generation unit 125 of the generation unit 120 of the learning device 100 performs each voice for adaptive processing which is an example of the conversation data set for update when the quantization table 122 is optimized. Acoustic analysis is performed on the data, and the acoustic processing unit 121 generates a feature amount. Then, the quantization table generation unit 125 collates each feature amount generated from each voice data with the quantization table 122, accumulates the quantization result, and minimizes the quantization error. Quantization table 122 is optimized by repeating the update of.

図１２は、量子化テーブル１２２の適応制御の全体的な流れを説明する図である。ここでは、２次元の特徴量としている。図１２に示すように、量子化テーブル生成部１２５は、白色雑音などから量子化テーブル１２２の初期値を生成し、適応処理用の音声データの量子化を行い、量子化テーブル１２２の各ベクトルを選択した特徴量の平均を用いてベクトルを更新し、この更新を繰り返すことにより、量子化誤差を削減する。つまり、量子化テーブル生成部１２５は、音声データの物理的な特徴量の分布に合わせ、量子化誤差が最小になるように、量子化テーブル１２２の更新を繰り返す。 FIG. 12 is a diagram illustrating an overall flow of adaptive control of the quantization table 122. Here, it is a two-dimensional feature quantity. As shown in FIG. 12, the quantization table generation unit 125 generates the initial value of the quantization table 122 from white noise and the like, quantizes the voice data for adaptive processing, and converts each vector of the quantization table 122. The vector is updated with the average of the selected feature quantities, and this update is repeated to reduce the quantization error. That is, the quantization table generation unit 125 repeatedly updates the quantization table 122 so as to minimize the quantization error according to the distribution of the physical features of the voice data.

このとき、量子化テーブル生成部１２５は、確からしさが最も高い会話データセット、または、確からしさが閾値以上である会話データセットを用いて、量子化テーブル１２２の更新を実行する。また、量子化テーブル生成部１２５は、実施例１で算出される係数αを用いて更新の速さを制御する。 At this time, the quantization table generation unit 125 updates the quantization table 122 using the conversation data set having the highest probability or the conversation data set having the probability equal to or higher than the threshold value. Further, the quantization table generation unit 125 controls the update speed by using the coefficient α calculated in the first embodiment.

［効果］
上述したように、学習装置１００は、確からしさが高い会話データセットを用いて量子化テーブルの最適化を実行することができるので、量子化テーブルの最適化の収束遅延を抑制でき、適切なデータを用いた最適な収束速度で、量子化テーブルを最適化することができる。 [effect]
As described above, since the learning device 100 can execute the optimization of the quantized table using the conversation data set with high certainty, it is possible to suppress the convergence delay of the optimization of the quantized table, and appropriate data. The quantization table can be optimized with the optimum convergence rate using.

ここで、最適化（更新）に利用した会話データセットの具体例を説明する。図１３は、会話データセットの分散例を説明する図である。図１３では、異なるコンタクトセンターで収集された２つの通常会話時のデータセットを用いて説明する。各通常会話時のデータセットは、３０個の会話データから構成され、会話データごとの特徴量（ＭＦＣＣ）の平均と分散を算出し、さらにデータセット全体における分散を算出する。 Here, a specific example of the conversation data set used for optimization (update) will be described. FIG. 13 is a diagram illustrating an example of distribution of a conversation data set. FIG. 13 illustrates using two normal conversation data sets collected at different contact centers. Each normal conversation data set is composed of 30 conversation data, calculates the average and variance of the feature amount (MFCC) for each conversation data, and further calculates the variance in the entire data set.

図１３の（ａ）は、会話データごとの平均を用いて、会話データセットにおける分散を算出した例を示している。つまり、会話データセットにおける会話データの平均の分散ｍｖ（ｊ）である。図１３の（ｂ）は、会話データごとの分散を用いて、会話データセットにおける分散を算出した例を示している。つまり、会話データセットにおける会話データの分散の分散ｖｖ（ｊ）である。図１３の（ａ）および（ｂ）に示すように、Ｂ線の会話データセットの方がＲ線の会話データセットよりも分散が小さく、量子化テーブルの更新に適している。 FIG. 13A shows an example of calculating the variance in a conversation data set using the average for each conversation data. That is, it is the average variance mv (j) of the conversation data in the conversation data set. FIG. 13B shows an example of calculating the variance in a conversation data set using the variance for each conversation data. That is, the variance vv (j) of the variance of the conversation data in the conversation dataset. As shown in FIGS. 13 (a) and 13 (b), the B-line conversation data set has a smaller variance than the R-line conversation data set, and is suitable for updating the quantization table.

図１３に示す各線に該当する会話データセットを用いて、量子化テーブルの更新を実行した結果を説明する。図１４は、図１３におけるＢ線の会話データセットで量子化テーブルの更新を実行した例を説明する図であり、図１５は、図１３におけるＲ線の会話データセットで量子化テーブルの更新を実行した例を説明する図である。 The result of updating the quantization table will be described using the conversation data set corresponding to each line shown in FIG. FIG. 14 is a diagram illustrating an example in which the quantization table is updated with the B-line conversation data set in FIG. 13, and FIG. 15 is a diagram showing the update of the quantization table with the R-line conversation data set in FIG. It is a figure explaining the execution example.

図１４には、図１３におけるＢ線に該当する会話データセットを用いて、通常会話の特徴量で更新した量子化テーブルを用い、評価用の通常会話と異常会話の各約２００会話（音声区間）を量子化した結果の平均と分散が示される。図１５には、図１３におけるＲ線に該当する会話データセットを用いて、通常会話の特徴量で更新した量子化テーブルを用い、評価用の通常会話と異常会話の各約２００会話（音声区間）を量子化した結果の平均と分散が示される。 In FIG. 14, the conversation data set corresponding to line B in FIG. 13 is used, and the quantization table updated with the features of the normal conversation is used, and about 200 conversations (voice section) each of the normal conversation and the abnormal conversation for evaluation are used. ) Is quantized and the mean and variance are shown. In FIG. 15, the conversation data set corresponding to the R line in FIG. 13 is used, and the quantization table updated with the features of the normal conversation is used. ) Is quantized and the mean and variance are shown.

図１４の（ａ）に示すように、Ｂ線に該当する会話データセットを用いて最適化した量子化テーブルを用いて、通常会話を量子化した量子化結果の平均と異常会話を量子化した量子化結果の平均との相関が「0.68」であった。また、図１４の（ｂ）に示すように、Ｂ線に該当する会話データセットを用いて最適化した量子化テーブルを用いて、通常会話を量子化した量子化結果の分散と異常会話を量子化した量子化結果の分散との相関が「0.15」であった。 As shown in FIG. 14 (a), the average of the quantization results obtained by quantizing the normal conversation and the anomalous conversation were quantized using the quantization table optimized using the conversation data set corresponding to the B line. The correlation with the average of the quantization results was "0.68". Further, as shown in FIG. 14 (b), the dispersion of the quantization result and the anomalous conversation obtained by quantizing the normal conversation are quantized by using the quantization table optimized by using the conversation data set corresponding to the B line. The correlation with the dispersion of the quantized result was "0.15".

同様に、図１５の（ａ）に示すように、Ｒ線に該当する会話データセットを用いて最適化した量子化テーブルを用いて、通常会話を量子化した量子化結果の平均と異常会話を量子化した量子化結果の平均との相関が「0.95」であった。また、図１５の（ｂ）に示すように、Ｒ線に該当する会話データセットを用いて最適化した量子化テーブルを用いて、通常会話を量子化した量子化結果の分散と異常会話を量子化した量子化結果の分散との相関が「0.79」であった。 Similarly, as shown in FIG. 15 (a), the average of the quantization results obtained by quantizing the normal conversation and the abnormal conversation are obtained using the quantization table optimized by using the conversation data set corresponding to the R line. The correlation with the average of the quantized quantized results was "0.95". Further, as shown in FIG. 15 (b), the dispersion of the quantization result obtained by quantizing the normal conversation and the anomalous conversation are quantized by using the quantization table optimized by using the conversation data set corresponding to the R line. The correlation with the dispersion of the quantized result was "0.79".

したがって、平均の相関は、Ｂ線の場合が「0.68」であり、Ｒ線の場合が「0.95」であり、Ｂ線を用いて更新した量子化テーブルの方が量子化結果の分布の差が大きい。また、分散の相関は、Ｂ線の場合が「0.15」であり、Ｒ線の場合が「0.79」であり、Ｂ線を用いて更新した量子化テーブルの方が量子化結果の分布の差が大きい。したがって、確からしさが大きい会話データセットで更新した量子化テーブルを用いて、会話分析を行った方が、通常会話と異常会話との正確に分類できる。 Therefore, the average correlation is "0.68" for the B line and "0.95" for the R line, and the difference in the distribution of the quantization results is larger in the quantization table updated using the B line. big. The variance correlation is "0.15" for the B line and "0.79" for the R line, and the difference in the distribution of the quantization results is larger in the quantization table updated using the B line. big. Therefore, it is better to perform conversation analysis using a quantization table updated with a conversation data set with high certainty, so that normal conversation and abnormal conversation can be accurately classified.

なお、会話状況の分析精度を評価した結果、データセットにおける分散が小さい会話データセット（Ｂ線）を量子化テーブルの更新に用いた方が、分析精度が高いことを確認できた。例えば、データセットにおける分散が小さい会話データセット（Ｂ線）で更新した場合は、約９０％の分析精度であったが、データセットにおける分散が大きい会話データセット（Ｒ線）で更新した場合は、約８６％の分析精度であった。 As a result of evaluating the analysis accuracy of the conversation situation, it was confirmed that the analysis accuracy was higher when the conversation data set (line B) having a small variance in the data set was used for updating the quantization table. For example, when updating with a conversation data set (B line) with a small variance in the data set, the analysis accuracy was about 90%, but when updating with a conversation data set (R line) with a large variance in the data set, the analysis accuracy was about 90%. The analysis accuracy was about 86%.

ところで、実施例２では、実施例１で説明した「確からしさ」が高い会話データセットを量子化テーブルの更新用（適用）として選択する例を説明したが、これに限定されるものではなく、「確からしさ」に基づき適用の優先順位を決定することもできる。 By the way, in the second embodiment, an example of selecting the conversation data set having a high “certainty” as described in the first embodiment for updating (applying) the quantization table has been described, but the present invention is not limited to this. It is also possible to determine the priority of application based on "certainty".

図１６は、実施例３にかかる処理の流れを示すフローチャートである。図１３に示すように、音響解析部６１は、処理開始が指示されると（Ｓ２０１：Ｙｅｓ）、分析フレーム単位で会話データの特徴量を算出する音響解析を実行する（Ｓ２０２）。 FIG. 16 is a flowchart showing the flow of processing according to the third embodiment. As shown in FIG. 13, when the processing start is instructed (S201: Yes), the acoustic analysis unit 61 executes an acoustic analysis for calculating the feature amount of the conversation data in the analysis frame unit (S202).

続いて、会話統計部６２は、各分析フレーム単位の特徴量を用いて、会話データごとの平均と分散を算出する（Ｓ２０３）。そして、全体統計部６３は、会話データごとの平均と分散を用いて、各更新用会話データセットにおける平均と分散を算出する（Ｓ２０４）。 Subsequently, the conversation statistics unit 62 calculates the average and variance for each conversation data using the feature amount of each analysis frame unit (S203). Then, the overall statistics unit 63 calculates the average and variance in each update conversation data set by using the average and variance for each conversation data (S204).

その後、算出部６４は、各更新用会話データセットにおける平均と分散を用いて、各会話データセットの確からしさを算出し（Ｓ２０５）、各会話データセットの確からしさを用いて、更新に用いる会話データセットの順番を決定する（Ｓ２０６）。 After that, the calculation unit 64 calculates the certainty of each conversation data set using the average and variance in each update conversation data set (S205), and uses the certainty of each conversation data set to use the conversation for updating. The order of the data sets is determined (S206).

図１７は、実施例３にかかる更新処理を説明する図である。図１７に示すように、更新用の会話データセット１からｍに対して、確からしさ１からｍが算出されたとする。ただし、確からしさ１＜・・・＜確からしさｍとする。この場合、確からしさが小さい順で、更新用の会話データセット１から順番に読み出して、量子化テーブルの更新が実行される。この結果、分散が小さいデータを後に用いて更新することができ、収束速度を速めることができる。 FIG. 17 is a diagram illustrating an update process according to the third embodiment. As shown in FIG. 17, it is assumed that the certainty 1 to m is calculated for the conversation data set 1 to m for update. However, it is assumed that the certainty 1 <... <the certainty m. In this case, the quantization table is updated by reading in order from the conversation data set 1 for update in ascending order of certainty. As a result, the data having a small variance can be updated later, and the convergence speed can be increased.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。 By the way, although the examples of the present invention have been described so far, the present invention may be carried out in various different forms other than the above-mentioned examples.

［適用例］
図１８を用いて、実施例１－３で説明した量子化テーブルを用いて訓練（機械学習）された判定モデルを用いた分析システムの適用例について説明する。図１８は、適用例を説明する図である。図１８に示すように、分析システムは、ＧＷ（Gatway）２、ＳＷ（Switch）３、ＩＰコンタクトセンターシステム４、オペレータ端末５、管理者端末６を有する。なお、オペレータ端末５は、トラブル検知機能を搭載しており、管理者端末６は、監視アプリケーションとの連携機能を搭載している。 [Application example]
An application example of an analysis system using a determination model trained (machine learning) using the quantization table described in Example 1-3 will be described with reference to FIG. FIG. 18 is a diagram illustrating an application example. As shown in FIG. 18, the analysis system includes a GW (Gatway) 2, a SW (Switch) 3, an IP contact center system 4, an operator terminal 5, and an administrator terminal 6. The operator terminal 5 is equipped with a trouble detection function, and the administrator terminal 6 is equipped with a function of linking with a monitoring application.

ＩＰコンタクトセンターシステム４は、ＧＷ２やＳＷ３を介して、クライアントからの電話を受付けて、オペレータ端末５に接続する。オペレータ端末５は、量子化テーブルを用いて学習された判定モデル（ＬＳＴＭモデル１１０ｃおよびＤＮＮモデル１１０ｄ）を用いて、クライアントとの会話を分析する。 The IP contact center system 4 receives a telephone call from a client via GW2 or SW3 and connects to the operator terminal 5. The operator terminal 5 analyzes the conversation with the client by using the determination model (LSTM model 110c and DNN model 110d) learned by using the quantization table.

ここで、オペレータ端末５は、オンラインで会話を分析し、応対トラブル（異常通話）を検知した場合は、検知メッセージを管理者端末６に出力する。管理者端末６が、管理者が確認できるように、通話モニタリングなどで状況をリアルタイムで表示することで、管理者が即時にオペレータの応対をサポートすることができる。 Here, the operator terminal 5 analyzes the conversation online, and when a response trouble (abnormal call) is detected, outputs a detection message to the administrator terminal 6. The administrator terminal 6 can immediately support the operator's response by displaying the situation in real time by call monitoring or the like so that the administrator can confirm it.

［数値等］
上記実施例で用いた数値例、閾値等は、あくまで一例であり、任意に変更することができる。また、分散と平均のいずれか一方のみを用いることもできる。また、本実施例では、特定の会話状況を「異常な会話状況」として説明したが、特定の会話状況は、異常な会話状況に限定されるものではない。たとえば、特定の会話状況は、会議が停滞している会話状況、トラブルが発生している会話状況、顧客にとって好ましい会話状況などであってもよい。また、量子化点についても２次元のベクトルに限らず、多次元のベクトルを用いることができる。 [Numerical values, etc.]
The numerical examples, threshold values, and the like used in the above examples are merely examples and can be arbitrarily changed. It is also possible to use only one of the variance and the mean. Further, in the present embodiment, the specific conversation situation has been described as an "abnormal conversation situation", but the specific conversation situation is not limited to the abnormal conversation situation. For example, the specific conversation situation may be a conversation situation in which the meeting is stagnant, a conversation situation in which a trouble occurs, a conversation situation favorable to the customer, and the like. Further, the quantization point is not limited to the two-dimensional vector, and a multidimensional vector can be used.

［システム］
上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。なお、会話統計部６２と全体統計部６３は、算出部に対応し、算出部６４は、実行部に対応する。ＬＳＴＭモデル１１０ｃとＤＮＮモデル１１０ｄは、判定部の一例である。会話データは、サンプルデータの一例であり、会話データセットは、データセットの一例である。 [system]
Information including processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. The conversation statistics unit 62 and the overall statistics unit 63 correspond to the calculation unit, and the calculation unit 64 corresponds to the execution unit. The LSTM model 110c and the DNN model 110d are examples of the determination unit. The conversation data is an example of sample data, and the conversation data set is an example of a data set.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、学習装置１００の制御部１０５は、判定装置２００の制御部２０５と同様の機能を持ち、ＬＳＴＭモデル１１０ｃ、ＤＮＮモデル１１０ｄを学習すると共に、音声データに特定の会話状況が含まれているか否かを判定してもよい。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution or integration of each device is not limited to the one shown in the figure. That is, all or a part thereof can be functionally or physically distributed / integrated in any unit according to various loads, usage conditions, and the like. For example, the control unit 105 of the learning device 100 has the same function as the control unit 205 of the determination device 200, learns the LSTM model 110c and the DNN model 110d, and whether or not the voice data includes a specific conversation situation. May be determined.

さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

［ハードウェア］
次に、上記実施例で説明した各装置のハードウェア構成例を説明する。なお、各装置は、同様のハードウェア構成を有するので、ここでは、情報処理装置５００として説明する。図１９は、ハードウェア構成例を説明する図である。図１９に示すように、情報処理装置５００は、通信装置５００ａ、ＨＤＤ（Hard Disk Drive）５００ｂ、メモリ５００ｃ、プロセッサ５００ｄを有する。また、図１９に示した各部は、バス等で相互に接続される。 [hardware]
Next, a hardware configuration example of each device described in the above embodiment will be described. Since each device has the same hardware configuration, the information processing device 500 will be described here. FIG. 19 is a diagram illustrating a hardware configuration example. As shown in FIG. 19, the information processing device 500 includes a communication device 500a, an HDD (Hard Disk Drive) 500b, a memory 500c, and a processor 500d. Further, the parts shown in FIG. 19 are connected to each other by a bus or the like.

通信装置５００ａは、ネットワークインタフェースカードなどであり、他のサーバとの通信を行う。ＨＤＤ５００ｂは、図２等に示した機能を動作させるプログラムやＤＢを記憶する。 The communication device 500a is a network interface card or the like, and communicates with other servers. The HDD 500b stores a program or DB that operates the functions shown in FIG. 2 and the like.

プロセッサ５００ｄは、図２等に示した各処理部と同様の処理を実行するプログラムをＨＤＤ５００ｂ等から読み出してメモリ５００ｃに展開することで、図２等で説明した各機能を実行するプロセスを動作させる。例えば、情報処理装置５０を例にして説明すると、このプロセスは、情報処理装置５０が有する各処理部と同様の機能を実行する。具体的には、プロセッサ５００ｄは、音響解析部６１、会話統計部６２、全体統計部６３、算出部６４等と同様の機能を有するプログラムをＨＤＤ５００ｂ等から読み出す。そして、プロセッサ５００ｄは、音響解析部６１、会話統計部６２、全体統計部６３、算出部６４等と同様の処理を実行するプロセスを実行する。 The processor 500d reads a program that executes the same processing as each processing unit shown in FIG. 2 or the like from the HDD 500b or the like and expands the program into the memory 500c to operate a process that executes each function described in FIG. 2 or the like. .. For example, taking the information processing apparatus 50 as an example, this process executes the same functions as each processing unit of the information processing apparatus 50. Specifically, the processor 500d reads a program having the same functions as the acoustic analysis unit 61, the conversation statistics unit 62, the overall statistics unit 63, the calculation unit 64, etc. from the HDD 500b or the like. Then, the processor 500d executes a process of executing the same processing as the acoustic analysis unit 61, the conversation statistics unit 62, the overall statistics unit 63, the calculation unit 64, and the like.

このように、情報処理装置５００は、プログラムを読み出して実行することで各種処理方法を実行する情報処理装置として動作する。また、情報処理装置５００は、媒体読取装置によって記録媒体から上記プログラムを読み出し、読み出された上記プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、情報処理装置５００によって実行されることに限定されるものではない。例えば、他のコンピュータまたはサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、本発明を同様に適用することができる。 In this way, the information processing apparatus 500 operates as an information processing apparatus that executes various processing methods by reading and executing the program. Further, the information processing apparatus 500 can realize the same function as that of the above-described embodiment by reading the program from the recording medium by the medium reader and executing the read program. The program referred to in the other embodiment is not limited to being executed by the information processing apparatus 500. For example, the present invention can be similarly applied when other computers or servers execute programs, or when they execute programs in cooperation with each other.

このプログラムは、インターネットなどのネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＭＯ（Magneto－Optical disk）、ＤＶＤ（Digital Versatile Disc）などのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することができる。 This program can be distributed over networks such as the Internet. In addition, this program is recorded on a computer-readable recording medium such as a hard disk, flexible disk (FD), CD-ROM, MO (Magneto-Optical disk), or DVD (Digital Versatile Disc), and is recorded from the recording medium by the computer. It can be executed by being read.

５０情報処理装置
５１通信部
５２記憶部
５３更新用会話データセット
５４確からしさＤＢ
６０制御部
６１音響解析部
６２会話統計部
６３全体統計部
６４算出部 50 Information processing device 51 Communication unit 52 Storage unit 53 Conversation data set for update 54 Accuracy DB
60 Control unit 61 Acoustic analysis unit 62 Conversation statistics unit 63 Overall statistics unit 64 Calculation unit

Claims

On the computer
For a plurality of data sets, each of which contains a plurality of sample data, the statistic of the feature amount of the plurality of sample data is calculated.
Based on the calculated statistic, the data set is selected or the priority is determined as training data to be used for training the judgment model for the plurality of data sets.
A decision program characterized by executing a process.

In the calculation process, the variance of the feature amount is calculated using the feature amount for each predetermined unit for each of the plurality of sample data.
In the process to be executed, a data set having sample data whose variance is less than the threshold calculated for each of the plurality of sample data is selected as the training data, or the priority is determined in ascending order of the variance. The decision program according to claim 1, wherein the determination program is performed.

In the calculation process, the variance calculated for each of the plurality of sample data possessed by each of the plurality of data sets is used to calculate the variance of the feature amount for each of the plurality of data sets.
The decision program according to claim 2, wherein the process to be executed selects a data set having a variance less than a threshold value as the training data, or executes the determination of the priority in ascending order of the variance. ..

For a quantization table containing a plurality of quantization points, the selectivity of each of the plurality of quantization points was calculated based on the quantization data obtained by quantizing the feature quantities of the plurality of speech data included in the training data set. By updating a plurality of quantization points based on the selectivity, the computer is made to perform a process of updating the quantization table.
The determination according to claim 3, wherein the updating process uses a data set having a variance less than a threshold value as the training data set to calculate the selectivity of each of the plurality of quantization points. program.

For a quantization table containing a plurality of quantization points, the selectivity of each of the plurality of quantization points was calculated based on the quantization data obtained by quantizing the feature quantities of the plurality of speech data included in the training data set. By updating a plurality of quantization points based on the selectivity, the computer is made to perform a process of updating the quantization table.
The third aspect of the present invention is characterized in that the updating process selects the plurality of data sets as the training data sets in ascending order of priority and calculates the selectivity of each of the plurality of quantization points. Described decision program.

The computer
For a plurality of data sets, each of which contains a plurality of sample data, the statistic of the feature amount of the plurality of sample data is calculated.
Based on the calculated statistic, the data set is selected or the priority is determined as training data to be used for training the judgment model for the plurality of data sets.
A decision method characterized by performing an operation.

For a plurality of data sets each containing a plurality of sample data, a calculation unit for calculating a statistic of the feature amount of the plurality of sample data, and a calculation unit.
An execution unit that executes selection or priority determination of a data set as training data used for training a judgment model for the plurality of data sets based on the calculated statistic.
An information processing device characterized by having.