JP5148532B2

JP5148532B2 - Topic determination device and topic determination method

Info

Publication number: JP5148532B2
Application number: JP2009042344A
Authority: JP
Inventors: 志鵬張; 信彦仲
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2009-02-25
Filing date: 2009-02-25
Publication date: 2013-02-20
Anticipated expiration: 2029-02-25
Also published as: JP2010197706A

Description

本発明は、話題判定装置および話題判定方法に関するものである。 The present invention relates to a topic determination device and a topic determination method.

近年、電話を利用した詐欺事件が発生していることに鑑み、例えば特許文献１には、発話内容から詐欺に関するキーワードを音声認識技術により検知することによって、例えば「オレオレ詐欺」等を防止するといった技術が開示されている。 In recent years, in light of the fact that fraud cases using telephones have occurred, for example, in Patent Document 1, for example, by detecting a keyword related to fraud from speech content using voice recognition technology, for example, “ole fraud” is prevented. Technology is disclosed.

特開２００７−１３９８６４号公報JP 2007-139864 A

特許文献１を始めとする従来技術ではキーワード検知による話題判定を行っている。すなわち、詐欺の話題を判定するためのキーワード（例えば「お金」、「使い込んだ」、「監査」、「横領」等）を事前に設定し、発話中に当該設定したキーワードの出現回数を数え、出現回数が一定回数を超えたら詐欺の話題であると判定する。 In conventional techniques such as Patent Document 1, topic determination is performed by keyword detection. That is, pre-set keywords (such as “money”, “exploited”, “audit”, “embedding”, etc.) to determine the topic of fraud, and count the number of occurrences of the set keyword during utterance, If the number of appearances exceeds a certain number, it is determined that the topic is a fraud.

このような従来の手法ではキーワードに対する認識率が低下した場合に話題判定の精度も共に低下するといった問題点がある。例えば、以下の発話が行われたとする。
発話：“会社のお金を使い込んだ。払わないと監査に間に合わない。このままでは横領罪になって捕まってしまう。”
この発話に対して音声認識を行った結果、以下のように誤認識されたとする。
音声認識結果：“会社のお金を使いこなした。払わないと検査に間に合わない。このままでは奉公になって捕まってしまう。”
なお、このような誤認識は特に珍しいものでもなく、音声認識の精度に応じては、十分に起こり得る程度のものである。 Such a conventional method has a problem in that when the recognition rate for a keyword is lowered, the accuracy of topic determination is also lowered. For example, assume that the following utterance is made.
Speaking: “We spent a lot of company money. If we do n’t pay, we wo n’t be able to meet the audit.
As a result of performing speech recognition on this utterance, it is assumed that the recognition error is as follows.
Speech recognition result: “You have used your company's money. If you don't pay, you won't be in time for the test.
Note that such misrecognition is not particularly uncommon and can be sufficiently caused depending on the accuracy of speech recognition.

以上のような誤認識の場合に、事前に設定した例えば「使い込んだ」、「監査」、「横領」等のキーワードが検知できず、詐欺の話題の判定は困難となる。更に、電話音声はその帯域が狭く、雑音の影響が大きいので、認識率が低い。このような状況においてキーワードだけで詐欺の話題を適切に判定することは非常に困難である。また、キーワードそのものの設定も難しい。例えば「お金」や「監査」等をキーワードとして設定した場合に、発話の中でこのようなキーワードが検知されたことだけを詐欺の話題の判定基準としては、話題判定の精度が低くなってしまう。 In the case of such misrecognition as described above, keywords such as “used”, “audit”, “embedding” set in advance cannot be detected, and it is difficult to determine the topic of fraud. Furthermore, since the telephone voice has a narrow band and is greatly affected by noise, the recognition rate is low. In such a situation, it is very difficult to appropriately determine the topic of fraud only with keywords. It is also difficult to set keywords. For example, when “money”, “audit”, etc. are set as keywords, only the detection of such a keyword in the utterance is used as the criterion for fraud topics. .

そこで、本発明は上記に鑑みてなされたもので、例えば詐欺等の話題を精度よく判定することが可能な話題判定装置および話題判定方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above, and an object thereof is to provide a topic determination device and a topic determination method capable of accurately determining topics such as fraud, for example.

上記課題を解決するために、本発明の話題判定装置は、特定の話題に特化したデータを収集するデータ収集手段と、前記データ収集手段が収集したデータを用いて、前記特定の話題に特化した話題特化言語モデルを作成する話題特化言語モデル作成手段と、前記話題特化言語モデル作成手段が作成した話題特化言語モデルを保持する言語モデル保持手段と、音声認識を行うための音響モデルを保持する音響モデル保持手段と、前記言語モデル保持手段に保持された話題特化言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて、入力音声に対する音声認識を行い、且つ前記音声認識の結果に対するスコアを計算する音声認識手段と、前記入力音声の内容が前記特定の話題に相当するか否かを判定するための判断基準となる閾値を保持する閾値保持手段と、前記音声認識手段が計算したスコアを入力し、当該スコアが前記閾値保持手段に保持された閾値以上である場合に、前記入力音声の内容が前記特定の話題に相当すると判定する判定手段と、を備えることを特徴とする。 In order to solve the above-described problems, a topic determination device according to the present invention uses a data collection unit that collects data specialized for a specific topic, and data collected by the data collection unit. A topic specialized language model creating means for creating a specialized topic specialized language model, a language model holding means for holding the topic specialized language model created by the topic specialized language model creating means, and speech recognition Using the acoustic model holding means for holding the acoustic model, the topic-specific language model held in the language model holding means and the acoustic model held in the acoustic model holding means, and performing speech recognition on the input speech; and A voice recognition means for calculating a score for the result of the voice recognition, and a criterion for judging whether or not the content of the input voice corresponds to the specific topic When a threshold value holding means for holding a threshold value and the score calculated by the voice recognition means are input and the score is equal to or higher than the threshold value held by the threshold value holding means, the content of the input voice is the specific topic. And determining means for determining that it corresponds.

また、本発明の話題判定方法は、データ収集手段が、特定の話題に特化したデータを収集するデータ収集ステップと、話題特化言語モデル作成手段が、前記データ収集手段が収集したデータを用いて、前記特定の話題に特化した話題特化言語モデルを作成する話題特化言語モデル作成ステップと、言語モデル保持手段が、前記話題特化言語モデル作成手段が作成した話題特化言語モデルを保持する言語モデル保持ステップと、音響モデル保持手段が、音声認識を行うための音響モデルを保持する音響モデル保持ステップと、音声認識手段が、前記言語モデル保持手段に保持された話題特化言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて、入力音声に対する音声認識を行い、且つ前記音声認識の結果に対するスコアを計算する音声認識ステップと、閾値保持手段が、前記入力音声の内容が前記特定の話題に相当するか否かを判定するための判断基準となる閾値を保持する閾値保持ステップと、判定手段が、前記音声認識手段が計算したスコアを入力し、当該スコアが前記閾値保持手段に保持された閾値以上である場合に、前記入力音声の内容が前記特定の話題に相当すると判定する判定ステップと、を備えることを特徴とする。 In the topic determination method of the present invention, the data collection unit collects data specialized for a specific topic, and the topic specialized language model creation unit uses the data collected by the data collection unit. A topic-specific language model creating step for creating a topic-specific language model specialized for the specific topic, and a language model holding means, the topic-specific language model created by the topic-specific language model creating means A language model holding step for holding, an acoustic model holding step for holding an acoustic model for performing voice recognition, and a topic-specific language model for which the voice recognition means is held by the language model holding means. And using the acoustic model held in the acoustic model holding means, voice recognition is performed on the input voice, and a score for the voice recognition result is obtained. A voice recognition step for calculating, a threshold value holding means, a threshold value holding step for holding a threshold value serving as a determination criterion for determining whether or not the content of the input voice corresponds to the specific topic, and a determination means, A determination step of inputting a score calculated by the voice recognition means and determining that the content of the input voice corresponds to the specific topic when the score is equal to or greater than a threshold value held by the threshold value holding means; It is characterized by providing.

このような本発明の話題判定装置および話題判定方法によれば、特定の話題に特化したデータより作成した話題特化言語モデルを用いて、入力音声に対する音声認識を行い、その結果に対するスコアを閾値と比較することにより、話題判定を行う。このことにより、キーワードの出現回数に基づいた話題判定手法に比べ、精度の高い話題判定を行うことができる。例えば特定のキーワードが認識されなかった場合でも、適切な話題判定が行われる。 According to the topic determination device and the topic determination method of the present invention, speech recognition is performed on input speech using a topic-specific language model created from data specialized for a specific topic, and a score for the result is obtained. The topic is determined by comparing with a threshold value. This makes it possible to perform topic determination with higher accuracy than the topic determination method based on the number of appearances of a keyword. For example, even when a specific keyword is not recognized, appropriate topic determination is performed.

また、本発明の話題判定装置において、前記話題特化言語モデルは、単語の出現確率をモデル化した統計的言語モデルであっても良い。 In the topic determination device of the present invention, the topic specific language model may be a statistical language model in which the appearance probability of a word is modeled.

話題判定に単語の出現確率を用いることにより、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、精度の高い話題判定を行うことができる。単語の出現確率をモデル化した統計的言語モデルとしては、例えばuni-gramがある。 By using the appearance probability of a word for topic determination, it is possible to perform topic determination with higher accuracy than the conventional topic determination method based on the mere appearance frequency of a word. For example, a uni-gram is a statistical language model that models the probability of word appearance.

また、本発明の話題判定装置において、前記話題特化言語モデルは、単語間の接続確率をモデル化した統計的言語モデルであっても良い。 In the topic determination device of the present invention, the topic specific language model may be a statistical language model in which a connection probability between words is modeled.

話題判定に単語間の接続確率を用いることにより、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、精度の高い話題判定を行うことができる。単語間の接続確率をモデル化した統計的言語モデルとしては、例えばn-gramがある。 By using the connection probability between words for topic determination, it is possible to perform topic determination with higher accuracy than in the conventional topic determination method based on the mere appearance count of words. An example of a statistical language model that models the probability of connection between words is n-gram.

また、本発明の話題判定装置において、前記特定の話題は、詐欺の話題でああっても良い。 In the topic determination device of the present invention, the specific topic may be a fraud topic.

本発明は、詐欺の話題を判定する場合に特に有用である。 The present invention is particularly useful when determining the topic of fraud.

また、本発明の話題判定装置において、前記入力音声は、電話機を用いた発声であっても良い。 In the topic determination device of the present invention, the input voice may be utterance using a telephone.

本発明によれば、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、その精度が高まる。このため、本発明は、帯域が狭く且つ雑音の影響が大きいことから認識率が低い電話音声の場合に、特に有用である。 According to the present invention, the accuracy is improved as compared with the conventional topic determination method based on the mere appearance number of words. For this reason, the present invention is particularly useful in the case of a telephone voice with a low recognition rate because the band is narrow and the influence of noise is large.

また、本発明の話題判定装置は、特定の話題に特化したデータを収集するデータ収集手段と、前記データ収集手段が収集したデータを用いて、前記特定の話題に特化した話題特化言語モデルを作成する話題特化言語モデル作成手段と、前記話題特化言語モデル作成手段が作成した話題特化言語モデル、および前記特定の話題に特化していない一般話題言語モデルを保持する言語モデル保持手段と、音声認識を行うための音響モデルを保持する音響モデル保持手段と、前記言語モデル保持手段に保持された話題特化言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて入力音声に対する第１音声認識を行い、且つ前記言語モデル保持手段に保持された一般話題言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて前記入力音声に対する第２音声認識を行い、更に前記第１音声認識の結果に対する第１スコアおよび前記第２音声認識の結果に対する第２スコアを計算する音声認識手段と、前記音声認識手段が計算した第１スコアおよび第２スコアを入力し、前記第１スコアが前記第２スコア以上である場合に、前記入力音声の内容が前記特定の話題に相当すると判定する判定手段と、を備えることを特徴とする。 Further, the topic determination device of the present invention includes a data collection means for collecting data specialized for a specific topic, and a topic specialization language specialized for the specific topic using the data collected by the data collection means. Language model holding means for holding a topic-specific language model creating means for creating a model, a topic-specific language model created by the topic-specific language model creating means, and a general topic language model not specialized for the specific topic Input using means, an acoustic model holding means for holding an acoustic model for speech recognition, a topic-specific language model held in the language model holding means, and an acoustic model held in the acoustic model holding means A first speech recognition for speech, and a general topic language model held in the language model holding unit and an acoustic model held in the acoustic model holding unit. Speech recognition means for performing second speech recognition on the input speech using a voice, further calculating a first score for the result of the first speech recognition and a second score for the result of the second speech recognition, and the speech recognition Determining means for inputting the first score and the second score calculated by the means and determining that the content of the input speech corresponds to the specific topic when the first score is equal to or greater than the second score; It is characterized by providing.

また、本発明の話題判定方法は、データ収集手段が、特定の話題に特化したデータを収集するデータ収集ステップと、話題特化言語モデル作成手段が、前記データ収集手段が収集したデータを用いて、前記特定の話題に特化した話題特化言語モデルを作成する話題特化言語モデル作成ステップと、言語モデル保持手段が、前記話題特化言語モデル作成手段が作成した話題特化言語モデル、および前記特定の話題に特化していない一般話題言語モデルを保持する言語モデル保持ステップと、音響モデル保持手段が、音声認識を行うための音響モデルを保持する音響モデル保持ステップと、音声認識手段が、前記言語モデル保持手段に保持された話題特化言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて入力音声に対する第１音声認識を行い、且つ前記言語モデル保持手段に保持された一般話題言語モデルおよび前記音響モデル保持手段に保持された音響モデルを用いて前記入力音声に対する第２音声認識を行い、更に前記第１音声認識の結果に対する第１スコアおよび前記第２音声認識の結果に対する第２スコアを計算する音声認識ステップと、判定手段が、前記音声認識手段が計算した第１スコアおよび第２スコアを入力し、前記第１スコアが前記第２スコア以上である場合に、前記入力音声の内容が前記特定の話題に相当すると判定する判定ステップと、を備えることを特徴とする。 In the topic determination method of the present invention, the data collection unit collects data specialized for a specific topic, and the topic specialized language model creation unit uses the data collected by the data collection unit. A topic-specific language model creating step for creating a topic-specific language model specialized for the specific topic, and a language model holding unit, the topic-specific language model created by the topic-specific language model creating unit, A language model holding step for holding a general topic language model not specialized for the specific topic, an acoustic model holding step for holding an acoustic model for performing voice recognition, and a voice recognition unit for , Using the topic-specific language model held in the language model holding means and the acoustic model held in the acoustic model holding means. Performing a first speech recognition, performing a second speech recognition on the input speech using a general topic language model held in the language model holding unit and an acoustic model held in the acoustic model holding unit; A voice recognition step of calculating a first score for the result of one voice recognition and a second score for the result of the second voice recognition; and a determination means that inputs the first score and the second score calculated by the voice recognition means. And a determination step of determining that the content of the input voice corresponds to the specific topic when the first score is equal to or higher than the second score.

このような本発明の話題判定装置および話題判定方法によれば、特定の話題に特化したデータより作成した話題特化言語モデルを用いて入力音声に対する第１音声認識を行い、且つ特定の話題に特化していない一般話題言語モデルを用いて入力音声に対する第２音声認識を行い、それらの結果に対するそれぞれのスコアを比較することにより、話題判定を行う。このことにより、キーワードの出現回数に基づいた話題判定手法に比べ、精度の高い話題判定を行うことができる。例えば特定のキーワードが認識されなかった場合でも、適切な話題判定が行われる。更に、閾値を別途定める必要がないというメリットがある。 According to such a topic determination apparatus and topic determination method of the present invention, first speech recognition is performed on input speech using a topic-specific language model created from data specialized for a specific topic, and a specific topic The second speech recognition for the input speech is performed using a general topic language model that is not specialized for the topic, and the topic is determined by comparing the scores for the results. This makes it possible to perform topic determination with higher accuracy than the topic determination method based on the number of appearances of a keyword. For example, even when a specific keyword is not recognized, appropriate topic determination is performed. Further, there is an advantage that it is not necessary to separately set a threshold value.

また、本発明の話題判定装置は、前記特定の話題に特化したキーワードを保持するキーワード保持手段を更に備え、前記判定手段は、前記第１スコアが前記第２スコア以上であることに加え、前記第１音声認識の結果または前記第２音声認識の結果に前記キーワードが一定回数以上検知された場合に、前記入力音声の内容が前記特定の話題に相当すると判定しても良い。 Further, the topic determination device of the present invention further includes keyword holding means for holding a keyword specialized for the specific topic, and the determination means has the first score equal to or higher than the second score, When the keyword is detected a predetermined number of times or more in the result of the first speech recognition or the result of the second speech recognition, it may be determined that the content of the input speech corresponds to the specific topic.

スコア比較によりある程度の話題判定が進んだら、この時点でキーワードによる更なる話題判定を行う。つまり、言語モデルによる第１話題判定とキーワードによる第２話題判定を重ねて行う。キーワードによる第２話題判定は、第１話題判定が終わった時点で行うので、当該話題に対するキーワードの設定は比較的にしやすくなる。また、複数回の話題判定を行うことにより、話題判定の精度を更に高めることができる。 When the topic comparison has progressed to some extent by the score comparison, further topic determination by keyword is performed at this point. That is, the first topic determination based on the language model and the second topic determination based on the keyword are repeated. Since the second topic determination by keyword is performed when the first topic determination is completed, it is relatively easy to set the keyword for the topic. Moreover, the accuracy of topic determination can be further increased by performing topic determination multiple times.

本発明によれば、例えば詐欺等の話題を精度よく判定することが可能な話題判定装置および話題判定方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the topic determination apparatus and topic determination method which can determine topics, such as fraud, for example can be provided.

第１実施形態にかかる詐欺判定装置１００の構成概要図である。It is a composition outline figure of fraud determination device 100 concerning a 1st embodiment. 詐欺判定装置１００のハードウェア構成図である。2 is a hardware configuration diagram of a fraud determination apparatus 100. FIG. 詐欺判定装置１００により行われる動作を示すフローチャートである。4 is a flowchart illustrating an operation performed by the fraud determination apparatus 100. 第２実施形態にかかる詐欺判定装置２００の構成概要図である。It is a structure schematic diagram of the fraud determination apparatus 200 concerning 2nd Embodiment. 詐欺判定装置２００により行われる動作を示すフローチャートである。5 is a flowchart showing an operation performed by fraud determination apparatus 200. 第３実施形態にかかる詐欺判定装置３００の構成概要図である。It is a structure schematic diagram of the fraud determination apparatus 300 concerning 3rd Embodiment. 詐欺判定装置３００により行われる動作を示すフローチャートである。4 is a flowchart showing an operation performed by fraud determination apparatus 300.

以下、添付図面を参照して本発明にかかる話題判定装置および話題判定方法の好適な実施形態を詳細に説明する。なお、図面の説明において同一の要素には同一の符号を付し、重複する説明を省略する。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of a topic determination device and a topic determination method according to the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted.

＜第１実施形態＞
（詐欺判定装置１００の構成）
まず、本発明の第１実施形態に係る詐欺判定装置１００の構成について、図１および図２を参照しながら説明する。図１は詐欺判定装置１００の構成概要図であり、図２はそのハードウェア構成図である。詐欺判定装置１００は、例えば電話機（図示せず）を用いた発声である入力音声に対して音声認識を行い、その内容が特定の話題に関連した内容であるか否かを判定する装置である。本実施形態における「特定の話題」とは、例えば「オレオレ詐欺」等の電話機を用いた詐欺の話題である。 <First Embodiment>
(Configuration of fraud determination apparatus 100)
First, the configuration of the fraud determination apparatus 100 according to the first embodiment of the present invention will be described with reference to FIG. 1 and FIG. FIG. 1 is a schematic configuration diagram of the fraud determination apparatus 100, and FIG. 2 is a hardware configuration diagram thereof. The fraud determination device 100 is a device that performs voice recognition on input speech that is utterance using, for example, a telephone (not shown), and determines whether or not the content is content related to a specific topic. . The “specific topic” in the present embodiment is a topic of fraud using a telephone such as “ole fraud”, for example.

図２に示すように、詐欺判定装置１００は、物理的には、ＣＰＵ１１、ＲＯＭ１２及びＲＡＭ１３等の主記憶装置、キーボード及びマウス等の入力デバイス１４、ディスプレイ等の出力デバイス１５、例えば電話機等の他の装置（図示せず）との間でデータの送受信を行う通信モジュール１６、ハードディスク等の補助記憶装置１７などを含む通常のコンピュータシステムとして構成される。後述する詐欺判定装置１００の各機能は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３等のハードウェア上に所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１１の制御の元で入力デバイス１４、出力デバイス１５、通信モジュール１６を動作させると共に、主記憶装置１２，１３や補助記憶装置１７におけるデータの読み出し及び書き込みを行うことで実現される。 As shown in FIG. 2, the fraud determination device 100 physically includes a CPU 11, a main storage device such as a ROM 12 and a RAM 13, an input device 14 such as a keyboard and a mouse, an output device 15 such as a display, such as a telephone. It is comprised as a normal computer system containing the communication module 16 which transmits / receives data between these apparatuses (not shown), auxiliary storage devices 17, such as a hard disk. Each function of the fraud determination apparatus 100 described later allows the input device 14, the output device 15, and the communication module 16 to be controlled under the control of the CPU 11 by reading predetermined computer software on hardware such as the CPU 11, the ROM 12, and the RAM 13. This is realized by operating and reading and writing data in the main storage devices 12 and 13 and the auxiliary storage device 17.

図１に示すように、詐欺判定装置１００は、機能的には、音声入力部１１０、詐欺データ収集部１２０（特許請求の範囲の「データ収集手段」に相当）、詐欺言語モデル作成部１３０（特許請求の範囲の「話題特化言語モデル作成手段」に相当）、詐欺言語モデル保持部１４０（特許請求の範囲の「言語モデル保持手段」に相当）、音響モデル保持部１５０（特許請求の範囲の「音響モデル保持手段」に相当）、音声認識部１６０（特許請求の範囲の「音声認識手段」に相当）、閾値保持部１７０（特許請求の範囲の「閾値保持手段」に相当）、および判定部１８０（特許請求の範囲の「判定手段」に相当）を備える。 As shown in FIG. 1, the fraud determination device 100 functionally includes a voice input unit 110, a fraud data collection unit 120 (corresponding to “data collection means” in the claims), and a fraud language model creation unit 130 ( (Corresponding to “topic specialized language model creation means” in claims), fraud language model holding unit 140 (corresponding to “language model holding means” in claims), acoustic model holding unit 150 (claims) ), A voice recognition unit 160 (corresponding to “voice recognition unit” in claims), a threshold holding unit 170 (corresponding to “threshold holding unit” in claims), and A determination unit 180 (corresponding to “determination means” in the claims) is provided.

音声入力部１１０は入力音声を受信するものであり、物理的には図２に示した通信モジュール１６として構成することができる。詐欺判定装置１００が電話機と別のものとして構成されている場合には、音声入力部１１０は当該電話機から有線または無線ネットワーク（図示せず）を通じて入力音声のデータを受信する。詐欺判定装置１００が電話機内の一つのモジュールとして組み込まれている場合には、音声入力部１１０は当該電話機内の所定の通信経路（図示せず）を通じて入力音声のデータを受信する。音声入力部１１０は入力した音声データを音声認識部１６０に出力する。 The voice input unit 110 receives input voice, and can be physically configured as the communication module 16 shown in FIG. When the fraud determination apparatus 100 is configured separately from the telephone, the voice input unit 110 receives input voice data from the telephone through a wired or wireless network (not shown). When the fraud determination apparatus 100 is incorporated as one module in the telephone, the voice input unit 110 receives input voice data through a predetermined communication path (not shown) in the telephone. The voice input unit 110 outputs the input voice data to the voice recognition unit 160.

詐欺データ収集部１２０は、特定の話題、つまり詐欺話題に特化した言語データを収集するものである。詐欺データ収集部１２０が収集する言語データの例を以下に示す。なお、詐欺データ収集部１２０は、詐欺の手口に応じ、詐欺内容のデータを随時更新するようにしても良い。
言語データの例：
“学生時代の友人に頼まれ、借金の保証人になってしまった。”
“友人が返済できず、保証人の僕がサラ金から借金をして返すことになった。”
“ご主人が、電車内で痴漢・わいせつ行為をやって捕まっている。示談が成立しなければ裁判になる。” The fraud data collection unit 120 collects language data specialized for a specific topic, that is, a fraud topic. Examples of language data collected by the fraud data collection unit 120 are shown below. The fraud data collecting unit 120 may update the fraud data at any time according to the fraud tricks.
Examples of language data:
“I was asked by my friend when I was a student and became a guarantor of debt.”
“My friend couldn't repay, and the guarantor servant borrowed from Sarah and returned it.”
“The husband is getting caught in the train and doing obscenity. If the talk is not successful, it will be tried.”

詐欺言語モデル作成部１３０は、詐欺データ収集部１２０が収集したデータを用いて、詐欺の話題に特化した言語モデルである詐欺話題特化言語モデル（特許請求の範囲の「話題特化言語モデル」に相当）を作成するものである。以下、「詐欺話題特化言語モデル」を略して「詐欺LM」と記載、「言語モデル」を略して「LM」と記載する。詐欺LM作成部１３０が作成する詐欺LMの一例として、例えばuni-gram等の単語の出現確率をモデル化した統計的言語モデルが挙げられる。また、詐欺LM作成部１３０が作成する詐欺LMの他の一例として、例えばn-gram等の単語間の接続確率をモデル化した統計的言語モデルが挙げられる。統計的言語モデルを作成する手法そのものは、例えば下記の参考文献１に開示されているように公知の技術であるため、ここでは説明を省略する。
参考文献１：音声認識システム、ISBN/ASIN：4-274-13228-5、鹿野清宏他、オーム社 The fraud language model creation unit 130 uses the data collected by the fraud data collection unit 120 to create a fraud topic-specific language model that is a language model specialized for the topic of fraud (the “topic-specific language model in the claims”). Is equivalent). Hereinafter, “fraud topic specific language model” is abbreviated as “fraud LM”, and “language model” is abbreviated as “LM”. As an example of the fraud LM created by the fraud LM creation unit 130, for example, a statistical language model in which the appearance probability of a word such as uni-gram is modeled. Another example of the fraud LM created by the fraud LM creation unit 130 is a statistical language model that models the connection probability between words such as n-grams. Since the technique itself for creating the statistical language model is a known technique as disclosed in Reference Document 1 below, for example, description thereof is omitted here.
Reference 1: Speech recognition system, ISBN / ASIN: 4-274-13228-5, Kiyohiro Shikano et al., Ohm

このように作成される詐欺LMは詐欺話題に特化しているので、詐欺に関連した単語の出現確率や単語間の接続確率が高く設定されている。このため、入力音声が詐欺内容であれば、その入力音声に対する音声認識の結果は詐欺LMにおける高いスコアを有する。その反面、詐欺に関連していない単語の出現確率や単語間の接続確率は、詐欺LMにおいて低く設定されている。このため、入力音声が詐欺内容でなければ、その入力音声に対する音声認識の結果は詐欺LMにおける低いスコアを有する。 Since the fraud LM created in this way is specialized in fraud topics, the appearance probability of words related to fraud and the connection probability between words are set high. For this reason, if the input voice is fraudulent content, the result of voice recognition for the input voice has a high score in the fraud LM. On the other hand, the appearance probability of words not related to fraud and the connection probability between words are set low in the fraud LM. For this reason, if the input voice is not fraudulent content, the result of voice recognition for the input voice has a low score in the fraud LM.

詐欺LM保持部１４０は、詐欺LM作成部１３０が作成した詐欺LMを保持するものである。 The fraud LM holding unit 140 holds the fraud LM created by the fraud LM creation unit 130.

音響モデル保持部１５０は、音声認識を行うための音響モデルを保持するものである。 The acoustic model holding unit 150 holds an acoustic model for performing voice recognition.

音声認識部１６０は、詐欺LM保持部１４０に保持された詐欺LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する音声認識を行うものである。音声認識部１６０は音声認識を行った後に、当該結果に対するスコアSを更に計算し、計算後のスコアSを判定部１８０に出力する。なお、音声認識を行う手法およびスコアを計算する手法そのものは、例えば上記の参考文献１に開示されているように公知の技術であるため、ここでは説明を省略する。音声認識部１６０は、音声認識の結果を無視し、音声認識の結果である単語列から導出したスコアSのみを、話題判定処理のパラメータとして判定部１８０に出力しても良い。このことには、音声認識の結果を保持しておく必要がなくなるといったメリットがある。また、音声認識部１６０は、当該スコアSに対し、発話の長さに基づいた正規化を行うようにしても良い。 The voice recognition unit 160 performs voice recognition on the input voice input from the voice input unit 110 using the fraud LM held in the fraud LM holding unit 140 and the acoustic model held in the acoustic model holding unit 150. is there. After performing speech recognition, the speech recognition unit 160 further calculates a score S for the result, and outputs the calculated score S to the determination unit 180. In addition, since the method of performing speech recognition and the method of calculating the score itself are known techniques as disclosed in, for example, Reference Document 1 described above, description thereof is omitted here. The speech recognition unit 160 may ignore the result of speech recognition and output only the score S derived from the word string that is the result of speech recognition to the determination unit 180 as a parameter for topic determination processing. This has the advantage that it is not necessary to hold the result of speech recognition. Further, the voice recognition unit 160 may perform normalization on the score S based on the length of the utterance.

閾値保持部１７０は、判定部１８０が話題判定処理を行う際に必要とする閾値Xを保持するものである。閾値Xは、入力音声の内容が詐欺の話題に相当するか否かを判定するための判断基準となるものである。 The threshold value holding unit 170 holds a threshold value X required when the determination unit 180 performs the topic determination process. The threshold value X is a criterion for determining whether or not the content of the input voice corresponds to the topic of fraud.

判定部１８０は、音声認識部１６０が計算したスコアSを入力し、当該スコアSが閾値保持部１７０に保持された閾値X以上である場合に、入力音声の内容が詐欺の話題に相当すると判定するものである。 The determination unit 180 inputs the score S calculated by the speech recognition unit 160, and determines that the content of the input speech corresponds to the topic of fraud when the score S is equal to or greater than the threshold value X stored in the threshold storage unit 170. To do.

（詐欺判定装置１００の動作）
続いて、詐欺判定装置１００により行われる動作（特許請求の範囲の「話題判定方法」）について、図３を参照しながら説明する。図３は詐欺判定装置１００により行われる動作を示すフローチャートである。なお、以下の説明においては、詐欺データ収集部１２０および詐欺LM作成部１３０による詐欺データ収集処理および詐欺LM作成処理（特許請求の範囲の「データ収集ステップ」および「話題特化言語モデル作成ステップ」に相当）は既に行われており、作成された詐欺LMが既に詐欺LM保持部１４０に保持されているものとする（特許請求の範囲の「言語モデル保持ステップ」に相当）。また、音響モデルにおいても、既に音響モデル保持部１５０に保持されているものとする（特許請求の範囲の「音響モデル保持ステップ」に相当）。また、閾値保持部１７０には、入力音声の内容が詐欺の話題に相当するか否かを判定するための判断基準となる閾値Xが既に保持されているものとする（特許請求の範囲の「閾値保持ステップ」に相当）。 (Operation of fraud determination apparatus 100)
Next, an operation performed by the fraud determination apparatus 100 (“topic determination method” in the claims) will be described with reference to FIG. FIG. 3 is a flowchart showing operations performed by the fraud determination apparatus 100. In the following description, fraud data collection processing and fraud LM creation processing by the fraud data collection unit 120 and fraud LM creation unit 130 (“data collection step” and “topic specialization language model creation step” in the claims) It is assumed that the created fraud LM is already held in the fraud LM holding unit 140 (corresponding to “language model holding step” in the claims). Also, the acoustic model is already held in the acoustic model holding unit 150 (corresponding to “acoustic model holding step” in the claims). In addition, it is assumed that the threshold value holding unit 170 already holds a threshold value X that is a criterion for determining whether or not the content of the input voice corresponds to the topic of fraud (see “ Equivalent to “threshold holding step”).

最初に、音声入力部１１０が入力音声を受信し、音声認識部１６０に出力する。（ステップＳ１１）。 First, the voice input unit 110 receives the input voice and outputs it to the voice recognition unit 160. (Step S11).

次に、音声認識部１６０が、詐欺LM保持部１４０に保持された詐欺LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する音声認識を行う（ステップＳ１２、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the voice recognition unit 160 performs voice recognition on the input voice input from the voice input unit 110 using the fraud LM held in the fraud LM holding unit 140 and the acoustic model held in the acoustic model holding unit 150. Is performed (step S12, corresponding to “voice recognition step” in the claims).

次に、音声認識部１６０が、ステップＳ１２の音声認識の結果に対するスコアSを計算し、計算後のスコアSを判定部１８０に出力する（ステップＳ１３、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the speech recognition unit 160 calculates a score S for the result of speech recognition in step S12, and outputs the calculated score S to the determination unit 180 (step S13, “voice recognition step” in the claims). Equivalent).

次に、判定部１８０が、音声認識部１６０が計算したスコアSを入力し、当該スコアSが閾値保持部１７０に保持された閾値X以上である場合に（ステップＳ１４：ＹＥＳ）、入力音声の内容が詐欺の話題に相当すると判定する（ステップＳ１５、特許請求の範囲の「判定ステップ」に相当）。一方、当該スコアSが閾値保持部１７０に保持された閾値X以上でない場合には（ステップＳ１４：ＮＯ）、処理はそのまま終了する。つまり、詐欺の話題とは判定しない。 Next, when the determination unit 180 inputs the score S calculated by the speech recognition unit 160 and the score S is equal to or greater than the threshold value X held in the threshold holding unit 170 (step S14: YES), It is determined that the content corresponds to the topic of fraud (step S15, corresponding to the “determination step” in the claims). On the other hand, when the score S is not equal to or higher than the threshold value X held in the threshold value holding unit 170 (step S14: NO), the process ends as it is. In other words, it is not judged as a topic of fraud.

以上の流れにおいて、ステップＳ１１にて例えば以下の入力音声が受信されたとする。
入力音声の例： “会社のお金を使い込んだ。払わないと監査に間に合わない。このままでは横領罪になって捕まってしまう。”
このような入力音声に対してステップＳ１２の音声認識処理を行い、以下のような結果を得たとする。
音声認識の結果の例：“会社のお金を使いこなした。払わないと検査に間に合わない。このままでは奉公になって捕まってしまう。”
このような音声認識処理の結果に対してステップＳ１３のスコア計算処理を行い、そのスコアＳが例えば0.8であり、閾値Ｘが例えば0.7である場合には、ステップＳ１５において上記入力音声は詐欺の話題であると判定される。つまり、本実施形態により、帯域が狭く且つ雑音の影響が大きいことから認識率が特に低い電話音声の場合に、「使い込んだ」、「監査」、「横領」等のキーワードが適切に認識されなかったにもかかわらず、入力音声が詐欺の話題であるか否かが適切に判定される。 In the above flow, it is assumed that, for example, the following input voice is received in step S11.
Example of input speech: “We spent company money. If we don't pay, we wo n’t be able to meet the audit.
It is assumed that the speech recognition process in step S12 is performed on such input speech and the following result is obtained.
An example of speech recognition results: “I used my company's money. If I don't pay it, I can't make it in time for the test.
The score calculation process of step S13 is performed on the result of such voice recognition process, and when the score S is, for example, 0.8 and the threshold value X is, for example, 0.7, the input voice is a topic of fraud in step S15. It is determined that In other words, according to the present embodiment, keywords such as “used”, “audit”, and “embedding” are not properly recognized in the case of telephone voice with a particularly low recognition rate due to the narrow band and the influence of noise. Nevertheless, it is appropriately determined whether or not the input voice is a topic of fraud.

以上で説明した本発明の第１実施形態によれば、詐欺という特定の話題に特化したデータより作成した詐欺ＬＭを用いて、入力音声に対する音声認識を行い、その結果に対するスコアＳを閾値Ｘと比較することにより、話題判定を行う。このことにより、キーワードの出現回数に基づいた話題判定手法に比べ、精度の高い話題判定を行うことができる。例えば特定のキーワードが認識されなかった場合でも、適切な話題判定が行われる。 According to the first embodiment of the present invention described above, speech recognition for input speech is performed using a fraud LM created from data specialized for a specific topic of fraud, and the score S for the result is set as a threshold value X. The topic is determined by comparing with. This makes it possible to perform topic determination with higher accuracy than the topic determination method based on the number of appearances of a keyword. For example, even when a specific keyword is not recognized, appropriate topic determination is performed.

また、話題判定に単語の出現確率を用いることにより、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、精度の高い話題判定を行うことができる。単語の出現確率をモデル化した統計的言語モデルとしては、例えばuni-gramがある。 In addition, by using the word appearance probability in the topic determination, it is possible to perform topic determination with higher accuracy than in the conventional topic determination method based on the mere appearance count of the word. For example, a uni-gram is a statistical language model that models the probability of word appearance.

また、話題判定に単語間の接続確率を用いることにより、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、精度の高い話題判定を行うことができる。単語間の接続確率をモデル化した統計的言語モデルとしては、例えばn-gramがある。 In addition, by using the connection probability between words for topic determination, it is possible to perform topic determination with higher accuracy than the conventional topic determination method based on the mere appearance frequency of words. An example of a statistical language model that models the probability of connection between words is n-gram.

以上のように、本発明によれば、単語の単なる出現回数に基づいた従来の話題判定手法に比べ、その精度が高まる。このため、本発明は、帯域が狭く且つ雑音の影響が大きいことから認識率が低い電話音声の場合に、特に有用である。 As described above, according to the present invention, the accuracy is improved as compared with the conventional topic determination method based on the mere appearance number of words. For this reason, the present invention is particularly useful in the case of a telephone voice with a low recognition rate because the band is narrow and the influence of noise is large.

＜第２実施形態＞
続いて、本発明の第２実施形態について説明する。なお、上記説明した第１実施形態と重複する部分については説明を省略し、第１実施形態との相違点を中心に説明する。 Second Embodiment
Subsequently, a second embodiment of the present invention will be described. In addition, description is abbreviate | omitted about the part which overlaps with 1st Embodiment described above, and it demonstrates centering around difference with 1st Embodiment.

図４は、第２実施形態にかかる詐欺判定装置２００の構成概要図である。第１実施形態における詐欺判定装置１００と比べると、詐欺判定装置２００は、一般言語モデル保持部２１０（特許請求の範囲の「言語モデル保持手段」に相当、以下「一般ＬＭ保持部２１０」と記載）を更に備え、音声認識部１６０の代わりに第１音声認識部２２０（特許請求の範囲の「音声認識手段」に相当）および第２音声認識部２３０（特許請求の範囲の「音声認識手段」に相当）を備え、閾値保持部１７０を備えない。 FIG. 4 is a schematic configuration diagram of a fraud determination apparatus 200 according to the second embodiment. Compared with the fraud determination apparatus 100 in the first embodiment, the fraud determination apparatus 200 corresponds to a general language model holding unit 210 (corresponding to “language model holding unit” in the claims, and hereinafter referred to as “general LM holding unit 210”). ), Instead of the voice recognition unit 160, a first voice recognition unit 220 (corresponding to “voice recognition means” in the claims) and a second voice recognition unit 230 (“voice recognition means” in the claims) And the threshold value holding unit 170 is not provided.

一般ＬＭ保持部２１０は、詐欺の話題に特化していない言語モデルである一般ＬＭ（特許請求の範囲の「一般話題言語モデル」に相当）を保持するものである。この一般ＬＭは、詐欺判定装置２００内で作成されても良く、外部で作成されたものを入力するようにしても良い。言語モデルを作成する手法そのものは、例えば上記の参考文献１に開示されているように公知の技術であるため、ここでは説明を省略する。なお、一般ＬＭは詐欺話題に特化していないので、詐欺に関連した単語の出現確率や単語間の接続確率が詐欺に関連していない単語の出現確率や単語間の接続確率より特別に高い訳ではない。 The general LM holding unit 210 holds a general LM (corresponding to the “general topic language model” in the claims), which is a language model not specialized in the topic of fraud. The general LM may be created in the fraud determination apparatus 200, or may be inputted externally. Since the technique itself for creating the language model is a known technique as disclosed in, for example, the above-mentioned Reference Document 1, description thereof is omitted here. Since general LM is not specialized in fraud topics, the probability of occurrence of words related to fraud and the probability of connection between words are particularly higher than the probability of appearance of words not related to fraud and the probability of connection between words. is not.

第１音声認識部２２０は、詐欺LM保持部１４０に保持された詐欺LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する音声認識（第１音声認識）を行うものである。音声認識部１６０は第１音声認識を行った後に、当該結果に対するスコア（第１スコアS１）を更に計算し、計算後の第１スコアS１を判定部１８０に出力する。 The first voice recognition unit 220 uses the fraud LM held in the fraud LM holding unit 140 and the acoustic model held in the acoustic model holding unit 150 to perform voice recognition on the input voice input from the voice input unit 110 (first 1 speech recognition). After performing the first speech recognition, the speech recognition unit 160 further calculates a score (first score S1) for the result, and outputs the calculated first score S1 to the determination unit 180.

第２音声認識部２３０は、一般ＬＭ保持部２１０に保持された一般LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する音声認識（第２音声認識）を行うものである。音声認識部１６０は第２音声認識を行った後に、当該結果に対するスコア（第２スコアS２）を更に計算し、計算後の第２スコアS２を判定部１８０に出力する。 The second voice recognition unit 230 uses the general LM held in the general LM holding unit 210 and the acoustic model held in the acoustic model holding unit 150 to perform voice recognition on the input voice input from the voice input unit 110 (first 2 speech recognition). After performing the second speech recognition, the speech recognition unit 160 further calculates a score (second score S2) for the result, and outputs the calculated second score S2 to the determination unit 180.

判定部１８０は、第１音声認識部２２０および第２音声認識部２３０が計算した第１スコアS１および第２スコアS２を入力し、第１スコアS１が第２スコアS２以上である場合に、入力音声の内容が詐欺の話題に相当すると判定する。 The determination unit 180 inputs the first score S1 and the second score S2 calculated by the first speech recognition unit 220 and the second speech recognition unit 230, and is input when the first score S1 is equal to or greater than the second score S2. It is determined that the audio content corresponds to the topic of fraud.

（詐欺判定装置２００の動作）
続いて、詐欺判定装置２００により行われる動作（特許請求の範囲の「話題判定方法」）について、図５を参照しながら説明する。図５は詐欺判定装置２００により行われる動作を示すフローチャートである。なお、以下の説明においては、詐欺データ収集部１２０および詐欺LM作成部１３０による詐欺データ収集処理および詐欺LM作成処理（特許請求の範囲の「データ収集ステップ」および「話題特化言語モデル作成ステップ」に相当）は既に行われており、作成された詐欺LMが既に詐欺LM保持部１４０に保持されており、詐欺の話題に特化していない一般ＬＭが既に一般LM保持部２１０に保持されているものとする（特許請求の範囲の「言語モデル保持ステップ」に相当）。また、音響モデルにおいても、既に音響モデル保持部１５０に保持されているものとする（特許請求の範囲の「音響モデル保持ステップ」に相当）。 (Operation of fraud determination device 200)
Next, an operation performed by the fraud determination apparatus 200 (“topic determination method” in the scope of claims) will be described with reference to FIG. FIG. 5 is a flowchart showing operations performed by the fraud determination apparatus 200. In the following description, fraud data collection processing and fraud LM creation processing by the fraud data collection unit 120 and fraud LM creation unit 130 (“data collection step” and “topic specialization language model creation step” in the claims) The fraud LM created is already held in the fraud LM holding unit 140, and a general LM not specialized in the topic of fraud is already held in the general LM holding unit 210. (Corresponding to “language model holding step” in the claims). Also, the acoustic model is already held in the acoustic model holding unit 150 (corresponding to “acoustic model holding step” in the claims).

最初に、音声入力部１１０が入力音声を受信し、第１音声認識部２２０および第２音声認識部２３０に出力する。（ステップＳ２１）。 First, the voice input unit 110 receives an input voice and outputs it to the first voice recognition unit 220 and the second voice recognition unit 230. (Step S21).

次に、第１音声認識部２２０が、詐欺LM保持部１４０に保持された詐欺LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する第１音声認識を行う（ステップＳ２２、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the first voice recognition unit 220 uses the fraud LM held in the fraud LM holding unit 140 and the acoustic model held in the acoustic model holding unit 150 to perform the first processing on the input voice input from the voice input unit 110. 1 voice recognition is performed (step S22, corresponding to “voice recognition step” in the claims).

次に、第１音声認識部２２０が、ステップＳ２２の音声認識の結果に対する第１スコアS１を計算し、計算後の第１スコアS１を判定部１８０に出力する（ステップＳ２３、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the first speech recognition unit 220 calculates a first score S1 for the speech recognition result in step S22, and outputs the calculated first score S1 to the determination unit 180 (step S23, claims) Equivalent to “voice recognition step”).

次に、第２音声認識部２３０が、一般ＬＭ保持部２１０に保持された一般LMおよび音響モデル保持部１５０に保持された音響モデルを用いて、音声入力部１１０より入力された入力音声に対する第２音声認識を行う（ステップＳ２４、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the second voice recognition unit 230 uses the general LM held in the general LM holding unit 210 and the acoustic model held in the acoustic model holding unit 150 to perform the second processing on the input voice input from the voice input unit 110. 2. Perform voice recognition (step S24, corresponding to “voice recognition step” in the claims).

次に、第２音声認識部２３０が、ステップＳ２４の音声認識の結果に対する第２スコアS２を計算し、計算後の第２スコアS２を判定部１８０に出力する（ステップＳ２５、特許請求の範囲の「音声認識ステップ」に相当）。 Next, the second speech recognition unit 230 calculates a second score S2 for the result of speech recognition in step S24, and outputs the calculated second score S2 to the determination unit 180 (step S25, claims) Equivalent to “voice recognition step”).

次に、判定部１８０が、第１音声認識部２２０および第２音声認識部２３０が計算した第１スコアS１および第２スコアS２を入力し、第１スコアS１が第２スコアS２以上である場合に（ステップＳ２６：ＹＥＳ）、入力音声の内容が詐欺の話題に相当すると判定する（ステップＳ２７、特許請求の範囲の「判定ステップ」に相当）。一方、第１スコアS１が第２スコアS２以上でない場合には（ステップＳ２６：ＮＯ）、処理はそのまま終了する。つまり、詐欺の話題とは判定しない。 Next, the determination unit 180 inputs the first score S1 and the second score S2 calculated by the first speech recognition unit 220 and the second speech recognition unit 230, and the first score S1 is greater than or equal to the second score S2. On the other hand (step S26: YES), it is determined that the content of the input voice corresponds to the topic of fraud (step S27, corresponding to the “determination step” in the claims). On the other hand, when the first score S1 is not equal to or higher than the second score S2 (step S26: NO), the process is ended as it is. In other words, it is not judged as a topic of fraud.

以上の流れにおいて、ステップＳ２１にて例えば以下の入力音声が受信されたとする。
入力音声の例：“会社のお金を使い込んだ。払わないと監査に間に合わない。このままでは横領罪になって捕まってしまう。”
このような入力音声に対してステップＳ２２の音声認識処理およびステップＳ２４の音声認識処理を行い、以下のような結果を得たとする。
Ｓ２２の結果例：“カードのお金を使いこなした。払わないと検査に間に合わない。警察局では奉公になって捕まってしまう。”
Ｓ２４の結果例：“会社のお金を使いこなした。払わないと検査に間に合わない。このままでは奉公になって捕まってしまう。”
このような音声認識処理の結果に対してステップＳ２３およびステップＳ２５のスコア計算処理を行い、第１スコアS１が例えば0.8であり、第２スコアS２が例えば0.4である場合には、ステップＳ２７において上記入力音声は詐欺の話題であると判定される。つまり、本実施形態により、帯域が狭く且つ雑音の影響が大きいことから認識率が特に低い電話音声の場合に、「使い込んだ」、「監査」、「横領」等のキーワードが適切に認識されなかったにもかかわらず、入力音声が詐欺の話題であるか否かが適切に判定される。 In the above flow, it is assumed that, for example, the following input voice is received in step S21.
Example of input speech: “We spent company money. If we don't pay, we wo n’t be in time for the audit.
It is assumed that the speech recognition process in step S22 and the speech recognition process in step S24 are performed on such input speech, and the following results are obtained.
Example of S22 result: “I used my card money. If I don't pay, I can't make it in time for the test.
Example of S24 result: “You have used your company's money. If you do not pay, you will not be in time for the inspection.
If the first score S1 is 0.8, for example, and the second score S2 is 0.4, for example, the score calculation processing of step S23 and step S25 is performed on the result of such voice recognition processing, and the above-mentioned in step S27. The input voice is determined to be a topic of fraud. In other words, according to the present embodiment, keywords such as “used”, “audit”, and “embedding” are not properly recognized in the case of telephone voice with a particularly low recognition rate due to the narrow band and the influence of noise. Nevertheless, it is appropriately determined whether or not the input voice is a topic of fraud.

以上で説明した本発明の第２実施形態によれば、詐欺という特定の話題に特化したデータより作成した詐欺ＬＭを用いて、入力音声に対する第１音声認識を行い、且つ詐欺という特定の話題に特化していない一般ＬＭを用いて、入力音声に対する第２音声認識を行い、それらの結果に対するそれぞれのスコアを比較することにより、話題判定を行う。このことにより、キーワードの出現回数に基づいた話題判定手法に比べ、精度の高い話題判定を行うことができる。例えば特定のキーワードが認識されなかった場合でも、適切な話題判定が行われる。更に、閾値を別途定める必要がないというメリットがある。 According to the second embodiment of the present invention described above, the first speech recognition for the input speech is performed using the fraud LM created from the data specialized for the specific topic of fraud, and the specific topic of fraud The second speech recognition is performed on the input speech using a general LM that is not specialized for the topic, and topic determination is performed by comparing respective scores for the results. This makes it possible to perform topic determination with higher accuracy than the topic determination method based on the number of appearances of a keyword. For example, even when a specific keyword is not recognized, appropriate topic determination is performed. Further, there is an advantage that it is not necessary to separately set a threshold value.

＜第３実施形態＞
続いて、本発明の第３実施形態について説明する。なお、上記説明した第１実施形態や第２実施形態と重複する部分については説明を省略し、第１実施形態や第２実施形態との相違点を中心に説明する。 <Third Embodiment>
Subsequently, a third embodiment of the present invention will be described. In addition, description is abbreviate | omitted about the part which overlaps 1st Embodiment described above, or 2nd Embodiment, and it demonstrates centering around difference with 1st Embodiment or 2nd Embodiment.

図６は、第３実施形態にかかる詐欺判定装置３００の構成概要図である。第２実施形態における詐欺判定装置２００と比べると、詐欺判定装置３００は、キーワード保持部３１０（特許請求の範囲の「キーワード保持手段」に相当）を更に備える。キーワード保持部３１０は、例えば詐欺という特定の話題に特化したキーワードを保持するものである。 FIG. 6 is a schematic configuration diagram of a fraud determination apparatus 300 according to the third embodiment. Compared to the fraud determination apparatus 200 in the second embodiment, the fraud determination apparatus 300 further includes a keyword holding unit 310 (corresponding to “keyword holding means” in the claims). The keyword holding unit 310 holds a keyword specialized in a specific topic such as fraud, for example.

第３実施形態にかかる詐欺判定装置３００の判定部１８０は、第１音声認識部２２０および第２音声認識部２３０が計算した第１スコアS１および第２スコアS２を入力し、第１スコアS１が第２スコアS２以上であることに加え、第１音声認識の結果または第２音声認識の結果に上記キーワードが一定回数以上検知された場合に、入力音声の内容が詐欺の話題に相当すると判定する。 The determination unit 180 of the fraud determination apparatus 300 according to the third embodiment inputs the first score S1 and the second score S2 calculated by the first speech recognition unit 220 and the second speech recognition unit 230, and the first score S1 is In addition to being above the second score S2, if the keyword is detected more than a certain number of times in the first speech recognition result or the second speech recognition result, it is determined that the content of the input speech corresponds to the topic of fraud .

（詐欺判定装置３００の動作）
続いて、詐欺判定装置３００により行われる動作（特許請求の範囲の「話題判定方法」）について、図７を参照しながら説明する。図７は詐欺判定装置３００により行われる動作を示すフローチャートである。なお、以下の説明においては、詐欺データ収集部１２０および詐欺LM作成部１３０による詐欺データ収集処理および詐欺LM作成処理（特許請求の範囲の「データ収集ステップ」および「話題特化言語モデル作成ステップ」に相当）は既に行われており、作成された詐欺LMが既に詐欺LM保持部１４０に保持されており、詐欺の話題に特化していない一般ＬＭが既に一般LM保持部２１０に保持されているものとする（特許請求の範囲の「言語モデル保持ステップ」に相当）。また、音響モデルにおいても、既に音響モデル保持部１５０に保持されているものとする（特許請求の範囲の「音響モデル保持ステップ」に相当）。 (Operation of fraud determination device 300)
Next, an operation performed by the fraud determination apparatus 300 (“topic determination method” in the scope of claims) will be described with reference to FIG. FIG. 7 is a flowchart showing operations performed by the fraud determination apparatus 300. In the following description, fraud data collection processing and fraud LM creation processing by the fraud data collection unit 120 and fraud LM creation unit 130 (“data collection step” and “topic specialization language model creation step” in the claims) The fraud LM created is already held in the fraud LM holding unit 140, and a general LM not specialized in the topic of fraud is already held in the general LM holding unit 210. (Corresponding to “language model holding step” in the claims). Also, the acoustic model is already held in the acoustic model holding unit 150 (corresponding to “acoustic model holding step” in the claims).

最初に、上記第２実施形態にて説明したステップＳ２１〜ステップＳ２５の処理が行われる（特許請求の範囲の「音声認識ステップ」に相当）。 First, the processes of steps S21 to S25 described in the second embodiment are performed (corresponding to the “voice recognition step” in the claims).

次に、判定部１８０が、第１音声認識部２２０および第２音声認識部２３０が計算した第１スコアS１および第２スコアS２を入力し、第１スコアS１が第２スコアS２以上である場合に（ステップＳ２６：ＹＥＳ）、入力音声の内容が詐欺の話題に相当すると暫定的に判定する。これを詐欺話題判定の第１段階という。 Next, the determination unit 180 inputs the first score S1 and the second score S2 calculated by the first speech recognition unit 220 and the second speech recognition unit 230, and the first score S1 is greater than or equal to the second score S2. (Step S26: YES), it is tentatively determined that the content of the input voice corresponds to the topic of fraud. This is called the first stage of fraud topic determination.

次に、判定部１８０が、第１音声認識の結果または第２音声認識の結果に上記キーワードが一定回数以上検知された場合に（ステップＳ３１：ＹＥＳ）、入力音声の内容が詐欺の話題に相当すると最終的に判定する。これを詐欺話題判定の第２段階という。なお、本実施形態では、上記の一定回数を例えば１回とする。 Next, when the determination unit 180 detects the keyword more than a certain number of times in the first speech recognition result or the second speech recognition result (step S31: YES), the content of the input speech corresponds to the topic of fraud. Then, finally judge. This is called the second stage of fraud topic determination. In the present embodiment, the predetermined number of times is set to one time, for example.

一方、第１スコアS１が第２スコアS２以上でない場合（ステップＳ２６：ＮＯ）、または第１スコアS１が第２スコアS２以上であるがキーワードが発見されない場合には（ステップＳ３１：ＮＯ）、処理はそのまま終了する。つまり、詐欺の話題とは判定しない。 On the other hand, when the first score S1 is not equal to or higher than the second score S2 (step S26: NO), or when the first score S1 is equal to or higher than the second score S2 but no keyword is found (step S31: NO), processing is performed. Ends as is. In other words, it is not judged as a topic of fraud.

以上の流れにおいて、話題判定をステップＳ２６およびステップＳ３１の２段階で行う理由は以下のようである。すなわち、例えば「オレオレ詐欺」等の詐欺事件において、詐欺を犯す者は、まず、背景の説明を行う。例えば、警察を偽って、被害者の家族に刑事事件が発生したことやその進行状況および結果を説明する。この段階を詐欺行為の第１段階とすると、この第１段階を通じて、詐欺を犯す者は、被害者の信頼を得てしまい、いよいよ本題、つまり実際の振込みの話に入る。詐欺を犯す者が本題の話をする段階を詐欺行為の第２段階とすると、この第２段階における発話内容は決まった文句になりやすい。例えば、“＊＊＊円のお金を口座＊＊＊に振り込んでくれ！”等である。この第２段階で使用される言葉は、詐欺行為において極めて重要な言葉となることが多く、その種類や数も比較的に少ない。つまり、第１段階における背景説明が何れの内容であるかにかかわらず、第２段階における発話は事前に予想可能な決まった文句になりやすい。したがって、この第２段階におけるキーワードは、第１段階におけるキーワードと異なり、事前に設定することが困難なものとはならない。しかも、第２段階におけるキーワードにより、話題判定の精度が更に高くなる。なお、従来技術においては、本実施形態における第１段階や第２段階との概念を考慮しないままでキーワードによる話題判定をしたため、その精度が高くなかったことが言える。 In the above flow, the reason for performing the topic determination in two stages of step S26 and step S31 is as follows. That is, for example, in a fraud case such as “Ole scam”, a person who commits a fraud first explains the background. For example, deceive the police and explain the occurrence of criminal cases in the victim's family and their progress and results. If this stage is the first stage of fraud, through this first stage, the person who commits the fraud gains the trust of the victim and finally enters the main topic, that is, the actual transfer. If the stage where a person who commits a fraud talks about the subject matter is the second stage of fraud, the content of the utterance in this second stage tends to be a fixed phrase. For example, “Please transfer **** yen money to your account ***!”. The words used in this second stage are often very important words in fraud, and the types and numbers thereof are relatively small. That is, regardless of the content of the background explanation in the first stage, the utterance in the second stage tends to be a fixed phrase that can be predicted in advance. Therefore, unlike the keyword in the first stage, the keyword in the second stage is not difficult to set in advance. Moreover, the accuracy of topic determination is further enhanced by the keywords in the second stage. In the prior art, it can be said that the accuracy of the topic is not high because the topic is determined by the keyword without considering the concept of the first stage and the second stage in the present embodiment.

以上で説明した本発明の第３実施形態によれば、スコア比較によりある程度の話題判定が進んだら、この時点でキーワードによる更なる話題判定を行う。つまり、言語モデルによる第１話題判定とキーワードによる第２話題判定を重ねて行う。キーワードによる第２話題判定は、第１話題判定が終わった時点で行うので、当該話題に対するキーワードの設定は比較的にしやすくなる。また、複数回の話題判定を行うことにより、話題判定の精度を更に高めることができる。 According to the third embodiment of the present invention described above, when a certain degree of topic determination has progressed by score comparison, further topic determination by keyword is performed at this point. That is, the first topic determination based on the language model and the second topic determination based on the keyword are repeated. Since the second topic determination by keyword is performed when the first topic determination is completed, it is relatively easy to set the keyword for the topic. Moreover, the accuracy of topic determination can be further increased by performing topic determination multiple times.

以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されないことは言うまでもない。 As mentioned above, although preferred embodiment of this invention was described, it cannot be overemphasized that this invention is not limited to the said embodiment.

例えば、上記実施形態においては、「特定の話題」として「詐欺」を一例としたが、これに限られることなく、「詐欺」以外の他の話題として良い。 For example, in the above embodiment, “fraud” is taken as an example of “specific topic”, but the present invention is not limited to this, and may be a topic other than “fraud”.

また、上記実施形態においては、言語モデルの一例としてuni-gramやn-gramを挙げて説明したが、これに限られることなく、uni-gramやn-gram以外の他の言語モデルを本発明の概念に適宜用いても良い。 In the above embodiment, uni-gram and n-gram have been described as examples of language models. However, the present invention is not limited to this, and other language models other than uni-gram and n-gram are used in the present invention. You may use suitably for this concept.

１００，２００，３００…詐欺判定装置、１１０…音声入力部、１２０…詐欺データ収集部、１３０…詐欺言語モデル作成部（詐欺LM作成部）、１４０…詐欺言語モデル保持部（詐欺LM保持部）、１５０…音響モデル保持部、１６０…音声認識部、１７０…閾値保持部、１８０…判定部、２１０…一般言語モデル保持部（一般LM保持部）、２２０…第１音声認識部、２３０…第２音声認識部、３１０…キーワード保持部。 DESCRIPTION OF SYMBOLS 100,200,300 ... Fraud judging device, 110 ... Voice input part, 120 ... Fraud data collection part, 130 ... Fraud language model creation part (fraud LM creation part), 140 ... Fraud language model holding part (fraud LM holding part) , 150 ... acoustic model holding unit, 160 ... voice recognition unit, 170 ... threshold holding unit, 180 ... determination unit, 210 ... general language model holding unit (general LM holding unit), 220 ... first voice recognition unit, 230 ... first 2 voice recognition unit, 310... Keyword holding unit.

Claims

A data collection means for collecting data specific to a particular topic;
Using the data collected by the data collection means, topic-specific language model creation means for creating a topic-specific language model specialized for the specific topic;
A language model holding means for holding a topic-specific language model created by the topic-specific language model creating means, and a general topic language model not specialized for the specific topic;
Acoustic model holding means for holding an acoustic model for voice recognition;
The first speech recognition is performed on the input speech using the topic-specific language model held in the language model holding unit and the acoustic model held in the acoustic model holding unit, and the general language held in the language model holding unit Second speech recognition is performed on the input speech using a topic language model and an acoustic model held in the acoustic model holding unit, and further, a first score for the result of the first speech recognition and a result of the second speech recognition are Voice recognition means for calculating a second score;
A determination unit that inputs the first score and the second score calculated by the voice recognition unit, and determines that the content of the input voice corresponds to the specific topic when the first score is equal to or higher than the second score. When,
Comprising a keyword holding means for holding a keyword specialized for the specific topic,
When the determination unit detects that the keyword is detected more than a certain number of times in the result of the first speech recognition or the result of the second speech recognition after it is determined that the first score is greater than or equal to the second score. And determining that the content of the input voice corresponds to the specific topic,
The specific topic is a topic of fraud,
The topic determination device according to claim 1, wherein the input voice is utterance using a telephone .

A data collection step in which the data collection means collects data specific to a specific topic;
Topic-specific language model creation means uses the data collected by the data collection means to create a topic-specific language model creation step for creating a topic-specific language model specialized for the specific topic;
A language model holding unit that holds a topic-specific language model created by the topic-specific language model creation unit and a general topic language model that is not specialized for the specific topic;
An acoustic model holding unit that holds an acoustic model for performing speech recognition; and
The speech recognition means performs first speech recognition on the input speech using the topic-specific language model held in the language model holding means and the acoustic model held in the acoustic model holding means, and the language model holding means The second speech recognition for the input speech is performed using the general topic language model held in the sound model and the acoustic model held in the acoustic model holding means, and the first score and the second score for the result of the first speech recognition are further performed. A speech recognition step of calculating a second score for the result of speech recognition;
When the determination unit inputs the first score and the second score calculated by the speech recognition unit, and the first score is equal to or higher than the second score, the content of the input speech corresponds to the specific topic. A determination step for determining;
The keyword holding means includes a keyword holding step for holding a keyword specialized for the specific topic,
In the determination step, after the determination means determines that the first score is greater than or equal to the second score, the keyword is added to the result of the first speech recognition or the result of the second speech recognition a predetermined number of times. When detected above, it is determined that the content of the input voice corresponds to the specific topic,
The specific topic is a topic of fraud,
A topic determination method , wherein the input voice is utterance using a telephone .