JP2010008764A

JP2010008764A - Speech recognition method, speech recognition system and speech recognition device

Info

Publication number: JP2010008764A
Application number: JP2008168594A
Authority: JP
Inventors: Takako Onishi; 貴子大西; Katsumi Ohashi; 勝己大橋
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2010-01-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition method capable of enhancing the precision of speech recognition. <P>SOLUTION: The speech of an operator and the speech of a customer are recorded in an operator speech record section 21 and a customer speech record section 22, respectively (S21). First, recognition of speech of the operator is performed for speech of the operator, which is recorded by an operator speech record section 21 (S22). After recognition of speech of the operator is performed, a priority of a word appeared as a result of the recognition is set as a maximum value(S23). Then, according to a coincidence measure calculated by a coincidence measure calculation section 26, the priority of the word is set (S24). Then, speech recognition of the customer is performed for the speech of the customer, which is recorded by the customer speech record section 22 (S25). <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、音声認識方法、音声認識システム、および音声認識装置に関するものであり、特に、第一の音声および第一の音声と異なる第二の音声を認識する音声認識方法、音声認識システムおよび音声認識装置に関するものである。 The present invention relates to a voice recognition method, a voice recognition system, and a voice recognition device, and in particular, a voice recognition method, a voice recognition system, and a voice that recognize a first voice and a second voice different from the first voice. The present invention relates to a recognition device.

一般的なコールセンターにおいては、オペレータと顧客とが電話回線を介して会話を行なう。そして、オペレータは、顧客に対して、技術サポートや商品説明等のサービスを行なう。オペレータは、顧客との通話中に、パソコン（パーソナルコンピュータ）等を操作しながらサービスを行うことも一般的に行なわれている。 In a general call center, an operator and a customer have a conversation via a telephone line. The operator then provides services such as technical support and product explanation to the customer. In general, an operator performs a service while operating a personal computer (personal computer) or the like during a call with a customer.

このようなコールセンターでは、オペレータの業務内容として、オペレータと顧客との会話内容を記録することが必要とされる。この場合、オペレータが、通話終了後に顧客の問合せ内容や回答内容を１件１件思い出しながら会話内容を記録することがある。しかし、このような方法では、正確な記録が困難であり、記載漏れや誤り等が発生してしまうおそれがある。また、顧客との通話中においては、上記したようにパソコン等による操作を行なっている場合もあるので、通話中における会話内容の記録も、非常に困難である。 In such a call center, it is necessary to record the conversation content between the operator and the customer as the business content of the operator. In this case, the operator may record the conversation contents while recalling the customer's inquiry contents and answer contents one by one after the call ends. However, with such a method, it is difficult to perform accurate recording, and there is a risk of omissions and errors. Also, during a call with a customer, there are cases where operations are performed by a personal computer or the like as described above, so it is very difficult to record the contents of the conversation during the call.

ここで、オペレータや顧客の音声を認識して出力する音声認識に関する技術が、例えば、特開平１１−３３８４９４号公報（特許文献１）、および特許第３８２７７０４号（特許文献２）に開示されている。特許文献１によると、通話内容を音声認識技術によりテキスト化、すなわち、自動的に文字化することとしている。こうすることにより、会話内容の記録作業の効率向上を図ることとしている。 Here, techniques relating to voice recognition for recognizing and outputting voices of operators and customers are disclosed in, for example, Japanese Patent Laid-Open No. 11-338494 (Patent Document 1) and Japanese Patent No. 3827704 (Patent Document 2). . According to Patent Document 1, the content of a call is converted into text, that is, automatically converted into text by a voice recognition technique. By doing so, the efficiency of the conversation content recording work is improved.

このような一般的な音声認識技術は、会話中の単語を全て正確に認識できるのではなく、出力された音声認識の結果に対しての確認や修正が必要となる。特許文献２によると、テキスト化した音声認識の結果をパソコンのディスプレイ等に表示し、予め指定されたキーワード等を強調表示することとしている。こうすることにより、オペレータが短時間で修正箇所を把握し、修正等を行なうことができることとしている。このようにして、会話内容の記録作業の効率向上を図ることとしている。
特開平１１−３３８４９４号公報特許第３８２７７０４号 Such a general speech recognition technique cannot accurately recognize all words in a conversation but requires confirmation and correction of the output speech recognition result. According to Patent Document 2, the text recognition result is displayed on a personal computer display or the like, and a keyword or the like designated in advance is highlighted. By doing so, the operator can grasp the correction part in a short time and perform correction or the like. In this way, the efficiency of recording conversation contents is improved.
JP 11-338494 A Japanese Patent No. 3827704

上記した特許文献１および特許文献２に示す一般的な音声認識技術は、録音した音声の波形から想定される単語の候補を、予め用意した辞書から複数選定する。そして、その中で評価値が最も高い単語、すなわち、最も適切であろう単語を認識結果として出力するものである。 The general speech recognition techniques shown in Patent Literature 1 and Patent Literature 2 described above select a plurality of word candidates assumed from a recorded speech waveform from a dictionary prepared in advance. And the word with the highest evaluation value among them, that is, the word that will be most appropriate is output as the recognition result.

ここで、音声認識の精度については、修正箇所の低減等の観点から、高い方が好ましい。音声の波形は、各個人によって異なるものであるため、音声認識の精度を高くするためには、予め認識させる音声の波形と単語との対応関係を適切にしておくこと、すなわち、予め認識させる音声を学習させることが考えられる。 Here, the accuracy of voice recognition is preferably higher from the viewpoint of reducing the number of correction points. Since the waveform of the voice is different for each individual, in order to increase the accuracy of voice recognition, the correspondence between the waveform of the voice to be recognized and the word is appropriately set, that is, the voice to be recognized in advance. Can be considered.

ここで、オペレータについては、個人の特定が可能であるため、オペレータの音声を予め認識させる音声として学習させることは可能である。すなわち、オペレータの音声の音声認識においては、認識精度を向上させることは可能である。しかし、通話先となる不特定多数の顧客の音声については、認識精度の向上のために予め学習させることはできない。そうすると、顧客の音声の音声認識においては、上記した特許文献１および特許文献２に示す一般的な音声認識、すなわち、事前に学習を行なわない音声認識を採用せざるを得ず、認識精度が不十分となってしまう。このような事態は、オペレータの作業効率を悪化させてしまうことになる。 Here, since an operator can identify an individual, the operator's voice can be learned as voice to be recognized in advance. That is, in the voice recognition of the operator's voice, it is possible to improve the recognition accuracy. However, the voices of an unspecified number of customers as call destinations cannot be learned in advance in order to improve recognition accuracy. Then, in the speech recognition of the customer's speech, the general speech recognition shown in Patent Document 1 and Patent Document 2 described above, that is, speech recognition that does not perform learning in advance must be adopted, and the recognition accuracy is poor. It will be enough. Such a situation deteriorates the operator's work efficiency.

この発明の目的は、音声認識の精度を高くすることができる音声認識方法を提供することである。 An object of the present invention is to provide a speech recognition method capable of increasing the accuracy of speech recognition.

この発明の他の目的は、音声認識の精度を高くすることができる音声認識システムを提供することである。 Another object of the present invention is to provide a speech recognition system capable of increasing the accuracy of speech recognition.

この発明のさらに他の目的は、音声認識の精度を高くすることができる音声認識装置を提供することである。 Still another object of the present invention is to provide a speech recognition apparatus capable of increasing the accuracy of speech recognition.

この発明のさらに他の目的は、音声認識の精度を高くすることができる音声認識プログラムを提供することである。 Still another object of the present invention is to provide a speech recognition program capable of increasing the accuracy of speech recognition.

この発明のさらに他の目的は、音声認識の精度を高くすることができる音声認識プログラムを記録した記録媒体を提供することである。 Still another object of the present invention is to provide a recording medium on which a voice recognition program capable of increasing the accuracy of voice recognition is recorded.

この発明に係る音声認識方法は、第一の音声および第一の音声と異なる第二の音声をそれぞれ認識する音声認識方法であって、事前に第一の音声の音声認識のためのデータを入手するデータ入手ステップと、データ入手ステップにより入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習ステップと、学習ステップの後に、第一の音声を認識する第一音声認識ステップと、第一音声認識ステップにより認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定ステップと、優先度設定ステップにより設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識ステップとを備える。 The speech recognition method according to the present invention is a speech recognition method for recognizing a first speech and a second speech different from the first speech, and obtains data for speech recognition of the first speech in advance. A data acquisition step, a learning step for performing learning for improving the recognition accuracy of the first voice using the data obtained in the data acquisition step, and a first step of recognizing the first voice after the learning step. One voice recognition step, a priority setting step for setting a higher priority for words that appear frequently in the first voice recognized in the first voice recognition step, and a word priority set in the priority setting step And a second voice recognition step for recognizing the second voice.

このように構成することにより、認識精度の高い第一の音声の認識結果を利用して、第二の音声を認識することができる。したがって、第二の音声の認識結果の精度を高めることができ、総じて、音声認識の精度を高めることができる。 By comprising in this way, a 2nd audio | voice can be recognized using the recognition result of a 1st audio | voice with high recognition accuracy. Therefore, the accuracy of the recognition result of the second voice can be improved, and the accuracy of the voice recognition can be improved as a whole.

好ましくは、優先度設定ステップは、第一音声認識ステップにより認識した第一の音声に出現する単語と所定の単語との共起尺度に基づいて、単語の優先度を設定するステップを含む。 Preferably, the priority setting step includes a step of setting a word priority based on a co-occurrence scale between a word appearing in the first voice recognized by the first voice recognition step and a predetermined word.

さらに好ましくは、優先度設定ステップは、共起尺度が高いほど、単語の優先度を高くするよう設定する。 More preferably, the priority setting step sets the word priority to be higher as the co-occurrence scale is higher.

さらに好ましくは、第一音声認識ステップの前に、所定の２つの単語間の共起尺度を予め算出する共起尺度算出ステップを備える。 More preferably, a co-occurrence scale calculating step for calculating in advance a co-occurrence scale between two predetermined words is provided before the first speech recognition step.

さらに好ましい一実施形態として、第一の音声は、オペレータの音声であり、第二の音声は、顧客の音声である。 In a further preferred embodiment, the first voice is an operator voice and the second voice is a customer voice.

この発明の他の局面においては、音声認識システムは、第一の音声および第一の音声と異なる第二の音声をそれぞれ認識する音声認識システムであって、事前に第一の音声の音声認識のためのデータを入手するデータ入手手段と、データ入手手段により入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習手段と、学習手段の後に、第一の音声を認識する第一音声認識手段と、第一音声認識手段により認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定手段と、優先度設定手段により設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識手段とを備える。 In another aspect of the present invention, a speech recognition system is a speech recognition system for recognizing a first speech and a second speech different from the first speech, respectively. A data obtaining means for obtaining data for learning, a learning means for performing learning for improving the recognition accuracy of the first speech using the data obtained by the data obtaining means, and after the learning means, First speech recognition means for recognizing speech, priority setting means for setting a higher priority for words that appear frequently in the first speech recognized by the first speech recognition means, and priority setting means. Second speech recognition means for recognizing the second speech based on the priority of the word.

この発明のさらに他の局面においては、音声認識装置は、第一の音声および第一の音声と異なる第二の音声をそれぞれ認識する音声認識装置であって、事前に第一の音声の音声認識のためのデータを入手するデータ入手部と、データ入手部により入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習部と、学習部の後に、第一の音声を認識する第一音声認識部と、第一音声認識部により認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定部と、優先度設定部により設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識部とを備える。 In still another aspect of the present invention, the speech recognition device is a speech recognition device that recognizes a first speech and a second speech different from the first speech, and recognizes the speech of the first speech in advance. A data acquisition unit for acquiring data for the first, a learning unit for performing learning for improving the recognition accuracy of the first speech using the data acquired by the data acquisition unit, Set by the first voice recognition unit that recognizes the voice, the priority setting unit that sets the priority of the words that frequently appear in the first voice recognized by the first voice recognition unit, and the priority setting unit And a second voice recognition unit for recognizing the second voice based on the priority of the word.

この発明のさらに他の局面においては、音声認識プログラムは、コンピュータを、事前に第一の音声の音声認識のためのデータを入手するデータ入手手段、データ入手手段により入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習手段、学習手段の後に、第一の音声を認識する第一音声認識手段、第一音声認識手段により認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定手段、および優先度設定手段により設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識手段として機能させるための音声認識プログラムである。 In still another aspect of the present invention, the speech recognition program uses a data acquisition unit for acquiring data for speech recognition of the first speech in advance, and data acquired by the data acquisition unit. Learning means for performing learning for improving the recognition accuracy of the first voice, the first voice recognition means for recognizing the first voice, and the first voice recognized by the first voice recognition means after the learning means In order to function as a second voice recognition means for recognizing the second voice based on the priority setting means for setting the priority of a word with high frequency to be set high and the priority of the word set by the priority setting means Is a voice recognition program.

この発明のさらに他の局面においては、記録媒体は、上記した音声認識プログラムを記録したコンピュータ読取可能な記録媒体である。 In still another aspect of the present invention, the recording medium is a computer-readable recording medium on which the above-described voice recognition program is recorded.

この発明によると、認識精度の高い第一の音声の認識結果を利用して、第二の音声を認識することができる。したがって、第二の音声の認識結果の精度を高めることができ、総じて、音声認識の精度を高めることができる。 According to the present invention, the second voice can be recognized using the recognition result of the first voice with high recognition accuracy. Therefore, the accuracy of the recognition result of the second voice can be improved, and the accuracy of the voice recognition can be improved as a whole.

また、このような音声認識システム、音声認識装置、音声認識プログラムおよび音声認識プログラムを記録した記録媒体によっても、認識精度の高い第一の音声の認識結果を利用して、第二の音声を認識することができる。したがって、第二の音声の認識結果の精度を高めることができ、総じて、音声認識の精度を高めることができる。 In addition, the second speech is recognized using the recognition result of the first speech with high recognition accuracy even by such a speech recognition system, speech recognition device, speech recognition program, and recording medium on which the speech recognition program is recorded. can do. Therefore, the accuracy of the recognition result of the second voice can be improved, and the accuracy of the voice recognition can be improved as a whole.

以下、この発明の実施の形態を、図面を参照して説明する。図１は、この発明の一実施形態に係る音声認識システムのハードウェア構成を示すシステム構成図である。図１を参照して、音声認識システム１１は、電話回線１２に接続された電話機１３と、電話機１３に接続され、オペレータの操作端末となるパソコン１４と、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ケーブル等を介してパソコン１４に接続され、データやプログラム等を格納するサーバー１５とを備える。パソコン１４は、テキスト化されたデータ等をその画面に表示するディスプレイ１９と、データを格納するためのハードディスクと、オペレータとのインターフェースとなるキーボードおよびマウス（いずれも図示せず）とを備える。パソコン１４は、その内部に格納され、パソコン１４全体の動作を制御する制御部（図示せず）によって制御されている。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a system configuration diagram showing a hardware configuration of a speech recognition system according to an embodiment of the present invention. Referring to FIG. 1, a speech recognition system 11 includes a telephone 13 connected to a telephone line 12, a personal computer 14 connected to the telephone 13 and serving as an operator's operation terminal, a LAN (Local Area Network) cable, and the like. And a server 15 for storing data, programs and the like. The personal computer 14 includes a display 19 that displays text data and the like on its screen, a hard disk for storing data, and a keyboard and mouse (both not shown) that serve as an interface with the operator. The personal computer 14 is stored therein and is controlled by a control unit (not shown) that controls the operation of the entire personal computer 14.

電話機１３とパソコン１４とは、回線１６によって接続されている。オペレータは、電話機１３を介して電話回線１２に接続されている顧客に対し、マイク１８を利用して音声を発する。また、電話機１３を介しての顧客からの音声は、スピーカ１７により入力される。第一の音声としてのオペレータの発する音声、および第一の音声とは異なる第二の音声としての顧客から入力される音声は、音声データとしてパソコン１４に入力される。入力された音声データは、パソコン１４に含まれるハードディスクに格納される。すなわち、パソコン１４によってオペレータの音声および顧客の音声が録音される。また、必要に応じて、サーバー１５にも音声データが格納される。 The telephone 13 and the personal computer 14 are connected by a line 16. The operator utters a voice using a microphone 18 to a customer connected to the telephone line 12 via the telephone 13. In addition, the voice from the customer via the telephone 13 is input through the speaker 17. The voice uttered by the operator as the first voice and the voice inputted from the customer as the second voice different from the first voice are inputted to the personal computer 14 as voice data. The input audio data is stored in a hard disk included in the personal computer 14. That is, the operator's voice and the customer's voice are recorded by the personal computer 14. In addition, audio data is also stored in the server 15 as necessary.

ここで、パソコン１４等の構成について説明する。図２は、この発明の一実施形態に係る音声認識システム１１のブロック図である。図１および図２を参照して、この発明の一実施形態に係る音声認識システム１１に備えられるパソコン１４は、オペレータの音声を録音するオペレータ音声録音部２１と、顧客の音声を録音する顧客音声録音部２２と、オペレータ音声録音部２１および顧客音声録音部２２によって録音されたオペレータの音声および顧客の音声をそれぞれ認識する音声認識部２３と、音声認識部２３により認識した音声の結果を出力する音声認識結果出力部２４とを備える。 Here, the configuration of the personal computer 14 and the like will be described. FIG. 2 is a block diagram of the speech recognition system 11 according to an embodiment of the present invention. Referring to FIGS. 1 and 2, a personal computer 14 provided in a speech recognition system 11 according to an embodiment of the present invention includes an operator speech recording unit 21 that records an operator's speech and a customer speech that records a customer's speech. Recording unit 22, voice recognition unit 23 for recognizing operator voice and customer voice respectively recorded by operator voice recording unit 21 and customer voice recording unit 22, and a result of voice recognized by voice recognition unit 23 are output. And a speech recognition result output unit 24.

パソコン１４は、第一の音声としてのオペレータの音声、および第二の音声としての顧客の音声を、オペレータ音声録音部２１および顧客音声録音部２２によってそれぞれ別々に録音する。そして、音声認識部２３において、それぞれ別々に音声認識を行なう。 The personal computer 14 records the operator's voice as the first voice and the customer's voice as the second voice by the operator voice recording unit 21 and the customer voice recording unit 22, respectively. Then, the voice recognition unit 23 performs voice recognition separately.

音声認識部２３は、音声認識辞書部２７を利用して音声認識を行なう。音声認識辞書部２７は、音声に対応する複数の単語を音声認識辞書として格納している。音声認識辞書部２７は、パソコン１４に接続されたサーバー１５に備えられている。 The voice recognition unit 23 performs voice recognition using the voice recognition dictionary unit 27. The speech recognition dictionary unit 27 stores a plurality of words corresponding to speech as a speech recognition dictionary. The voice recognition dictionary unit 27 is provided in the server 15 connected to the personal computer 14.

音声認識部２３は、オペレータ音声録音部２１および顧客音声録音部２２により録音された音声に対応する単語を、音声認識辞書部２７内の音声認識辞書に格納された複数の単語から抽出し、その結果を音声認識結果出力部２４に出力する。音声認識の出力は、テキスト形式、すなわち、テキストデータにより行う。具体的には、出力結果となるテキストデータを、パソコン１４のディスプレイ１９等により表示することにより行なう。 The voice recognition unit 23 extracts words corresponding to the voices recorded by the operator voice recording unit 21 and the customer voice recording unit 22 from a plurality of words stored in the voice recognition dictionary in the voice recognition dictionary unit 27, and The result is output to the speech recognition result output unit 24. The speech recognition output is performed in a text format, that is, text data. Specifically, the text data as the output result is displayed on the display 19 of the personal computer 14 or the like.

また、パソコン１４は、音声認識部２３により音声を認識する際に、音声認識辞書における単語の優先度を設定する単語優先度設定部２５を備える。音声認識部２３において音声を認識する際には、単語の優先度が利用される。音声に対応する類似した単語がある場合には、優先度に応じて、具体的には、優先度が高い順に認識結果として反映される。 The personal computer 14 also includes a word priority setting unit 25 that sets the priority of words in the speech recognition dictionary when the speech recognition unit 23 recognizes speech. When the speech recognition unit 23 recognizes speech, the priority of words is used. If there is a similar word corresponding to the voice, it is reflected as a recognition result according to the priority, specifically, in descending order of priority.

ここで、音声認識部２３においては、予めオペレータの音声を学習させておく。すなわち、オペレータの音声の波形から想定される単語の候補のうち、評価値の最も高い単語を、音声認識における適切な単語として認識し、誤っていれば修正するようにしておく。このような音声認識部２３における学習は、オペレータが特定されているため可能であり、このような学習をさせた音声認識部２３におけるオペレータの音声の認識結果の精度は、非常に高いものとなる。 Here, the voice recognition unit 23 learns the operator's voice in advance. That is, among the word candidates assumed from the waveform of the operator's voice, the word having the highest evaluation value is recognized as an appropriate word in voice recognition, and corrected if it is incorrect. Such learning in the speech recognition unit 23 is possible because the operator is specified, and the accuracy of the recognition result of the operator's speech in the speech recognition unit 23 that has performed such learning is very high. .

具体的には、事前に第一の音声の音声認識のためのデータを入手するデータ入手ステップとして、オペレータの音声のデータを入力する。そして、第一の音声の認識精度を向上させるための学習を行なう学習ステップとして、上記した音声認識の学習を行なう。 Specifically, voice data of an operator is input as a data acquisition step for acquiring data for voice recognition of the first voice in advance. Then, the above-described speech recognition learning is performed as a learning step for performing learning for improving the recognition accuracy of the first speech.

また、パソコン１４は、過去の業務記録から所定の２つの単語間の共起尺度を算出する共起尺度算出部２６とを備える。過去の業務記録、すなわち、オペレータと顧客との過去の会話内容を記録した業務記録格納部２８は、サーバー１５に備えられている。パソコン１４は、共起尺度算出部２６により算出した共起尺度に基づいて、単語優先度設定部２５の設定を行なう。 The personal computer 14 also includes a co-occurrence scale calculation unit 26 that calculates a co-occurrence scale between two predetermined words from past business records. A server 15 is provided with a business record storage unit 28 that records past business records, that is, past conversations between an operator and a customer. The personal computer 14 sets the word priority setting unit 25 based on the co-occurrence scale calculated by the co-occurrence scale calculation unit 26.

ここで、共起尺度算出部２６により共起尺度を算出する算出方法について説明する。図３は、共起尺度を算出する場合のパソコン１４の制御部の動作を示すフローチャートである。図１〜図３を参照して、まず、テキスト化された過去の業務記録に対して、形態素解析を実施する（図３において、ステップＳ１１、以下、ステップを省略する）。形態素解析とは、文法の知識（文法のルールの集まり）や辞書（品詞等の情報付きの単語リスト）を情報源として用い、自然言語で書かれた文を形態素の列に分割し、それぞれの品詞を判別する作業を指す。ここで、形態素（Ｍｏｒｐｈｅｍｅ）とは、言語で意味を持つ最小単位をいう。形態素解析を行なうツールとしては、無償ソフトウェアである「茶筅（ＣｈａＳｅｎ）」を始めとして種々のものがある。ここでは、一般的な手法であればどのような形態素解析法を用いても構わない。 Here, a calculation method for calculating the co-occurrence scale by the co-occurrence scale calculation unit 26 will be described. FIG. 3 is a flowchart showing the operation of the control unit of the personal computer 14 when calculating the co-occurrence scale. Referring to FIGS. 1 to 3, first, morphological analysis is performed on past business records that have been converted to text (step S11 in FIG. 3, hereinafter, steps are omitted). Morphological analysis uses grammatical knowledge (a collection of grammar rules) and a dictionary (a word list with information such as parts of speech) as information sources, and divides sentences written in natural language into morpheme strings. Refers to the task of determining part of speech. Here, a morpheme is a minimum unit having meaning in a language. There are various tools for performing morphological analysis, including “ChaSen” which is free software. Here, any morphological analysis method may be used as long as it is a general method.

その後、過去の全ての業務記録のテキストに出現する単語について、所定の２つの単語間の共起尺度を算出する（Ｓ１２）。 Thereafter, a co-occurrence scale between two predetermined words is calculated for words appearing in all past business record texts (S12).

ここで、共起尺度の算出については、以下に示す一般的な尺度のうち、いずれを用いてもよい。例えば、単語Ｘの出現数を｜Ｘ｜、単語Ｙの出現数を｜Ｙ｜とし、少なくとも一方が出現した業務記録の件数を｜Ｘ∪Ｙ｜、両方が出現した業務記録の件数を｜Ｘ∩Ｙ｜とすると、共起頻度は、｜Ｘ∩Ｙ｜、Ｊａｃｃａｒｄ係数は、｜Ｘ∩Ｙ｜／｜Ｘ∪Ｙ｜、Ｓｉｍｐｓｏｎ係数は、｜Ｘ∩Ｙ｜／ｍｉｎ（｜Ｘ｜，｜Ｙ｜）、コサイン距離は、｜Ｘ∩Ｙ｜／ｓｑｒｔ（｜Ｘ｜｜Ｙ｜）で表される。 Here, for the calculation of the co-occurrence scale, any of the following general scales may be used. For example, the number of occurrences of word X is | X |, the number of occurrences of word Y is | Y |, the number of business records in which at least one appears is | X ｜ Y |, and the number of business records in which both appear is | X If ∩Y |, the co-occurrence frequency is | X∩Y |, the Jaccard coefficient is | X∩Y | / | X∪Y |, and the Simpson coefficient is | X∩Y | / min (| X |, | Y |) and the cosine distance are represented by | X∩Y | / sqrt (| X || Y |).

このようにして共起尺度を算出する。共起尺度算出部２６により算出された共起尺度は、単語優先度設定部２５の設定に用いられる。すなわち、音声を認識する際の単語の優先度を設定する際に利用される。 In this way, the co-occurrence scale is calculated. The co-occurrence scale calculated by the co-occurrence scale calculation unit 26 is used for setting of the word priority setting unit 25. That is, it is used when setting the priority of words for recognizing speech.

次に、このような音声認識システム１１を用いて、音声を認識する方法について説明する。図４は、音声認識システム１１を用いて音声を認識する場合のパソコン１４の制御部の動作を示すフローチャートである。 Next, a method for recognizing speech using such a speech recognition system 11 will be described. FIG. 4 is a flowchart showing the operation of the control unit of the personal computer 14 when recognizing speech using the speech recognition system 11.

図１〜図４を参照して、まず、オペレータの音声および顧客の音声をオペレータ音声録音部２１および顧客音声録音部２２において、それぞれ別々に録音する（図４において、Ｓ２１）。 1 to 4, first, the operator voice and the customer voice are separately recorded by the operator voice recording unit 21 and the customer voice recording unit 22 (S21 in FIG. 4).

次に、まず、オペレータ音声録音部２１により録音されたオペレータの音声について、第一の音声を認識する第一音声認識ステップとして、第一の音声としてのオペレータの音声の認識を行なう（Ｓ２２）。この場合、予めオペレータの音声について事前に学習されているため、音声認識部２３における認識結果の精度は高いものである。 Next, the operator's voice recorded by the operator voice recording unit 21 is recognized as the first voice, as a first voice recognition step for recognizing the first voice (S22). In this case, since the operator's voice is learned in advance, the accuracy of the recognition result in the voice recognition unit 23 is high.

オペレータの音声の認識が終了した後、認識した結果から出現する単語の優先度を最大値に設定する（Ｓ２３）。オペレータと顧客との会話においては、共通する単語が出現することが多い。したがって、こうすることにより、後に行なう顧客の音声の認識の精度を高めることができる。このステップは、第一音声認識ステップにより認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定ステップとなる。 After the recognition of the operator's voice is completed, the priority of words appearing from the recognition result is set to the maximum value (S23). Common words often appear in conversations between operators and customers. Therefore, by doing this, it is possible to increase the accuracy of customer's voice recognition performed later. This step is a priority setting step for setting a higher priority for words that frequently appear in the first speech recognized in the first speech recognition step.

次に、共起尺度算出部２６により算出した共起尺度に応じて、音声認識辞書の全単語に対して優先度を設定する（Ｓ２４）。この共起尺度は、上記した図３に示すステップにおいて算出されたものである。この場合、共起尺度が高いほど、優先度が高くなるように設定する。具体的には、音声認識辞書の全単語のそれぞれに対して、オペレータの音声の認識結果に出現する全単語との共起尺度を取得し、その平均値を算出する。ある単語Ｘに対して、その平均値をＡｖｒ、優先度に設定可能な最大値をＰｍａｘ、最小値をＰｍｉｎとすると、単語Ｘの優先度として、たとえば、（Ｐｍａｘ−Ｐｍｉｎ）×Ａｖｒ＋Ｐｍｉｎを用いる。 Next, priority is set for all words in the speech recognition dictionary according to the co-occurrence scale calculated by the co-occurrence scale calculation unit 26 (S24). This co-occurrence scale is calculated in the step shown in FIG. In this case, the higher the co-occurrence scale, the higher the priority. Specifically, a co-occurrence scale with all words appearing in the operator's speech recognition result is obtained for each word in the speech recognition dictionary, and an average value thereof is calculated. For a certain word X, assuming that the average value is Avr, the maximum value that can be set as the priority is Pmax, and the minimum value is Pmin, for example, (Pmax−Pmin) × Avr + Pmin is used as the priority of the word X.

その後、顧客音声録音部２２により録音された顧客の音声について、第二の音声としての顧客の音声の認識を行なう（Ｓ２５）。このステップは、優先度設定ステップにより設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識ステップとなる。次に、オペレータの音声および顧客の音声についての認識結果の出力を行なう（Ｓ２６）。 Thereafter, the customer's voice recorded by the customer voice recording unit 22 is recognized as the second voice (S25). This step is a second speech recognition step for recognizing the second speech based on the word priority set in the priority setting step. Next, the recognition results for the operator's voice and the customer's voice are output (S26).

すなわち、この発明に係る音声認識方法は、事前に第一の音声の音声認識のためのデータを入手するデータ入手ステップと、データ入手ステップにより入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習ステップと、学習ステップの後に、第一の音声を認識する第一音声認識ステップと、第一音声認識ステップにより認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定ステップと、優先度設定ステップにより設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識ステップとを備える。また、この発明に係る音声認識方法は、第一音声認識ステップの前に、所定の２つの単語間の共起尺度を予め算出する共起尺度算出ステップを備える。 That is, the speech recognition method according to the present invention includes a data acquisition step for obtaining data for speech recognition of the first speech in advance, and a first speech recognition using the data obtained in the data acquisition step. A learning step for performing learning for improving accuracy, a first voice recognition step for recognizing the first voice after the learning step, and a high frequency of appearing in the first voice recognized by the first voice recognition step A priority setting step for setting a high word priority, and a second voice recognition step for recognizing the second voice based on the word priority set in the priority setting step. The speech recognition method according to the present invention further includes a co-occurrence scale calculating step for calculating in advance a co-occurrence scale between two predetermined words before the first speech recognition step.

このように構成することにより、認識精度の高い第一の音声としてのオペレータの音声の認識結果を利用して、第二の音声としての顧客の音声を認識することができる。したがって、顧客の音声の認識結果の精度を高めることができ、総じて、オペレータの音声および顧客の音声の音声認識の精度を高めることができる。 With this configuration, it is possible to recognize the customer's voice as the second voice by using the recognition result of the operator's voice as the first voice with high recognition accuracy. Therefore, the accuracy of the recognition result of the customer's voice can be improved, and the accuracy of the voice recognition of the operator's voice and the customer's voice can be improved as a whole.

この場合、共起尺度に応じて、単語の優先度を高めて音声認識を行なっているため、より適切に、すなわち、より精度よく音声認識を行なうことができる。 In this case, since speech recognition is performed by increasing the priority of words according to the co-occurrence scale, speech recognition can be performed more appropriately, that is, with higher accuracy.

なお、上記の実施の形態においては、オペレータの音声または顧客の音声全体に対して一括して音声認識を行なうこととしたが、これに限らず、オペレータの音声または顧客の音声を所定のタイミング、例えば、無音部分で分割し、分割された音声の各々について、音声認識を行なうことにしてもよい。 In the above embodiment, voice recognition is performed collectively for the operator's voice or the customer's voice. However, the present invention is not limited to this. For example, it is possible to divide by silence and perform speech recognition for each of the divided sounds.

また、上記の実施の形態においては、オペレータの音声および顧客の音声の両方の録音が終了してから、オペレータの音声の認識を行い、その後、顧客の音声の認識を行なうこととしたが、これに限らず、通話中に音声を随時録音しながらリアルタイムあるいはほぼリアルタイムに近い準リアルタイムで上記した音声認識を行なうことにしてもよい。 In the above embodiment, the operator's voice is recognized after the recording of both the operator's voice and the customer's voice, and then the customer's voice is recognized. However, the above-described voice recognition may be performed in real time or near real time while recording voice at any time during a call.

なお、上記の実施の形態においては、サーバーが音声認識辞書部および業務記録格納部を格納することとしたが、これに限らず、パソコンのハードディスクに音声認識辞書部および業務記録格納部を格納することにしてもよい。 In the above embodiment, the server stores the voice recognition dictionary unit and the business record storage unit. However, the present invention is not limited to this, and the voice recognition dictionary unit and the business record storage unit are stored in the hard disk of the personal computer. You may decide.

また、音声認識装置としてのパソコンは、事前に第一の音声の音声認識のためのデータを入手するデータ入手部と、データ入手部により入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習部と、学習部の後に、第一の音声を認識する第一音声認識部と、第一音声認識部により認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定部と、優先度設定部により設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識部とを備えるよう構成してもよい。 In addition, the personal computer as the voice recognition device uses a data acquisition unit that obtains data for voice recognition of the first voice in advance, and the first voice recognition accuracy using the data obtained by the data acquisition unit. A learning unit that performs learning to improve performance, a first speech recognition unit that recognizes the first speech after the learning unit, and a word that frequently appears in the first speech recognized by the first speech recognition unit A priority setting unit that sets a higher priority of the second voice recognition unit, and a second voice recognition unit that recognizes the second voice based on the priority of the word set by the priority setting unit. .

また、ネットワークに他のオペレータ用の音声認識システムを構築することにしてもよい。この場合、サーバーを兼用することにしてもよい。 Further, a voice recognition system for other operators may be constructed on the network. In this case, the server may also be used.

なお、音声認識プログラムとして、コンピュータを、事前に第一の音声の音声認識のためのデータを入手するデータ入手手段、データ入手手段により入手されたデータを用いて、第一の音声の認識精度を向上させるための学習を行なう学習手段、学習手段の後に、第一の音声を認識する第一音声認識手段、第一音声認識手段により認識した第一の音声に出現する頻度の高い単語の優先度を高く設定する優先度設定手段、および優先度設定手段により設定された単語の優先度に基づいて、第二の音声を認識する第二音声認識手段として機能させるための音声認識プログラムを用いることとしてもよい。 As the voice recognition program, the computer uses the data acquisition means for obtaining data for voice recognition of the first voice in advance, and the data obtained by the data acquisition means to improve the first voice recognition accuracy. Learning means for performing learning for improvement, first voice recognition means for recognizing the first voice after the learning means, and priority of words frequently appearing in the first voice recognized by the first voice recognition means Priority setting means for setting a higher value, and using a voice recognition program for functioning as second voice recognition means for recognizing the second voice based on the word priority set by the priority setting means Also good.

また、上記した音声認識プログラムを記録したコンピュータ読取可能な記録媒体を用いることにしてもよい。 Further, a computer-readable recording medium on which the voice recognition program described above is recorded may be used.

また、この実施形態ではコールセンターにおけるオペレータ音声と顧客音声の音声認識について説明したが、これに限定されず、電話による検診・保健指導、電話によるコンサルティング、テレフォンショッピング、電話によるアンケート調査や世論調査など、種々のアプリケーションに適用可能である。 Further, in this embodiment, the voice recognition of the operator voice and the customer voice in the call center has been described, but the present invention is not limited to this, such as telephone examination / health guidance, telephone consulting, telephone shopping, telephone questionnaire survey and public opinion survey, etc. It can be applied to various applications.

以上、図面を参照してこの発明の実施形態を説明したが、この発明は、図示した実施形態のものに限定されない。図示した実施形態に対して、この発明と同一の範囲内において、あるいは均等の範囲内において、種々の修正や変形を加えることが可能である。 As mentioned above, although embodiment of this invention was described with reference to drawings, this invention is not limited to the thing of embodiment shown in figure. Various modifications and variations can be made to the illustrated embodiment within the same range or equivalent range as the present invention.

この発明に係る音声認識方法、音声認識システム、音声認識装置、音声認識プログラムおよび記録媒体は、コールセンターのようなオペレータの音声と顧客の音声とを認識する必要がある場合に、有効に利用される。 The voice recognition method, voice recognition system, voice recognition apparatus, voice recognition program, and recording medium according to the present invention are effectively used when it is necessary to recognize the voice of an operator and the voice of a customer such as a call center. .

この発明の一実施形態に係る音声認識システムのハードウェア構成を示すシステム構成図である。1 is a system configuration diagram showing a hardware configuration of a speech recognition system according to an embodiment of the present invention. この発明の一実施形態に係る音声認識システムのブロック図である。1 is a block diagram of a voice recognition system according to an embodiment of the present invention. 共起尺度を算出する場合の動作を示すフローチャートである。It is a flowchart which shows the operation | movement in the case of calculating a co-occurrence scale. オペレータの音声および顧客の音声を認識する際の動作を示すフローチャートである。It is a flowchart which shows the operation | movement at the time of recognizing an operator's voice and a customer's voice.

Explanation of symbols

１１音声認識システム、１２電話回線、１３電話機、１４パソコン、１５サーバー、１６回線、１７スピーカ、１８マイク、１９ディスプレイ、２１オペレータ音声録音部、２２顧客音声録音部、２３音声認識部、２４音声認識結果出力部、２５単語優先度設定部、２６共起尺度算出部、２７音声認識辞書部、２８業務記録格納部。 11 voice recognition system, 12 telephone line, 13 telephone, 14 personal computer, 15 server, 16 line, 17 speaker, 18 microphone, 19 display, 21 operator voice recording part, 22 customer voice recording part, 23 voice recognition part, 24 voice recognition Result output unit, 25 word priority setting unit, 26 co-occurrence scale calculation unit, 27 speech recognition dictionary unit, 28 business record storage unit.

Claims

A voice recognition method for recognizing a first voice and a second voice different from the first voice,
A data acquisition step for acquiring data for speech recognition of the first voice in advance;
A learning step for performing learning for improving the recognition accuracy of the first voice, using the data obtained in the data obtaining step;
A first speech recognition step for recognizing the first speech after the learning step;
A priority setting step for setting a high priority for words that frequently appear in the first voice recognized in the first voice recognition step;
A speech recognition method comprising: a second speech recognition step for recognizing the second speech based on the priority of the word set in the priority setting step.

The priority setting step includes a step of setting a word priority based on a co-occurrence scale between a word appearing in the first voice recognized by the first voice recognition step and a predetermined word. 2. The speech recognition method according to 1.

The speech recognition method according to claim 2, wherein the priority setting step sets the word priority to be higher as the co-occurrence scale is higher.

The speech recognition method according to claim 2 or 3, further comprising a co-occurrence scale calculation step for calculating in advance a co-occurrence scale between two predetermined words before the first speech recognition step.

The first voice is an operator voice,
The voice recognition method according to claim 1, wherein the second voice is a customer voice.

A voice recognition system for recognizing a first voice and a second voice different from the first voice,
Data acquisition means for acquiring data for speech recognition of the first voice in advance;
Learning means for performing learning for improving the recognition accuracy of the first voice, using the data obtained by the data obtaining means;
A first voice recognition means for recognizing the first voice after the learning means;
Priority setting means for setting a high priority for words that frequently appear in the first voice recognized by the first voice recognition means;
A voice recognition system comprising: second voice recognition means for recognizing the second voice based on the priority of the word set by the priority setting means.

A voice recognition device for recognizing a first voice and a second voice different from the first voice,
A data acquisition unit for acquiring data for speech recognition of the first voice in advance;
A learning unit that performs learning for improving the recognition accuracy of the first voice, using the data obtained by the data obtaining unit;
A first voice recognition unit for recognizing the first voice after the learning unit;
A priority setting unit that sets a high priority for words that frequently appear in the first speech recognized by the first speech recognition unit;
A speech recognition apparatus comprising: a second speech recognition unit that recognizes the second speech based on the word priority set by the priority setting unit.

Computer
Data acquisition means for acquiring data for voice recognition of the first voice in advance;
Learning means for performing learning for improving the recognition accuracy of the first voice, using the data obtained by the data obtaining means;
A first voice recognition means for recognizing the first voice after the learning means;
Priority setting means for setting a high priority for words that frequently appear in the first voice recognized by the first voice recognition means;
And a speech recognition program for causing the second speech recognition means to recognize the second speech based on the priority of the word set by the priority setting means.

A computer-readable recording medium on which the voice recognition program according to claim 8 is recorded.