JP2014178381A

JP2014178381A - Voice recognition device, voice recognition system and voice recognition method

Info

Publication number: JP2014178381A
Application number: JP2013051031A
Authority: JP
Inventors: Ryoji Torii; 陵二鳥居
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2013-03-13
Filing date: 2013-03-13
Publication date: 2014-09-25

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition device, a voice recognition system and a voice recognition method capable of efficiently improving a voice recognition rate.SOLUTION: The voice recognition device includes a dictionary management information storage part, a dictionary storage part, a voice storage part, a call information storage part, a dictionary selection part and a voice recognition part. The dictionary storage part stores a voice recognition dictionary per operator and a voice recognition dictionary per job type. The voice storage part stores voice of each operator who has performed telephone correspondence to a user. The call information storage part stores call information associating identification information of each operator who performed telephone correspondence to a user with identification information of the type of a job that he/she is in charge of. The dictionary selection part selects a dictionary to be used for voice recognition by referring to the dictionary management information storage part on the basis of the call information. The voice recognition part converts voice of an operator read from the voice storage part into text data by referring to the selected dictionary of the dictionary storage part.

Description

本発明の実施形態は、音声認識装置、音声認識システムおよび音声認識方法に関する。 Embodiments described herein relate generally to a speech recognition apparatus, a speech recognition system, and a speech recognition method.

昨今、製造業、販売業、金融業など、さまざまな事業形態があるが、これらの事業を行う上で、顧客サービスの一環として、顧客とオペレータが電話応対を行うコールセンターが設置されているケースが多い。 In recent years, there are various business forms such as manufacturing, sales, and finance, but there are cases where a call center is set up where customers and operators answer the phone as part of customer service. Many.

コールセンターでは、顧客であるお客様へのオペレータの対応品質の確認や顧客の声の活用のために、お客様とオペレータとの通話内容を録音し、録音した音声データをテキストに変換する音声認識装置が用いられている。 In the call center, a voice recognition device is used to record the call contents between the customer and the operator and convert the recorded voice data into text in order to check the quality of the operator's response to the customer and to use the voice of the customer. It has been.

お客様へのオペレータの対応品質を確かめるためには、録音した通話内容を音声認識装置に音声認識させてテキストに変換した文字列を監視用端末のモニタなどに表示させて、スーパーバイザーなどの管理者が不適切な用語などのキーワードを検索したり、お客様のニーズ把握に必要なワードを検索したりするケースがある。 In order to confirm the quality of the operator's response to the customer, supervisors and other managers can display the character strings that have been recorded on the voice recognition device and converted to text on the monitor of the monitoring terminal. May search for keywords such as inappropriate terms, or search for words that are necessary to understand customer needs.

ところで、音声認識装置には、音声認識のための辞書が使用されているが、通常、このような辞書は、汎用の辞書である。 Incidentally, a dictionary for speech recognition is used in the speech recognition apparatus. Usually, such a dictionary is a general-purpose dictionary.

コールセンターでは、顧客対応にあたるオペレータは、一人とは限らず複数の人が交代で対応していることが多く、それぞれのオペレータは発話方法が異なる。また、コールセンターでは、顧客からの問い合わせの内容によって業務を分けて、業務毎に専門のオペレータが対応するケースも多い。 In a call center, the number of operators who deal with customers is not limited to one, but a plurality of people often take turns, and each operator has a different utterance method. Also, call centers often divide their work according to the content of inquiries from customers, and there are many cases where specialized operators respond to each work.

特開２０１１−１４１３４９号公報JP 2011-141349 A

しかしながら、従来の音声認識装置の場合、汎用の辞書を使用していることから、オペレータの発話方法の違いや業務で使う専門用語の誤認識などの影響から音声認識率がよくないという問題があった。 However, in the case of conventional speech recognition devices, since a general-purpose dictionary is used, there is a problem that the speech recognition rate is not good due to the influence of the operator's utterance method and the misrecognition of technical terms used in business. It was.

本発明が解決しようとする課題は、音声認識率を効率よく向上することができる音声認識装置、音声認識システムおよび音声認識方法を提供することにある。 The problem to be solved by the present invention is to provide a voice recognition device, a voice recognition system, and a voice recognition method capable of improving the voice recognition rate efficiently.

実施形態の音声認識装置は、辞書管理情報記憶部、辞書記憶部、音声格納部、呼情報生成部、辞書選定部、音声認識部を備える。前記辞書管理情報記憶部には、ユーザに電話対応するオペレータの識別情報と、ユーザの問い合わせ先を示す業務種別の識別情報と、音声認識に使用する辞書の識別情報とを対応付けて記憶されている。前記辞書記憶部にはオペレータ毎の音声認識用の辞書および業務種別毎の音声認識用の辞書が記憶されている。前記音声格納部にはユーザに電話対応したオペレータの音声が格納されている。前記呼情報生成部はユーザに電話対応したオペレータの識別情報と、オペレータが担当する業務種別の識別情報とを対応付けた呼情報を記憶している。前記辞書選定部は呼情報を基に前記辞書管理情報記憶部を参照して音声認識に使用すべき辞書を選定する。前記音声認識部は、前記辞書選定部により選定された前記辞書記憶部の辞書を参照して、前記音声格納部から読み出したオペレータの音声をテキストデータに変換する。 The speech recognition apparatus according to the embodiment includes a dictionary management information storage unit, a dictionary storage unit, a speech storage unit, a call information generation unit, a dictionary selection unit, and a speech recognition unit. In the dictionary management information storage unit, operator identification information corresponding to a telephone call to a user, business type identification information indicating a user's inquiry destination, and dictionary identification information used for voice recognition are stored in association with each other. Yes. The dictionary storage unit stores a speech recognition dictionary for each operator and a speech recognition dictionary for each business type. The voice storage unit stores an operator voice corresponding to the user. The call information generation unit stores call information in which the identification information of the operator who corresponds to the user by telephone and the identification information of the business type handled by the operator are associated with each other. The dictionary selection unit selects a dictionary to be used for speech recognition with reference to the dictionary management information storage unit based on call information. The voice recognition unit refers to the dictionary in the dictionary storage unit selected by the dictionary selection unit, and converts the operator's voice read from the voice storage unit into text data.

実施形態の音声認識システムの概要構成を示す図である。It is a figure showing the outline composition of the voice recognition system of an embodiment. コールセンター側の装置構成を示すブロック図である。It is a block diagram which shows the apparatus structure by the side of a call center. 辞書管理ＤＢの１例を示す図である。It is a figure which shows one example of dictionary management DB. 音声認識システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of a speech recognition system.

以下、図面を参照して実施形態を詳細に説明する。
（実施形態）図１は実施形態の音声認識システムの構成を示す図である。 Hereinafter, embodiments will be described in detail with reference to the drawings.
(Embodiment) FIG. 1 is a diagram showing the configuration of a speech recognition system according to an embodiment.

図１および図２に示すように、この実施形態の音声認識システムは、ユーザの電話機１、公衆交換電話網２：Public Switched Telephone Networks２（以下「ＰＳＴＮ２」、ＩＰ−ＰＢＸ／ＰＢＸなどの回線交換装置３、オペレータ用電話機４およびオペレータ用の操作端末としてのコンピュータ５（以下「オペレータ用ＰＣ５」と称す）、制御装置６、Computer Telephony Integration装置７（以下「ＣＴＩ装置７」と称す）、監視用端末としての監視ＰＣ８、通話録音装置９、音声認識装置１０を備える。 As shown in FIG. 1 and FIG. 2, the voice recognition system of this embodiment includes a user's telephone 1, a public switched telephone network 2: Public switched telephone networks 2 (hereinafter referred to as “PSTN 2”, IP-PBX / PBX, etc.). 3. Operator telephone 4 and computer 5 as an operator operation terminal (hereinafter referred to as “operator PC 5”), control device 6, Computer Telephony Integration device 7 (hereinafter referred to as “CTI device 7”), monitoring terminal Monitoring PC 8, call recording device 9, and speech recognition device 10.

回線交換装置３は、コールセンターにおいて、ＰＳＴＮ２からの呼の着信制御および発信制御を行う。オペレータ用電話機４は、顧客（お客様またはユーザなどともいう）からの電話をオペレータが受けて電話対応業務を行うための電話機である。 The circuit switching device 3 performs incoming call control and outgoing call control from the PSTN 2 at the call center. The operator telephone 4 is a telephone for an operator to receive a telephone call from a customer (also referred to as a customer or a user) and perform a telephone service.

オペレータ用ＰＣ５は、制御装置６からのユーザの情報やオペレータがキーボードから入力した情報を画面に表示して、オペレータが電話対応業務を行うための情報端末である。 The operator PC 5 is an information terminal for displaying information on the user from the control device 6 and information input from the keyboard by the operator on the screen so that the operator can perform a telephone service.

制御装置６は、ＣＴＩ装置７、通話録音装置９、音声認識装置１０などを制御し、情報の割り当て、各装置への指示などを行う。制御装置６は、ユーザの電話に対する自動応答機能を備え、ユーザの電話操作を促し、電話操作に応じて問い合わせ先（業務）を特定し、特定した業務のオペレータ用電話機４にユーザの電話を接続させる。 The control device 6 controls the CTI device 7, the call recording device 9, the voice recognition device 10, and the like, assigns information, gives instructions to each device, and the like. The control device 6 has an automatic answering function for the user's telephone, prompts the user's telephone operation, specifies an inquiry destination (business) according to the telephone operation, and connects the user's telephone to the operator telephone 4 for the specified business. Let

自動応答機能は、予め設定された応答メッセージとこの応答メッセージに紐付けられた電話機のボタンの番号（トーン）に応じた業務種別の識別情報（業務ＩＤ）と、この業務ＩＤの業務に電話対応するオペレータの識別情報（オペレータＩＤ）および電話機の識別情報などが設定されており、ユーザが行った電話機のボタン操作（トーン発信操作）で業務ＩＤを特定し、ユーザの着呼を、現在空いているオペレータ用電話機４へ接続する機能である。 The automatic response function supports identification of business type (business ID) according to a preset response message and the number (tone) of the telephone button associated with the response message, and supports the business of this business ID Operator identification information (operator ID), telephone identification information, and the like are set, the business ID is specified by the telephone button operation (tone transmission operation) performed by the user, and the user's incoming call is currently free This is a function for connecting to the operator telephone 4.

ＣＴＩ装置７は、回線交換装置３を通じてオペレータが電話対応した情報（ユーザの電話番号、業務ＩＤなど）を制御装置６から受け取り、呼情報（オペレータＩＤと業務ＩＤとの対）を生成し（図２参照）、格納ＤＢ１２に格納する。 The CTI device 7 receives from the control device 6 information (such as the user's telephone number and business ID) that the operator supports via the line switching device 3, and generates call information (a pair of operator ID and business ID) (see FIG. 2), and stored in the storage DB 12.

すなわち、ＣＴＩ装置７は、ユーザに電話対応したオペレータＩＤと、オペレータが担当する業務種別の業務ＩＤとを対応付けた呼情報を生成する呼情報生成部である。 In other words, the CTI device 7 is a call information generating unit that generates call information in which an operator ID corresponding to a telephone call to a user is associated with a business ID of a business type handled by the operator.

通話録音装置９は、ユーザとオペレータとの会話（通話内容）をそれぞれ別の音声ファイルとして着呼ＩＤ（図示せず）などで紐付けて（対応付けて）録音（記憶）する。すなわち通話録音装置９は、ユーザに電話対応したオペレータの音声を録音する。 The call recording device 9 records (stores) a conversation (call contents) between the user and the operator as a separate audio file by associating (corresponding) with an incoming call ID (not shown). That is, the call recording device 9 records the operator's voice corresponding to the user.

監視ＰＣ８は、スーパーバイザーなどの管理者が使用する端末である。監視ＰＣ８は、通話録音装置９が録音した通話内容を音声認識装置１０に音声認識させてテキストに変換した文字列をモニタに表示させて、管理者はオペレータが発言する言葉として不適切な用語であるＮＧワードなどのキーワードを検索したり、お客様のニーズ把握に必要なワードを検索したりする。 The monitoring PC 8 is a terminal used by an administrator such as a supervisor. The monitoring PC 8 causes the voice recognition device 10 to recognize the content of the call recorded by the call recording device 9 and display the character string converted into text on the monitor, and the administrator uses words that are inappropriate as words spoken by the operator. Search for a keyword such as a certain NG word, or search for a word that is necessary to understand the customer's needs.

通話録音装置９は、電話対応業務中の通話内容をオペレータの音声ファイルと顧客の音声ファイルに分けて録音、つまり音声ファイルを音声認識装置１０の格納ＤＢ１２に格納する。 The call recording device 9 divides the contents of the call during the telephone correspondence business into an operator's voice file and a customer's voice file, that is, stores the voice file in the storage DB 12 of the voice recognition device 10.

図２に示すように、音声認識装置１０は、辞書管理テータベース１１（以下データベースをＤＢと称す）、格納ＤＢ１２、辞書選定部１３、オペレータ毎の複数の辞書１４と業務毎の複数の辞書１５が記憶（格納）された辞書記憶部１６、音声認識エンジン１７、音声認識結果ＤＢ１８を備える。 As shown in FIG. 2, the speech recognition apparatus 10 includes a dictionary management data base 11 (hereinafter referred to as a database), a storage DB 12, a dictionary selection unit 13, a plurality of dictionaries 14 for each operator, and a plurality of dictionaries 15 for each business. Are stored (stored), a dictionary storage unit 16, a speech recognition engine 17, and a speech recognition result DB 18.

辞書管理ＤＢ１１には、図３に示すように、オペレータＩＤと、業務ＩＤと、使用する辞書の識別子である辞書ＩＤとが対応付けられ記憶されている。辞書ＩＤは、オペレータ用の辞書ＩＤと業務用の辞書ＩＤがあり、対応付けられているのは２つの場合も１つの場合もある。 As shown in FIG. 3, the dictionary management DB 11 stores an operator ID, a business ID, and a dictionary ID that is an identifier of a dictionary to be used in association with each other. The dictionary ID includes an operator dictionary ID and a business dictionary ID, and there may be two cases or one case.

すなわち、辞書管理ＤＢ１１は、ユーザに電話対応するオペレータＩＤ（オペレータの識別情報）と、業務ＩＤ（ユーザの問い合わせ先を示す業務種別の識別情報）と、辞書ＩＤ（音声認識に使用する辞書の識別情報）とを対応付けて記憶した辞書管理情報記憶部である。 That is, the dictionary management DB 11 has an operator ID (operator identification information) corresponding to a telephone call to a user, a business ID (business type identification information indicating a user inquiry destination), and a dictionary ID (identification of a dictionary used for voice recognition). Information management) is stored in association with the dictionary management information storage unit.

格納ＤＢ１２には、通話録音装置９により録音された通話内容（オペレータの音声ファイルと顧客の音声ファイル）が着呼ＩＤ（図示せず）で紐付けられてそれぞれ別の音声データとして格納（記憶）される。またこの格納ＤＢ１２には、ユーザに電話対応したオペレータＩＤと、オペレータが担当する業務種別の業務ＩＤとを対応付けた呼情報が記憶される。 In the storage DB 12, the call contents (operator's voice file and customer's voice file) recorded by the call recording device 9 are associated with an incoming call ID (not shown) and stored (stored) as separate voice data. Is done. Further, the storage DB 12 stores call information in which an operator ID corresponding to a telephone call to a user is associated with a business ID of a business type handled by the operator.

辞書選定部１３は、ＣＴＩ装置７により生成された格納ＤＢ１２の呼情報（オペレータＩＤと業務ＩＤとの対）を基に辞書管理ＤＢ１１を参照して、音声認識に使用すべき辞書ＩＤを選定し、選定した辞書ＩＤを音声認識エンジン１７へ出力（通知）する。 The dictionary selection unit 13 refers to the dictionary management DB 11 based on the call information (a pair of operator ID and business ID) in the storage DB 12 generated by the CTI device 7, and selects a dictionary ID to be used for voice recognition. The selected dictionary ID is output (notified) to the speech recognition engine 17.

辞書記憶部１６には、オペレータ毎の音声認識用の辞書１４、業務毎の音声認識用の辞書１５および汎用辞書（図示せず）が記憶されている。辞書１４は、各オペレータの発音や話し方（話法）に合わせてテキストデータと音声データとを紐付けて登録した専用辞書である。辞書１５は、業務種別毎の専門用語のテキストデータと音声データとを紐付けて登録した辞書である。 The dictionary storage unit 16 stores a voice recognition dictionary 14 for each operator, a voice recognition dictionary 15 for each business, and a general-purpose dictionary (not shown). The dictionary 14 is a dedicated dictionary in which text data and voice data are registered in association with each operator's pronunciation and speaking method (speaking method). The dictionary 15 is a dictionary in which text data and audio data of technical terms for each business type are linked and registered.

音声認識エンジン１７は、音声認識対象の音声ファイル、例えばオペレータの音声ファイルなどを格納ＤＢ１２から読み出して、辞書選定部１３から通知（入力）された辞書ＩＤを用いて辞書記憶部１６の辞書１４，１５を参照して格納ＤＢ１２から読み出したオペレータの音声ファイルの音声認識処理を行い、その音声認識の結果を音声認識結果ＤＢ１８に記憶する。 The speech recognition engine 17 reads out a speech file to be speech-recognized, for example, an operator's speech file from the storage DB 12, and uses the dictionary ID in the dictionary storage unit 16 using the dictionary ID notified (input) from the dictionary selection unit 13. 15, the voice recognition processing of the operator's voice file read from the storage DB 12 is performed, and the voice recognition result is stored in the voice recognition result DB 18.

なお、オペレータとユーザとの通話の状況をリアルタイムに音声認識を行う場合は以下のようにする。すなわち、音声認識エンジン１７は、辞書選定部１３により選定された辞書記憶部１６の辞書１４，１５を参照して、通話録音装置９により録音されたオペレータの音声をテキストデータに変換する。 In the case where voice recognition is performed in real time on the state of a call between an operator and a user, the following is performed. That is, the voice recognition engine 17 refers to the dictionaries 14 and 15 of the dictionary storage unit 16 selected by the dictionary selection unit 13 and converts the operator's voice recorded by the call recording device 9 into text data.

音声認識結果ＤＢ１８には、音声認識エンジン１７の音声認識結果として、テキストデータとこのテキストデータの認識元の音声ファイルとが対応付けられて記憶される。 In the speech recognition result DB 18, text data and a speech file from which the text data is recognized are associated and stored as a speech recognition result of the speech recognition engine 17.

すなわち音声認識エンジン１７は、辞書選定部１３により複数の辞書の中から選定された辞書ＩＤを基に、格納ＤＢ１２に記憶された音声認識対象の音声ファイルを読み出し、通知された辞書ＩＤにマッチする辞書記憶部１６の辞書１４，１５を参照して音声認識処理を行う。 That is, the speech recognition engine 17 reads the speech recognition target speech file stored in the storage DB 12 based on the dictionary ID selected from the plurality of dictionaries by the dictionary selection unit 13 and matches the notified dictionary ID. Speech recognition processing is performed with reference to the dictionaries 14 and 15 in the dictionary storage unit 16.

以下、図４のフローチャートを参照してこの実施形態の音声認識システムの動作を説明する。図４は音声認識システムの動作を示すフローチャートである。
この実施形態の場合、ユーザからコールセンターの電話対応窓口の代表番号への発信があると、ＰＳＴＮ２を通じて回線交換装置３に着呼し、着呼の情報（ユーザの電話番号）が制御装置６に送られる。 The operation of the speech recognition system of this embodiment will be described below with reference to the flowchart of FIG. FIG. 4 is a flowchart showing the operation of the voice recognition system.
In the case of this embodiment, when a call is made from the user to the representative number of the telephone support desk at the call center, the call is received to the circuit switching device 3 through the PSTN 2 and information on the incoming call (user's phone number) is sent to the control device 6. It is done.

制御装置６は、既に登録済みのユーザであるか、どういった問い合わせであるかの応答メッセージを流し、ユーザに対して、センター側の問い合わせ先（業務）を、電話機１のテンキー操作にて選択させる。 The control device 6 sends a response message indicating whether the user is already registered or what kind of inquiry is made, and selects the inquiry destination (business) on the center side to the user by operating the numeric keypad of the telephone 1 Let

ユーザの選択操作により、ユーザが登録ユーザであり、かつ問い合わせ先の業務ＩＤが特定されると、制御装置６は、回線交換装置３に対して該当する業務のオペレータの電話機４にユーザからの電話（着呼）を接続するよう指示する。 When the user is a registered user and the business ID of the inquiry destination is specified by the user's selection operation, the control device 6 calls the telephone 4 of the operator of the business corresponding to the circuit switching device 3 from the user. Instruct to connect (incoming call).

回線交換装置３は、制御装置６から指示されたオペレータ用電話機４にユーザからの電話（着呼）を接続する。 The circuit switching device 3 connects the telephone (incoming call) from the user to the operator telephone 4 instructed by the control device 6.

制御装置６は、オペレータが電話機４で応答した時点から、通話録音装置９に通話録音を指示し、これにより、通話録音装置９は、通話録音を開始し、通話の内容を着呼ＩＤに紐付く音声ファイルを生成し（図４のステップＳ１０１）、格納ＤＢ１２に格納する。 The control device 6 instructs the call recording device 9 to record the call from the time when the operator responds with the telephone 4, whereby the call recording device 9 starts the call recording and links the content of the call to the incoming call ID. An attached audio file is generated (step S101 in FIG. 4) and stored in the storage DB 12.

またＣＴＩ装置７は、制御装置６からのオペレータとユーザとの通話開始の情報（着呼ＩＤ、オペレータＩＤ、業務ＩＤ、ユーザの電話番号など）を受けて呼情報（着呼ＩＤに紐付けられたオペレータＩＤと業務ＩＤの対）を生成し（ステップＳ１０２）、格納ＤＢ１２に格納する。これにより着呼ＩＤによって紐付けられた呼情報（オペレータＩＤ、業務ＩＤ）と音声ファイルがほぼ同時に格納ＤＢ１２に格納される。 The CTI device 7 receives call information (incoming call ID, operator ID, business ID, user telephone number, etc.) from the control device 6 and starts call communication between the operator and the user. A pair of operator ID and business ID) is generated (step S102) and stored in the storage DB 12. As a result, the call information (operator ID, business ID) associated with the incoming call ID and the audio file are stored in the storage DB 12 almost simultaneously.

辞書選定部１３は、格納ＤＢ１２に格納された呼情報（オペレータＩＤ、業務ＩＤ）を読み出し、その呼情報（オペレータＩＤ、業務ＩＤ）をキーにして辞書管理ＤＢ１６を参照して呼情報（オペレータＩＤ、業務ＩＤ）にマッチする音声認識用の辞書ＩＤを選定（選出）し（ステップＳ１０３）、音声認識エンジン１７へ出力（通知）する。 The dictionary selection unit 13 reads the call information (operator ID, business ID) stored in the storage DB 12, and uses the call information (operator ID, business ID) as a key to refer to the dictionary management DB 16 to call information (operator ID). , A dictionary ID for speech recognition that matches the business ID) is selected (selected) (step S103), and is output (notified) to the speech recognition engine 17.

なお、呼情報（オペレータＩＤ、業務ＩＤ）にマッチする辞書ＩＤが辞書管理ＤＢ１１に存在しない場合、辞書選定部１３は、音声認識に使用する辞書を汎用辞書として音声認識エンジン１７へ通知する。呼情報（オペレータＩＤ、業務ＩＤ）にマッチとは、オペレータＩＤ、業務ＩＤのいずれか一方でもよい。 If there is no dictionary ID matching the call information (operator ID, business ID) in the dictionary management DB 11, the dictionary selection unit 13 notifies the speech recognition engine 17 of the dictionary used for speech recognition as a general-purpose dictionary. Matching with call information (operator ID, business ID) may be either operator ID or business ID.

音声認識エンジン１７は、通知された辞書ＩＤをキーにして辞書記憶部１６を検索し、該当する辞書を参照して、音声認識対象の音声ファイルを音声認識する（ステップＳ１０４）。 The speech recognition engine 17 searches the dictionary storage unit 16 using the notified dictionary ID as a key, refers to the corresponding dictionary, and recognizes the speech file to be speech recognized (step S104).

音声認識装置１０では、音声認識エンジン１７が認識処理を行った結果のテキストファイルを音声認識結果ＤＢ１８へ出力し（ステップＳ１０５）、テキストファイルと音声ファイルとを対応付けて音声認識結果ＤＢに格納する。 In the speech recognition apparatus 10, the text file resulting from the recognition processing performed by the speech recognition engine 17 is output to the speech recognition result DB 18 (step S105), and the text file and the speech file are associated with each other and stored in the speech recognition result DB. .

監視ＰＣは、ユーザとオペレータの会話の内容をリアルタイムで出力したり、また音声認識結果ＤＢ１８にアクセスして、所望のデータを読み出して表示および／または音声を再生する。 The monitoring PC outputs the contents of the conversation between the user and the operator in real time, or accesses the voice recognition result DB 18 to read out desired data and display and / or reproduce the voice.

このようにこの第１の実施形態によれば、オペレータ毎／業務毎に音声認識用の辞書１４，１５を複数設け、ＣＴＩ装置７が生成する呼情報（オペレータＩＤや業務ＩＤ）を基に使用する辞書を複数の辞書１４，１５の中から選定し、選定した辞書を参照して音声認識を行うので、オペレータ固有の声の癖や抑揚などの発話方法の違いを正しく認識してオペレータの音声の認識率を向上し、また業務毎の専門用語の辞書１５を用いることで、業務で使う用語の認識率を向上することができ、この結果、音声認識システム全体としての音声認識率を効率よく向上することができる。 As described above, according to the first embodiment, a plurality of voice recognition dictionaries 14 and 15 are provided for each operator / work, and used based on call information (operator ID and work ID) generated by the CTI device 7. The dictionary to be selected is selected from a plurality of dictionaries 14 and 15, and voice recognition is performed with reference to the selected dictionary. Therefore, the operator's voice can be recognized by correctly recognizing differences in the utterance methods such as voice peculiar to the operator and intonation. The recognition rate of terms used in business can be improved by using the technical term dictionary 15 for each business. As a result, the speech recognition system as a whole can be efficiently improved. Can be improved.

本発明の実施形態を説明したが、この実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。この新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。この実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although the embodiment of the present invention has been described, this embodiment is presented as an example and is not intended to limit the scope of the invention. The novel embodiment can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. This embodiment and its modifications are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

上記実施形態に示した各構成要素を、コンピュータのハードディスク装置などのストレージにインストールしたプログラムで実現してもよく、また上記プログラムを、コンピュータ読取可能な電子媒体：electronic mediaに記憶しておき、プログラムを電子媒体からコンピュータに読み取らせることで本発明の機能をコンピュータが実現するようにしてもよい。電子媒体としては、例えばＣＤ−ＲＯＭ等の記録媒体やフラッシュメモリ、リムーバブルメディア：Removable media等が含まれる。さらに、ネットワークを介して接続した異なるコンピュータに構成要素を分散して記憶し、各構成要素を機能させたコンピュータ間で通信することで実現してもよい。 Each component shown in the above embodiment may be realized by a program installed in a storage such as a hard disk device of a computer, and the program is stored in a computer-readable electronic medium: program. The computer may realize the functions of the present invention by causing the computer to read from the electronic medium. Examples of the electronic medium include a recording medium such as a CD-ROM, flash memory, and removable media. Further, the configuration may be realized by distributing and storing components in different computers connected via a network, and communicating between computers in which the components are functioning.

１…電話機、２…公衆交換電話網（ＰＳＴＮ）、３…回線交換装置（ＩＰ−ＰＢＸ／ＰＢＸ）、４…オペレータ用電話機、５…コンピュータ（オペレータ用ＰＣ）、６…制御装置、９…通話録音装置、１０…音声認識装置、１１…辞書管理テータベース（辞書管理ＤＢ）、１３…辞書選定部、１４…オペレータ毎の辞書、１５…業務毎の辞書、１６…辞書記憶部、１７…音声認識エンジン。 DESCRIPTION OF SYMBOLS 1 ... Telephone, 2 ... Public switched telephone network (PSTN), 3 ... Circuit switching apparatus (IP-PBX / PBX), 4 ... Operator telephone, 5 ... Computer (operator PC), 6 ... Control apparatus, 9 ... Call Recording device, 10 ... Voice recognition device, 11 ... Dictionary management database (dictionary management DB), 13 ... Dictionary selection unit, 14 ... Dictionary for each operator, 15 ... Dictionary for each operation, 16 ... Dictionary storage unit, 17 ... Voice Recognition engine.

Claims

A dictionary management information storage unit that stores the identification information of the operator corresponding to the user, the identification information of the business type indicating the inquiry destination of the user, and the identification information of the dictionary used for voice recognition;
A dictionary storage unit in which a dictionary for voice recognition for each operator and a dictionary for voice recognition for each business type are stored;
A voice storage unit that stores the voice of the operator corresponding to the telephone;
A call information storage unit that stores call information in which identification information of an operator corresponding to a telephone call is associated with identification information of a business type handled by the operator;
A dictionary selection unit that selects a dictionary to be used for speech recognition with reference to the dictionary management information storage unit based on the call information;
A voice recognition device comprising: a voice recognition unit that converts the operator's voice read from the voice storage unit into text data with reference to the dictionary in the dictionary storage unit selected by the dictionary selection unit.

The dictionary selection unit
Select dictionary identification information to be used for speech recognition with reference to the dictionary management information storage unit based on the call information,
The voice recognition unit
The voice recognition according to claim 1, wherein the voice of the operator recorded by the call recording device is converted into text data with reference to the dictionary of the dictionary storage unit that matches the identification information of the dictionary selected by the dictionary selection unit. apparatus.

2. The speech recognition apparatus according to claim 1, wherein the dictionary for speech recognition for each operator is a dictionary in which text data and speech data are registered in association with each operator's pronunciation and speaking style.

The speech recognition apparatus according to claim 1, wherein the speech recognition dictionary for each business type is a dictionary in which text data and speech data of technical terms for each business type are linked and registered.

In addition to a dictionary for voice recognition for each operator and a dictionary for voice recognition for each business type, the dictionary storage unit includes a general-purpose dictionary,
The speech recognition apparatus according to claim 1, wherein the dictionary selection unit selects the general-purpose dictionary as a dictionary used for speech recognition when there is no dictionary that matches the call information in the dictionary storage unit.

A dictionary management information storage unit that stores the identification information of the operator corresponding to the user, the identification information of the business type indicating the inquiry destination of the user, and the identification information of the dictionary used for voice recognition;
A dictionary storage unit in which a dictionary for voice recognition for each operator and a dictionary for voice recognition for each business type are stored;
A call recording device for recording the voice of an operator who corresponds to the user by telephone;
A call information generation unit that generates call information in which identification information of an operator corresponding to a telephone call is associated with identification information of a business type handled by the operator;
A dictionary selection unit that selects a dictionary to be used for speech recognition with reference to the dictionary management information storage unit based on the call information;
A voice recognition system comprising: a voice recognition unit that converts a voice of an operator recorded by the call recording device into text data with reference to the dictionary of the dictionary storage unit selected by the dictionary selection unit.

An identifier of an operator corresponding to the user, a business type identification information indicating a user's inquiry destination, and a dictionary used for voice recognition are stored in association with each other,
Stores a dictionary for voice recognition for each operator and a dictionary for voice recognition for each business type,
Store or record the voice of the operator who is on the phone,
Stores call information that associates the identification information of the operator corresponding to the user with the identification information of the business type that the operator is in charge of,
A speech recognition method for selecting a dictionary to be used for speech recognition based on the call information from a plurality of stored dictionaries, and referring to the selected dictionary to convert stored or recorded operator speech into text data .