JPH1125112A

JPH1125112A - Method and device for processing interactive voice, and recording medium

Info

Publication number: JPH1125112A
Application number: JP9180159A
Authority: JP
Inventors: Otoya Shirotsuka; 音也城塚
Original assignee: N T T DATA KK; NTT Data Corp
Current assignee: N T T DATA KK; NTT Data Group Corp
Priority date: 1997-07-04
Filing date: 1997-07-04
Publication date: 1999-01-29

Abstract

PROBLEM TO BE SOLVED: To provide an interactive voice processor improved so as to efficiently reproduce the information of a necessary part from recorded interactive voices. SOLUTION: A division processing part 1 processes interactive voices inputted in time series order to generate plural voice data. An item contents recording processing part 2 allows contents to correspond to a data item (topic) to be an index of interactive voices. A data item correspondence processing part 3 allows the data item to correspond to a voice data group and stores a voice data set in a data storage part 4 in each data item. An interactive contents reproducing part 6 retrieves only a voice data set for a necessary data item from the data storage part 4 and reproduces the contents of an interaction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、対話音声
を収録しておいて、事後的に対話音声を効率的に再生し
て対話内容を把握するための音声データ処理技術に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice data processing technique for recording a dialogue voice, for example, and for efficiently reproducing the dialogue voice afterwards to grasp the content of the dialogue.

【０００２】[0002]

【従来の技術】対話型音声処理技術を応用したサービス
システムにおいて、顧客と対話しながら顧客対応記録の
作成を行う受付担当者の支援装置が知られている。この
支援装置では、予め決められた顧客対応のデータ項目を
受付担当者がキーボードで入力したり、メニューを選択
することによって、顧客データの入力や顧客対応記録の
作成を行う。サービス担当者は、受付担当者が作成した
顧客対応記録に記入されている顧客の要求をみてその顧
客への応対内容を決定し、必要な措置をとる。2. Description of the Related Art In a service system to which an interactive voice processing technology is applied, a support apparatus for a receptionist who creates a customer correspondence record while interacting with a customer is known. In this support device, the receptionist inputs customer data items determined in advance using a keyboard or selects a menu, thereby inputting customer data and creating a customer response record. The service technician sees the customer's request written in the customer affairs record created by the reception clerk, determines the content of the response to the customer, and takes necessary measures.

【０００３】[0003]

【発明が解決しようとする課題】ところで、サービス担
当者は、受付担当者から伝達された情報が顧客への対処
を行う上で不十分であった場合は、再度、顧客に問い合
わせを行う必要が生じる。この場合、顧客にとっては、
すでに受付担当者に対して話した内容と同じ内容を再度
サービス担当者に話さなければならないため、煩わしい
ものとなる。このような問題の解決法としては、顧客と
受付担当者との対話をすべて収録しておき、伝達された
情報が足りない場合に、サービス担当者が事後的に対話
内容を把握できるようにすることが考えられる。しか
し、この手法では、収録された対話のうち実際に必要と
なる部分は非常に短いにもかかわらず、対話全体を聞か
なければならないという事態が生じる。そのため、対話
が長い場合や、複数の対話が収録されている場合は、サ
ービス担当者が対話音声を聞きとるまで時間がかかって
しまうという間題点があった。[0005] By the way, if the information transmitted from the receptionist is insufficient to deal with the customer, the service representative needs to contact the customer again. Occurs. In this case, for the customer,
Since the same contents as those already spoken to the receptionist have to be told again to the service person, it is troublesome. The solution to this problem is to record all interactions between the customer and the receptionist so that service personnel can understand the details of the conversation ex post if insufficient information is provided. It is possible. However, in this method, a part of the recorded dialogue that is actually required is very short, but the entire dialogue must be heard. Therefore, when the conversation is long or when a plurality of conversations are recorded, there is a problem that it takes time for the service staff to hear the conversation voice.

【０００４】このような問題は、顧客と受付担当者との
対話内容をサービス担当者が事後的に把握する用途のほ
か、単に対話内容の概略を知るという目的で、収録され
た対話音声を聞く場合にも同様に生じる。[0004] Such a problem is caused not only by a service technician afterhand ascertaining the contents of the dialogue between the customer and the receptionist, but also by listening to the recorded dialogue voice for the purpose of simply knowing the outline of the dialogue. This also occurs in the case.

【０００５】そこで本発明の課題は、収録された対話音
声から必要部分の情報を効率的に再生できるようにする
改良された対話音声処理方法を提供することにある。本
発明の他の課題は、上記対話音声処理方法の実施に適し
た装置、及びこの装置を汎用のコンピュータ装置上で実
現するための記録媒体を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide an improved dialogue speech processing method which can efficiently reproduce necessary information from recorded dialogue speech. Another object of the present invention is to provide an apparatus suitable for implementing the above-described interactive voice processing method, and a recording medium for realizing the apparatus on a general-purpose computer device.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決する本発
明の対話音声処理方法は、コンピュータ装置を用いた方
法であって、入力された対話音声を複数の音声データに
分割し、分割された各音声データを話題毎に対応付けて
蓄積するとともに、蓄積された音声データから特定の話
題に対応する音声データ群を選択的に読み出して対話内
容を再生する過程を含むことを特徴とする。この方法に
よれば、話題についての音声データ群のみを検索して再
生できるので、検索効率が向上するとともに、すべての
音声データを時系列に再生する場合に比べて対話内容の
把握に要する時間を格段に短縮することができる。A dialogue speech processing method according to the present invention for solving the above-mentioned problem is a method using a computer device, which divides an inputted dialogue speech into a plurality of speech data and divides the speech data into a plurality of speech data. The method includes a step of storing each voice data in association with each topic, selectively reading out a voice data group corresponding to a specific topic from the stored voice data, and reproducing a conversation content. According to this method, only the voice data group on the topic can be searched for and played back, so that the search efficiency is improved and the time required for grasping the conversation contents is shorter than when all voice data is played back in chronological order. It can be significantly shortened.

【０００７】また、本発明の他の対話音声処理方法は、
入力された対話音声を複数の音声データに分割し、予め
話題毎に定められたキーワードが含まれているかどうか
を個々の音声データ毎に判定して、当該音声データを前
記判定結果と対応付けて蓄積するとともに、蓄積された
音声データから前記キーワードを含む音声データを蓄積
順に読み出して対話内容を再生する過程を含むことを特
徴とする。この方法によれば、キーワードが含まれる、
即ち重要度が比較的高いと推定される音声データに基づ
いて対話内容が時系列に再生されるので、対話内容の概
要を把握することが可能となる。[0007] Further, another conversational speech processing method according to the present invention comprises:
The input dialogue voice is divided into a plurality of voice data, and it is determined for each voice data whether a keyword determined in advance for each topic is included, and the voice data is associated with the determination result. Storing the voice data including the keyword from the stored voice data in the storage order and reproducing the dialogue content. According to this method, the keyword is included,
That is, the contents of the dialogue are reproduced in chronological order based on the voice data whose importance is estimated to be relatively high, so that it is possible to grasp the outline of the contents of the dialogue.

【０００８】なお、キーワードに予め優先順位をつけて
おき、優先順位の高いキーワードに対応する音声データ
を優先的に再生するようにしても良い。It is also possible to assign priorities to the keywords in advance, and to reproduce the audio data corresponding to the keywords having the higher priority.

【０００９】また、上記他の課題を解決する第１構成の
対話音声処理装置は、対話音声を入力する音声入力手段
と、入力された対話音声を所定の処理単位毎に分割して
複数の音声データを生成する手段と、前記生成された複
数の音声データをそれぞれ対話内容のインデックスとな
るデータ項目に対応付けて蓄積する手段とを備え、前記
蓄積された複数の音声データから前記データ項目毎の対
応音声データ群を索出するように構成されたことを特徴
とする。According to a first aspect of the present invention, there is provided a dialogue speech processing apparatus for solving the above-mentioned problems, comprising: a speech input unit for inputting a dialogue voice; Means for generating data, and means for accumulating the plurality of generated voice data in association with data items each serving as an index of the content of a conversation, and for each of the data items from the plurality of stored voice data. It is characterized in that it is configured to search for a corresponding audio data group.

【００１０】第２構成の対話音声処理装置は、対話音声
を入力する音声入力手段と、入力された対話音声を所定
の処理単位毎に分割して複数の音声データを生成する手
段と、予め話題毎に定めたキーワードが個々の音声デー
タに存在するかどうかを判定する手段と、前記判定の結
果情報を当該音声データに付与して蓄積する手段とを備
え、前記蓄積された複数の音声データから前記キーワー
ドを含む音声データ群を蓄積順に索出するように構成さ
れたことを特徴とする。[0010] The conversational speech processing apparatus of the second configuration comprises a speech inputting means for inputting a conversational speech, a means for dividing the inputted conversational speech for each predetermined processing unit and generating a plurality of speech data, Means for determining whether a keyword determined for each voice exists in each voice data, and means for adding the determination result information to the voice data and storing the same, and from the plurality of stored voice data, The audio data group including the keyword is searched in the order of accumulation.

【００１１】第３構成の対話音声処理装置は、対話音声
を入力する音声入力手段と、入力された対話音声を所定
の処理単位毎に分割して複数の音声データを生成する手
段と、予め話題毎に定めたキーワードが個々の音声デー
タに存在するかどうかを判定する手段と、個々の音声デ
ータに前記判定の結果情報を付与するとともに、各音声
データをそれぞれ対話内容のインデックスとなるデータ
項目に対応付けて蓄積する手段とを備え、前記蓄積され
た複数の音声データから前記データ項目毎またはキーワ
ード毎に対応音声データ群を索出するように構成された
ことを特徴とする。According to a third aspect of the present invention, there is provided a dialogue speech processing apparatus comprising: a voice input unit for inputting a dialogue voice; a unit for dividing the input dialogue voice into predetermined processing units to generate a plurality of voice data; Means for determining whether a keyword determined for each voice is present in each voice data, and adding information on the result of the determination to each voice data, and converting each voice data into a data item serving as an index of a conversation content. Means for associating and storing a plurality of voice data, and searching for a corresponding voice data group for each data item or each keyword from the plurality of stored voice data.

【００１２】なお、各対話音声処理装置において、好ま
しくは、索出された対応音声データ群に基づいて対話内
容を再生する再生手段をさらに備える。[0012] Each of the dialogue voice processing devices preferably further comprises a reproducing means for reproducing the content of the dialogue based on the searched corresponding voice data group.

【００１３】さらに、上記他の課題を解決する本発明の
記録媒体は、コンピュータ装置に下記の処理を実行させ
るプログラムを当該コンピュータ装置が読み取り可能な
形態で記録してなる記録媒体である。（１）対話音声を入力する音声入力処理、（２）入力さ
れた対話音声を所定の処理単位毎に分割して複数の音声
データを生成する処理、（３）予め話題毎に定めたキー
ワードが個々の音声データに存在するかどうかを判定す
る処理、（４）前記生成された音声データにそれぞれ前
記判定の結果情報を付与するとともに各音声データを対
話内容のインデックスとなるデータ項目と対応付けて蓄
積する処理、（５）前記蓄積された音声データを前記デ
ータ項目毎またはキーワード毎に索出して対話内容を再
生する処理。Further, a recording medium of the present invention for solving the above-mentioned other problems is a recording medium in which a program for causing a computer device to execute the following processing is recorded in a form readable by the computer device. (1) a voice input process for inputting a dialogue voice, (2) a process for generating a plurality of voice data by dividing the input dialogue voice for each predetermined processing unit, and (3) a keyword previously determined for each topic. A process of determining whether or not the voice data is present in each voice data; (4) assigning information on the result of the determination to the generated voice data, and associating each voice data with a data item serving as an index of the dialogue content (5) a process of retrieving the stored voice data for each of the data items or for each keyword and reproducing the contents of the dialogue.

【００１４】[0014]

【発明の実施の形態】以下、本発明を、顧客と受付担当
者との間で交わされる対話音声を対象とする対話音声処
理装置に適用した場合の実施の形態を詳細に説明する。
図１は、本実施形態による対話音声処理装置の概略構成
図である。この対話音声処理装置は、コンピュータ装置
が所定のプログラムを読み込んで実行することにより形
成される、分割処理部１、項目内容記録処理部２、デー
タ項目対応付け処理部３、データ蓄積部４、キーワード
検出部５、及び対話内容再生部６の機能を備えて構成さ
れる。なお、各機能ブロック１〜６を形成するためのプ
ログラムは、通常、当該コンピュータ装置の内部記憶装
置あるいは外部記憶装置に格納され、随時読み取られて
実行されるようになっているが、当該コンピュータ装置
とは分離可能な記録媒体、例えばＣＤ−ＲＯＭやＦＤ等
に格納され、使用時に上記内部記憶装置または外部記憶
装置にインストールされて随時実行に供されるものであ
っても良い。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment in which the present invention is applied to a dialogue voice processing apparatus for a dialogue voice exchanged between a customer and a receptionist will be described in detail.
FIG. 1 is a schematic configuration diagram of a conversational speech processing device according to the present embodiment. This interactive voice processing device is formed by a computer device reading and executing a predetermined program, which is formed by a division processing unit 1, an item content recording processing unit 2, a data item association processing unit 3, a data storage unit 4, a keyword It has the functions of the detection unit 5 and the dialogue content reproduction unit 6. A program for forming each of the functional blocks 1 to 6 is normally stored in an internal storage device or an external storage device of the computer device, and is read and executed as needed. May be stored in a recording medium that can be separated, for example, a CD-ROM, an FD, or the like, and may be installed in the internal storage device or the external storage device at the time of use and provided for execution at any time.

【００１５】本実施形態による対話音声は、電話等を通
じて本装置に入力される顧客及び受付担当者の音声であ
る。分割処理部１は、入力された対話音声を適当な処理
単位の音声データに分割し、各音声データに連続の番号
を付与する。音声データの分割は、例えば発話と発話の
間の任意の長さ以上の無音区間を用いて行うことができ
る。多くの場合、対話音声は、一方の話者が話し始めて
から他方の話者が話し始めるまでを一区間の音声データ
として分割される。勿論、他の手法によって分割処理を
行うことも可能である。The dialogue voice according to the present embodiment is a voice of a customer and a receptionist input to the apparatus through a telephone or the like. The division processing unit 1 divides the input dialogue voice into voice data of an appropriate processing unit, and assigns a continuous number to each voice data. The division of the audio data can be performed, for example, using a silent section having an arbitrary length or more between utterances. In many cases, the dialogue voice is divided as one section of voice data from when one speaker starts talking to when the other speaker starts talking. Of course, it is also possible to perform the division processing by another method.

【００１６】項目内容記録処理部２は、入力された対話
音声から予め定められた顧客対応データ項目（以下、デ
ータ項目と略す）の内容を抽出し、抽出結果をデータ項
目対応付け処理部３に渡す。対話音声からの項目内容の
抽出は、音声自動認識処理等によって行うことが可能で
ある。例えば、本装置を操作する受付担当者がデータ項
目を顧客に尋ねるようにし、顧客による返答部分の音声
を音声認識処理して当該顧客についてのデータ項目の内
容を把握する。あるいは、音声自動認識処理等によら
ず、受付担当者が顧客と直接会話し、その会話中にデー
タ項目に関する内容が含まれていた場合、これを図示し
ないキーボードによって項目内容記録処理部２への項目
内容として入力するようにしても良い。The item content recording processing unit 2 extracts the contents of a predetermined customer correspondence data item (hereinafter, abbreviated as a data item) from the input dialogue voice, and outputs the extracted result to the data item association processing unit 3. hand over. The extraction of the item contents from the conversation voice can be performed by voice automatic recognition processing or the like. For example, the receptionist who operates the apparatus asks the customer for the data item, and performs voice recognition processing on the voice of the reply portion of the customer to grasp the contents of the data item for the customer. Alternatively, when the receptionist talks directly with the customer without using the automatic voice recognition processing and the like and the contents related to the data items are included in the conversation, the data is sent to the item contents recording processing unit 2 by a keyboard (not shown). You may make it input as item content.

【００１７】データ項目対応付け処理部３は、顧客と受
付担当者との間の話題のインデックスとなるデータ項目
と、項目内容記録処理部２で抽出された各項目内容とを
対応付けるとともに、分割処理部１からの音声データ
（付与された番号）と上記データ項目とを対応付け、そ
れらを一連の顧客対応データとしてデータ蓄積部４に蓄
積する。また、各データ項目について、受付担当者によ
る、あるキー入力（キーボードによる入力、以下同じ）
があった時点以前で、かつ、一つ前のデータ項目のキー
入力があった時点よりも後に話された一連の音声データ
のすべてを、そのデータ項目と対応付けておく。The data item association processing unit 3 associates a data item serving as an index of a topic between a customer and a receptionist with each item content extracted by the item content recording processing unit 2, and performs a division process. The audio data (number assigned) from the unit 1 is associated with the data items, and the data items are stored in the data storage unit 4 as a series of customer correspondence data. In addition, for each data item, a key input by the receptionist (keyboard input, the same applies hereinafter)
All of a series of voice data spoken before the time point when there is a key input of the immediately preceding data item and after the time point when the key input of the previous data item was input are associated with the data item.

【００１８】キーワード検出部５は、予め対話に出現す
ると予想されるキーワードのリストを話題毎、つまりデ
ータ項目毎に保持しており、分割された音声データ中に
上記キーワードが含まれるかどうかをキーワード認識処
理により調べて、その有無の情報を音声データに対応付
けてデータ蓄積部４に蓄積する。このキーワード認識処
理は、音声データの分割処理と平行して行えるように、
マルチタスク形式で実行する。The keyword detection unit 5 holds in advance a list of keywords expected to appear in a dialog for each topic, that is, for each data item, and determines whether the above-mentioned keywords are included in the divided voice data. The data is checked by the recognition process, and information on the presence or absence is stored in the data storage unit 4 in association with the voice data. This keyword recognition process is performed in parallel with the voice data division process.
Execute in multitasking format.

【００１９】対話内容再生部６は、データ蓄積部４に蓄
積されている複数の音声データからデータ項目毎あるい
はキーワード毎に該当するものを探索して所定順に並
べ、対話内容を再生するものである。再生に際しては、
通常、音響手法が用いられるが、音声データを音声認識
処理してこれを文字情報化、符号化、図形化してディス
プレイ表示する等の方法を採用することもできる。音声
データの探索についてはテキストサーチ等の検索手法を
採用することができる。The conversation content reproducing unit 6 retrieves the data corresponding to each data item or each keyword from a plurality of voice data stored in the data storage unit 4, arranges them in a predetermined order, and reproduces the conversation content. . Upon playback,
Normally, an acoustic technique is used. However, a method of performing speech recognition processing on speech data, converting the speech data into character information, encoding, and graphics, and displaying the resulting data on a display may be employed. A search method such as a text search can be employed for searching for audio data.

【００２０】次に、本実施形態の対話音声処理装置（以
下、本装置）において、音声データを蓄積する場合の処
理を図２を参照して説明する。本装置における音声デー
タの蓄積処理は、顧客が電話回線をつないでから切断す
るまで続くものとする。Next, a process for storing voice data in the dialogue voice processing apparatus of the present embodiment (hereinafter, this apparatus) will be described with reference to FIG. It is assumed that the storage processing of the voice data in the present apparatus continues until the customer connects and disconnects the telephone line.

【００２１】本装置では、回線が切断されていないこと
を確認した後（ステップＳ１：Ｎｏ）、キー入力及び音
声入力を待つ（ステップＳ２：Ｎｏ、ステップＳ３：Ｎ
ｏ）。音声が入力された場合は（ステップＳ３：Ｙｅ
ｓ）、図示しない記録領域への音声データ毎の記録処理
を行う（ステップＳ４）。この処理は音声入力が終了す
るまで繰り返し行われる（ステップＳ５：Ｎｏ）。音声
入力が終了すると（ステップＳ５：Ｙｅｓ）、入力音声
から分割された一連の音声データについての番号付与処
理を行う（ステップＳ６）。以上の処理は、分割処理部
１で行われる。ステップＳ２において、キー入力があっ
た場合、例えばあるデータ項目に関して２つの音声デー
タが使用され、そのデータ項目についてのキー入力があ
った場合は、データ項目対応付け処理部３で、そのデー
タ項目と音声データとの対応付け処理を行う（ステップ
Ｓ７）。その後、キーワード検出部５で、キーワード認
識処理及び音声データ中のキーワードの有無を検出し、
検出結果をデータ蓄積部４に蓄積する（ステップＳ
８）。これらの一連の処理を回線切断があるまで繰り返
す。In this apparatus, after confirming that the line has not been disconnected (step S1: No), the apparatus waits for key input and voice input (step S2: No, step S3: N).
o). If a voice is input (step S3: Ye
s) A recording process for each audio data in a recording area (not shown) is performed (step S4). This process is repeated until the voice input ends (step S5: No). When the voice input ends (step S5: Yes), a numbering process is performed on a series of voice data divided from the input voice (step S6). The above processing is performed by the division processing unit 1. In step S2, when there is a key input, for example, two voice data are used for a certain data item, and when there is a key input for that data item, the data item association processing unit 3 A process of associating with audio data is performed (step S7). Thereafter, the keyword detection unit 5 detects the presence or absence of the keyword in the keyword recognition processing and the voice data,
The detection result is stored in the data storage unit 4 (step S
8). These series of processes are repeated until the line is disconnected.

【００２２】以上の処理を、顧客が、購入したパーソナ
ルコンピュータに関する相談を本装置の受付担当者に対
して行う場合に適用した場合について具体的に説明す
る。図３は、この例における対話の流れとそのときの音
声データの内容を示す図である。この例では、顧客と受
付担当者との間で交わされる対話が、それぞれ分割処理
部１において複数の音声データに分割され、各音声デー
タに連続番号「００００」〜「００１３」が付与され
る。また、データ項目に対応した音声データ群、すなわ
ち音声データセットの開始音声データ「００００」がデ
ータ蓄積部４の該当領域に記録される。対話が進行し、
顧客の名前が分かった段階で、受付担当者は、データ項
目「顧客名」に「いそべ」という顧客名をキー入力し、
これらの音声データセット（０００１〜０００２）をデ
ータ蓄積部４の該当領域に蓄積する。同様に、データ項
目「用件」に「メモリー型番問い合わせ」、データ項目
「対象ハードウェア」にパソコンの種類「ＸＸ−Ｙ
Ｙ」、データ項目「連絡先」に顧客の電話番号「０３−
２４５６−７７７７」をそれぞれキー入力し、それぞれ
のデータ項目についての音声データセットをデータ蓄積
部４の該当領域に蓄積する。A specific description will be given of a case where the above-described processing is applied to a case where a customer provides a consultation regarding a purchased personal computer to a receptionist of the present apparatus. FIG. 3 is a diagram showing the flow of the dialogue in this example and the contents of the voice data at that time. In this example, a dialogue exchanged between a customer and a receptionist is divided into a plurality of audio data in the division processing unit 1, and serial numbers "0000" to "0013" are assigned to each audio data. Also, a voice data group corresponding to the data item, that is, the voice data set start voice data “0000” is recorded in the corresponding area of the data storage unit 4. Dialogue progresses,
Once the customer's name is known, the receptionist types the customer name "Isobe" into the data field "Customer Name"
These audio data sets (0001 to 0002) are stored in corresponding areas of the data storage unit 4. Similarly, the data item “message” is “memory model number inquiry”, and the data item “target hardware” is a computer type “XX-Y”.
Y ", the customer telephone number" 03-
2456-7777 "by key input, and a voice data set for each data item is stored in the corresponding area of the data storage unit 4.

【００２３】本対話に基づいて作成されたデータ項目と
その内容の対応関係を表すデータ（顧客対応データ）の
例を図４に示す。本実施形態では、そのデータ項目につ
いての区切り情報であるキー入力を完了した時点で、そ
れ以前で最も新しく発声された音声データを当該データ
項目についての音声データセットの終了音声データとす
る。これにより音声データセットの範囲とその対応先の
項目とが決定する。FIG. 4 shows an example of data (customer correspondence data) representing the correspondence between data items and their contents created based on this dialogue. In this embodiment, when the key input which is the delimiter information for the data item is completed, the most recently uttered voice data before that is used as the end voice data of the voice data set for the data item. As a result, the range of the audio data set and the corresponding item are determined.

【００２４】キーワード検出部５は、音声データセット
の範囲とその対応先の項目とが決定した時点、例えばデ
ータ項目「用件」についていえば、その内容「メモリー
型番問い合わせ」がキー入力された時点で、音声データ
セット「０００３」〜「０００５」が、そのデータ項目
「用件」に対応した音声データであると判定し、その音
声データ「０００３」〜「０００５」に対してキーワー
ド認識処理を行う。このキーワード認識処理は、事前に
登録されたキーワードのリストに基づいて行う。図５
は、このリストの例を示す図であり、例えばデータ項目
「用件」に対応したキーワードとして登録されているも
のに、「パソコン」、「聞きたい」、「型番」、「教え
て」・・・がある。キーワード検出部５は、例えばデー
タ項目「用件」についてのキーワードが含まれている音
声データとして音声データ「０００３」および「０００
５」を選定し、音声データ「０００４」にはキーワード
が含まれていないと判定する。これらの結果は図６に示
すように、データ項目名、項目内容、対応音声データ
（一つまたは複数の音声データの番号、音声データファ
イル名、それぞれの音声データ中のキーワード含有の有
無の情報（１＝有／０＝無））の組データとして、デー
タ蓄積部４に蓄積される。The keyword detecting unit 5 determines when the range of the voice data set and the corresponding item are determined, for example, when the data item “message” is key-inputted to the content “memory model number inquiry”. Then, it is determined that the voice data sets “0003” to “0005” are voice data corresponding to the data item “message”, and the keyword recognition processing is performed on the voice data “0003” to “0005”. . This keyword recognition process is performed based on a list of keywords registered in advance. FIG.
Is a diagram showing an example of this list. For example, "PC", "I want to listen", "Model number", "Tell me", etc. are registered as keywords corresponding to the data item "Message".・ There is. For example, the keyword detection unit 5 outputs the voice data “0003” and “000” as voice data including a keyword for the data item “message”.
5 "is selected, and it is determined that the voice data" 0004 "does not include a keyword. As shown in FIG. 6, these results are, as shown in FIG. 6, data item names, item contents, and corresponding audio data (one or more audio data numbers, audio data file names, 1 = presence / 0 = absence) are stored in the data storage unit 4 as set data.

【００２５】ところで、図３の例では、受付担当者によ
って４つのデータ項目の情報がキー入力されている。し
かし、顧客が話した内容のうち、購入時期である「昨年
９月」、購入場所である「ＸＸ電器」、購入予定のメモ
リーの容量である「８Ｍから４０Ｍへの変更」という情
報については、受付担当者がキー入力を行っていない。
また、顧客の電話番号は誤ってキー入力されている。こ
のような場合、サービス担当者が受付担当者の作成した
顧客対応データを基に回答をしようとすると、情報が不
足したり、回答すべき情報が誤ってしまうことが予想さ
れる。この場合、サービス担当者は、以下のようにして
対応することになる。By the way, in the example of FIG. 3, information on four data items is inputted by a key by the receptionist. However, among the contents spoken by the customer, regarding the information of “purchase time last September”, the purchase place “XX Denki”, and the memory capacity to be purchased “change from 8M to 40M”, The receptionist has not entered a key.
Also, the customer's telephone number has been incorrectly keyed in. In such a case, if the service rep attempts to reply based on the customer response data created by the reception rep, it is expected that the information will be insufficient or the information to be answered will be incorrect. In this case, the service person will respond as follows.

【００２６】まず、生産時期によってメモリーの型番が
違う事実がある場合、製品番号や購入時期を正しく知る
必要がある。このときは、すでに収録されている「用
件」の部分の音声データセットをデータ蓄積部４から索
出して対話内容再生部６で再生し、商品の購入時期を知
ることにより、商品の生産時期と、正しいメモリーの型
番を判断する。また、顧客対応データの「連絡」項目の
電話番号に電話をかけた場合、受付担当者がキー入力を
間違ったために顧客につながらない。この場合は、顧客
対応データの「連絡先」項目に対応した音声データセッ
トを選択的にデータ蓄積部４より索出し、これを対話内
容再生部６で再生して確認することにより、顧客が発し
た正しい電話番号を知ることができるようになる。First, when there is a fact that the model number of the memory differs depending on the production time, it is necessary to correctly know the product number and the purchase time. At this time, the voice data set of the "message" part which has already been recorded is retrieved from the data storage unit 4 and played back by the conversation content playback unit 6 to know the purchase time of the product. And the correct memory model number. In addition, when a call is made to the telephone number of the "contact" item of the customer correspondence data, the receptionist does not connect to the customer due to an incorrect key entry. In this case, the voice data set corresponding to the “contact information” item of the customer correspondence data is selectively retrieved from the data storage unit 4 and played back by the conversation content reproduction unit 6 to be confirmed. You will be able to know the correct phone number.

【００２７】さらに、対話の内容の概略を聞く場合は、
キーワードを含有する音声データのみを、時間的に古い
ものからデータ蓄積部４から索出して並び替えし、これ
を対話内容再生部６で再生することにより、その概略を
迅速に把握することができるようになる。図７は、図３
の対話例からキーワードを含む音声データのみを抜き出
して作成した対話内容の例である。下線をひかれた単語
が、予め登録されたキーワードである。図７から明らか
なように、図３の対話内容に比べて格段に短い内容にな
っている。しかも、必要な会話内容についてはすべて網
羅されているので、対話内容の正しい把握が可能にな
る。Furthermore, when listening to the outline of the contents of the dialogue,
By retrieving only the voice data containing the keyword from the data storage unit 4 in the order of time and sorting the data, and reproducing this by the conversation content reproduction unit 6, the outline can be quickly grasped. Become like FIG. 7 shows FIG.
21 is an example of the contents of a dialog created by extracting only voice data including a keyword from the example of the dialog. The underlined words are keywords registered in advance. As is clear from FIG. 7, the content is much shorter than the content of the dialog shown in FIG. In addition, since all necessary conversation contents are covered, correct understanding of the conversation contents is possible.

【００２８】なお、対話内容の再生に際しては、音声デ
ータをソーティングしたうえで再生したり、任意の法則
によりキーワードに優先順位をつけておき、優先順位の
高いキーワードに対応する音声データから順に再生する
ようにすることも可能である。このようにすれば、関心
のある順に対話音声を把握できるようになり、目的に合
致した利用形態を実現することができるようになる。When the contents of the dialogue are reproduced, the audio data is reproduced after being sorted, or the keywords are prioritized according to an arbitrary rule, and the audio data corresponding to the keywords having the higher priority are reproduced in order. It is also possible to do so. In this way, the conversation voice can be grasped in the order of interest, and a use form that matches the purpose can be realized.

【００２９】以上説明したように、本実施形態の対話音
声処理装置では、顧客と受付担当者との間で交わされる
対話音声を分割して複数の音声データを生成するととも
に、予め決めておいたデータ項目毎の音声データセット
毎に対応付けて蓄積し、該当データ項目毎の再生を可能
にしたので、対話内容を事後的に把握するときに、従来
のように対話の初めから通して音声を聞く必要がなくな
り、必要な情報を取得するための労力と所要時間を節約
することができるようになる。As described above, in the dialogue speech processing apparatus of the present embodiment, the dialogue voice exchanged between the customer and the receptionist is divided to generate a plurality of voice data, and is determined in advance. Since data is stored in association with each audio data set for each data item, and playback can be performed for each data item, when grasping the contents of the dialogue ex post facto, the sound is passed through from the beginning of the dialogue as in the past. There is no need to listen, and the labor and time required to obtain the necessary information can be saved.

【００３０】また、分割した音声データの中で、対話内
容に関して重要なもの、例えば、予め話題毎に定めたキ
ーワードを含んだ音声データのみを時系列上に並べて対
話の概略内容を再生できるようにしたので、対話内容を
簡潔に理解することが可能になる。Also, of the divided voice data, important ones concerning the contents of the dialogue, for example, only the voice data including a keyword predetermined for each topic are arranged in a time series so that the outline contents of the dialogue can be reproduced. Therefore, it is possible to easily understand the contents of the dialogue.

【００３１】なお、本実施形態では、顧客と受付担当者
との対話音声を例に挙げて説明したが、この例に限ら
ず、任意の対話に対しても本発明を適用することが可能
であることはいうまでもないことである。Although the present embodiment has been described by taking as an example the dialogue voice between the customer and the receptionist, the present invention is not limited to this example, and the present invention can be applied to any dialogue. It goes without saying that there is something.

【００３２】[0032]

【発明の効果】以上の説明から明らかなように、本発明
によれば、対話音声を収録する際に対話音声を複数の音
声データに分割し、話題毎の音声データセットとして蓄
積するようにしたので、知りたい話題についての情報が
含まれる可能性がある話題部分の音声データセットのみ
を再生することが可能になる。As is apparent from the above description, according to the present invention, when recording a dialogue voice, the dialogue voice is divided into a plurality of voice data and stored as a voice data set for each topic. Therefore, it is possible to reproduce only a voice data set of a topic portion that may include information about a topic to be known.

【００３３】また、本発明によれば、蓄積された複数の
音声データから所要のキーワードが含まれる音声データ
を選択的に索出して再生することができるので、対話内
容を全体的に要約して把握することが可能となる。特
に、キーワードを含む音声データを、対話の時系列順に
再生することで、もとの対話音声の流れに沿った再生が
可能になり、対話内容を正しく理解できるようになる。Further, according to the present invention, voice data including a required keyword can be selectively searched out from a plurality of stored voice data and reproduced, so that the contents of the dialog can be summarized as a whole. It becomes possible to grasp. In particular, by reproducing the voice data including the keyword in the chronological order of the dialogue, it is possible to reproduce the voice along with the flow of the original dialogue voice, so that the contents of the dialogue can be correctly understood.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る対話音声処理装置の
概略機能説明図。FIG. 1 is a schematic functional explanatory diagram of a conversational voice processing device according to an embodiment of the present invention.

【図２】本実施形態の対話音声処理装置の処理手順説明
図。FIG. 2 is an explanatory diagram of a processing procedure of the conversational voice processing device of the embodiment.

【図３】顧客と対話音声処理装置の受付担当者との間の
対話の流れとそのときの音声データの内容を示す説明
図。FIG. 3 is an explanatory diagram showing a flow of a dialogue between a customer and a receptionist of a dialogue voice processing apparatus and the contents of voice data at that time.

【図４】本実施形態による顧客対応データの説明図。FIG. 4 is an explanatory diagram of customer correspondence data according to the embodiment.

【図５】予め設定されたキーワードリストの内容例を示
す説明図。FIG. 5 is an explanatory diagram showing an example of the content of a preset keyword list.

【図６】データ蓄積部に蓄積される組データの内容例を
示す説明図。FIG. 6 is an explanatory diagram showing an example of the contents of group data stored in a data storage unit.

【図７】キーワード抽出によって再生された対話音声の
説明図。FIG. 7 is an explanatory diagram of a dialogue voice reproduced by keyword extraction.

[Explanation of symbols]

１分割処理部２項目内容記録処理部３データ項目対応付け処理部４データ蓄積部５キーワード検出部６対話内容再生部 DESCRIPTION OF SYMBOLS 1 Division processing part 2 Item content recording processing part 3 Data item correspondence processing part 4 Data storage part 5 Keyword detection part 6 Dialogue content reproduction part

Claims

[Claims]

An input dialogue voice is divided into a plurality of voice data, the divided voice data is stored as a voice data set for each topic, and voice data corresponding to a specific topic is stored from the stored voice data. A dialogue speech processing method using a computer device, comprising a step of selectively retrieving a set and playing back the content of the dialogue.

2. The range of the audio data set is such that the audio data input after the previous determination of the delimitation information for each topic is the start audio data, and the latest input audio data before the determination of the next delimitation information is set. Is determined as the end voice data.

3. An input dialogue voice is divided into a plurality of voice data, and it is determined for each voice data whether a keyword predetermined for each topic is included, and the voice data is determined. A dialogue voice processing method using a computer device, the method including a step of storing voice data including the keyword from the stored voice data in the storage order and playing back the content of the dialogue while storing the voice data in association with a result.

4. The interactive voice processing method according to claim 3, wherein priorities are assigned to the keywords in advance, and voice data corresponding to the keywords having a higher priority is reproduced preferentially.

5. A voice input unit for inputting a dialogue voice, a unit for generating a plurality of voice data by dividing the input dialogue voice for each predetermined processing unit, Means for storing in association with a data item serving as an index of a dialogue content, and a dialogue voice processing apparatus configured to search for a corresponding voice data group for each data item from the stored plurality of voice data. .

6. A voice input unit for inputting a dialogue voice, a unit for generating a plurality of voice data by dividing the input dialogue voice for each predetermined processing unit, and Means for determining whether or not the voice data exists in the voice data; and means for adding the determination result information to the voice data and storing the voice data. The voice data group including the keyword is stored from the plurality of stored voice data. An interactive speech processing device configured to search in the order of accumulation.

7. A voice input unit for inputting a dialogue voice, a unit for generating a plurality of voice data by dividing the input dialogue voice for each predetermined processing unit, and Means for determining whether or not the voice data is present; and means for adding information on the result of the determination to each voice data and storing each voice data in association with a data item serving as an index of a dialogue content. An interactive voice processing device configured to search for a corresponding voice data group for each data item or each keyword from the plurality of stored voice data.

8. The dialogue voice processing apparatus according to claim 5, further comprising a reproducing unit for reproducing the content of the dialogue based on the searched corresponding voice data group.

9. A voice input process for inputting a dialogue voice, a process for dividing the input dialogue voice for each predetermined processing unit to generate a plurality of voice data, A process of determining whether or not the voice data exists in the voice data; a process of adding the determination result information to the generated voice data, and accumulating each voice data in association with a data item serving as an index of a conversation content. A recording medium in which a program for causing a computer device to execute the process of retrieving the stored voice data for each data item or each keyword and reproducing the contents of a conversation is recorded in a form readable by the computer device.