JP2012247912A

JP2012247912A - Speech signal processing apparatus

Info

Publication number: JP2012247912A
Application number: JP2011117999A
Authority: JP
Inventors: Osamu Segawa; 修瀬川
Original assignee: Chubu Electric Power Co Inc
Current assignee: Chubu Electric Power Co Inc
Priority date: 2011-05-26
Filing date: 2011-05-26
Publication date: 2012-12-13
Anticipated expiration: 2031-05-26
Also published as: JP5627109B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for precisely extracting necessary information associated with business from a speech signal during a dialog of the business between speakers.SOLUTION: Speech recognition means 13 converts a speed signal during a dialog between speakers into speech recognition information. Conversation unit division means 14 divides the speech recognition information into conversation units. Necessary information clue information imparting means 16 imparts necessary information clue information included in the conversation units by the conversation units. Business determination means 15 determines the business of the dialog between the speakers on the basis of the speech recognition information. Dialog state transition determination means 17 selects a conversation unit including necessary information clue information as a clue to extraction of necessary information corresponding to the business. On the basis of the selected conversation unit and dialog state determination information stored in a conversation state determination information database 25 corresponding to the business, it is determined that the dialog state changes in the order of an information request dialog state, a request acceptance dialog state, an information disclosure dialog state, and an information acceptance dialog state, thereby extracting the necessary information.

Description

本発明は、用件に対する話者間の対話時の音声信号から、用件に係る必要情報を抽出する技術に関する。 The present invention relates to a technique for extracting necessary information relating to a requirement from a voice signal during dialogue between speakers for the requirement.

顧客が企業に用件（サービス）を依頼する場合（例えば、顧客が電力会社に［電気の使用開始］、［電気の使用停止］、［電気の契約容量の変更］を依頼する場合）等には、顧客は、企業のサービスセンターに電話する。この時、サービスセンターの受付者（オペレータ)は、顧客が依頼する用件の内容、当該依頼された用件の遂行に必要な必要情報を顧客から聞き取り、聞き取った用件の内容および必要情報をコンピュータや記録用紙に記録する。この場合、受付者は、用件の内容や必要情報を顧客から聞き取って記録する作業が必要となり、受付者の負担が大きい。
そこで、受付者の負担を軽減するために、話者間の対話時の音声信号から必要情報を抽出する音声信号処理装置の開発が要望されている。
これまで、音声から必要情報を抽出（要約）する技術が提案されている。例えば、特許文献１には、音響分析による特徴パターン（無音継続時間、音声の高低、話者間の音声波形の類似度等）を手掛かりとして、対話中の重要な情報（語句）を判別し、音声要約を行う方法が開示されている。また、特許文献２には、音声認識結果であるテキスト情報に対して、形態素解析や構文解析および意味解析を行い、これらの結果に基づき重要箇所の判別を行う方法が開示されている。
しかしながら、このような従来の音声要約技術では、用件が明確な目的指向の対話から、用件に係る必要情報を精度よく抽出することができない。 When a customer requests a business (service) from a company (for example, when a customer requests an electric power company to start [use of electricity], [stop use of electricity], or [change the contracted capacity of electricity]) The customer calls the corporate service center. At this time, the service center receptionist (operator) listens to the customer about the details of the request requested by the customer and the necessary information necessary for the execution of the requested request. Record on a computer or recording paper. In this case, the receptionist needs to listen to and record the contents of the business and necessary information from the customer, and the burden on the receptionist is large.
Therefore, in order to reduce the burden on the receptionist, there is a demand for the development of a speech signal processing apparatus that extracts necessary information from speech signals during conversation between speakers.
Until now, a technique for extracting (summarizing) necessary information from speech has been proposed. For example, in Patent Document 1, important information (words) during conversation is determined using a characteristic pattern (silence duration, speech level, similarity of speech waveform between speakers, etc.) as a clue by acoustic analysis, A method for speech summarization is disclosed. Patent Document 2 discloses a method of performing morphological analysis, syntax analysis, and semantic analysis on text information that is a speech recognition result, and determining an important part based on these results.
However, such a conventional speech summarization technique cannot accurately extract necessary information related to a requirement from a purpose-oriented dialogue with a clear requirement.

特開２００６−５８５６７号公報JP 2006-58567 A 特開平８−２１２２２８号公報JP-A-8-212228

本願発明は、このような点に鑑みて創案されたものであり、用件に対する話者間の対話時の音声信号から、用件に係る必要情報を精度よく抽出することができる技術を提供することを目的とする。 The invention of the present application was devised in view of such points, and provides a technique capable of accurately extracting necessary information relating to a requirement from an audio signal during dialogue between speakers for the requirement. For the purpose.

本願の音声信号処理装置は、予め定められている用件に対する話者間の対話時の音声信号から、当該用件に係る必要情報を抽出する。「用件」としては、典型的には、顧客が企業に依頼するサービス（例えば、顧客が電力会社に依頼する［電気の使用開始］、［電気の使用停止］、［電気の契約容量の変更］）が対応する。「音声信号」としては、典型的には、音声を電気信号に変換するマイクロフォン等から出力される信号が対応する。なお、処理手段における信号処理はデジタルで行われるため、マイクロフォン等から出力されるアナログの信号をデジタル化した信号を「音声信号」として用いることもできる。 The audio signal processing apparatus of the present application extracts necessary information related to the requirement from the audio signal during the dialogue between the speakers for the predetermined requirement. As the “requirement”, typically, a service requested by a customer to a company (for example, [use of electricity], [stop use of electricity], [change of contracted capacity of electricity requested by a customer] ]). The “audio signal” typically corresponds to a signal output from a microphone or the like that converts audio into an electric signal. Since the signal processing in the processing means is performed digitally, a signal obtained by digitizing an analog signal output from a microphone or the like can be used as an “audio signal”.

一つの発明は、記憶手段と、処理手段と、入力手段と、出力手段を備えている。
記憶手段は、必要情報データベースと、対話状態判別情報データベースを有している。
必要情報データベースには、用件に係る必要情報を示す必要情報リストが記憶されている。「必要情報」は、依頼された用件（例えば、顧客から企業に依頼するサービス）を遂行するのに必要な情報であり、用件毎に定められる。
対話状態判別情報データベースには、話者間の対話状態を判別するための対話状態判別情報が記憶されている。本発明では、話者間の対話状態が、必要情報を要求した情報要求対話状態、必要情報を開示した情報開示対話状態、必要情報を受理した情報受理対話状態の順に遷移したことを判別することにより、必要情報を判別する。このため、対話状態判別情報として、必要情報を要求した対話状態であることを判別するための情報要求手掛かり情報、必要情報を開示した対話状態であることを判別するための情報開示手掛かり情報、必要情報を受理した対話状態であることを判別するための情報受理手掛かり情報が記憶されている。
処理手段は、音声認識手段と、用件判別手段と、対話状態遷移判別手段と、管理手段を有している。
音声認識手段は、入力手段から入力された話者間の対話時の音声信号を、テキスト情報を含む音声認識情報に変換する。音声信号を音声認識情報に変換する方法としては、公知の種々の音声認識方法を用いることができる。例えば、ＨＭＭ（隠れマルコフモデル）とＮグラム（確率的言語モデル）を用いた大語彙連続音声認識方法が用いられる。入力手段としては、例えば、音声を電気信号に変換するマイクロフォン、アナログ信号をデジタル信号に変換するＡ／Ｄ変換処理を行うＡ／Ｄ変換器等が用いられる。なお、Ａ／Ｄ変換処理は、音声認識手段で行うこともできる。また、入力手段としては、記憶媒体に記憶されている音声信号等を読み取る読取手段を用いることもできる。また、本願発明の「音声認識手段は、入力手段から入力された話者間の対話時の音声信号を音声認識情報に変換する」構成には、「処理手段と離れて設けられている音声認識手段によって音声信号を音声認識情報に変換し、音声認識情報を入力手段から入力する」構成も含まれる。
用件判別手段は、音声認識情報に基づいて、話者間の対話の用件を判別する。「音声認識情報に基づいて用件を判別する」方法としては、典型的には、音声認識情報に含まれている文字列から用件を判別する方法が用いられる。本明細書では、文字、数字や記号等を用いて表現されたものを「文字列」という。
対話状態遷移判別手段は、音声認識情報に基づいて、話者間の対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって、用件判別手段で判別した用件に対応する必要情報リストで示されている各必要情報を抽出する。例えば、必要情報が［契約者名］である場合には、契約者の名前を示す人名を抽出し、必要情報が［契約者住所］である場合には、契約者の住所を示す地名を抽出し、必要情報が［契約者電話番号］である場合には、契約者の電話番号を示す数詞を抽出する。話者間の対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことは、例えば、音声認識情報に、対話状態判別情報データベースに記憶されている情報要求手掛かり情報、情報開示手掛かり情報、情報受理手掛かり情報が順に含まれていることによって判別する。
管理手段は、対話状態遷移判別手段によって抽出された必要情報を出力手段から出力する。出力手段としては、例えば、表示手段、印刷手段等が用いられる。対話状態遷移判別手段によって抽出された必要情報を出力手段から出力する態様としては、種々の出力態様を用いることができる。
本発明では、対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出しているため、必要情報を精度よく抽出することができる。 One invention comprises a storage means, a processing means, an input means, and an output means.
The storage means has a necessary information database and a dialog state determination information database.
In the necessary information database, a necessary information list indicating necessary information related to the business is stored. “Necessary information” is information necessary for executing a requested business (for example, a service requested from a customer to a company), and is determined for each business.
The dialog state determination information database stores dialog state determination information for determining a dialog state between speakers. In the present invention, it is determined that the dialogue state between the speakers has changed in the order of the information request dialogue state in which the necessary information is requested, the information disclosure dialogue state in which the necessary information is disclosed, and the information reception dialogue state in which the necessary information is received. Based on the above, necessary information is determined. For this reason, as the dialogue state determination information, information request clue information for determining whether the conversation state requested the necessary information, information disclosure clue information for determining the conversation state disclosing the necessary information, necessary Information reception clue information for determining that the information is in a dialog state is stored.
The processing means includes voice recognition means, requirement determination means, dialog state transition determination means, and management means.
The voice recognition means converts the voice signal during the dialogue between the speakers input from the input means to voice recognition information including text information. Various known voice recognition methods can be used as a method for converting the voice signal into voice recognition information. For example, a large vocabulary continuous speech recognition method using HMM (Hidden Markov Model) and N-gram (Stochastic Language Model) is used. As the input means, for example, a microphone that converts sound into an electrical signal, an A / D converter that performs A / D conversion processing that converts an analog signal into a digital signal, and the like are used. The A / D conversion process can also be performed by voice recognition means. As the input unit, a reading unit that reads an audio signal or the like stored in a storage medium can be used. Further, in the configuration of the “speech recognition unit that converts a speech signal during dialogue between speakers input from the input unit into speech recognition information” according to the present invention, “speech recognition provided apart from the processing unit” The voice signal is converted into voice recognition information by the means, and the voice recognition information is input from the input means ".
The message determining means determines a message for dialogue between speakers based on the voice recognition information. As a method of “determining a requirement based on speech recognition information”, a method of determining a requirement from a character string included in the speech recognition information is typically used. In this specification, what is expressed using letters, numbers, symbols, and the like is referred to as a “character string”.
The dialogue state transition discriminating means discriminates the situation based on the voice recognition information by discriminating that the dialogue state between the speakers has transitioned in the order of the information request dialogue state, the information disclosure dialogue state, and the information reception dialogue state. Each necessary information shown in the necessary information list corresponding to the requirement determined by the means is extracted. For example, if the required information is [Contractor Name], the person name indicating the contractor's name is extracted. If the required information is [Contractor Address], the place name indicating the contractor's address is extracted. If the necessary information is [contractor telephone number], a number indicating the telephone number of the contractor is extracted. The fact that the dialogue state between the speakers has transitioned in the order of the information request dialogue state, the information disclosure dialogue state, and the information reception dialogue state is, for example, information request clue information stored in the dialogue state discrimination information database in the speech recognition information. This is determined by the fact that information disclosure clue information and information acceptance clue information are included in this order.
The management means outputs the necessary information extracted by the dialog state transition determination means from the output means. As the output means, for example, a display means, a printing means or the like is used. Various output modes can be used as a mode for outputting the necessary information extracted by the dialog state transition determination unit from the output unit.
In the present invention, since necessary information is extracted by determining that the dialog state has transitioned in the order of the information request dialog state, the information disclosure dialog state, and the information reception dialog state, the necessary information can be accurately extracted. it can.

他の発明は、記憶手段と、処理手段と、入力手段と、出力手段を備えている。
記憶手段は、必要情報データベースと、対話状態判別情報データベースを有している。必要情報データベース、対話状態判別情報データベースとしては、前述した一つの発明と同様のものを用いることができる。
処理手段は、音声認識手段と、用件判別手段と、対話状態遷移判別手段と、管理手段を有している。音声認識手段、対話状態遷移判別手段、管理手段としては、前述した一つの発明と同様の音声認識手段、対話状態遷移判別手段、管理手段を用いることができる。本発明では、用件判別手段は、入力手段から入力された用件識別情報に基づいて、話者間の対話の用件を判別する。用件識別情報を入力する方法としては、例えば、入力手段に設けられている入力キーを操作する方法や入力手段に設けられている入力画面に表示されている入力部を選択する方法を用いることができる。用件識別情報は、音声信号とともに入力するのが好ましいが、音声信号と異なるタイミングで入力してもよい。
本発明では、対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出しているため、必要情報を精度よく抽出することができる。また、本発明では、入力手段から入力される用件識別情報に基づいて用件を判別するため、話者間の対話の用件を容易に判別することができる。 Another invention includes a storage means, a processing means, an input means, and an output means.
The storage means has a necessary information database and a dialog state determination information database. As the necessary information database and the dialog state determination information database, those similar to the one invention described above can be used.
The processing means includes voice recognition means, requirement determination means, dialog state transition determination means, and management means. As the speech recognition means, the dialog state transition determination means, and the management means, the same speech recognition means, dialog state transition determination means, and management means as those of the one invention described above can be used. In the present invention, the business item discrimination means discriminates a conversational requirement between speakers based on the business item identification information input from the input unit. As a method for inputting the item identification information, for example, a method of operating an input key provided in the input unit or a method of selecting an input unit displayed on the input screen provided in the input unit is used. Can do. The requirement identification information is preferably input together with the audio signal, but may be input at a timing different from that of the audio signal.
In the present invention, since necessary information is extracted by determining that the dialog state has transitioned in the order of the information request dialog state, the information disclosure dialog state, and the information reception dialog state, the necessary information can be accurately extracted. it can. Further, in the present invention, since the requirement is determined based on the requirement identification information input from the input means, the requirement for dialogue between the speakers can be easily determined.

一つの発明あるいは他の発明の異なる形態では、対話状態判別情報データベースには、さらに、必要情報の要求を受理した要求受理対話状態であることを判別するための要求受理手掛かり情報が記憶されている。そして、対話状態遷移判別手段は、音声認識情報に基づいて、話者間の対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって、用件判別手段で判別した用件に対応する必要情報リストで示されている各必要情報を抽出する。話者間の対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことは、音声認識情報に、対話状態判別情報データベースに記憶されている情報要求手掛かり情報、要求受理手掛かり情報、情報開示手掛かり情報、情報受理手掛かり情報が順に含まれていることによって判別する。
本形態では、対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出しているため、必要情報を精度よく抽出することができる。 In one aspect of the invention or another aspect of the other invention, the request state clue information for determining that the request state is a request reception dialog state that has received a request for necessary information is stored in the dialog state determination information database. . Then, the dialog state transition determining means determines based on the voice recognition information that the dialog state between the speakers has transitioned in the order of the information request dialog state, the request reception dialog state, the information disclosure dialog state, and the information reception dialog state. As a result, each necessary information indicated in the necessary information list corresponding to the business determined by the business determining means is extracted. Information stored in the dialog state determination information database is the voice recognition information that the dialog state between the speakers transitions in the order of the information request dialog state, the request reception dialog state, the information disclosure dialog state, and the information reception dialog state. Determination is made by sequentially including request clue information, request acceptance clue information, information disclosure clue information, and information acceptance clue information.
In this mode, since necessary information is extracted by determining that the dialog state has transitioned in the order of information request dialog state, request reception dialog state, information disclosure dialog state, information reception dialog state, the necessary information is accurately Can be extracted well.

一つの発明あるいは他の発明の他の異なる形態では、記憶手段は、さらに、会話単位判別情報データベースと必要情報手掛かり情報データベースを有し、処理手段は、さらに、会話単位分割手段と必要情報手掛かり情報付与手段を有している。
会話単位判別情報データベースには、音声認識情報を会話単位に分割するための会話単位判別情報が記憶されている。「会話単位」は、一つの話題に関する範囲を意味する。会話単位判別情報データベースには、典型的には、会話単位の始点を判別するための会話単位始点判別情報と、会話単位の終点を判別するための会話単位終点判別情報を含む会話単位判別情報が記憶される。必要情報手掛かり情報データベースには、必要情報を抽出するための手掛かりとなる必要情報手掛かり情報を示す必要情報手掛かり情報リストが、用件に対応して記憶されている。必要情報手掛かり情報は、必要情報に対応して適宜設定することができるが、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報（典型的には、必要情報として抽出される文字列と同じ品詞を有する必要情報手掛かり情報）が設定される。例えば、必要情報が［契約者名］である場合には、少なくとも、人名を示す文字列に対応する必要情報手掛かり情報（品詞［人名］を有する必要情報手掛かり情報）を設定する。また、必要情報が［契約者住所］である場合には、少なくとも、住所を示す文字列に対する必要情報手掛かり情報（品詞［地名］を有する必要情報手掛かり情報）を設定する。また、必要情報が［契約者電話番号］である場合には、少なくとも、数字を組み合わせた文字列に対する必要情報手掛かり情報（品詞［数詞］を有する必要情報手掛かり情報）を設定する。なお、必要情報手掛かり情報データベースと必要情報データベースを一つのデータベースとして構成することもできる。例えば、用件に対応させて必要情報を記憶させるとともに、各必要情報に対応させて必要情報手掛かり情報を記憶させる。
会話単位分割手段は、会話単位判別情報データベースに記憶されている会話単位判別情報に基づいて、音声認識情報を会話単位に分割する。会話単位分割手段は、典型的には、音声認識情報と会話単位判別情報データベースに記憶されている会話単位始点判別情報および会話単位終点判別情報に基づいて会話単位の始点および終点を判別し、会話単位の始点と、会話単位の始点に後続する会話単位の終点の間を会話単位とする。例えば、会話単位始点判別情報が存在する箇所を会話単位の始点とし、会話単位終点判別情報が存在する箇所を会話単位の終点とする。なお、２つの会話単位始点判別情報の間に会話単位終点判別情報が存在しない場合には、後続する会話単位始点判別情報が存在する箇所の直前の箇所を、先行する会話単位の終点とし、後続する会話単位始点判別情報が存在する箇所を、後続する会話単位の始点とする。また、２つの会話単位終点判別情報の間に会話単位始点判別情報が存在しない場合には、先行する会話単位終点判別情報が存在する箇所を、先行する会話単位の終点とし、先行する会話単位終点判別情報が存在する箇所の直後の箇所を、後続する会話単位の始点とする。
必要情報手掛かり情報付与手段は、会話単位分割手段により分割された会話単位に必要情報手掛かり情報を付与する。具体的には、会話単位に、必要情報を抽出する手掛かりとなる必要情報手掛かり情報が含まれていれば、当該必要情報手掛かり情報を当該会話単位に付与する。
対話状態遷移判別手段は、用件に対応する必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含んでいる会話単位を選択する。必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を選択する方法としては、必要情報に対応する必要情報手掛かり情報を一つ含む会話単位を選択する方法や、適宜選択した複数の必要情報手掛かり情報を含む会話単位を選択する方法を用いることができる。また、会話単位に含まれる必要情報手掛かり情報を適宜変更しながら会話単位を選択する方法を用いることもできる。なお、会話単位を選択する際には、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報（典型的には、必要情報として抽出される文字列と同じ品詞を有する必要情報手掛かり情報）を含む会話単位を選択する。例えば、必要情報が［契約者名］である場合には、人名を示す文字列（品詞［人名］を有する文字列）を含んでいる会話単位を選択し、必要情報が［契約者住所］である場合には、住所を示す文字列（品詞［地名］を有する文字列）を含んでいる会話単位を選択し、必要情報が［契約者電話番号］である場合には、数字を組み合わせた文字列（品詞［数詞］を有する文字列）を含んでいる会話単位を選択する。そして、選択した会話単位に含まれている音声認識情報に基づいて対話状態の遷移を判別することによって、必要情報を抽出する。
本形態では、音声認識情報を会話単位に分割することにより、必要情報をより精度よく抽出することができる。また、必要情報手掛かり情報を会話単位に付与することにより、必要情報を効率よく抽出することができる。 In another aspect of the invention, the storage means further includes a conversation unit discrimination information database and a necessary information clue information database, and the processing means further includes a conversation unit division means and a necessary information clue information. It has a giving means.
The conversation unit determination information database stores conversation unit determination information for dividing the speech recognition information into conversation units. “Conversation unit” means a range related to one topic. The conversation unit determination information database typically includes conversation unit determination information including conversation unit start point determination information for determining the start point of the conversation unit and conversation unit end point determination information for determining the end point of the conversation unit. Remembered. In the necessary information clue information database, a necessary information clue information list indicating necessary information clue information that is a clue for extracting necessary information is stored in correspondence with the requirements. Necessary information clue information can be appropriately set corresponding to the necessary information, but at least the necessary information clue information corresponding to the character string extracted as necessary information (typically, characters extracted as necessary information) Necessary information clue information) having the same part of speech as the column is set. For example, when the necessary information is [contractor name], at least necessary information clue information (necessary information clue information having a part of speech [person name]) corresponding to a character string indicating a person name is set. Further, when the necessary information is [contractor address], at least necessary information clue information (necessary information clue information having a part-of-speech [location name]) for the character string indicating the address is set. When the necessary information is [contractor telephone number], at least necessary information clue information (necessary information clue information having a part of speech [numerical number]) for a character string combining numbers is set. The necessary information clue information database and the necessary information database can be configured as one database. For example, necessary information is stored in correspondence with the requirements, and necessary information clue information is stored in correspondence with each necessary information.
The conversation unit dividing means divides the speech recognition information into conversation units based on the conversation unit determination information stored in the conversation unit determination information database. The conversation unit dividing means typically determines the start and end points of the conversation unit based on the speech recognition information and the conversation unit start point determination information and the conversation unit end point determination information stored in the conversation unit determination information database. A conversation unit is defined between the start point of the unit and the end point of the conversation unit following the start point of the conversation unit. For example, a location where the conversation unit start point determination information exists is set as the start point of the conversation unit, and a location where the conversation unit end point determination information exists is set as the end point of the conversation unit. If there is no conversation unit end point determination information between two conversation unit start point determination information, the point immediately before the point where the subsequent conversation unit start point determination information exists is set as the end point of the preceding conversation unit, and the subsequent A location where there is conversation unit start point determination information to be performed is set as the start point of the subsequent conversation unit. In addition, when there is no conversation unit start point determination information between two conversation unit end point determination information, the location where the preceding conversation unit end point determination information exists is set as the end point of the preceding conversation unit, and the preceding conversation unit end point The location immediately after the location where the discrimination information exists is set as the start point of the subsequent conversation unit.
The necessary information clue information giving means gives the necessary information clue information to the conversation unit divided by the conversation unit dividing means. Specifically, if necessary information clue information that is a clue to extract necessary information is included in the conversation unit, the necessary information clue information is given to the conversation unit.
The dialogue state transition determination unit selects a conversation unit including necessary information clue information that is a clue to extract necessary information corresponding to the requirement. As a method of selecting a conversation unit including necessary information clue information as a clue to extract necessary information, a method of selecting a conversation unit including one necessary information clue information corresponding to the necessary information, or a plurality of appropriately selected needs A method of selecting a conversation unit including information clue information can be used. Also, a method of selecting a conversation unit while appropriately changing necessary information clue information included in the conversation unit can be used. When selecting a conversation unit, at least necessary information clue information corresponding to a character string extracted as necessary information (typically, a necessary information clue having the same part of speech as the character string extracted as necessary information). Information unit). For example, when the necessary information is [contractor name], a conversation unit including a character string indicating a person name (a character string having a part of speech [person name]) is selected, and the necessary information is [contractor address]. In some cases, select a conversation unit that contains a character string that indicates an address (a character string that has a part of speech [place name]), and if the required information is [contractor telephone number], a combination of numbers. Select a conversation unit containing a string (a character string with part of speech [numerical]). Then, necessary information is extracted by determining the transition of the conversation state based on the speech recognition information included in the selected conversation unit.
In this embodiment, necessary information can be extracted with higher accuracy by dividing speech recognition information into conversation units. In addition, the necessary information can be efficiently extracted by giving the necessary information clue information to the conversation unit.

さらに他の発明は、コンピュータに、前記した処理手段の処理を実行させるためのプログラムあるいはプログラムが記憶された記憶媒体である。
本発明のプログラムあるいは記憶媒体を用いることにより、前述した効果を得ることができる。 Still another invention is a storage medium storing a program or a program for causing a computer to execute the processing of the processing means described above.
By using the program or the storage medium of the present invention, the effects described above can be obtained.

本発明の音声信号処理装置では、対話状態の遷移を判別することで、用件に関する話者間の対話時の音声信号から用件に係る必要情報を抽出しているため、用件に係る必要情報を精度よく抽出することができる。
本発明では、 In the audio signal processing device of the present invention, necessary information related to the requirement is extracted from the audio signal during the dialogue between the speakers related to the requirement by determining the transition of the dialogue state. Information can be extracted with high accuracy.
In the present invention,

本発明の実施の形態の概略構成図である。It is a schematic block diagram of embodiment of this invention. 会話単位判別情報データベースの一例を示す図である。It is a figure which shows an example of a conversation unit discrimination | determination information database. 必要情報リストの一例を示す図である。It is a figure which shows an example of a required information list. 必要情報リストの他の例を示す図である。It is a figure which shows the other example of a required information list. 必要情報リストのさらに他の例を示す図である。It is a figure which shows the further another example of a required information list. 必要情報手掛かり情報リストの一例を示す図である。It is a figure which shows an example of a required information clue information list. 対話状態判別情報データベースの一例を示す図である。It is a figure which shows an example of a dialogue state discrimination | determination information database. 本発明の実施の形態の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of embodiment of this invention. 話者間の対話時の音声信号を音声認識した音声認識情報の一例を示す図である。It is a figure which shows an example of the speech recognition information which recognized the speech signal at the time of the dialog between speakers. 話者間の対話時の音声信号を音声認識した音声認識情報の一例を示す図である。It is a figure which shows an example of the speech recognition information which recognized the speech signal at the time of the dialog between speakers. 話者間の対話状態の遷移を判別することにより必要情報を抽出する方法の概要を示す図である。It is a figure which shows the outline | summary of the method of extracting required information by discriminating the transition of the dialogue state between speakers. 話者間の対話状態の遷移を判別することにより必要情報を抽出する方法の一例を示すである。It is an example of a method for extracting necessary information by discriminating a transition of a conversation state between speakers. 話者間の対話状態の遷移を判別することにより必要情報を抽出する他の例を示すである。It is another example in which necessary information is extracted by discriminating the transition of dialogue state between speakers. 話者間の対話状態の遷移を判別することにより必要情報を抽出するさらに他の例を示すである。FIG. 10 shows still another example in which necessary information is extracted by discriminating a transition of a conversation state between speakers.

以下に、本願発明の音声信号処理装置の実施の形態を、図面を参照して説明する。
図１に、本発明の音声信号処理装置の第１の実施の形態の概略構成図が示されている。第１の実施の形態の音声信号処理装置は、一方の話者から他方の話者に、予め定められている用件を依頼する際における話者間の対話時の音声信号から、依頼された用件を遂行するために必要な必要情報を抽出する音声信号処理装置として好適に用いることができる。本明細書において、「用件」は、一方の話者から他方の話者に依頼する内容（例えば、サービスの内容）を意味する。
本実施の形態の音声信号処理装置は、処理手段１０、記憶手段２０、入力手段３０、出力手段４０を備えている。
入力手段３０としては、音声を電気信号である音声信号に変換するマイクロフォン、記憶媒体に記憶されている音声信号を読み取る読取手段、入力キー、入力画面等の種々の公知の入力手段を用いることができる。「音声信号」は、音声を電気信号に変換するマイクロフォン等から出力される信号を意味する。なお、「音声信号」には、マイクロフォン等から出力された信号をデジタル化した信号も含まれる。
出力手段４０としては、表示手段や印刷手段等の公知の種々の出力手段を用いることができる。 Embodiments of an audio signal processing device according to the present invention will be described below with reference to the drawings.
FIG. 1 shows a schematic configuration diagram of a first embodiment of an audio signal processing device of the present invention. The audio signal processing apparatus according to the first embodiment is requested from an audio signal at the time of dialogue between speakers when requesting a predetermined requirement from one speaker to the other speaker. The present invention can be suitably used as an audio signal processing apparatus that extracts necessary information necessary for performing a requirement. In this specification, “business” means content (for example, service content) requested from one speaker to the other speaker.
The audio signal processing apparatus according to the present embodiment includes processing means 10, storage means 20, input means 30, and output means 40.
As the input means 30, various known input means such as a microphone that converts sound into an audio signal that is an electric signal, a reading means that reads an audio signal stored in a storage medium, an input key, an input screen, and the like are used. it can. “Audio signal” means a signal output from a microphone or the like that converts audio into an electric signal. The “audio signal” includes a signal obtained by digitizing a signal output from a microphone or the like.
As the output means 40, various known output means such as a display means and a printing means can be used.

記憶手段２０は、音声信号データベース２１、会話単位判別情報データベース２２、必要情報データベース２３、必要情報手掛かり情報データベース２４、対話状態判別情報データベース２５を有している。 The storage means 20 includes an audio signal database 21, a conversation unit determination information database 22, a necessary information database 23, a necessary information clue information database 24, and a dialog state determination information database 25.

音声信号データベース２１には、入力手段３０を介して入力された、用件に対する話者間の対話時の音声信号が記憶される。なお、本実施の形態では、後述するように、音声信号から作成された、テキスト情報を含む音声認識情報に基づいて必要情報を抽出する。このため、音声信号を記憶する音声信号データベース２１に代えて、音声認識情報データベース２６を用いることもできる。この場合、入力手段３０を介して入力された音声信号は音声認識手段１２により音声認識情報に変換され、変換された音声認識情報が音声認識情報データベース２６に記憶される。あるいは、入力手段３０を介して入力された音声信号を音声信号データベース２１に記憶し、音声信号データベース２１に記憶されている音声信号から作成された音声認識情報を音声認識情報データベース２６に記憶させてもよい。あるいは、入力手段３０を介して入力された音声認識情報を音声認識情報データベース２６に記憶させてもよい。すなわち、音声信号から作成された音声認識情報を利用可能であればよい。 The voice signal database 21 stores voice signals that are input via the input means 30 and that are used when a speaker interacts with a message. In this embodiment, as will be described later, necessary information is extracted based on speech recognition information including text information created from a speech signal. Therefore, the voice recognition information database 26 can be used in place of the voice signal database 21 that stores the voice signal. In this case, the voice signal input through the input unit 30 is converted into voice recognition information by the voice recognition unit 12, and the converted voice recognition information is stored in the voice recognition information database 26. Alternatively, the voice signal input via the input means 30 is stored in the voice signal database 21, and the voice recognition information created from the voice signal stored in the voice signal database 21 is stored in the voice recognition information database 26. Also good. Alternatively, the voice recognition information input via the input unit 30 may be stored in the voice recognition information database 26. That is, it is only necessary that the voice recognition information created from the voice signal can be used.

会話単位判別情報データベース２２には、音声認識情報を会話単位に分割するための会話単位判別情報が記憶されている。「会話単位]は、一つの話題に関する範囲を意味する。
図２に、会話単位判別情報データベース２２の一例が示されている。図２に示されている会話単位判別情報データベース２２では、会話単位判別情報として、会話単位の始点を示す会話単位始点判別情報と、会話単位の終点を示す会話単位終点判別情報が記憶されている。会話単位始点判別情報としては、会話単位の初めに用いられることが多い表現が用いられる。例えば、呼び掛ける時の表現「もしもし」、始めの挨拶の表現「おはようございます」、話題を変更する時の表現「では」、「それでは」および「次に」等が用いられる。会話単位終点判別情報としては、会話単位の終わりに用いられることが多い表現が用いられる。例えば、お礼の表現「ありがとうございました」、確認の表現「かしこまりました」および「承りました」、終わりの挨拶の表現「失礼いたします」等が用いられる。 The conversation unit determination information database 22 stores conversation unit determination information for dividing the speech recognition information into conversation units. “Conversation unit” means a range related to one topic.
FIG. 2 shows an example of the conversation unit discrimination information database 22. In the conversation unit determination information database 22 shown in FIG. 2, conversation unit start point determination information indicating the start point of the conversation unit and conversation unit end point determination information indicating the end point of the conversation unit are stored as the conversation unit determination information. . As the conversation unit start point determination information, an expression that is often used at the beginning of a conversation unit is used. For example, the expression “Moshimoshi” at the time of calling, the expression “Good morning” at the beginning greeting, the expressions “At”, “Now”, “Next”, etc. at the time of changing the topic are used. As the conversation unit end point determination information, an expression that is often used at the end of the conversation unit is used. For example, expressions of thanks “Thank you”, expressions of confirmation “Kashikomare” and “Accepted”, expressions of closing greetings “I ’m sorry”, etc. are used.

必要情報データベース２３には、話者間の対話時の音声信号（音声認識情報）から抽出する必要情報が記憶されている。話者間の対話時の音声信号から抽出する必要情報は、一方の話者から他方の話者に依頼する用件に応じて異なることが多い。このため、必要情報データベース２３には、話者間の対話時の音声信号から抽出する必要情報を示している必要情報リストが用件に対応して記憶されている。勿論、複数の用件に対して共通の必要情報を抽出する場合には、共通の必要情報リストを、複数の用件に対応させて記憶させてもよい。
図３〜図５に、顧客から電力会社に用件（サービス）を依頼する際における顧客と電力会社の受付者（オペレータ）との間の対話時の音声信号から抽出する必要情報を示している必要情報リストの一例が示されている。図３〜図５には、スロット番号と、スロット番号に対応する必要情報（スロット番号のスロットに挿入される必要情報）が示されている。図３に示されている必要情報リストは、顧客から電力会社に［電気の使用］を依頼する（用件：［電気の使用開始］）際に、音声信号から抽出する必要情報を示している。図４に示されている必要情報リストは、顧客から電力会社に［電気の停止］を依頼する（用件：［電気の使用停止］）際に、音声信号から抽出する必要情報を示している。図５に示されている必要情報リストは、顧客から電力会社に［電気の契約容量の変更］を依頼する（用件：［電気の契約容量の変更］）際に、音声信号から抽出する必要情報を示している。 The necessary information database 23 stores necessary information extracted from speech signals (speech recognition information) during conversation between speakers. Necessary information to be extracted from a speech signal at the time of dialogue between speakers is often different depending on a request from one speaker to the other speaker. For this reason, in the necessary information database 23, a necessary information list indicating necessary information extracted from the voice signal at the time of dialogue between the speakers is stored corresponding to the requirements. Of course, when common necessary information is extracted for a plurality of requirements, a common necessary information list may be stored corresponding to the plurality of requirements.
FIGS. 3 to 5 show necessary information extracted from a voice signal at the time of a dialogue between a customer and a power company receptionist (operator) when a customer requests a service (service) from the customer. An example of the necessary information list is shown. 3 to 5 show a slot number and necessary information corresponding to the slot number (necessary information inserted into the slot having the slot number). The necessary information list shown in FIG. 3 indicates necessary information to be extracted from the audio signal when the customer requests [use of electricity] from the power company to the power company (needs: [use of electricity]). . The necessary information list shown in FIG. 4 indicates necessary information to be extracted from the audio signal when the customer requests the electric power company to [stop electricity] (necessary: [stop electricity use]). . The necessary information list shown in FIG. 5 is required to be extracted from the audio signal when the customer requests the power company to [Change Contracted Electricity Capacity] (Requirement: [Change Contracted Electricity Capacity]). Information is shown.

必要情報手掛かり情報データベース２４には、話者間の対話時の音声信号（音声認識情報）から必要情報を抽出する際の手掛かりとなる必要情報手掛かり情報が記憶されている。本実施の形態では、必要情報手掛かり情報データベース２４には、必要情報に対応する必要情報手掛かり情報を示す必要情報手掛かり情報リストが用件に対応して記憶されている。必要情報手掛かり情報は、必要情報に対応して（用件に対応する必要情報に対応して）適宜設定することができるが、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報（典型的には、必要情報として抽出される文字列と同じ品詞を有する必要情報手掛かり情報）を設定する。例えば、必要情報が［契約者名］である場合には、少なくとも、人名を示す文字列に対応する必要情報手掛かり情報（品詞［人名］を有する必要情報手掛かり情報）を設定する。また、必要情報が［契約者住所］である場合には、少なくとも、住所を示す文字列に対する必要情報手掛かり情報（品詞［地名］を有する必要情報手掛かり情報）を設定する。また、必要情報が［契約者電話番号］である場合には、少なくとも、数字を組み合わせた文字列に対する必要情報手掛かり情報（品詞［数詞］を有する必要情報手掛かり情報）を設定する。
図６に、顧客から［電気の停止］を依頼する（用件：［電気の使用停止］）際に、音声信号から必要情報を抽出する手掛かりとなる必要情報手掛かり情報を示す必要情報手掛かり情報リストの一例が示されている。なお、図６において、必要情報手掛かり情報［品詞：○○］は、文字列の品詞種別（品種情報）が○○である（例えば、人の名前を表す［人名］、住所を表す［地名］、数字の組み合わせを表す［数詞］）ことを表し、それ以外の必要情報手掛かり情報は、文字列を表している。
図６に示されている必要情報手掛かり情報リストには、例えば、必要情報［契約者名］に対応して、必要情報［契約者名］を抽出する手掛かりとなる必要情報手掛かり情報［契約者］、［お客様］、［名前］、［品詞：人名］が記憶されている。また、必要情報［契約者住所］に対応させて、必要情報［契約者住所］を抽出する手掛かりとなる必要情報手掛かり情報［契約者］、［お客様］、［住所］、［品詞：地名］が記憶されている。また、必要情報［契約者電話番号］に対応させて、必要情報［契約者電話番号］を抽出する手掛かりとなる必要情報手掛かり情報［契約者］、［お客様］、［電話］、［携帯］、［番号］、［品詞：数詞］が記憶されている。
必要情報手掛かり情報は、音声信号（音声認識情報）から必要情報を抽出する際に、会話単位を選択する処理で用いられる（詳しくは後述する）。 The necessary information clue information database 24 stores necessary information clue information that serves as a clue when extracting necessary information from voice signals (voice recognition information) during conversation between speakers. In the present embodiment, the necessary information clue information database 24 stores a necessary information clue information list indicating necessary information clue information corresponding to the necessary information corresponding to the requirement. Necessary information clue information can be set as appropriate corresponding to the necessary information (corresponding to the necessary information corresponding to the requirement), but at least the necessary information clue information corresponding to the character string extracted as the necessary information (Typically, necessary information clue information having the same part of speech as the character string extracted as necessary information) is set. For example, when the necessary information is [contractor name], at least necessary information clue information (necessary information clue information having a part of speech [person name]) corresponding to a character string indicating a person name is set. Further, when the necessary information is [contractor address], at least necessary information clue information (necessary information clue information having a part-of-speech [location name]) for the character string indicating the address is set. When the necessary information is [contractor telephone number], at least necessary information clue information (necessary information clue information having a part of speech [numerical number]) for a character string combining numbers is set.
FIG. 6 shows a necessary information clue information list indicating necessary information clue information used as a clue to extract necessary information from an audio signal when a customer requests [stop electricity] (requirement: [stop electricity use]). An example is shown. In FIG. 6, the necessary information clue information [part of speech: XX] has a part of speech type (product type information) of XX (for example, [person name] representing a person's name, [location name] representing an address) Represents a combination of numbers [numerical]), and other necessary information clue information represents a character string.
The necessary information clue information list shown in FIG. 6 includes, for example, necessary information clue information [contractor] that is a clue to extract necessary information [contractor name] corresponding to the necessary information [contractor name]. , [Customer], [name], [part of speech: personal name] are stored. In addition, necessary information [contractor address] corresponding to the necessary information [contractor address] is required information [contractor], [customer], [address], [part of speech: place name] It is remembered. Also, required information [contractor], [customer], [telephone], [mobile phone], necessary information [contractor phone number] corresponding to the necessary information [contractor phone number] [Number] and [Part of speech: Numeral] are stored.
Necessary information clue information is used in a process of selecting a conversation unit when extracting necessary information from a voice signal (voice recognition information) (details will be described later).

対話状態判別情報データベース２５には、話者間の対話状態を判別するための対話状態判別情報が記憶されている。本実施の形態では、対話状態判別情報として、必要情報を要求した対話状態（「情報要求対話状態」という）であることを判別するための情報要求手掛かり情報２５ａ、必要情報の要求を受理した対話状態（「要求受理対話状態」という）であることを判別するための要求受理手掛かり情報２５ｂ、必要情報を開示した対話状態（「情報開示対話状態」という）であることを判別するための情報開示手掛かり情報２５ｃ、必要情報を受理した対話状態（「情報受理対話状態」という）であることを判別するための情報受理手掛かり情報２５ｄが記憶されている。
図７に、対話状態判別情報データベース２５の一例が示されている。図７に示されている対話状態判別情報データベース２５には、情報要求手掛かり情報として、必要情報の開示を要求する表現「［（必要情報）］をお願いします」等が記憶され、要求受理手掛かり情報として、必要情報の要求を受理したことを表す表現「わかりました」等が記憶され、情報開示手掛かり情報として、必要情報を表す表現「人名」、「地名」や「数詞」等が記憶され、情報受理手掛かり情報として、開示された必要情報を受理したことを表す表現「ありがとうございます」や「［（必要情報）］ですね」等が記憶されている。
なお、図７に記載されている［（必要情報）］は、必要情報を抽出する手掛かりとなる必要情報抽出情報（図６参照）のいずれか一つを含む文字列あるいは必要情報そのものを表す。例えば、必要情報を要求する対話（［情報要求対話状態］）では、「お客様番号」、「お引越し先のご住所」等の文字列が用いられ、必要情報を開示する対話（［情報開示対話状態］）では、お客様番号そのもの、引越し先の住所そのもの等が用いられる。 The dialog state determination information database 25 stores dialog state determination information for determining a dialog state between speakers. In the present embodiment, as the dialogue state discrimination information, information request clue information 25a for discriminating that it is a dialogue state in which required information is requested (referred to as “information request dialogue state”), and a dialogue in which a request for necessary information has been received. Request acceptance clue information 25b for determining that the state is a state (referred to as "request acceptance dialog state"), information disclosure for determining whether the state is a dialog state disclosing necessary information (referred to as "information disclosure dialog state") The clue information 25c and information acceptance clue information 25d for determining that the dialogue state has received the necessary information (referred to as “information acceptance dialogue state”) are stored.
FIG. 7 shows an example of the dialogue state determination information database 25. The dialogue state determination information database 25 shown in FIG. 7 stores, as information request clue information, an expression “[(required information)] please request disclosure of necessary information”, and the request acceptance clue. Information such as the expression “I understand” that the request for the necessary information has been accepted is stored, and the expressions “person name”, “place name”, “numerical”, etc. that represent the necessary information are stored as information disclosure clue information As the information acceptance clue information, expressions such as “Thank you” and “[(Necessary information)]” indicating that the disclosed necessary information has been received are stored.
Note that [(necessary information)] illustrated in FIG. 7 represents a character string including any one of necessary information extraction information (see FIG. 6) that is a key for extracting necessary information or the necessary information itself. For example, in a dialog requesting necessary information ([Information Request Dialog State]), character strings such as “customer number” and “address of moving address” are used, and a dialog to disclose necessary information ([Information Disclosure Dialogue] Status]), the customer number itself, the address of the new address, etc. are used.

処理手段１０は、管理手段１１と必要情報抽出手段１２を有している。
管理手段１１は、音声信号処理装置の全体の処理を管理する。例えば、入力手段３０を介して音声信号を入力する処理、入力された音声信号を音声信号データベース２１に記憶させる処理、音声信号から必要情報を抽出するために必要情報抽出手段１２を作動させる処理、抽出した必要情報を出力手段４０から出力する処理等を実行する。なお、管理手段１１の処理を必要情報抽出手段１２で実行させることもできる。この場合には、管理手段１１を省略することができる。 The processing means 10 has a management means 11 and necessary information extraction means 12.
The management unit 11 manages the overall processing of the audio signal processing apparatus. For example, a process of inputting an audio signal via the input unit 30, a process of storing the input audio signal in the audio signal database 21, a process of operating the necessary information extracting unit 12 to extract necessary information from the audio signal, A process for outputting the extracted necessary information from the output means 40 is executed. The process of the management unit 11 can be executed by the necessary information extraction unit 12. In this case, the management means 11 can be omitted.

必要情報抽出手段１２は、音声認識手段１３、会話単位分割手段１４、用件判別手段１５、必要情報手掛かり情報付与手段１６、対話状態遷移判別手段１７により構成されている。本実施の形態では、各手段の処理を共通の処理装置によって実行するように構成しているが、各手段の処理を別々の処理装置で実行するように構成することもできる。また、各手段は、ＬＡＮ、電話回線、インターネット回線等の通信回線を介して接続するように構成されていてもよい。 The necessary information extracting unit 12 includes a voice recognition unit 13, a conversation unit dividing unit 14, a requirement determining unit 15, a necessary information clue information providing unit 16, and a dialog state transition determining unit 17. In the present embodiment, the processing of each unit is configured to be executed by a common processing device, but the processing of each unit may be configured to be executed by a separate processing device. Each means may be configured to be connected via a communication line such as a LAN, a telephone line, and an Internet line.

音声認識手段１３は、入力手段３０を介して入力された、話者間の対話時の音声信号を、テキスト情報を含む音声認識情報に変換する。音声信号を音声認識情報に変換する方法としては、公知の種々の音声認識方法を用いることができる。例えば、ＨＭＭ（隠れマルコフモデル）とＮグラム（確率的言語モデル）を用いた大語彙連続音声認識方法を用いることができる。
なお、「入力手段３０を介して入力された音声信号」には、入力手段３０を介して入力された音声信号そのものだけでなく、入力手段３０を介して入力された後音声信号データベース２１に記憶されている音声信号も含まれる。
本実施の形態では、音声認識手段１３は、品詞情報が付与されている音声認識情報を出力する。例えば、音声認識結果（音声認識情報）に、品詞情報を有する文字列が含まれている場合（音声認識手段１３で用いる認識語彙辞書に品詞情報が含まれている場合）には、当該品詞情報を当該文字列に対して付与する。音声認識結果（音声認識情報）に、品詞情報を有していない文字列が含まれている場合には、音声認識結果に含まれているテキスト情報に対して形態素解析を行うことによって当該文字列に品詞情報を付与する。音声認識情報に付与された品詞情報は、後述する必要情報手掛かり情報付与手段１６が会話単位に必要情報手掛かり情報を付与する際に用いられる。
また、音声信号を音声認識情報に変換する音声認識手段１３は、処理手段１０内に設けられていなくてもよい。例えば、処理手段１０が設けられているコンピュータとは別のコンピュータに設けられている音声認識手段によって音声信号を音声認識情報に変換してもよい。この場合には、別のコンピュータに設けられている音声認識手段から出力される音声認識情報が入力手段３０を介して入力され、音声認識情報データベース２６に記憶される。本願発明の「音声認識手段は、入力手段から入力された話者間の対話時の音声信号を音声認識して音声認識情報を出力する」構成には、処理手段１０が設けられているコンピュータとは別のコンピュータに設けられている音声認識手段を用いて音声信号を音声認識情報に変換し、変換した音声認識情報を入力手段３０から入力する」構成も含まれる。 The voice recognition means 13 converts the voice signal at the time of dialogue between the speakers, which is input via the input means 30, into voice recognition information including text information. Various known voice recognition methods can be used as a method for converting the voice signal into voice recognition information. For example, a large vocabulary continuous speech recognition method using HMM (Hidden Markov Model) and N-gram (Stochastic Language Model) can be used.
Note that “the audio signal input via the input unit 30” is stored in the audio signal database 21 after being input via the input unit 30 as well as the audio signal itself input via the input unit 30. Also included are audio signals.
In the present embodiment, the speech recognition means 13 outputs speech recognition information to which part of speech information is assigned. For example, when a speech recognition result (speech recognition information) includes a character string having part of speech information (when part of speech information is included in the recognition vocabulary dictionary used by the speech recognition means 13), the part of speech information Is added to the character string. If the speech recognition result (speech recognition information) includes a character string that does not have part-of-speech information, the character string is obtained by performing morphological analysis on the text information included in the speech recognition result. Part of speech information is given to. The part-of-speech information given to the speech recognition information is used when necessary information clue information giving means 16 to be described later gives necessary information clue information to a conversation unit.
Further, the voice recognition means 13 for converting the voice signal into the voice recognition information may not be provided in the processing means 10. For example, the speech signal may be converted into speech recognition information by speech recognition means provided in a computer different from the computer provided with the processing means 10. In this case, speech recognition information output from speech recognition means provided in another computer is input via the input means 30 and stored in the speech recognition information database 26. In the configuration of the present invention, “the speech recognition means recognizes the speech signal at the time of the dialogue between the speakers inputted from the input means and outputs the speech recognition information”, the computer provided with the processing means 10 and Includes a configuration in which voice signals are converted into voice recognition information using voice recognition means provided in another computer, and the converted voice recognition information is input from the input means 30.

会話単位分割手段１４は、会話単位判別情報データベース２２に記憶されている会話単位判別情報に基づいて、音声認識情報を会話単位に分割する。本実施の形態では、会話単位判別情報データベース２２には、会話単位始点判別情報と会話単位終点判別情報を含む会話単位判別情報が記憶されている。そして、会話単位分割手段１４は、会話単位判別情報データベースに記憶されている会話単位始点情報と会話単位終点情報に基づいて会話単位の始点と終点を判別することによって、音声認識情報を会話単位に分割している。例えば、会話単位始点判別情報が存在する箇所を会話単位の始点とし、会話単位終点判別情報が存在する箇所を会話単位の終点とする。なお、通常の会話においては、会話単位終点判別情報を発することなく会話単位始点判別情報を発し、あるいは、会話単位終点情報を発した後、会話単位始点判別情報を発することなく会話単位始点情報を発することがある。このため、２つの会話単位始点判別情報の間に会話単位終点判別情報が存在しない場合には、後続する会話単位始点判別情報が存在する箇所の直前の箇所を、先行する会話単位の終点とし、後続する会話単位始点判別情報が存在する箇所を、後続する会話単位の始点とする。また、２つの会話単位終点判別情報の間に会話単位始点判別情報が存在しない場合には、先行する会話単位終点判別情報が存在する箇所を、先行する会話単位の終点とし、先行する会話単位終点判別情報が存在する箇所の直後の箇所を、後続する会話単位の始点とする。なお、会話単位の始点や終点を判別する際には、会話単位判別情報そのものだけでなく、会話単位判別情報の変化態様も考慮する。 The conversation unit dividing means 14 divides the speech recognition information into conversation units based on the conversation unit determination information stored in the conversation unit determination information database 22. In the present embodiment, the conversation unit determination information database 22 stores conversation unit determination information including conversation unit start point determination information and conversation unit end point determination information. Then, the conversation unit dividing means 14 discriminates the voice recognition information for each conversation unit by discriminating the start point and the end point of the conversation unit based on the conversation unit start point information and the conversation unit end point information stored in the conversation unit discrimination information database. It is divided. For example, a location where the conversation unit start point determination information exists is set as the start point of the conversation unit, and a location where the conversation unit end point determination information exists is set as the end point of the conversation unit. In normal conversation, the conversation unit start point determination information is issued without issuing the conversation unit end point determination information, or the conversation unit start point information is not issued after the conversation unit end point information is issued. May occur. For this reason, when there is no conversation unit end point determination information between two conversation unit start point determination information, the point immediately before the point where the subsequent conversation unit start point determination information exists is set as the end point of the preceding conversation unit, The location where the subsequent conversation unit start point determination information exists is set as the start point of the subsequent conversation unit. In addition, when there is no conversation unit start point determination information between two conversation unit end point determination information, the location where the preceding conversation unit end point determination information exists is set as the end point of the preceding conversation unit, and the preceding conversation unit end point The location immediately after the location where the discrimination information exists is set as the start point of the subsequent conversation unit. Note that when determining the start point and end point of the conversation unit, not only the conversation unit determination information itself but also the change mode of the conversation unit determination information is considered.

用件判別手段１５は、話者間の対話の用件を判別する。本実施の形態では、用件判別手段１５は、音声認識情報に基づいて、話者間の対話の用件を判別している。音声認識情報に基づいて用件を判別する方法としては、用件認識情報にいずれの用件に対応する表現が含まれているかを判別する方法が用いられる。 The requirement discriminating means 15 discriminates a requirement for dialogue between speakers. In the present embodiment, the requirement discriminating means 15 discriminates a dialogue requirement between speakers based on the voice recognition information. As a method for discriminating a requirement based on voice recognition information, a method for discriminating which requirement is included in the requirement recognition information is used.

必要情報手掛かり情報付与手段１６は、会話単位分割手段１４によって分割された会話単位に必要情報手掛かり情報を付与する。すなわち、会話単位に必要情報手掛かり情報が含まれている場合には、当該必要情報手掛かり情報を当該会話単位に付与する。必要情報手掛かり情報としては、用件に対応して必要情報手掛かり情報データベース２４に記憶されている必要情報手掛かり情報リストで示される必要情報手掛かり情報が用いられる。 The necessary information clue information giving means 16 gives necessary information clue information to the conversation units divided by the conversation unit dividing means 14. That is, when the necessary information clue information is included in the conversation unit, the necessary information clue information is given to the conversation unit. As the necessary information clue information, necessary information clue information shown in the necessary information clue information list stored in the necessary information clue information database 24 corresponding to the requirement is used.

対話状態遷移判別手段１７は、音声認識情報に基づいて対話状態の遷移を判別することによって、必要情報を抽出する。本実施の形態では、対話状態遷移判別手段１７は、情報要求対話状態判別手段１７ａ、要求受理対話状態判別手段１７ｂ、情報開示対話状態判別手段１７ｃ、情報受理対話状態判別手段１７ｄを有している。情報要求対話状態判別手段１７ａは、音声認識情報と、対話状態判別情報データベース２５に記憶されている情報要求手掛かり情報２５ａに基づいて、対話状態が情報要求対話状態であることを判別する。要求受理対話状態判別手段１７ｂは、音声認識情報と、対話状態判別情報データベース２５に記憶されている要求受理手掛かり情報２５ｂに基づいて、対話状態が要求受理対話状態であることを判別する。情報開示対話状態判別手段１７ｃは、音声認識情報と、対話状態判別情報データベース２５に記憶されている情報開示手掛かり情報２５ｃに基づいて、対話状態が情報開示対話状態であることを判別する。情報受理対話状態判別手段１７ｄは、音声認識情報と、対話状態判別情報データベース２５に記憶されている情報受理手掛かり情報２５ｄに基づいて、対話状態が情報受理対話状態であることを判別する。そして、対話状態遷移判別手段１７は、対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって、必要情報を抽出する。 The dialog state transition determining means 17 extracts necessary information by determining the transition of the dialog state based on the voice recognition information. In the present embodiment, the dialog state transition determining unit 17 includes an information request dialog state determining unit 17a, a request receiving dialog state determining unit 17b, an information disclosure dialog state determining unit 17c, and an information receiving dialog state determining unit 17d. . Based on the voice recognition information and the information request clue information 25a stored in the dialog state determination information database 25, the information request dialog state determination unit 17a determines that the dialog state is the information request dialog state. Based on the voice recognition information and the request acceptance clue information 25b stored in the dialogue state discrimination information database 25, the request acceptance dialogue state discriminating means 17b discriminates that the dialogue state is the request acceptance dialogue state. Based on the voice recognition information and the information disclosure clue information 25c stored in the dialog state determination information database 25, the information disclosure dialog state determination unit 17c determines that the dialog state is the information disclosure dialog state. Based on the voice recognition information and the information reception clue information 25d stored in the dialog state determination information database 25, the information reception dialog state determination unit 17d determines that the dialog state is the information reception dialog state. Then, the dialog state transition determination unit 17 extracts necessary information by determining that the dialog state has transitioned in the order of the information request dialog state, the request reception dialog state, the information disclosure dialog state, and the information reception dialog state.

顧客と企業のサービスセンターの受付者（オペレータ）との間の対話時の音声信号（音声信号に対応する音声認識情報）から、本実施の形態の対話状態遷移判別手段１７によって対話状態の遷移を判別する方法の概要が図１１に示されている。なお、本実施の形態では、実線矢印で示されている順に対話状態が遷移したことを判別する。
先ず、受付者は、情報要求手掛かり情報を顧客に発する。例えば、情報要求手掛かり情報「［（必要情報）］をお願いします。」を発する。
顧客は、受付者から発せられた情報要求手掛かり情報を確認すると、要求受理手掛かり情報を受付者に発する。例えば、要求受理手掛かり情報「わかりました。」を発する。
次いで、顧客は、情報開示手掛かり情報を受付者に発する。例えば、「［（必要情報）］です。」を発する。
受付者は、顧客から発せられた情報開示手掛かり情報を確認すると、情報受理手掛かり情報を顧客に発する。例えば、「ありがとうございます。」を発する。
これにより、［（必要情報）］に対して、対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別し、［（必要情報）］を抽出する。 The dialogue state transition is determined by the dialogue state transition discriminating means 17 according to the present embodiment from the voice signal (voice recognition information corresponding to the voice signal) at the time of dialogue between the customer and the receptionist (operator) of the company service center. An outline of the determination method is shown in FIG. In the present embodiment, it is determined that the conversation state has transitioned in the order indicated by the solid line arrows.
First, the receptionist issues information request clue information to the customer. For example, the information request clue information “Please give me [(required information)]” is issued.
When the customer confirms the information requesting clue information issued from the acceptor, the customer issues the request accepting clue information to the acceptor. For example, the request acceptance clue information “I understand” is issued.
Next, the customer issues information disclosure clue information to the receptionist. For example, “[(Necessary information)]” is issued.
When the acceptor confirms the information disclosure clue information issued from the customer, the acceptor issues information acceptance clue information to the customer. For example, “Thank you.”
As a result, it is determined for [[necessary information]] that the dialog state has transitioned in the order of information request dialog state, request reception dialog state, information disclosure dialog state, information reception dialog state, and [(required information) ] Is extracted.

ここで、本実施の形態では、必要情報の抽出精度を高め、また、処理負担を軽減するために、会話単位分割手段１４および必要情報手掛かり情報付与手段１６が設けられている。
すなわち、会話単位分割手段１４によって、音声認識情報を会話単位に分割する。また、必要情報手掛かり情報付与手段１６によって、必要情報手掛かり情報が会話単位に含まれている場合には、当該必要情報手掛かり情報を当該会話単位に付与する。
対話状態遷移判別手段１７は、用件判別手段１５によって判別された用件に対応する必要情報リストで示されている必要情報を抽出する際には、必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を選択する。必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を選択する方法としては、必要情報に対応する必要情報手掛かり情報を一つ含む会話単位を選択する方法や、適宜選択した複数の必要情報手掛かり情報を含む会話単位を選択する方法を用いることができる。また、会話単位に含まれる必要情報手掛かり情報を適宜変更しながら会話単位を選択する方法を用いることもできる。なお、会話単位を選択する際には、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報（典型的には、必要情報として抽出される文字列と同じ品詞を有する必要情報手掛かり情報）を含む会話単位を選択する。例えば、必要情報が［契約者名］である場合には、少なくとも、人名を示す文字列（品詞［人名］を有する文字列）を含んでいる会話単位を選択し、必要情報が［契約者住所］である場合には、少なくとも、住所を示す文字列（品詞［地名］を有する文字列）を含んでいる会話単位を選択し、必要情報が［契約者電話番号］である場合には、少なくとも、数字を組み合わせた文字列（品詞［数詞］を有する文字列）を含んでいる会話単位を選択する。そして、選択した会話単位（あるいは、選択した会話単位を含む、選択した会話単位に隣接する複数の会話単位）に対して対話状態の遷移を判別することによって、必要情報を抽出する。
例えば、必要情報手掛かり情報付与手段１６は、用件に対応する必要情報リストによって必要情報［契約者名］が示されている場合、会話単位に、「契約者」、「お客様」、［名前］という文字列や、［品詞：人名］が含まれていれば、必要情報［契約者名］を抽出する際の手掛かりとなる必要情報手掛かり情報［契約者］、［お客様］、［名前］、［品詞：人名］を当該会話単位に付与する。対話状態遷移判別手段１７は、必要情報［契約者名］を抽出する際には、必要情報手掛かり情報［契約者］、［お客様］、［名前］、［品詞：人名］が付与されている（必要情報手掛かり情報「契約者」、「お客様」、「名前」、［品詞：人名］が含まれている）会話単位を選択する。そして、選択した会話単位に含まれている音声認識情報に基づいて、対話状態の遷移を判別することにより、必要情報［契約者名］を抽出する。この場合、［品詞：人名］を抽出する。これにより、必要情報を精度よく、また、効率よく抽出することができる。なお、必要情報手掛かり情報付与手段１６は、品詞情報に基づいて会話単位に必要情報手掛かり情報を付与する際には、音声認識手段１３によって、音声認識情報に付与された品詞情報と、必要情報データベース２３に記憶されている必要情報手掛かり情報リストで示されている必要情報手掛かり情報の品詞情報を比較して必要情報手掛かり情報を付与する。 Here, in the present embodiment, a conversation unit dividing unit 14 and a necessary information clue information adding unit 16 are provided in order to increase the accuracy of extracting necessary information and reduce the processing load.
In other words, the speech recognition information is divided into conversation units by the conversation unit dividing means 14. Further, when the necessary information clue information adding means 16 includes the necessary information clue information in the conversation unit, the necessary information clue information is added to the conversation unit.
When the dialog state transition determining unit 17 extracts the necessary information indicated in the necessary information list corresponding to the business determined by the business determining unit 15, the necessary information clue that is a key for extracting the necessary information. Select the conversation unit that contains the information. As a method of selecting a conversation unit including necessary information clue information as a clue to extract necessary information, a method of selecting a conversation unit including one necessary information clue information corresponding to the necessary information, or a plurality of appropriately selected needs A method of selecting a conversation unit including information clue information can be used. Also, a method of selecting a conversation unit while appropriately changing necessary information clue information included in the conversation unit can be used. When selecting a conversation unit, at least necessary information clue information corresponding to a character string extracted as necessary information (typically, a necessary information clue having the same part of speech as the character string extracted as necessary information). Information unit). For example, when the necessary information is [contractor name], at least a conversation unit including a character string indicating a person name (a character string having a part of speech [person name]) is selected, and the necessary information is [contractor address]. ], At least when a conversation unit including a character string indicating an address (a character string having a part of speech [place name]) is selected and the necessary information is [contractor telephone number], at least , A conversation unit including a character string (a character string having a part of speech [numerical]) combined with numbers is selected. Then, necessary information is extracted by determining the transition of the conversation state with respect to the selected conversation unit (or a plurality of conversation units adjacent to the selected conversation unit including the selected conversation unit).
For example, when the necessary information [contractor name] is indicated by the necessary information list corresponding to the requirement, the necessary information clue information giving means 16 includes “contractor”, “customer”, [name] for each conversation. Or the [part of speech: personal name], the necessary information clue information [contractor], [customer], [name], [ Part of speech: Person name] is assigned to the conversation unit. When the dialogue state transition discriminating means 17 extracts necessary information [contractor name], necessary information clue information [contractor], [customer], [name], and [part of speech: personal name] are given ( Necessary information Select the conversation unit (including the "contractor", "customer", "name", and [part of speech: person name]). Then, necessary information [contractor name] is extracted by determining the transition of the conversation state based on the voice recognition information included in the selected conversation unit. In this case, [part of speech: personal name] is extracted. Thereby, required information can be extracted accurately and efficiently. The necessary information clue information adding means 16 adds the part of speech information given to the voice recognition information by the voice recognition means 13 and the necessary information database when giving the necessary information clue information to the conversation unit based on the part of speech information. The part of speech information of the necessary information clue information shown in the necessary information clue information list stored in 23 is compared and necessary information clue information is given.

次に、本実施の形態の音声信号処理装置の動作を、図８に示されているフローチャートを参照して説明する。
ステップＡ１では、入力手段３０を介して、話者間の対話時の音声信号を入力する。ステップＡ１の処理は、管理手段１１によって実行される。
ステップＡ２では、ステップＡ１で入力された音声信号を、テキスト情報を含む音声認識情報に変換する。なお、入力手段３０を介して入力された音声信号には、入力手段３０を介して入力された後、音声信号データベース２１に記憶されている音声信号も含まれる。ステップＡ２の処理は、音声認識手段１３によって実行される。
ステップＡ３では、ステップＡ２で作成された音声認識情報を会話単位に分割する。なお、音声認識情報には、音声信号から変換された後、音声認識情報データベース２６に記憶されている音声認識情報も含まれる。ステップＡ３の処理は、会話単位分割手段１４によって実行される。
ステップＡ４では、話者間の対話の用件を判別する。本実施の形態では、ステップＡ２で作成された音声認識情報に基づいて、話者間の対話の用件を判別している。ステップＡ４の処理は、用件判別手段１５によって実行される。
ステップＡ５では、ステップＡ３で分割された会話単位に必要情報手掛かり情報を付与する。本実施の形態では、会話単位に、必要情報を抽出する手掛かりとなる必要情報手掛かり情報が含まれているか否かを判別し、必要情報手掛かり情報が含まれていれば、当該必要情報手掛かり情報を当該会話単位に付与する。必要情報は、必要情報データベース２４に、ステップＡ４で判別された用件に対応して記憶されている必要情報リストで示されている。また、必要情報手掛かり情報は、必要情報手掛かり情報データベース２４に、ステップＡ４で判別された用件に対応して記憶されている必要情報手掛かり情報リストで、必要情報に対応して示されている。ステップＡ５の処理は、必要情報手掛かり情報付与手段１６によって実行される。 Next, the operation of the audio signal processing apparatus according to the present embodiment will be described with reference to the flowchart shown in FIG.
In step A1, an audio signal during dialogue between the speakers is input via the input unit 30. The process of step A1 is executed by the management means 11.
In step A2, the voice signal input in step A1 is converted into voice recognition information including text information. Note that the audio signal input through the input unit 30 includes the audio signal stored in the audio signal database 21 after being input through the input unit 30. The process of step A2 is executed by the voice recognition means 13.
In step A3, the speech recognition information created in step A2 is divided into conversation units. The voice recognition information includes voice recognition information stored in the voice recognition information database 26 after being converted from the voice signal. The process of step A3 is executed by the conversation unit dividing means 14.
In step A4, a dialogue requirement between speakers is determined. In the present embodiment, the dialogue requirements between the speakers are determined based on the speech recognition information created in step A2. The process of step A4 is executed by the business matter discriminating means 15.
In step A5, necessary information clue information is given to the conversation unit divided in step A3. In the present embodiment, it is determined whether or not necessary information clue information that is a clue to extract necessary information is included in the conversation unit. If necessary information clue information is included, the necessary information clue information is Granted to the conversation unit. The necessary information is indicated by a necessary information list stored in the necessary information database 24 corresponding to the requirement determined in step A4. The necessary information clue information is shown in the necessary information clue information database 24 corresponding to the necessary information in the necessary information clue information list stored in correspondence with the requirement determined in step A4. The process of step A5 is executed by the necessary information clue information adding means 16.

ステップＡ６では、必要情報データベース２３に記憶されている、ステップＡ４で判別された用件に対応する必要情報リストを選択する。
ステップＡ７では、スロット番号Ｎを「１」に設定する。
ステップＡ８では、スロット番号Ｎのスロットに挿入される必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位の一つを選択する。ステップＡ８では、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報を含む会話単位を選択する。例えば、必要情報が、顧客の名前である［契約者名］の場合には、少なくとも、人の名前を示す必要情報手掛かり情報［品詞：人名］を含む会話単位を選択する。好ましくは、必要情報手掛かり情報［品詞：人名］、［契約者］（あるいは、［お客様］）、［名前］を含む会話単位を選択する。また、必要情報が、顧客の住所である［契約者住所］の場合には、少なくとも、住所を示す必要情報手掛かり情報［品詞：地名］を含む会話単位を選択する。好ましくは、必要情報手掛かり情報［品詞：地名］、［契約者］（あるいは、［お客様］）、［住所］を含む会話単位を選択する。また、必要情報が、顧客の電話番号である［契約者電話番号］の場合には、少なくとも、数字の組み合わせを示す必要情報手掛かり情報［品詞：数詞］を含む会話単位を選択する。好ましくは、必要情報手掛かり情報［品詞：数詞］、［契約者］（あるいは、［お客様］）、［電話］（あるいは、［携帯］）、［番号］を含む会話単位を選択する。
ステップＡ８で、少なくとも、必要情報として抽出される文字列に対応する必要情報手掛かり情報を含む会話単位を選択する方法としては、必要情報として抽出される文字列に対応する必要情報手掛かり情報のみを含む会話単位を選択する方法や、必要情報として抽出される文字列に対応する必要情報手掛かり情報と、必要情報手掛かり情報リストに示されている他の必要情報手掛かり情報のうちの１つあるいは複数を含む会話単位を選択する方法等を適宜組み合わせて用いることができる。 In step A6, the necessary information list corresponding to the message determined in step A4 stored in the necessary information database 23 is selected.
In step A7, the slot number N is set to “1”.
In step A8, one of the conversation units including necessary information clue information that is a clue to extract necessary information inserted into the slot of slot number N is selected. In step A8, at least a conversation unit including necessary information clue information corresponding to a character string extracted as necessary information is selected. For example, when the necessary information is [contractor name] which is the name of a customer, a conversation unit including at least necessary information clue information [part of speech: person name] indicating a person's name is selected. Preferably, a conversation unit including necessary information clue information [part of speech: personal name], [contractor] (or [customer]), and [name] is selected. Further, when the necessary information is [contractor address] which is a customer address, a conversation unit including at least necessary information clue information [part of speech: place name] indicating the address is selected. Preferably, a conversation unit including necessary information clue information [part of speech: place name], [contractor] (or [customer]), and [address] is selected. When the necessary information is [contractor telephone number], which is the customer's telephone number, a conversation unit including at least necessary information clue information [part of speech: number] indicating a combination of numbers is selected. Preferably, a conversation unit including necessary information clue information [part of speech: number], [contractor] (or [customer]), [phone] (or [mobile phone]), and [number] is selected.
In step A8, as a method for selecting at least a conversation unit including necessary information clue information corresponding to a character string extracted as necessary information, only necessary information clue information corresponding to a character string extracted as necessary information is included. Includes one or more of a method for selecting a conversation unit, necessary information clue information corresponding to a character string extracted as necessary information, and other necessary information clue information indicated in the necessary information clue information list A method for selecting a conversation unit or the like can be used in appropriate combination.

ステップＡ９では、ステップＡ８で選択した会話単位に含まれている音声認識情報に、ステップＡ６で選択した必要情報リストのスロット番号Ｎに挿入される必要情報に対応する情報要求手掛かり情報２５ａが存在するか否か、すなわち、必要情報に関する情報要求対話状態であるか否かを判別する。情報要求手掛かり情報２５ａが存在することを判別した場合、すなわち、情報要求対話状態であることを判別した場合には、ステップＡ１０に進む。情報要求手掛かり情報２５ａが存在しない場合、すなわち、情報要求対話状態であることが判別されなかった場合には、ステップＡ８で選択した会話単位に含まれている音声認識情報から必要情報を抽出することができないと判断し、ステップＡ１６に進む。
ステップＡ１０では、ステップＡ８で選択した会話単位に含まれている音声認識情報に、必要情報に対応する情報要求手掛かり情報２５ａに後続して、必要情報に対応する要求受理手掛かり情報２５ｂが存在するか否か、すなわち、必要情報に関する情報要求対話状態から必要情報に関する要求受理対話状態に遷移したか否かを判別する。情報要求対話状態から要求受理対話状態に遷移した場合にはステップＡ１１に進み、遷移してない場合にはステップＡ１５に進む。
ステップＡ１１では、ステップＡ８で選択した会話単位に含まれている音声認識情報に、必要情報に対応する情報要求手掛かり情報２５ａに後続して、必要情報に対応する要求受理手掛かり情報２５ｂ、さらに、必要情報に対応する情報開示手掛かり情報２５ｃが存在するか否か、すなわち、必要情報に関する情報要求対話状態から、必要情報に関する要求受理対話状態、さらに、必要情報に関する情報開示対話状態に遷移したか否かを判別する。情報要求対話状態から要求受理対話状態、さらに、情報開示対話状態に遷移した場合にはステップＡ１２に進み、遷移してない場合にはステップＡ１５に進む。
ステップＡ１２では、ステップＡ８で選択した会話単位に含まれている音声認識情報に、必要情報に対応する情報要求手掛かり情報２５ａに後続して、必要情報に対応する要求受理手掛かり情報２５ｂ、必要情報に対応する情報開示手掛かり情報２５ｃ、さらに、必要情報に対応する情報受理手掛かり情報が存在するか否か、すなわち、必要情報に関する情報要求対話状態から、必要情報に関する要求受理対話状態、必要情報に関する情報開示対話状態、さらに、必要情報に関する情報受理対話状態に遷移したか否かを判別する。情報要求対話状態から要求受理対話状態、情報開示対話状態、さらに、情報受理対話状態に遷移した場合にはステップＡ１３に進み、遷移してない場合にはステップＡ１５に進む。 In step A9, information request clue information 25a corresponding to the necessary information inserted in the slot number N of the necessary information list selected in step A6 exists in the speech recognition information included in the conversation unit selected in step A8. It is determined whether or not it is an information request dialogue state regarding necessary information. When it is determined that the information request clue information 25a exists, that is, when it is determined that the information request dialogue state is established, the process proceeds to step A10. When the information request clue information 25a does not exist, that is, when it is not determined that the information request dialogue state is established, necessary information is extracted from the speech recognition information included in the conversation unit selected in step A8. The process proceeds to step A16.
In step A10, whether or not the request recognition clue information 25b corresponding to the necessary information exists in the voice recognition information included in the conversation unit selected in step A8, following the information request clue information 25a corresponding to the necessary information. In other words, it is determined whether or not a transition is made from the information request dialogue state relating to necessary information to the request acceptance dialogue state relating to necessary information. If the information request dialog state has transitioned to the request acceptance dialog state, the process proceeds to step A11, and if not, the process proceeds to step A15.
In step A11, the speech recognition information included in the conversation unit selected in step A8 is followed by information request clue information 25a corresponding to necessary information, request acceptance clue information 25b corresponding to necessary information, and further necessary. Whether there is information disclosure clue information 25c corresponding to the information, that is, whether the information request dialogue state relating to the necessary information has changed to the request acceptance dialogue state relating to the necessary information, and further to the information disclosure dialogue state relating to the necessary information Is determined. When the information request dialogue state is changed to the request acceptance dialogue state and further to the information disclosure dialogue state, the process proceeds to Step A12, and otherwise, the process proceeds to Step A15.
In step A12, the speech recognition information included in the conversation unit selected in step A8 is added to the request acceptance clue information 25b corresponding to the necessary information and the necessary information following the information request clue information 25a corresponding to the necessary information. Whether the corresponding information disclosure clue information 25c and the information acceptance clue information corresponding to the necessary information exist, that is, from the information request dialogue state regarding the necessary information, the request acceptance dialogue state regarding the necessary information, and the information disclosure regarding the necessary information It is determined whether or not a transition is made to a dialog state and further to an information reception dialog state regarding necessary information. When the information request dialogue state is changed to the request acceptance dialogue state, the information disclosure dialogue state, and further to the information acceptance dialogue state, the process proceeds to Step A13, and otherwise, the process proceeds to Step A15.

ステップＡ１３では、必要情報を抽出し、抽出した必要情報を、ステップＡ６で選択した必要情報リストの、スロット番号Ｎに対応するスロットに挿入する。例えば、必要情報が［契約者名］の場合には、人の名前を示す必要情報手掛かり情報［品詞：人名］に該当する文字列を抽出する。また、必要情報が［契約者住所］の場合には、住所を示す必要情報手掛かり情報［品詞：地名］に該当する文字列を抽出する。また、必要情報が［契約者電話番号］の場合には、数字の組み合わせを示す必要情報手掛かり情報［品詞：数詞］に該当する文字列を抽出する。
ステップＡ１４では、Ｎが、ステップＡ６で選択した必要情報リストの総スロット数に等しいか否かを判別する。Ｎが必要情報リストの総スロット数に等しい場合には、処理を終了し、等しくない場合には、ステップＡ１７に進む。 In step A13, the necessary information is extracted, and the extracted necessary information is inserted into the slot corresponding to the slot number N in the necessary information list selected in step A6. For example, when the necessary information is [contractor name], a character string corresponding to the necessary information clue information [part of speech: person name] indicating the person's name is extracted. When the necessary information is [contractor address], a character string corresponding to the necessary information clue information [part of speech: place name] indicating the address is extracted. If the necessary information is [contractor telephone number], a character string corresponding to the necessary information clue information [part of speech: number] indicating a combination of numbers is extracted.
In step A14, it is determined whether N is equal to the total number of slots in the necessary information list selected in step A6. If N is equal to the total number of slots in the necessary information list, the process is terminated; otherwise, the process proceeds to step A17.

ステップＡ１５では、ステップＡ８で選択した会話単位に含まれている音声認識情報から、ステップＡ６で選択した必要情報リストの、スロット番号Ｎに挿入する必要情報に対応する情報要求手掛かり情報２５ａを全て判別したか否かを判別する。情報要求手掛かり情報２５ａを全て判別した場合には、ステップＡ１６に進み。情報要求手掛かり情報２５ａを全て判別していない場合には、ステップＡ９に戻り、残りの音声認識情報に存在する情報要求手掛かり情報２５ａを判別する。
ステップＡ１６では、ステップＡ６で選択した必要情報リストの、スロット番号Ｎのスロットに挿入される必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を全て選択したか否かを判別する。スロット番号Ｎのスロットに挿入される必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を全て選択していない場合には、ステップＡ８に戻り、スロット番号Ｎのスロットに挿入される必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む他の会話単位を選択する。スロット番号Ｎのスロットに挿入される必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を全て選択した場合には、ステップＡ１４に進む。
ステップＡ１７では、スロット番号Ｎに「１」を加算した後、ステップＡ８に戻り、ステップＡ６で選択した必要情報リストの、次のスロット番号Ｎ（＝Ｎ＋１）のスロットに挿入される必要情報の抽出処理を繰り返す。
ステップＡ６〜Ａ１７の処理は、対話状態遷移判別手段１７によって実行される。 In step A15, all the information request clue information 25a corresponding to the necessary information to be inserted into the slot number N in the necessary information list selected in step A6 is determined from the speech recognition information included in the conversation unit selected in step A8. It is determined whether or not. If all the information request clue information 25a is determined, the process proceeds to step A16. If all the information request clue information 25a has not been determined, the process returns to step A9 to determine the information request clue information 25a existing in the remaining speech recognition information.
In step A16, it is determined whether or not all the conversation units including the necessary information clue information to be used as a clue to extract the necessary information inserted into the slot of slot number N in the necessary information list selected in step A6 have been selected. If not all the conversation units including necessary information clue information that is a clue to extract necessary information to be inserted into the slot with slot number N have been selected, it is necessary to return to step A8 and insert into the slot with slot number N. Other conversation units including necessary information clue information as a clue to extract information are selected. When all conversation units including necessary information clue information that is a clue to extract necessary information to be inserted into the slot of slot number N are selected, the process proceeds to step A14.
In step A17, “1” is added to the slot number N, and then the process returns to step A8 to extract necessary information to be inserted into the slot of the next slot number N (= N + 1) in the necessary information list selected in step A6. Repeat the process.
The processing in steps A6 to A17 is executed by the dialog state transition determination unit 17.

第１の実施の形態では、対話状態が、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出した。ここで、通常の対話では、情報要求対話状態から情報開示対話状態に遷移することがある。すなわち、要求受理対話状態が存在しない場合がある。
以下に、対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出する、本発明の音声信号処理装置の第２の実施の形態を説明する。第２の実施の形態の音声信号処理装置も、第１の実施の形態の音声信号処理装置と同様に、一方の話者から他方の話者に用件を依頼する際における話者間の対話時の音声信号から、依頼された用件を遂行するために必要な必要情報を抽出する音声信号処理装置として好適に用いることができる。
第２の実施の形態の音声信号処理装置は、第１の実施の形態の音声信号処理装置と同様に、処理手段１０、記憶手段２０、入力手段３０、出力手段４０を有している。ただ、第２の実施の形態では、対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出するため、図１に示されている対話状態判別情報データベース２５の要求受理手掛かり情報２５ｂ、対話状態遷移判別手段１７の要求受理対話状態判別手段１７ｂが省略されている。他の構成は、第１の実施の形態の音声信号処理装置と同様である。 In the first embodiment, necessary information is extracted by determining that the dialog state has transitioned in the order of the information request dialog state, the request reception dialog state, the information disclosure dialog state, and the information reception dialog state. Here, in a normal dialogue, the information request dialogue state may change to the information disclosure dialogue state. That is, there may be no request acceptance dialog state.
In the second embodiment of the audio signal processing apparatus of the present invention, the necessary information is extracted by determining that the dialogue state has transitioned in the order of the information request dialogue state, the information disclosure dialogue state, and the information reception dialogue state. A form is demonstrated. Similarly to the audio signal processing apparatus of the first embodiment, the audio signal processing apparatus of the second embodiment also has dialogue between speakers when requesting a request from one speaker to the other speaker. The present invention can be suitably used as an audio signal processing apparatus that extracts necessary information necessary for performing a requested request from an audio signal at the time.
The audio signal processing apparatus according to the second embodiment includes a processing means 10, a storage means 20, an input means 30, and an output means 40, similarly to the audio signal processing apparatus according to the first embodiment. However, in the second embodiment, the necessary information is extracted by determining that the dialog state has transitioned in the order of the information request dialog state, the information disclosure dialog state, and the information reception dialog state. The request acceptance clue information 25b of the dialog state determination information database 25 and the request reception dialog state determination unit 17b of the dialog state transition determination unit 17 are omitted. Other configurations are the same as those of the audio signal processing apparatus according to the first embodiment.

顧客と企業のサービスセンターの受付者（オペレータ）との間の対話時の音声信号（音声信号に対応する音声認識情報）から、第２の実施の形態の対話状態遷移判別手段１７によって対話状態の繊維を判別する方法の概要が図１１に示されている。なお、第２の実施の形態では、破線矢印で示されている順に対話状態が遷移したことを判別する。
先ず、受付者は、情報要求手掛かり情報を顧客に発する。例えば、情報要求手掛かり情報「［（必要情報）］をお願いします。」を発する。
顧客は、受付者から発せられた情報要求手掛かり情報を確認すると、情報開示手掛かり情報を受付者に発する。例えば、「［（必要情報）］です。」を発する。
受付者は、顧客から発せられた情報開示手掛かり情報を確認すると、情報受理手掛かり情報を顧客に発する。例えば、「ありがとうございます。」を発する。
これにより、［（必要情報）］に対して、対話状態が、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別し、［（必要情報）］を抽出する。 From the voice signal (voice recognition information corresponding to the voice signal) at the time of dialogue between the customer and the receptionist (operator) of the company service center, the dialogue state transition discriminating means 17 of the second embodiment An overview of the method for discriminating fibers is shown in FIG. In the second embodiment, it is determined that the conversation state has transitioned in the order indicated by the dashed arrows.
First, the receptionist issues information request clue information to the customer. For example, the information request clue information “Please give me [(required information)]” is issued.
When the customer confirms the information requesting clue information issued from the acceptor, the customer issues information disclosure clue information to the acceptor. For example, “[(Necessary information)]” is issued.
When the acceptor confirms the information disclosure clue information issued from the customer, the acceptor issues information acceptance clue information to the customer. For example, “Thank you.”
As a result, it is determined that the dialogue state has transitioned in the order of the information request dialogue state, the information disclosure dialogue state, and the information reception dialogue state with respect to [(necessary information)], and [(necessary information)] is extracted.

次に、第２の実施の形態の音声信号処理装置の動作を、図８に示されているフローチャートを参照して説明する。第２の実施の形態の音声信号処理装置では、対話状態が要求受理対話状態にあることを判別しないため、ステップＡ１０が省略されている。
すなわち、ステップＡ９において、選択した会話単位に含まれている音声認識情報から情報要求手掛かり情報２５ａを判別した場合には、破線矢印で示されているように、ステップＡ１１に進む。
他の処理は、第１の実施の形態の音声信号処理装置と同様である。 Next, the operation of the audio signal processing apparatus according to the second embodiment will be described with reference to the flowchart shown in FIG. In the audio signal processing apparatus according to the second embodiment, step A10 is omitted because it is not determined that the dialogue state is the request acceptance dialogue state.
That is, in step A9, when the information request clue information 25a is determined from the voice recognition information included in the selected conversation unit, the process proceeds to step A11 as indicated by the broken line arrow.
Other processes are the same as those of the audio signal processing apparatus according to the first embodiment.

次に、本実施の形態の動作を、顧客から電力会社に［電気の使用停止］を依頼する場合について具体的に説明する。
図９および図１０には、［電気の使用停止］を依頼する際の、顧客と電力会社のサービスセンターの受付者（オペレータ）との間での対話時の音声信号を、テキスト情報を含む音声認識情報に変換した後、文書Ｃ１〜Ｃ４９を作成した例が示されている。 Next, the operation of the present embodiment will be specifically described for a case where a customer requests an electric power company to stop using electricity.
FIG. 9 and FIG. 10 show a voice signal including text information as a voice signal at the time of a dialogue between a customer and a receptionist (operator) of a service center of an electric power company when requesting [use of electricity]. An example is shown in which documents C1 to C49 are created after conversion into recognition information.

先ず、音声認識情報に基づいて対話の用件が判別される。この場合、例えば、文書Ｃ３「電気の停止をお願いしたいのですが。」に含まれている文字列「停止」が、用件［電気の使用停止］に対応する必要情報手掛かり情報リスト（図６参照）中の必要情報手掛かり情報［停止］と一致するため、この対話の用件（顧客から電力会社に依頼するサービスの内容）が［電気の使用停止］であることが判別される。音声認識情報から対話の用件を判別する方法としては、これ以外の種々の方法を用いることができる。
次に、音声認識情報（文書Ｃ１〜Ｃ４９に含まれている音声認識情報）は、会話単位判別情報データベース２２に会話単位判別情報として記憶されている会話単位終点判別情報と会話単位終点判別所に基づいて会話単位Ｓ１〜Ｓ１５に分割される。図９および図１０では、実線で囲まれている音声認識情報が会話単位始点判別情報であり、破線で囲まれている音声認識情報が会話単位終点判別情報である。例えば、文書Ｃ１の「おはようございます」（始めの挨拶の表現）、文書Ｃ６の「それでは」および文書Ｃ８の「では」および文書Ｃ１１の「次に」（話題を変える表現）等が会話単位始点判別情報として判別され、文書Ｃ５の「かしこまりました」（確認の表現）、文書Ｃ１０の「ありがとうございました」（お礼の表現）、文書Ｃ４８の「失礼いたします」（終わりの挨拶の表現）等が会話終点判別情報として判別される。 First, the dialogue requirements are determined based on the voice recognition information. In this case, for example, the character string “stop” included in the document C3 “I want to stop the electricity” is a necessary information clue information list corresponding to the requirement [stop electricity use] (FIG. 6). Since it matches the necessary information clue information [stop] in the reference), it is determined that the requirement for the conversation (contents of service requested from the customer to the power company) is [stop use of electricity]. Various methods other than this can be used as a method of discriminating the dialogue requirements from the speech recognition information.
Next, the speech recognition information (speech recognition information included in the documents C1 to C49) is stored in the conversation unit end point determination information and the conversation unit end point determination place stored as the conversation unit determination information in the conversation unit determination information database 22. Based on the conversation units S1 to S15. In FIG. 9 and FIG. 10, the speech recognition information surrounded by a solid line is conversation unit start point determination information, and the voice recognition information surrounded by a broken line is conversation unit end point determination information. For example, “Good morning” in document C1 (expression of the first greeting), “Now” in document C6, “In” in document C8, “Next” in document C11 (expression to change the topic), etc. Discriminated as discriminating information, “Cumshot” in document C5 (expression of confirmation), “Thank you” in document C10 (expression of thanks), “I am sorry” (expression of greeting at the end) in document C48, etc. Is determined as the conversation end point determination information.

また、分割した会話単位に、判別した用件に対応する必要情報リストに示されている必要情報を抽出する手掛かりとなる必要情報手掛かり情報が含まれていれば、当該必要情報手掛かり情報が当該会話単位に付与される。例えば、用件が［電気の使用停止］である場合には、図４に示されている、用件［電気の使用停止］に対応する必要情報リストで示されている必要情報を抽出する手掛かりとなる必要情報手掛かり情報が付与される。用件［電気の使用停止］における、必要情報に対応する必要情報手掛かり情報の一例が図５に示されている。
図９では、例えば、会話単位Ｓ１に含まれている文字列「○○受付センター」、「△△」に基づいて、必要情報［受付者名］に対応する必要情報手掛かり情報［受付センター］、［品詞：人名］が会話単位Ｓ１に付与されている。また、文字列「電気の停止」、「電気を止める」に基づいて、必要情報［使用停止日］、［使用停止時刻］に対応する必要情報手掛かり情報［停止］、［停め］が会話単位Ｓ１付与されている。また、会話単位Ｓ３に含まれている文字列「お客様番号」、「００１１２２３３」に基づいて、必要情報［お客様番号］を抽出する手掛かりとなる必要情報手掛かり情報［お客様番号］、［品詞：数詞］が会話単位Ｓ３に付与されている。また、会話単位Ｓ４に含まれている文字列「契約者」、「名前」、「□□□□」に基づいて、必要情報［契約者名］を抽出する手掛かりとなる必要情報手掛かり情報［契約者］、［名前］、［品詞：人名］が会話単位Ｓ４に付与されている。
また、図１０では、例えば、会話単位Ｓ８に含まれている文字列「Ｘ月」、「Ｙ日」、「火曜日」、「１２時」に基づいて、必要情報［使用停止日］、［使用停止時刻］を抽出する手掛かりとなる必要情報手掛かり情報「月」、「日」、「時」、「曜」、［品詞：数詞］が会話単位Ｓ８に付与されている。また、会話単位Ｓ９に含まれている文字列「Ｘ月」、「Ｙ日」、「火曜日」、「１２時」、「停め」に基づいて、必要情報［使用停止日］、［使用停止時刻］を抽出する手掛かりとなる必要情報手掛かり情報「月」、「日」、「時」、「曜」、［停め］、［品詞：数詞］が会話単位Ｓ９に付与されている。また、会話単位Ｓ１２に含まれている文字列「引越し」、「住所」、「ＤＤ市ＥＥ４−５−６ＦＦマンション２０２」に基づいて、必要情報［契約者移転先住所］を抽出する手掛かりとなる必要情報手掛かり情報［引越］、［住所］、［品詞：地名］が付与されている。 In addition, if the divided conversation unit includes necessary information clue information that serves as a clue to extract necessary information indicated in the necessary information list corresponding to the determined requirement, the necessary information clue information is included in the conversation. It is given to the unit. For example, when the requirement is [use of electricity], a clue to extract necessary information shown in the necessary information list corresponding to the requirement [use of electricity] shown in FIG. Necessary information clue information is given. FIG. 5 shows an example of necessary information clue information corresponding to necessary information in the requirement [stop using electricity].
In FIG. 9, for example, necessary information clue information [reception center] corresponding to necessary information [recipient name] based on the character strings “XX reception center” and “ΔΔ” included in the conversation unit S1, [Part of speech: personal name] is assigned to the conversation unit S1. Further, based on the character strings “stop electricity” and “stop electricity”, the necessary information clue information [stop] and [stop] corresponding to the necessary information [use stop date] and [use stop time] are conversation units S1. Has been granted. Further, necessary information clue information [customer number], [part of speech] used as a clue to extract necessary information [customer number] based on the character strings “customer number” and “00 11 22 33” included in the conversation unit S3. : Numeral] is assigned to the conversation unit S3. Further, based on the character strings “contractor”, “name”, and “□□□□” included in the conversation unit S4, necessary information clue information [contract] that is a clue to extract necessary information [contractor name]. [Person], [name], [part of speech: personal name] are assigned to the conversation unit S4.
In FIG. 10, for example, necessary information [usage stop date], [use date] based on the character strings “X month”, “Y day”, “Tuesday”, “12:00” included in the conversation unit S8. Necessary information clue information “Month”, “Day”, “Time”, “Day”, and [Part of speech: Numeral], which are clues for extracting [stop time], are assigned to the conversation unit S8. Further, based on the character strings “X month”, “Y day”, “Tuesday”, “12:00”, “stop” included in the conversation unit S9, necessary information [usage stop date], [use stop time] Necessary information clue information “month”, “day”, “hour”, “day of the week”, “stop”, and “part of speech: number” are assigned to the conversation unit S9. Further, based on the character strings “moving”, “address”, and “DD city EE4-5-6 FF apartment 202” included in the conversation unit S12, a clue for extracting necessary information [contractor relocation address] Necessary information clue information [moving], [address], [part of speech: place name] is given.

そして、音声認識情報に基づいて対話状態の遷移を判別することによって、判別した用件に対応する必要情報リストで示されている必要情報を抽出する。例えば、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したこと、あるいは、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出する。この時、必要情報を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位を選択する。
対話状態の遷移を判別することによって必要情報を抽出する処理の具体例を図１２〜図１４により説明する。なお、図１２、図１３は、対話状態が情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別する例であり、図１４は、対話状態が情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別する例である。 Then, by determining the transition of the dialogue state based on the voice recognition information, the necessary information indicated in the necessary information list corresponding to the determined requirement is extracted. For example, it is determined that the information request dialogue state, the request acceptance dialogue state, the information disclosure dialogue state, the information acceptance dialogue state are changed in this order, or the information request dialogue state, the information disclosure dialogue state, the information acceptance dialogue state are changed in this order. To extract necessary information. At this time, a conversation unit including necessary information clue information that is a clue to extract necessary information is selected.
A specific example of processing for extracting necessary information by determining the transition of the dialogue state will be described with reference to FIGS. 12 and 13 are examples of determining that the dialog state has transitioned in the order of the information request dialog state, the information disclosure dialog state, and the information reception dialog state. FIG. 14 illustrates the dialog state as the information request dialog state, It is an example which discriminate | determines that it changed in order of a request reception dialogue state, an information disclosure dialogue state, and an information reception dialogue state.

対話の用件［電気の使用停止］に対する必要情報［お客様番号］を抽出する処理が図１２に示されている。図１２では、図４に示されているスロット番号Ｂ００２の必要情報［お客様番号］を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位として、必要情報手掛かり情報［お客様番号］、［品詞：数詞］が付与されている会話単位Ｓ３が選択されている。
受付者から顧客への対話内容を示す文書Ｃ８には、必要情報［お客様番号］の要求を表す表現「お客様番号を左からお願いします」が含まれている。これにより、文書Ｃ８から、必要情報［お客様番号］を要求した情報要求対話状態であることが判別される。
文書Ｃ８に後続する、顧客から受付者への対話内容を示す文書Ｃ９には、必要情報［お客様番号］を示す文字列に対応する文字列「００１１２２３３」が含まれている。これにより、文書Ｃ９から、必要情報［お客様番号］を開示した情報開示対話状態であることが判別される。
文書Ｃ９に後続する、受付者から顧客への対話内容を示す文書Ｃ１０には、必要情報［お客様番号］の確認を表す表現「００１１２２３３ですね、ありがとうございます」が含まれている。これにより、文書Ｃ１０から、必要情報［お客様番号］を受理した情報受理対話状態であることが判別される。
以上により、必要情報［お客様番号］に関して、会話単位Ｓ３において、対話状態が情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことが判別され、必要情報である［お客様番号］として、必要情報［お客様番号］を示す必要情報手掛かり情報［品詞：数詞］に該当する文字列「００１１２２３３」が抽出される。 FIG. 12 shows a process of extracting necessary information [customer number] for the dialogue requirement [electricity stoppage]. In FIG. 12, necessary information clue information [customer number], [part of speech: numerology] as a conversation unit including necessary information clue information for extracting necessary information [customer number] of slot number B002 shown in FIG. ] Has been selected.
The document C8 indicating the content of the dialogue from the reception person to the customer includes an expression “Please enter the customer number from the left” indicating the request for the necessary information [customer number]. As a result, it is determined from the document C8 that the information request dialogue state has requested the necessary information [customer number].
The document C9 indicating the content of the dialogue from the customer to the receiver subsequent to the document C8 includes a character string “00 11 22 33” corresponding to the character string indicating the necessary information [customer number]. As a result, it is determined from the document C9 that the information disclosure dialogue state discloses the necessary information [customer number].
The document C10 indicating the dialogue contents from the receiver to the customer following the document C9 includes an expression “00 11 22 33, thank you” indicating confirmation of the necessary information [customer number]. Thus, it is determined from the document C10 that the information reception dialogue state in which the necessary information [customer number] is received is established.
As described above, regarding the necessary information [customer number], in the conversation unit S3, it is determined that the dialogue state has transitioned in the order of the information request dialogue state, the information disclosure dialogue state, and the information reception dialogue state. The character string “00112233” corresponding to the necessary information clue information [part of speech: number] indicating the necessary information [customer number] is extracted.

対話の用件［電気の使用停止］に対する必要情報［契約者名］を抽出する処理が図１３に示されている。図１３では、図４に示されているスロット番号Ｂ００３の必要情報［契約者名］を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位として、必要情報手掛かり情報［契約者］、［名前］、［品詞：人名］が付与されている会話単位Ｓ４が選択されている。
受付者から顧客への対話内容を示す文書Ｃ１１には、必要情報［契約者名］の要求を表す表現「ご契約者様のお名前をお願いします」が含まれている。これにより、文書Ｃ１１から、必要情報［契約者名］を要求した情報要求対話状態であることが判別される。
文書Ｃ１１に後続する、顧客から受付者への対話内容を示す文書Ｃ１２には、必要情報［契約者名］を示す文字列に対応する文字列「□□□□」が含まれている。これにより、文書Ｃ１２から、必要情報［契約者名］を開示した情報開示対話状態であることが判別される。
文書Ｃ１２に後続する、受付者から顧客への対話内容を示す文書Ｃ１３には、必要情報［契約者名］の確認を表す表現「□□□□ですね、ありがとうございます」が含まれている。これにより、文書Ｃ１３から、必要情報［契約者名］を受理した情報受理対話状態であることが判別される。
以上により、必要情報［契約者名］に関して、会話単位Ｓ４において、対話状態が情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことが判別され、必要情報である［契約者名］として、必要情報［契約者名］を示す必要情報手掛かり情報［品詞：人名］に該当する文字列「□□□□」が抽出される。 FIG. 13 shows a process of extracting necessary information [contractor name] for the dialogue requirement [electricity stoppage]. In FIG. 13, necessary information clue information [contractor], [name] is used as a conversation unit including necessary information clue information that is a clue to extract the necessary information [contractor name] of the slot number B003 shown in FIG. The conversation unit S4 to which [part of speech: personal name] is assigned is selected.
The document C11 indicating the content of the dialogue from the acceptor to the customer includes the expression “Please give the name of the contractor” indicating the request for the necessary information [contractor name]. As a result, it is determined from the document C11 that the information request dialogue state has requested the necessary information [contractor name].
The document C12 indicating the content of the dialogue from the customer to the receiver subsequent to the document C11 includes a character string “□□□□” corresponding to the character string indicating the necessary information [contractor name]. Thus, it is determined from the document C12 that the information disclosure dialogue state is disclosed in which the necessary information [contractor name] is disclosed.
The document C13 showing the content of the dialogue from the receiver to the customer following the document C12 includes the expression “Thank you, □□□□,” indicating confirmation of the necessary information [contractor name]. . Thus, it is determined from the document C13 that the information reception dialogue state in which the necessary information [contractor name] is received is established.
As described above, regarding the necessary information [contractor name], it is determined in the conversation unit S4 that the dialog state has transitioned in the order of the information request dialog state, the information disclosure dialog state, and the information reception dialog state. As the “name”, the character string “□□□□” corresponding to the necessary information clue information [part of speech: personal name] indicating the necessary information [contractor name] is extracted.

対話の用件［電気の使用停止］に対する必要情報［契約者移転先電話番号］を抽出する処理が図１４に示されている。図１４では、図４に示されているスロット番号Ｂ０１１の必要情報［契約者移転先電話番号］を抽出する手掛かりとなる必要情報手掛かり情報を含む会話単位として、必要情報手掛かり情報［引越］、［電話］（あるいは［携帯］）、［番号］、［品詞：数詞］が付与されている会話単位Ｓ１３が選択されている。
受付者から顧客への対話内容を示す文書Ｃ３７には、必要情報［契約者移転先電話番号］の要求を表す表現「お引越し先の電話番号をお願いします」が含まれている。これにより、文書Ｃ３７から、必要情報［契約者移転先電話番号］を要求した情報要求対話状態であることが判別される。
文書Ｃ３７に後続する、顧客から受付者への対話内容を示す文書Ｃ３８には、必要情報［契約者移転先電話番号］の要求を受理したことを表す表現「わかりました」が含まれている。これにより、文書Ｃ３８から、必要情報［契約者移転先電話番号］の要求を受理した要求受理対話状態であることが判別される。
文書Ｃ３９に後続する、顧客から受付者への対話内容を示す文書Ｃ４１には、必要情報［契約者移転先電話番号］を示す文字列に対応する文字列「０９０００００１１１１」が含まれている。これにより、文書Ｃ４１から、必要情報［契約者移転先電話番号］を開示した情報開示対話状態であることが判別される。
文書Ｃ４１に後続する、受付者から顧客への対話内容を示す文書Ｃ４２には、必要情報［契約者移転先電話番号］の確認を表す表現「０９０００００１１１１ですね、ありがとうございます」が含まれている。これにより、文書Ｃ４２から、必要情報［契約者移転先電話番号］を受理した情報受理対話状態であることが判別される。
以上により、必要情報［契約者移転先電話番号］に関して、会話単位Ｓ１３において、対話状態が情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことが判別され、必要情報である［契約者移転先電話番号］として、必要情報［契約者移転先電話番号］を示す必要情報手掛かり情報［品詞：数詞］に該当する文字列「０９０００００１１１１」が抽出される。 FIG. 14 shows a process of extracting necessary information [contractor's transfer destination telephone number] for the dialogue requirement [electricity stoppage]. In FIG. 14, the necessary information clue information [moving], [moving], [conversion unit including necessary information clue information that is a clue to extract the necessary information [contractor transfer destination telephone number] of the slot number B011 shown in FIG. The conversation unit S13 to which "telephone" (or [mobile phone]), [number], and [part of speech: number] are assigned is selected.
The document C37 indicating the content of the dialogue from the reception person to the customer includes an expression “Please give me the phone number of the destination” indicating the request for the necessary information [contractor transfer destination phone number]. As a result, it is determined from the document C37 that the information request dialogue state in which the required information [contractor transfer destination telephone number] is requested.
The document C38 indicating the content of the dialogue from the customer to the receiver subsequent to the document C37 includes an expression “I understand” indicating that the request for the necessary information [contractor transfer destination telephone number] has been received. . As a result, it is determined from the document C38 that the request acceptance dialogue state in which the request for the necessary information [contractor transfer destination telephone number] has been accepted is made.
The document C41 indicating the content of the dialogue from the customer to the receiver subsequent to the document C39 includes a character string “090 0000 1111” corresponding to the character string indicating the necessary information [contractor transfer destination telephone number]. . Accordingly, it is determined from the document C41 that the information disclosure dialogue state is disclosed in which the necessary information [contractor transfer destination telephone number] is disclosed.
The document C42 indicating the content of the dialogue from the receiver to the customer following the document C41 includes an expression “090 0000 1111, thank you” indicating confirmation of necessary information [contractor's transfer destination telephone number]. ing. Thus, it is determined from the document C42 that the information reception dialogue state in which the necessary information [contractor transfer destination telephone number] has been received is determined.
As described above, regarding the necessary information [contractor transfer destination telephone number], it is determined in the conversation unit S13 that the dialog state has changed in the order of the information request dialog state, the request reception dialog state, the information disclosure dialog state, and the information reception dialog state. As the necessary information [contractor relocation destination telephone number], the character string “090001111” corresponding to the necessary information clue information [part of speech: number] indicating the necessary information [contractor relocation destination telephone number] is extracted.

なお、例えば、会話単位Ｓ１４には、必要情報［使用停止日］、［使用停止時刻］、［契約者名］、［契約者住所］に対応する必要情報手掛かり情報［月］、［日］、［時］、［曜］、［停め］、［品詞：数詞］、［品詞：地名］が付与されているが、会話単位Ｓ１４内では前述した対話状態の遷移が生じていない。このため、対話状態遷移判別手段１７は、会話単位Ｓ１４に含まれている音声認識情報からは、前述した対話状態の遷移を判別することができない。すなわち、会話単位Ｓ１４からは必要情報が抽出されない。 For example, in the conversation unit S14, the necessary information [usage date], [usage time], [contractor name], [contractor address] and necessary information clue information [month], [day], [Time], [Day of the week], [Stop], [Part of speech: Numeral], [Part of speech: Place name] are given, but the conversation state transition described above does not occur in the conversation unit S14. For this reason, the dialog state transition determining means 17 cannot determine the above-described dialog state transition from the speech recognition information included in the conversation unit S14. That is, necessary information is not extracted from the conversation unit S14.

以上のように、前述した実施の形態では、話者間の対話時の音声信号（音声信号から作成された音声認識情報）から、対話状態が、用件に対する必要情報に関して、情報要求対話状態、要求受理対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって、あるいは、情報要求対話状態、情報開示対話状態、情報受理対話状態の順に遷移したことを判別することによって必要情報を抽出している。これにより、話者間の対話時の音声信号から、話者間の対話の用件に対する必要情報（例えば、一方の話者から他方の話者に依頼した用件を遂行するのに必要な必要情報）を精度よく抽出することができる。特に、前述した本実施の形態では、音声認識情報を会話単位に分割し、会話単位の中で対話状態の遷移を判別（対話進行をトレース）することによって必要情報を抽出しているため、必要情報をより精度よく検出することができる。また、会話単位に、必要情報を抽出する手掛かりとなる必要情報手掛かり情報が含まれていれば、当該必要情報手掛かり情報を当該会話単位に付与し、必要情報を抽出する際には、必要情報を抽出する手掛かりとなる必要情報手掛かり情報が含まれている会話単位を選択しているため、必要情報をより精度よく抽出することができ、また、必要情報を抽出するための処理負担を軽減することができる。 As described above, in the above-described embodiment, the dialogue state is the information request dialogue state regarding the necessary information for the requirement from the voice signal (voice recognition information created from the voice signal) during the dialogue between the speakers. By determining that the request acceptance dialogue state, the information disclosure dialogue state, and the information acceptance dialogue state have been changed in order, or by determining that the information request dialogue state, the information disclosure dialogue state, and the information acceptance dialogue state have been changed in this order. Necessary information is extracted by. As a result, information necessary for the conversation between the speakers is obtained from the voice signal during the conversation between the speakers (for example, necessary to execute the request requested from one speaker to the other speaker). Information) can be extracted with high accuracy. In particular, in the present embodiment described above, the necessary information is extracted by dividing the speech recognition information into conversation units and discriminating the transition of the conversation state within the conversation units (tracing the conversation progress). Information can be detected more accurately. In addition, if the conversation unit includes necessary information clue information that is a clue to extract necessary information, the necessary information clue information is assigned to the conversation unit and the necessary information is extracted when the necessary information is extracted. Necessary information used as a clue to be extracted Since the conversation unit containing the clue information is selected, the necessary information can be extracted more accurately and the processing load for extracting the necessary information can be reduced. Can do.

なお、本発明は、コンピュータに、前述した処理手段（処理手段を構成する管理手段、音声認識手段、会話単位分割手段、用件判別手段、必要情報手掛かり情報付与手段、対話状態遷移判別手段等）の処理を実行させるためのプログラムあるいはプログラムが記憶された記憶媒体として構成することもできる。 In the present invention, the processing means described above (management means constituting the processing means, voice recognition means, conversation unit dividing means, requirement determining means, necessary information clue information providing means, dialog state transition determining means, etc.) It is also possible to configure a program for executing the process or a storage medium storing the program.

本願発明は、前述した実施の形態の構成に限定されず、種々の変更、追加、削除が可能である。
用件判別手段１５は、音声認識情報に基づいて対話の用件を判別したが、対話の用件を判別する方法としては、これ以外の種々の方法を用いることができる。例えば、入力手段３０に設けられている入力キーを操作することによって、あるいは、入力手段３０に設けられている表示画面の入力部を選択することによって、入力手段３０から、用件を示す用件識別情報を入力する方法を用いることができる。この場合、図８に示されているステップＡ４において、用件判別手段１５は、入力手段３０を介して入力された用件識別情報に基づいて、対話の用件を判別する。このような用件判別手段１５を用いることにより、用件を判別する処理が容易となる。
会話単位を判別するために用いられる会話単位判別情報は、対話の用件等に応じて適宜設定することができる。
音声信号から抽出する必要情報は、対話の用件等に応じて適宜設定することができる。
必要情報を抽出する手掛かりとなる必要情報手掛かり情報は、必要情報に応じて適宜設定することができる。
音声認識情報を会話単位に分割するとともに、必要情報を抽出する手掛かりとなる必要情報手掛かり情報を会話単位に付与したが、会話単位に分割する処理や、会話単位に必要情報手掛かり情報を付与する処理は省略することもできる。この場合でも、対話状態の遷移を判別することによって必要情報を抽出するため、必要情報を精度よく抽出することができる。
対話状態を判別するために用いられる対話状態判別情報は、対話の用件等に応じて適宜設定することができる。
対話の用件としては、一方の話者から他方の話者に依頼する用件の内容に応じて適宜設定可能である。
入力手段３０や出力手段４０は、処理手段１０と通信回線を介して接続可能な遠方の端末装置に設けられている入力手段や出力手段を用いることができる。 The present invention is not limited to the configuration of the embodiment described above, and various changes, additions, and deletions are possible.
Although the requirement determination unit 15 determines the requirement for dialogue based on the voice recognition information, various other methods can be used as a method for determining the requirement for dialogue. For example, by operating an input key provided on the input means 30 or by selecting an input part of a display screen provided on the input means 30, a condition indicating a condition is input from the input means 30. A method of inputting identification information can be used. In this case, in step A <b> 4 shown in FIG. 8, the matter determination unit 15 determines a conversation item based on the item identification information input via the input unit 30. By using such a requirement discriminating means 15, processing for discriminating a requirement is facilitated.
The conversation unit discrimination information used for discriminating the conversation unit can be set as appropriate according to the dialogue requirements.
Necessary information to be extracted from the audio signal can be set as appropriate according to the dialogue requirements.
Necessary information clue information as a clue to extract necessary information can be set as appropriate according to the necessary information.
The speech recognition information is divided into conversation units, and the necessary information clue information that is a clue to extract necessary information is assigned to the conversation unit, but the process of dividing into conversation units and the process of giving necessary information clue information to the conversation unit Can be omitted. Even in this case, necessary information can be extracted with high accuracy because necessary information is extracted by discriminating the transition of the dialogue state.
Dialog state determination information used to determine the dialog state can be set as appropriate according to the requirements of the dialog.
The conversation requirement can be set as appropriate according to the content of the requirement requested from one speaker to the other speaker.
As the input means 30 and the output means 40, an input means and an output means provided in a remote terminal device that can be connected to the processing means 10 via a communication line can be used.

１０処理手段
１１管理手段
１２必要情報抽出手段
１３音声認識手段
１４会話単位分割手段
１５用件判別手段
１６必要情報手掛かり情報付与手段
１７対話状態遷移判別手段
１７ａ情報要求対話状態判別手段
１７ｂ要求受理対話状態判別手段
１７ｃ情報開示対話状態判別手段
１７ｄ情報受理対話状態判別手段
２０記憶手段
２１音声信号データベース
２２会話単位判別情報データベース
２３必要情報データベース
２４必要情報手掛かり情報データベース
２５対話状態判別情報データベース
２５ａ情報要求手掛かり情報
２５ｂ要求受理手掛かり情報
２５ｃ情報開示手掛かり情報
２５ｄ情報受理手掛かり情報
２６音声認識情報データベース
３０入力手段
４０出力手段 DESCRIPTION OF SYMBOLS 10 Processing means 11 Management means 12 Necessary information extraction means 13 Speech recognition means 14 Conversation unit division means 15 Requirements determination means 16 Necessary information clue information addition means 17 Dialog state transition determination means 17a Information request dialog state determination means 17b Request acceptance dialog state Discriminating unit 17c Information disclosure dialogue state discriminating unit 17d Information receiving dialogue state discriminating unit 20 Storage unit 21 Audio signal database 22 Conversation unit discriminating information database 23 Necessary information database 24 Necessary information clue information database 25 Dialogue state discriminating information database 25a Information request clue information 25b Request acceptance clue information 25c Information disclosure clue information 25d Information acceptance clue information 26 Speech recognition information database 30 Input means 40 Output means

Claims

A speech signal processing apparatus that extracts necessary information related to the requirement from a speech signal at the time of dialogue between speakers for the requirement,
A storage unit, a processing unit, an input unit, and an output unit;
The storage means includes a necessary information database and a dialog state determination information database,
In the necessary information database, a necessary information list indicating necessary information relating to the requirement is stored corresponding to the requirement,
In the dialog state determination information database, information request clue information for determining whether the dialog state requested the necessary information, information disclosure clue information for determining the dialog state disclosing the necessary information, and necessary information Information reception clue information for determining whether the conversation is accepted is stored.
The processing means includes voice recognition means, requirement determination means, dialogue state transition determination means, and management means,
The speech recognition means recognizes a speech signal at the time of dialogue between speakers inputted from the input means and outputs speech recognition information.
The requirement determining means determines a requirement for dialogue between speakers based on the speech recognition information output from the speech recognition means,
The dialog state transition determination means is based on the voice recognition information output from the voice recognition means, information request clue information, information disclosure clue information, and information acceptance clue information stored in the dialog state determination information database. By determining that the conversation state between the speakers has transitioned in the order of the conversation state in which the necessary information is requested, the conversation state in which the necessary information is disclosed, and the conversation state in which the necessary information has been received, it is determined by the requirement determining unit. The necessary information shown in the necessary information list stored in the necessary information database corresponding to the requirements is extracted,
The audio signal processing apparatus, wherein the management unit outputs the necessary information extracted by the dialog state transition determination unit from the output unit.

A speech signal processing apparatus that extracts necessary information related to the requirement from a speech signal at the time of dialogue between speakers for the requirement,
A storage unit, a processing unit, an input unit, and an output unit;
The storage means includes a necessary information database and a dialog state determination information database,
In the necessary information database, a necessary information list indicating necessary information relating to the requirement is stored corresponding to the requirement,
In the dialog state determination information database, information request clue information for determining whether the dialog state requested the necessary information, information disclosure clue information for determining the dialog state disclosing the necessary information, and necessary information Information reception clue information for determining whether the conversation is accepted is stored.
The processing means includes voice recognition means, requirement determination means, dialogue state transition determination means, and management means,
The speech recognition means recognizes a speech signal at the time of dialogue between speakers inputted from the input means and outputs speech recognition information.
The message determination means determines a message for dialogue between the speakers based on the message identification information input from the input means,
The dialog state transition determination means is based on the voice recognition information output from the voice recognition means, information request clue information, information disclosure clue information, and information acceptance clue information stored in the dialog state determination information database. By determining that the conversation state between the speakers has transitioned in the order of the conversation state in which the necessary information is requested, the conversation state in which the necessary information is disclosed, and the conversation state in which the necessary information has been received, it is determined by the requirement determining unit. The necessary information shown in the necessary information list stored in the necessary information database corresponding to the requirements is extracted,
The audio signal processing apparatus, wherein the management unit outputs the necessary information extracted by the dialog state transition determination unit from the output unit.

The audio signal processing device according to claim 1 or 2,
The dialog state determination information database further stores request acceptance clue information for determining that the required state information is in a dialog state.
The dialog state transition determination means includes voice recognition information output from the voice recognition means, information request clue information, request acceptance clue information, information disclosure clue information, and information acceptance clue stored in the dialog state determination information database. Based on the information, the conversation state between the speakers transitioned in the order of the conversation state that requested the necessary information, the conversation state that accepted the request for the necessary information, the conversation state that disclosed the necessary information, and the conversation state that accepted the necessary information. And extracting necessary information indicated by a necessary information list stored in the necessary information database corresponding to the business determined by the business determination unit based on the determination result An audio signal processing device.

The audio signal processing device according to any one of claims 1 to 3,
The storage means further includes a conversation unit determination information database and a necessary information clue information database,
In the conversation unit determination information database, conversation unit determination information for dividing voice recognition information into conversation units is stored,
In the necessary information clue information database, a necessary information clue information list indicating necessary information clue information serving as a clue for extracting necessary information is stored in correspondence with requirements.
The processing means further includes a conversation unit dividing means and a necessary information clue information giving means,
The conversation unit dividing unit divides the speech recognition information output from the speech recognition unit into conversation units based on the conversation unit determination information stored in the conversation unit determination information database.
The necessary information clue information providing means is the necessary information stored in the necessary information clue information database corresponding to the message determined by the message determining means for the conversation unit divided by the conversation unit dividing means. When necessary information clue information shown in the clue information list is included, the necessary information clue information is given to the conversation unit,
The dialog state transition determining means selects a conversation unit including necessary information clue information that is a clue to extract necessary information, and extracts the necessary information based on voice recognition information included in the selected conversation unit. An audio signal processing device.

The program for making a computer perform the process of the said process means in any one of Claims 1-4.

A storage medium in which a program for causing a computer to execute the processing of the processing means according to claim 1 is recorded.