JP6991409B2

JP6991409B2 - Information processing equipment, programs and information processing methods

Info

Publication number: JP6991409B2
Application number: JP2021550833A
Authority: JP
Inventors: 辰彦斉藤; 勇之相川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2019-10-02
Filing date: 2019-10-02
Publication date: 2022-01-12
Anticipated expiration: 2039-10-02
Also published as: TW202115713A; WO2021064886A1; JPWO2021064886A1

Description

本発明は、情報処理装置、プログラム及び情報処理方法に関する。 The present invention relates to an information processing apparatus, a program and an information processing method.

コールセンターシステムは、お客様であるユーザとオペレータとのやり取りを通して、様々な情報を取得する必要がある。ユーザの氏名、住所又は電話番号等は、その一例である。従来、オペレータが、復唱を交えながらこのような情報を確認して、確認された情報をコールセンターシステムに手入力しており、非常にコストがかかっていた。 The call center system needs to acquire various information through the interaction between the user who is the customer and the operator. The user's name, address, telephone number, etc. are examples. In the past, the operator confirmed such information while repeating it, and manually input the confirmed information into the call center system, which was very costly.

これに対し、例えば特許文献１には、音声認識を用いて、本人又はその他の確認項目の自動チェックを行うことで、オペレータの確認作業を支援するオペレータ本人確認支援システムが記載されている。 On the other hand, for example, Patent Document 1 describes an operator identity verification support system that supports an operator's verification work by automatically checking the identity or other confirmation items using voice recognition.

特開２０１４－１９７１４０号公報Japanese Unexamined Patent Publication No. 2014-197140

しかしながら、従来のオペレータ本人確認支援システムは、ユーザ及びオペレータの発話を音声認識し、認識された音声を示すテキストからキーワードを抽出しているが、音声だけでは氏名や住所等の漢字又は綴りといった文字又は文字列を特定することが難しいため、実際の通話から、音声だけで必要な情報を抽出することは難しかった。 However, the conventional operator identity verification support system recognizes the utterances of the user and the operator by voice and extracts the keyword from the text indicating the recognized voice, but the voice alone is used for characters such as kanji or spelling such as name and address. Or, since it is difficult to specify the character string, it is difficult to extract the necessary information only by voice from the actual call.

そこで、本発明は、文字又は文字列の説明を含む音声から、自動的に所望の情報を特定できるようにすることを目的とする。 Therefore, an object of the present invention is to make it possible to automatically identify desired information from a voice including a description of a character or a character string.

本発明の一態様に係る情報処理装置は、発話された音声を含む音声データから、前記発話された音声を認識する音声認識部と、前記認識された音声から、文字又は文字列と、前記文字又は前記文字列の書き方を説明している説明表現と、を含む部分である説明部分を抽出する説明部分抽出部と、前記説明表現と、前記説明表現で説明されている前記文字又は前記文字列と、を対応付ける説明表現情報を記憶する説明表現情報記憶部と、前記説明表現情報を参照することで、前記説明表現で説明されている前記文字又は前記文字列を、固有情報として確定する固有情報確定部と、を備えることを特徴とする。 The information processing apparatus according to one aspect of the present invention has a voice recognition unit that recognizes the spoken voice from voice data including the spoken voice, and a character or a character string and the character from the recognized voice. Alternatively, the explanatory part extraction unit for extracting the explanatory portion which is a portion including the explanatory expression explaining how to write the character string, the explanatory expression, and the character or the character string described in the explanatory expression. By referring to the explanatory expression information storage unit that stores the explanatory expression information associated with and, and the explanatory expression information , the unique information that determines the character or the character string described in the explanatory expression as unique information. It is characterized by having a fixed portion.

本発明の一態様に係るプログラムは、コンピュータを、発話された音声を含む音声データから、前記発話された音声を認識する音声認識部、前記認識された音声から、文字又は文字列と、前記文字又は前記文字列の書き方を説明している説明表現と、を含む部分である説明部分を抽出する説明部分抽出部、前記説明表現と、前記説明表現で説明されている前記文字又は前記文字列と、を対応付ける説明表現情報を記憶する説明表現情報記憶部、及び、前記説明表現情報を参照することで、前記説明表現で説明されている前記文字又は前記文字列を、固有情報として確定する固有情報確定部、として機能させることを特徴とする。 In the program according to one aspect of the present invention, the computer is subjected to a voice recognition unit that recognizes the spoken voice from voice data including the spoken voice, a character or a character string from the recognized voice, and the character. Alternatively, an explanatory part extraction unit for extracting an explanatory part which is a part including an explanatory expression explaining how to write the character string, the explanatory expression, and the character or the character string described in the explanatory expression. By referring to the explanatory expression information storage unit that stores the explanatory expression information associated with, and the explanatory expression information, the character or the character string described in the explanatory expression is determined as unique information. It is characterized by functioning as a fixed part.

本発明の一態様に係る情報処理方法は、音声認識部が、発話された音声を含む音声データから、前記発話された音声を認識し、説明部分抽出部が、前記認識された音声から、文字又は文字列と、前記文字又は前記文字列の書き方を説明している説明表現と、を含む部分である説明部分を抽出し、固有情報確定部が、前記説明表現と、前記説明表現で説明されている前記文字又は前記文字列と、を対応付ける説明表現情報を参照することで、前記説明表現で説明されている前記文字又は前記文字列を、固有情報として確定することを特徴とする。 In the information processing method according to one aspect of the present invention, the voice recognition unit recognizes the spoken voice from the voice data including the spoken voice, and the explanatory partial extraction unit recognizes a character from the recognized voice. Alternatively, an explanatory part that is a part including the character string and the explanatory expression explaining the character or how to write the character string is extracted, and the unique information determination unit is explained by the explanatory expression and the explanatory expression. By referring to the explanatory expression information associated with the character or the character string, the character or the character string described in the explanatory expression is determined as unique information.

本発明の一又は複数の態様によれば、文字又は文字列の説明を含む音声から、自動的に所望の情報を特定することができる。 According to one or more aspects of the present invention, desired information can be automatically specified from a voice including a description of a character or a character string.

実施の形態１に係る通話データ情報抽出装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the call data information extraction apparatus which concerns on Embodiment 1. FIG. 説明抽出ルールの第１の例を示す概略図である。It is the schematic which shows the 1st example of the explanatory extraction rule. 説明抽出ルールの第２の例を示す概略図である。It is a schematic diagram which shows the 2nd example of the explanatory extraction rule. 説明表現情報の第１の例を示す概略図である。It is the schematic which shows the 1st example of explanatory expression information. 説明表現情報の第２の例を示す概略図である。It is the schematic which shows the 2nd example of explanatory expression information. 説明表現情報の第３の例を示す概略図である。It is the schematic which shows the 3rd example of explanatory expression information. 説明表現情報の第４の例を示す概略図である。It is the schematic which shows the 4th example of explanatory expression information. 説明表現情報の第５の例を示す概略図である。It is the schematic which shows the 5th example of explanatory expression information. 実施の形態１に係る通話データ情報抽出装置のハードウェア構成図である。It is a hardware block diagram of the call data information extraction apparatus which concerns on Embodiment 1. FIG. 実施の形態１に係る通話データ情報抽出装置の動作を示すフローチャートである。It is a flowchart which shows the operation of the call data information extraction apparatus which concerns on Embodiment 1. 実施の形態２に係る通話データ情報抽出装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the call data information extraction apparatus which concerns on Embodiment 2. FIG. 実施の形態２に係る通話データ情報抽出装置において、説明ＤＢに記憶されている説明表現情報を更新する動作を示すフローチャートである。It is a flowchart which shows the operation of updating the explanatory expression information stored in the explanatory DB in the call data information extraction apparatus which concerns on Embodiment 2. FIG. 実施の形態３に係る通話データ情報抽出装置の構成を概略的に示すブロック図である。FIG. 3 is a block diagram schematically showing a configuration of a call data information extraction device according to a third embodiment. 実施の形態３に係る通話データ情報抽出装置において、入力される音声信号から固有情報を確定する動作を示すフローチャートである。FIG. 5 is a flowchart showing an operation of determining unique information from an input voice signal in the call data information extraction device according to the third embodiment. 実施の形態４に係る通話データ情報抽出装置の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of the call data information extraction apparatus which concerns on Embodiment 4. FIG.

実施の形態１．
図１は、実施の形態１に係る情報処理装置である通話データ情報抽出装置１００の構成を概略的に示すブロック図である。
通話データ情報抽出装置１００は、音声入力部１０１と、音声取得部１０２と、音声認識部１０３と、説明部分抽出部１０４と、説明データベース（以下、説明ＤＢという）１０５と、固有情報確定部１０６とを備える。通話データ情報抽出装置１００は、以上の構成で、発話の音声認識を行い、説明ＤＢ１０５に基づいて、固有情報を確定する。Embodiment 1.
FIG. 1 is a block diagram schematically showing a configuration of a call data information extraction device 100, which is an information processing device according to the first embodiment.
The call data information extraction device 100 includes a voice input unit 101, a voice acquisition unit 102, a voice recognition unit 103, an explanation part extraction unit 104, an explanation database (hereinafter referred to as an explanation DB) 105, and a unique information determination unit 106. And prepare. The call data information extraction device 100 performs voice recognition of utterances with the above configuration, and determines unique information based on the explanation DB 105.

音声入力部１０１は、抽出対象であるお客様の入力音声を示す音声信号の入力を受け付ける。入力された音声信号は、音声取得部１０２に与えられる。以下、お客様をユーザという。 The voice input unit 101 receives an input of a voice signal indicating the input voice of the customer to be extracted. The input voice signal is given to the voice acquisition unit 102. Hereinafter, the customer is referred to as a user.

音声取得部１０２は、音声入力部１０１から与えられる音声信号を、例えばＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）によりＡ／Ｄ（Ａｎａｌｏｇ／Ｄｉｇｉｔａｌ）変換することで、音声データを取得する。取得された音声データは、音声認識部１０３に与えられる。 The voice acquisition unit 102 acquires voice data by A / D (Analog / Digital) conversion of the voice signal given from the voice input unit 101 by, for example, PCM (Pulse Code Modulation). The acquired voice data is given to the voice recognition unit 103.

音声アナログ信号で示される入力音声は、予め一人分の話者の音声にしておく必要がある。例えば、コールセンターでの通話の場合には、ステレオで、ユーザと、オペレータとの音声を分離しておく必要がある。あるいは、モノラルで、複数話者の音声が混合している場合、事前に音声分離技術等で、話者毎の音声に分離しておく必要がある。 Voice The input voice indicated by the analog signal needs to be the voice of one speaker in advance. For example, in the case of a call in a call center, it is necessary to separate the voices of the user and the operator in stereo. Alternatively, when the voices of a plurality of speakers are mixed in monaural, it is necessary to separate the voices of each speaker in advance by voice separation technology or the like.

ここでは、音声入力部１０１は、ユーザの入力音声を示す音声信号の入力を受けるものとするが、実施の形態１はこのような例に限定されない。例えば、音声取得部１０２において、公知の技術を用いて、入力された音声信号から、話者毎の入力音声を特定して、ユーザの入力音声を示す音声データが生成されてもよい。 Here, the voice input unit 101 receives an input of a voice signal indicating a user's input voice, but the first embodiment is not limited to such an example. For example, the voice acquisition unit 102 may use a known technique to specify the input voice for each speaker from the input voice signal and generate voice data indicating the user's input voice.

音声認識部１０３は、音声取得部１０２によりデジタル化された音声データから、発話された音声に該当する音声区間を検出して、その音声区間の音声の認識処理を行うことにより、発話された音声を認識して、その発話された音声に対応する発話内容を示すテキストのデータである音声テキストデータを生成する。生成された音声テキストデータは、説明部分抽出部１０４に与えられる。 The voice recognition unit 103 detects a voice section corresponding to the spoken voice from the voice data digitized by the voice acquisition unit 102, and performs voice recognition processing for the voice section to perform the spoken voice. Is recognized, and voice text data, which is text data indicating the content of the speech corresponding to the speech spoken, is generated. The generated voice text data is given to the explanatory partial extraction unit 104.

説明部分抽出部１０４は、音声認識部１０３から与えられた音声テキストデータで示される、ユーザの発話内容から、文字又は文字列と、その文字又は文字列の書き方を説明している説明表現とを含む部分である説明部分を抽出する。そして、説明部分抽出部１０４、抽出された説明部分を示す説明部分テキストデータを生成する。生成された説明部分テキストデータは、固有情報確定部１０６に与えられる。 The explanation partial extraction unit 104 describes a character or a character string and an explanatory expression explaining how to write the character or the character string from the speech content of the user, which is indicated by the voice text data given by the voice recognition unit 103. Extract the explanation part that is the part to be included. Then, the explanatory portion extraction unit 104 generates explanatory portion text data indicating the extracted explanatory portion. The generated explanatory partial text data is given to the unique information determination unit 106.

ここでいう説明部分は、氏名又は住所等、音だけでは漢字又は綴りといった文字又は文字列を確定することが難しい場合に、漢字又は綴りといった文字又は文字列を確定するための補足情報を述べている部分のことを表す。 The explanation part here describes supplementary information for determining characters or character strings such as Chinese characters or spelling when it is difficult to determine characters or character strings such as Chinese characters or spelling only by sound, such as name or address. Represents the part that is.

例えば、説明部分抽出部１０４は、音声テキストデータで示される発話内容が、例えば、図２又は図３に示されている説明抽出ルール情報で示されている説明抽出ルールに該当する場合に、そのルールで示されている一部を説明部分として抽出してもよい。
ここで、説明抽出ルールは、認識された音声において、文字又は文字列の書き方を説明するために使用される表現のルールである。For example, the explanatory partial extraction unit 104 may use the explanatory partial extraction unit 104 when the utterance content indicated by the voice text data corresponds to the explanatory extraction rule indicated by the explanatory extraction rule information shown in FIG. 2 or FIG. 3, for example. A part indicated by the rule may be extracted as an explanatory part.
Here, the explanation extraction rule is an expression rule used to explain how to write a character or a character string in the recognized voice.

例えば、図２に示されている説明抽出ルール情報の一行目には、＜ＥＮＴＩＴＹ＞は＜ＤＥＳＣＲＩＰＴＩＯＮ＞の＜ＥＮＴＩＴＹ＞という説明抽出ルールが格納されている。
発話内容が「フクシマは都道府県のフクシマ」である場合、「フクシマ」が＜ＥＮＴＩＴＹ＞となり、「都道府県」が＜ＤＥＳＣＲＩＰＴＩＯＮ＞となる。ここで、最初の「＜ＥＮＴＩＴＹ＞」に一致する部分が、説明される文字又は文字列となり、「＜ＤＥＳＣＲＩＰＴＩＯＮ＞の＜ＥＮＴＩＴＹ＞」に一致する部分が、説明表現となる。For example, in the first line of the explanation extraction rule information shown in FIG. 2, the explanation extraction rule that <ENTITY> is <DESCPRIPTION><ENTITY> is stored.
When the utterance content is "Fukushima is a prefecture's Fukushima", "Fukushima" becomes <ENTITY> and "prefecture" becomes <DESCPRIPTION>. Here, the part corresponding to the first "<ENTITY>" is the character or character string to be explained, and the part corresponding to "<ENTITY> of <DESCRIPTION>" is the explanatory expression.

なお、説明部分抽出部１０４は、図２又は図３に示されている説明抽出ルール情報を用いて説明部分を抽出しているが、実施の形態１はこのような例に限定されない。例えば、説明部分抽出部１０４は、機械学習を行って、説明部分を抽出してもよい。例えば、説明部分抽出部１０４は、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）のような分類器を用いて、文又は文節単位で説明部分が含まれるかどうか分類してもよい。 The explanatory portion extraction unit 104 extracts the explanatory portion using the explanatory extraction rule information shown in FIG. 2 or FIG. 3, but the first embodiment is not limited to such an example. For example, the explanatory portion extraction unit 104 may perform machine learning to extract the explanatory portion. For example, the explanatory part extraction unit 104 may use a classifier such as an SVM (Support Vector Machine) to classify whether or not the explanatory part is included in sentence or clause units.

また、例えば、音声取得部１０２に、オペレータの入力音声を示す音声信号も入力して、音声データを生成し、その音声データから認識された音声テキストデータについても説明部分抽出部１０４に与えることで、説明部分抽出部１０４は、オペレータが「どのような漢字ですか？」といった予め定められた発話内容の発話をした後に、ユーザが発話する内容を説明部分として抽出してもよい。 Further, for example, a voice signal indicating the input voice of the operator is also input to the voice acquisition unit 102 to generate voice data, and the voice text data recognized from the voice data is also given to the explanatory partial extraction unit 104. , The explanation part extraction unit 104 may extract the content uttered by the user as the explanation part after the operator utters a predetermined utterance content such as "what kind of kanji is it?".

説明ＤＢ１０５は、音だけでは漢字又は綴りといった文字又は文字列を確定することが難しい場合に、それらを確定するための補足情報として述べられる説明表現を示す説明表現情報を記憶する説明表現情報記憶部である。説明表現情報は、説明表現と、その説明表現により書き方が説明される文字又は文字列とを対応付ける。 The explanatory DB 105 is an explanatory expression information storage unit that stores explanatory expression information indicating explanatory expressions described as supplementary information for determining characters or character strings such as Chinese characters or spelling only by sound. Is. The explanatory expression information associates the explanatory expression with a character or a character string whose writing method is explained by the explanatory expression.

ここで、図４～図８は、説明ＤＢ１０５に記憶されている説明表現情報の例を示す概略図である。
図４は、固有情報が単数の漢字による名前である場合の説明表現情報の例である。図４に示されているように、説明表現に対して、一つ漢字が対応付けられている。Here, FIGS. 4 to 8 are schematic views showing an example of explanatory expression information stored in the explanatory DB 105.
FIG. 4 is an example of explanatory expression information when the unique information is a name in a single Chinese character. As shown in FIG. 4, one Chinese character is associated with the explanatory expression.

図５は、固有情報が複数の漢字による名前である場合の説明表現情報の例である。図５に示されているように、説明表現に対して、複数の漢字が対応付けられている。 FIG. 5 is an example of explanatory expression information when the unique information is a name in a plurality of Chinese characters. As shown in FIG. 5, a plurality of Chinese characters are associated with the explanatory expression.

図６は、固有情報が中国語の名前である場合の説明表現情報の例である。図６に示されているように、中国語の説明表現に対して、一つの漢字からなる中国語の名前が対応付けられている。
図７は、固有情報が英語の名前である場合の説明表現情報の例である。図７に示されているように、英語の説明表現に対して、英語の名前が対応付けられている。FIG. 6 is an example of explanatory expression information when the unique information is a Chinese name. As shown in FIG. 6, a Chinese name consisting of one Chinese character is associated with the Chinese explanatory expression.
FIG. 7 is an example of explanatory expression information when the unique information is an English name. As shown in FIG. 7, an English name is associated with an English explanatory expression.

図８は、固有情報が住所の例である。図８に示されているように、説明表現に対して、地名が対応付けられている。 In FIG. 8, the unique information is an example of an address. As shown in FIG. 8, a place name is associated with the explanatory expression.

固有情報確定部１０６は、説明表現で説明されている文字又は文字列を、固有情報として確定する。例えば、固有情報確定部１０６は、説明ＤＢ１０５に記憶されている説明表現情報を参照することで、説明表現で説明されている文字又は文字列を確定する。 The unique information determination unit 106 determines the character or character string described in the explanatory expression as unique information. For example, the unique information determination unit 106 determines the character or the character string explained in the explanatory expression by referring to the explanatory expression information stored in the explanatory DB 105.

具体的には、固有情報確定部１０６は、説明部分抽出部１０４から与えられる説明部分テキストデータで示される説明部分に、説明ＤＢ１０５に記憶されている説明表現情報で示される説明表現が含まれているか否かを判断する。そして、固有情報確定部１０６は、説明部分に説明表現が含まれている場合には、その説明表現に対応付けられている文字又は文字列を特定する。そして、固有情報確定部１０６は、特定された文字又は文字列を固有情報として確定する。固有情報確定部１０６は、確定された固有情報を示す固有情報データを別の装置（図示せず）又は後段の処理部（図示せず）に出力してもよい。 Specifically, the unique information determination unit 106 includes an explanatory expression represented by the explanatory expression information stored in the explanatory DB 105 in the explanatory portion indicated by the explanatory partial text data given from the explanatory portion extracting unit 104. Judge whether or not. Then, when the explanatory portion includes the explanatory expression, the unique information determination unit 106 specifies the character or the character string associated with the explanatory expression. Then, the unique information determination unit 106 determines the specified character or character string as unique information. The unique information determination unit 106 may output the unique information data indicating the confirmed unique information to another device (not shown) or a subsequent processing unit (not shown).

ここでは、固有情報確定部１０６は、説明部分に説明表現が含まれているか否かの判断を、文字列の完全一致又は部分一致により行ってもよく、また、例えば、説明部分に含まれている表現と説明表現との類似度を公知の技術で計算し、その類似度が閾値以上であるか否かにより行ってもよい。この場合には、類似度が閾値以上である場合に、説明部分に説明表現が含まれていると判断される。 Here, the unique information determination unit 106 may determine whether or not the explanatory expression is included in the explanatory portion by an exact match or a partial match of the character string, or is included in the explanatory portion, for example. The degree of similarity between the present expression and the explanatory expression may be calculated by a known technique, and may be performed depending on whether or not the degree of similarity is equal to or higher than the threshold value. In this case, when the similarity is equal to or higher than the threshold value, it is determined that the explanatory portion includes the explanatory expression.

図９は、実施の形態１に係る通話データ情報抽出装置１００のハードウェア構成図である。
図９に示されているように、通話データ情報抽出装置１００は、メモリ１１と、プロセッサ１２と、音声インタフェース（以下、音声Ｉ／Ｆという）１３と、テキスト入力インタフェース（以下、テキスト入力Ｉ／Ｆという）１４と、ネットワークインタフェース（以下、ネットワークＩ／Ｆという）１５とを備えるコンピュータ１０で実現することができる。FIG. 9 is a hardware configuration diagram of the call data information extraction device 100 according to the first embodiment.
As shown in FIG. 9, the call data information extraction device 100 includes a memory 11, a processor 12, a voice interface (hereinafter referred to as voice I / F) 13, and a text input interface (hereinafter referred to as text input I / F). This can be realized by a computer 10 including a network interface (hereinafter referred to as a network I / F) 15 and a computer 10 having a network interface (hereinafter referred to as a network I / F) 14.

メモリ１１は、音声取得部１０２、音声認識部１０３、説明部分抽出部１０４、固有情報確定部１０６のプログラム及びその中間データを記憶する。
また、メモリ１１は、説明表現情報を記憶することで、説明ＤＢ１０５として機能する。The memory 11 stores the programs of the voice acquisition unit 102, the voice recognition unit 103, the explanation part extraction unit 104, the unique information determination unit 106, and their intermediate data.
Further, the memory 11 functions as the explanatory DB 105 by storing the explanatory expression information.

プロセッサ１２は、メモリ１１からプログラムを読み出し、そのプログラムを実行することで、音声取得部１０２、音声認識部１０３、説明部分抽出部１０４及び固有情報確定部１０６として機能する。プロセッサ１２は、例えば、プログラム処理を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）又はＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）等の回路である。 The processor 12 reads a program from the memory 11 and executes the program, thereby functioning as a voice acquisition unit 102, a voice recognition unit 103, an explanatory partial extraction unit 104, and a unique information determination unit 106. The processor 12 is, for example, a circuit such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor) that performs program processing.

音声Ｉ／Ｆ１３は、音声信号の入力を受け付けるインタフェースである。また、音声Ｉ／Ｆ１３は、応答音声を示す信号である応答音声信号を出力するインタフェースである。 The voice I / F 13 is an interface that accepts the input of a voice signal. Further, the voice I / F 13 is an interface that outputs a response voice signal, which is a signal indicating the response voice.

テキスト入力Ｉ／Ｆ１４は、オペレータがテキストデータの入力を行うインタフェースである。 The text input I / F 14 is an interface for an operator to input text data.

ネットワークＩ／Ｆ１５は、ネットワーク（図示せず）と通信を行うインタフェースである。 The network I / F15 is an interface for communicating with a network (not shown).

なお、図９において、必要なプログラム又はデータは、コンピュータ１０の内部のメモリ１１に記憶されているが、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ等の外部メモリを接続して、必要なプログラム又はデータ等をコンピュータ１０が読み込んでもよい。また、ネットワークＩ／Ｆ１５を介して、ネットワークに接続された他の装置から、必要なプログラム又はデータをコンピュータ１０が読み込んでもよい。 In FIG. 9, the necessary program or data is stored in the internal memory 11 of the computer 10, but the necessary program or data is connected to, for example, an external memory such as a USB (Universal Serial Bus) memory. Etc. may be read by the computer 10. Further, the computer 10 may read necessary programs or data from other devices connected to the network via the network I / F15.

次に動作について説明する。
図１０は、実施の形態１に係る通話データ情報抽出装置１００の動作を示すフローチャートである。
まず、音声入力部１０１は、ユーザが発話した音声を示す音声信号の入力を受ける（Ｓ１０）。Next, the operation will be described.
FIG. 10 is a flowchart showing the operation of the call data information extraction device 100 according to the first embodiment.
First, the voice input unit 101 receives an input of a voice signal indicating the voice spoken by the user (S10).

次に、音声取得部１０２は、音声信号から、ユーザが発話した音声を、音声データとして取得する（Ｓ１１）。 Next, the voice acquisition unit 102 acquires the voice spoken by the user from the voice signal as voice data (S11).

次に、音声認識部１０３は、音声データで示される音声を認識する音声認識処理を行い、認識された音声である発話内容を示す音声テキストデータを生成する（Ｓ１２）。
音声認識処理は、パターン認識に限定されるものではなく、公知の如何なる音声認識処理を用いたものでも良い。公知の音声認識処理は、例えば、古井貞煕著、『音声情報処理』、森北出版、１９９８年、ｐｐ．７９－１３２等に記載されている。Next, the voice recognition unit 103 performs a voice recognition process for recognizing the voice indicated by the voice data, and generates voice text data indicating the speech content which is the recognized voice (S12).
The voice recognition process is not limited to pattern recognition, and any known voice recognition process may be used. Known speech recognition processes include, for example, Sadaoki Furui, "Voice Information Processing", Morikita Publishing, 1998, pp. It is described in 79-132 and the like.

次に、説明部分抽出部１０４は、音声テキストデータで示される発話内容から、説明部分を抽出する処理を行い（Ｓ１３）、ユーザの発話内容に説明部分があるか否かを判断する（Ｓ１４）。説明部分がある場合（Ｓ１４でＹｅｓ）には、説明部分抽出部１０４は、抽出された説明部分を示す説明部分テキストデータを固有情報確定部１０６に与えて、処理はステップＳ１５に進む。説明部分がない場合（Ｓ１４でＮｏ）には、処理はステップＳ１３に戻る。 Next, the explanation part extraction unit 104 performs a process of extracting the explanation part from the utterance content indicated by the voice text data (S13), and determines whether or not the user's utterance content has an explanation part (S14). .. When there is an explanatory portion (Yes in S14), the explanatory portion extraction unit 104 gives the explanatory portion text data indicating the extracted explanatory portion to the unique information determination unit 106, and the process proceeds to step S15. If there is no explanation portion (No in S14), the process returns to step S13.

ステップＳ１５では、固有情報確定部１０６は、説明ＤＢ１０５に記憶されている説明表現情報を参照することで、説明部分テキストデータで示される説明部分から固有情報を確定する。 In step S15, the unique information determination unit 106 determines the unique information from the explanatory portion indicated by the explanatory partial text data by referring to the explanatory expression information stored in the explanatory DB 105.

以上のように、実施の形態１に係る通話データ情報抽出装置１００は、ユーザの音声から発話部分を抽出し、説明表現情報を参照して固有情報を確定する。これによって、冗長な入力音声から自動的に固有情報を確定することができる。 As described above, the call data information extraction device 100 according to the first embodiment extracts the utterance portion from the voice of the user and determines the unique information by referring to the explanatory expression information. As a result, the unique information can be automatically determined from the redundant input voice.

実施の形態２．
図１１は、実施の形態２に係る情報処理装置である通話データ情報抽出装置２００の構成を概略的に示すブロック図である。
通話データ情報抽出装置２００は、音声入力部１０１と、音声取得部１０２と、音声認識部１０３と、説明部分抽出部２０４と、説明ＤＢ１０５と、固有情報確定部１０６と、入力受付部２０７と、伝票データ生成部２０８と、伝票データ記憶部２０９と、データベース更新部（以下、ＤＢ更新部という）２１０とを備える。Embodiment 2.
FIG. 11 is a block diagram schematically showing the configuration of the call data information extraction device 200, which is the information processing device according to the second embodiment.
The call data information extraction device 200 includes a voice input unit 101, a voice acquisition unit 102, a voice recognition unit 103, an explanation part extraction unit 204, an explanation DB 105, a unique information determination unit 106, an input reception unit 207, and the like. It includes a slip data generation unit 208, a slip data storage unit 209, and a database update unit (hereinafter referred to as a DB update unit) 210.

実施の形態２に係る通話データ情報抽出装置２００の音声入力部１０１、音声取得部１０２、音声認識部１０３、説明ＤＢ１０５及び固有情報確定部１０６は、実施の形態１に係る通話データ情報抽出装置１００の音声入力部１０１、音声取得部１０２、音声認識部１０３、説明ＤＢ１０５及び固有情報確定部１０６と同様である。 The voice input unit 101, the voice acquisition unit 102, the voice recognition unit 103, the explanation DB 105, and the unique information determination unit 106 of the call data information extraction device 200 according to the second embodiment are the call data information extraction device 100 according to the first embodiment. This is the same as the voice input unit 101, the voice acquisition unit 102, the voice recognition unit 103, the explanation DB 105, and the unique information determination unit 106.

説明部分抽出部２０４は、実施の形態１における説明部分抽出部１０４と同様に、音声認識部１０３から与えられた音声テキストデータで示される、ユーザの発話内容から、説明部分を抽出し、抽出された説明部分を示す説明部分テキストデータを生成する。
実施の形態２では、説明部分抽出部２０４は、生成された説明部分テキストデータを、固有情報確定部１０６に与えるとともに、伝票データ記憶部２０９に記憶する。Similar to the explanation part extraction unit 104 in the first embodiment, the explanation part extraction unit 204 extracts and extracts the explanation part from the user's utterance content indicated by the voice text data given by the voice recognition unit 103. Generates explanatory text data indicating the explanatory part.
In the second embodiment, the explanatory partial extraction unit 204 gives the generated explanatory partial text data to the unique information determination unit 106 and stores it in the slip data storage unit 209.

入力受付部２０７は、オペレータからのテキストの入力を受け付ける。例えば、入力受付部２０７は、説明部分抽出部１０４で抽出された説明部分に含まれている説明表現で書き方が説明された文字又は文字列の入力を受け付ける。 The input receiving unit 207 receives the input of the text from the operator. For example, the input receiving unit 207 accepts an input of a character or a character string whose writing method is explained by the explanatory expression included in the explanatory portion extracted by the explanatory portion extracting unit 104.

伝票データ生成部２０８は、入力受付部２０７を介して、オペレータから、伝票データ記憶部２０９に記憶されている説明部分テキストデータで示される説明部分に含まれている説明表現に対応して、その説明表現に対して正解となる漢字又は綴りといった文字又は文字列との入力を受ける。そして、伝票データ生成部２０８は、入力された文字又は文字列と、対応する説明表現とを示す伝票データを生成する。そして、伝票データ生成部２０８は、生成された伝票データを、伝票データ記憶部２０９に記憶させる。
伝票データ記憶部２０９は、上述の伝票データを記憶する。The slip data generation unit 208 corresponds to the explanatory expression included in the explanatory portion indicated by the explanatory portion text data stored in the slip data storage unit 209 from the operator via the input receiving unit 207. Receives input with characters or character strings such as Chinese characters or spellings that are the correct answer for the explanatory expression. Then, the slip data generation unit 208 generates slip data indicating the input character or character string and the corresponding explanatory expression. Then, the slip data generation unit 208 stores the generated slip data in the slip data storage unit 209.
The slip data storage unit 209 stores the above-mentioned slip data.

ＤＢ更新部２１０は、伝票データ記憶部２０９に、伝票データが記憶されると、記憶された伝票データに基づいて、説明ＤＢ１０５に記憶されている説明表現情報を更新する更新部である。例えば、ＤＢ更新部２１０は、伝票データで示されている説明表現、及び、文字又は文字列を説明表現情報に追加する。 The DB update unit 210 is an update unit that updates the explanatory expression information stored in the explanatory DB 105 based on the stored slip data when the slip data is stored in the slip data storage unit 209. For example, the DB update unit 210 adds the explanatory expression shown in the slip data and the character or the character string to the explanatory expression information.

以上に記載された入力受付部２０７は、図９に示されているテキスト入力Ｉ／Ｆ１４により実現可能である。
また、伝票データ生成部２０８及びＤＢ更新部２１０は、プロセッサ１２が対応するプログラムを実行することで、実現可能である。この対応するプログラムは、メモリ１１に記憶されているものとする。
伝票データ記憶部２０９は、メモリ１１により実現可能である。The input receiving unit 207 described above can be realized by the text input I / F 14 shown in FIG.
Further, the slip data generation unit 208 and the DB update unit 210 can be realized by executing the corresponding program by the processor 12. It is assumed that this corresponding program is stored in the memory 11.
The slip data storage unit 209 can be realized by the memory 11.

次に動作について説明する。
なお、実施の形態２においても、入力される音声信号から固有情報を確定する動作については、実施の形態１と同様である。
図１２は、実施の形態２に係る通話データ情報抽出装置２００において、説明ＤＢ１０５に記憶されている説明表現情報を更新する動作を示すフローチャートである。
まず、音声入力部１０１は、ユーザが発話した音声を示す音声信号の入力を受ける（Ｓ２０）。Next, the operation will be described.
Also in the second embodiment, the operation of determining the unique information from the input audio signal is the same as that of the first embodiment.
FIG. 12 is a flowchart showing an operation of updating the explanatory expression information stored in the explanatory DB 105 in the call data information extracting device 200 according to the second embodiment.
First, the voice input unit 101 receives an input of a voice signal indicating the voice spoken by the user (S20).

次に、音声取得部１０２は、音声信号から、ユーザが発話した音声を、音声データとして取得する（Ｓ２１）。 Next, the voice acquisition unit 102 acquires the voice spoken by the user from the voice signal as voice data (S21).

次に、音声認識部１０３は、音声データで示される音声を認識する音声認識処理を行い、認識された音声による発話内容を示す音声テキストデータを生成する（Ｓ２２）。 Next, the voice recognition unit 103 performs a voice recognition process for recognizing the voice indicated by the voice data, and generates voice text data indicating the content of the speech by the recognized voice (S22).

次に、説明部分抽出部１０４は、音声テキストデータで示される発話内容から、説明部分を抽出し、抽出された説明部分を示す説明部分テキストデータを生成して、生成された説明部分テキストデータを伝票データ記憶部２０９に記憶させる（Ｓ２３）。 Next, the explanation part extraction unit 104 extracts the explanation part from the utterance content indicated by the voice text data, generates the explanation part text data indicating the extracted explanation part, and generates the generated explanation part text data. It is stored in the slip data storage unit 209 (S23).

次に、伝票データ生成部２０８は、入力受付部２０７を介して、オペレータから、伝票データ記憶部２０９に記憶されている説明部分テキストデータで示される説明部分に含まれている説明表現に対応して、その説明表現で書き方が説明された文字又は文字列の入力を受けて、入力された文字又は文字列と、対応する説明表現とを示す伝票データを生成する（Ｓ２４）。そして、伝票データ生成部２０８は、生成された伝票データを、伝票データ記憶部２０９に記憶させる。 Next, the slip data generation unit 208 corresponds to the explanatory expression included in the explanatory portion indicated by the explanatory portion text data stored in the slip data storage unit 209 from the operator via the input reception unit 207. Then, in response to the input of the character or character string whose writing method is explained in the explanatory expression, slip data indicating the input character or character string and the corresponding explanatory expression is generated (S24). Then, the slip data generation unit 208 stores the generated slip data in the slip data storage unit 209.

次に、ＤＢ更新部２１０は、伝票データ記憶部２０９に、伝票データが記憶されると、記憶された伝票データに基づいて、説明ＤＢ１０５に記憶されている説明表現情報を更新する（Ｓ２５）。 Next, when the slip data is stored in the slip data storage unit 209, the DB update unit 210 updates the explanatory expression information stored in the explanatory DB 105 based on the stored slip data (S25).

以上のように、実施の形態２によれば、通話データ情報抽出装置２００は、自動で説明ＤＢ１０５に記憶されている説明表現情報を更新することができる。
ここで、伝票データには、氏名欄、住所欄等のユーザ情報をオペレータが記入するようになっている。すなわち、氏名を説明する発話と氏名欄に記入された氏名とを紐づけて登録することで、次から同様の説明発話がなされたとき、この正解データが発話されたものとして推定することが可能になる。As described above, according to the second embodiment, the call data information extraction device 200 can automatically update the explanatory expression information stored in the explanatory DB 105.
Here, the operator fills in the user information such as the name field and the address field in the slip data. That is, by registering the utterance explaining the name and the name entered in the name column in association with each other, it is possible to presume that this correct answer data is spoken when the same explanation utterance is made from the next time. become.

説明ＤＢ１０５に記憶されている説明表現情報は、実際の通話を使って更新する以外に、例えば、漢字から説明表現情報を更新することもできる。即ち、漢字の部首、形を説明することも考えられるが、ＤＢ更新部２１０は、漢字の構造から自動的に説明表現を作成することもできる。 The explanatory expression information stored in the explanatory DB 105 can be updated not only by using an actual call, but also by updating the explanatory expression information from, for example, Chinese characters. That is, it is conceivable to explain the radical and shape of the Chinese character, but the DB update unit 210 can also automatically create an explanatory expression from the structure of the Chinese character.

実施の形態３．
図１３は、実施の形態３に係る情報処理装置である通話データ情報抽出装置３００の構成を概略的に示すブロック図である。
通話データ情報抽出装置３００は、音声入力部１０１と、音声取得部１０２と、音声認識部３０３と、説明部分抽出部２０４と、説明ＤＢ１０５と、固有情報確定部３０６と、入力受付部２０７と、伝票データ生成部２０８と、伝票データ記憶部２０９と、ＤＢ更新部２１０と、音声認識結果修正部３１１とを備える。Embodiment 3.
FIG. 13 is a block diagram schematically showing the configuration of the call data information extraction device 300, which is the information processing device according to the third embodiment.
The call data information extraction device 300 includes a voice input unit 101, a voice acquisition unit 102, a voice recognition unit 303, an explanation part extraction unit 204, an explanation DB 105, a unique information determination unit 306, an input reception unit 207, and the like. It includes a slip data generation unit 208, a slip data storage unit 209, a DB update unit 210, and a voice recognition result correction unit 311.

実施の形態３に係る通話データ情報抽出装置３００の音声入力部１０１、音声取得部１０２及び説明ＤＢ１０５は、実施の形態１に係る通話データ情報抽出装置１００の音声入力部１０１、音声取得部１０２及び説明ＤＢ１０５と同様である。
また、実施の形態３に係る通話データ情報抽出装置３００の説明部分抽出部２０４、入力受付部２０７、伝票データ生成部２０８、伝票データ記憶部２０９及びＤＢ更新部２１０は、実施の形態２に係る通話データ情報抽出装置２００の説明部分抽出部２０４、入力受付部２０７、伝票データ生成部２０８、伝票データ記憶部２０９及びＤＢ更新部２１０と同様である。The voice input unit 101, the voice acquisition unit 102, and the explanation DB 105 of the call data information extraction device 300 according to the third embodiment are the voice input unit 101, the voice acquisition unit 102, and the voice acquisition unit 102 of the call data information extraction device 100 according to the first embodiment. Explanation It is the same as DB 105.
Further, the explanatory partial extraction unit 204, the input reception unit 207, the slip data generation unit 208, the slip data storage unit 209, and the DB update unit 210 of the call data information extraction device 300 according to the third embodiment relate to the second embodiment. Description of the call data information extraction device 200 The same applies to the partial extraction unit 204, the input reception unit 207, the slip data generation unit 208, the slip data storage unit 209, and the DB update unit 210.

音声認識部３０３は、実施の形態１の音声認識部１０３と同様に音声テキストデータを生成する。
実施の形態３では、音声認識部３０３は、生成された音声テキストデータを、説明部分抽出部２０４及び音声認識結果修正部３１１に与える。The voice recognition unit 303 generates voice text data in the same manner as the voice recognition unit 103 of the first embodiment.
In the third embodiment, the voice recognition unit 303 gives the generated voice text data to the explanation partial extraction unit 204 and the voice recognition result correction unit 311.

固有情報確定部３０６は、実施の形態１の固有情報確定部１０６と同様に、固有情報を確定する。
実施の形態３では、固有情報確定部３０６は、確定された固有情報と、固有情報を確定した際に用いた説明部分とを示す修正用データを生成し、その修正用データを音声認識結果修正部３１１に与える。The unique information determination unit 306 determines the unique information in the same manner as the unique information determination unit 106 of the first embodiment.
In the third embodiment, the unique information confirmation unit 306 generates correction data indicating the confirmed unique information and the explanatory portion used when the unique information is fixed, and corrects the correction data as a voice recognition result. Give to part 311.

音声認識結果修正部３１１は、固有情報確定部３０６から与えられる修正用データを用いて、音声認識部３０３から与えられる音声テキストデータを修正する。例えば、音声認識結果修正部３１１は、音声テキストデータにおいて、固有情報確定部３０６で確定された固有情報に対応する部分を、その固有情報で置き換えることで、音声テキストデータを修正する。 The voice recognition result correction unit 311 corrects the voice text data given by the voice recognition unit 303 by using the correction data given by the unique information determination unit 306. For example, the voice recognition result correction unit 311 corrects the voice text data by replacing the portion of the voice text data corresponding to the unique information confirmed by the unique information confirmation unit 306 with the unique information.

具体的には、音声認識結果修正部３１１は、修正用データで示されている説明部分を音声テキストデータから検索し、その説明部分に含まれている文字又は文字列に対応する部分のテキストを、固有情報に対応する部分と判断して、音声テキストデータの内、固有情報に対応する部分を、修正用データで示されている固有情報で置き換える。
例えば、音声認識結果修正部３１１は、図２又は図３で示されている説明抽出ルールに従って、＜ＥＮＴＩＴＹ＞又は＜ＮＡＭＥ＞の部分のテキストを、固有情報に対応する部分と判断する。そして、音声認識結果修正部３１１は、音声テキストデータの内、その固有情報に対応する部分のテキストと一致する部分を、固有情報で置き換える。Specifically, the voice recognition result correction unit 311 searches for the explanation part indicated by the correction data from the voice text data, and searches for the text of the character or the part corresponding to the character string included in the explanation part. , It is determined that the part corresponds to the unique information, and the part corresponding to the unique information in the voice text data is replaced with the unique information indicated by the correction data.
For example, the voice recognition result correction unit 311 determines that the text of the <ENTITY> or <NAME> part corresponds to the unique information according to the explanatory extraction rule shown in FIG. 2 or FIG. Then, the voice recognition result correction unit 311 replaces the part of the voice text data that matches the text of the part corresponding to the unique information with the unique information.

以上に記載された音声認識結果修正部３１１は、プロセッサ１２が対応するプログラムを実行することで、実現可能である。この対応するプログラムは、メモリ１１に記憶されているものとする。 The voice recognition result correction unit 311 described above can be realized by executing the corresponding program by the processor 12. It is assumed that this corresponding program is stored in the memory 11.

次に動作について説明する。
なお、実施の形態３においても、説明ＤＢ１０５に記憶されている説明表現情報を更新する動作については、実施の形態２と同様である。
図１４は、実施の形態３に係る通話データ情報抽出装置３００において、入力される音声信号から固有情報を確定する動作を示すフローチャートである。Next, the operation will be described.
Also in the third embodiment, the operation of updating the explanatory expression information stored in the explanatory DB 105 is the same as that of the second embodiment.
FIG. 14 is a flowchart showing an operation of determining unique information from an input voice signal in the call data information extraction device 300 according to the third embodiment.

図１４において、図１０に示されているフローチャートのステップの処理と同様の処理については、図１０と同様の符号を付すことで、詳細な説明を省略する。 In FIG. 14, the same processing as the processing of the steps in the flowchart shown in FIG. 10 is designated by the same reference numerals as those in FIG. 10, and detailed description thereof will be omitted.

図１４のステップＳ１０～Ｓ１５までの処理は、図１０のステップＳ１０～１５までの処理と同様である。
但し、図１４のステップＳ１２において、音声認識部３０３は、生成された音声テキストデータを、説明部分抽出部２０４及び音声認識結果修正部３１１に与える。
また、図１４のステップＳ１５において、固有情報確定部３０６は、確定された固有情報と、固有情報を確定した際に用いた説明部分とを示す修正用データを生成し、その修正用データを音声認識結果修正部３１１に与える。そして、図１４のステップＳ１５の処理の後は、処理はステップＳ３６に進む。The process of steps S10 to S15 in FIG. 14 is the same as the process of steps S10 to 15 of FIG.
However, in step S12 of FIG. 14, the voice recognition unit 303 gives the generated voice text data to the explanation partial extraction unit 204 and the voice recognition result correction unit 311.
Further, in step S15 of FIG. 14, the unique information determination unit 306 generates correction data indicating the confirmed unique information and the explanatory portion used when the unique information is confirmed, and the correction data is voiced. It is given to the recognition result correction unit 311. Then, after the process of step S15 in FIG. 14, the process proceeds to step S36.

ステップＳ３６では、音声認識結果修正部３１１は、固有情報確定部３０６から与えられる修正用データを用いて、音声認識部３０３から与えられる音声テキストデータを修正する。
なお、音声認識結果修正部３１１は、修正された音声テキストデータを別の装置（図示せず）又は後段の処理部（図示せず）に出力してもよい。In step S36, the voice recognition result correction unit 311 corrects the voice text data given by the voice recognition unit 303 by using the correction data given by the unique information determination unit 306.
The voice recognition result correction unit 311 may output the corrected voice text data to another device (not shown) or a subsequent processing unit (not shown).

以上のように、実施の形態３によれば、確定された固有情報によって、音声認識結果を修正することができる。 As described above, according to the third embodiment, the voice recognition result can be modified by the determined unique information.

実施の形態４．
図１５は、実施の形態４に係る情報処理装置である通話データ情報抽出装置４００の構成を概略的に示すブロック図である。
通話データ情報抽出装置４００は、音声入力部１０１と、音声取得部１０２と、音声認識部４０３と、説明部分抽出部２０４と、説明ＤＢ４０５と、固有情報確定部３０６と、入力受付部２０７と、伝票データ生成部４０８と、伝票データ記憶部４０９と、ＤＢ更新部４１０と、音声認識結果修正部３１１と、応答生成部４１２と、応答出力部４１３とを備える。Embodiment 4.
FIG. 15 is a block diagram schematically showing the configuration of the call data information extraction device 400, which is the information processing device according to the fourth embodiment.
The call data information extraction device 400 includes a voice input unit 101, a voice acquisition unit 102, a voice recognition unit 403, an explanation part extraction unit 204, an explanation DB 405, a unique information determination unit 306, an input reception unit 207, and the like. It includes a slip data generation unit 408, a slip data storage unit 409, a DB update unit 410, a voice recognition result correction unit 311, a response generation unit 412, and a response output unit 413.

実施の形態４に係る通話データ情報抽出装置４００の音声入力部１０１及び音声取得部１０２は、実施の形態１に係る通話データ情報抽出装置１００の音声入力部１０１及び音声取得部１０２と同様である。
また、実施の形態４に係る通話データ情報抽出装置４００の説明部分抽出部２０４及び入力受付部２０７は、実施の形態２に係る通話データ情報抽出装置２００の説明部分抽出部２０４及び入力受付部２０７と同様である。
さらに、実施の形態４に係る通話データ情報抽出装置４００の固有情報確定部３０６及び音声認識結果修正部３１１は、実施の形態３に係る通話データ情報抽出装置３００の固有情報確定部３０６及び音声認識結果修正部３１１と同様である。The voice input unit 101 and the voice acquisition unit 102 of the call data information extraction device 400 according to the fourth embodiment are the same as the voice input unit 101 and the voice acquisition unit 102 of the call data information extraction device 100 according to the first embodiment. ..
Further, the explanatory partial extraction unit 204 and the input receiving unit 207 of the call data information extracting device 400 according to the fourth embodiment are the explanatory partial extracting unit 204 and the input receiving unit 207 of the call data information extracting device 200 according to the second embodiment. Is similar to.
Further, the unique information confirmation unit 306 and the voice recognition result correction unit 311 of the call data information extraction device 400 according to the fourth embodiment are the unique information confirmation unit 306 and the voice recognition of the call data information extraction device 300 according to the third embodiment. This is the same as the result correction unit 311.

音声認識部４０３は、実施の形態１の音声認識部１０３と同様に音声テキストデータを生成する。
実施の形態４では、音声認識部４０３は、生成された音声テキストデータを、説明部分抽出部２０４、音声認識結果修正部３１１及び応答生成部４１２に与える。The voice recognition unit 403 generates voice text data in the same manner as the voice recognition unit 103 of the first embodiment.
In the fourth embodiment, the voice recognition unit 403 gives the generated voice text data to the explanation partial extraction unit 204, the voice recognition result correction unit 311 and the response generation unit 412.

説明ＤＢ４０５は、説明表現と、その説明表現により書き方が説明される文字又は文字列と、その文字又は文字列の読みとを示す説明表現情報を記憶する。 The explanatory DB 405 stores explanatory expressions, characters or character strings whose writing method is explained by the explanatory expressions, and explanatory expression information indicating the reading of the characters or character strings.

伝票データ生成部４０８は、入力受付部２０７を介して、オペレータから、伝票データ記憶部４０９に記憶されている説明部分テキストデータで示される説明部分に含まれている説明表現に対応して、その説明表現で書き方が説明された文字又は文字列と、その文字又は文字列の読み方との入力を受けて、対応する説明表現と、入力された文字又は文字列と、その読み方とを示す伝票データを生成する。そして、伝票データ生成部４０８は、生成された伝票データを、伝票データ記憶部４０９に記憶させる。 The slip data generation unit 408 corresponds to the explanatory expression included in the explanatory portion indicated by the explanatory portion text data stored in the slip data storage unit 409 from the operator via the input reception unit 207. Slip data indicating the corresponding explanatory expression, the input character or character string, and the reading in response to the input of the character or character string whose writing method is explained in the explanatory expression and the reading of the character or character string. To generate. Then, the slip data generation unit 408 stores the generated slip data in the slip data storage unit 409.

ＤＢ更新部４１０は、伝票データ記憶部４０９に、伝票データが記憶されると、記憶された伝票データに基づいて、説明ＤＢ４０５に記憶されている説明表現情報を更新する。例えば、ＤＢ更新部４１０は、伝票データで示されている説明表現と、文字又は文字列と、その読み方とを説明表現情報に追加する。 When the slip data is stored in the slip data storage unit 409, the DB update unit 410 updates the explanatory expression information stored in the explanatory DB 405 based on the stored slip data. For example, the DB update unit 410 adds the explanatory expression shown in the slip data, the character or the character string, and the reading thereof to the explanatory expression information.

応答生成部４１２は、音声認識部４０３で認識された音声に含まれている文字又は文字列を特定し、説明ＤＢ４０５に記憶されている説明表現情報を参照することで、特定された文字又は特定された文字列を説明している説明表現から、特定された文字又は特定された文字列の書き方を問い合わせる疑問文を示す応答データを生成する。 The response generation unit 412 identifies the character or character string included in the voice recognized by the voice recognition unit 403, and refers to the explanatory expression information stored in the explanation DB 405 to specify the specified character or the specification. From the explanatory expression explaining the specified character string, response data indicating a question sentence asking how to write the specified character or the specified character string is generated.

具体的には、応答生成部４１２は、音声テキストデータで示される発話内容に名前、住所等の特定の表現が含まれる場合に、説明ＤＢ４０５に記憶されている説明表現情報を参照することで、その特定の表現に対応する説明表現を取得する。
そして、応答生成部４１２は、取得された説明表現を用いた疑問文を生成し、その疑問文を示す応答データを生成する。ここでは、応答データは、その疑問文を音声で示すデータとするが、画像又はテキストで示すデータであってもよい。生成された応答データは、応答出力部４１３に与えられる。Specifically, the response generation unit 412 can refer to the explanatory expression information stored in the explanatory DB 405 when the utterance content indicated by the voice text data includes a specific expression such as a name and an address. Gets the descriptive expression that corresponds to that particular expression.
Then, the response generation unit 412 generates an interrogative sentence using the acquired explanatory expression, and generates response data indicating the interrogative sentence. Here, the response data is data that indicates the interrogative sentence by voice, but may be data that is indicated by an image or text. The generated response data is given to the response output unit 413.

応答出力部４１３は、応答生成部４１２から与えられる応答データを出力する。
例えば、応答出力部４１３は、応答データが音声データである場合には、その音声データに基づいて疑問文の音声を出力する。
なお、応答出力部４１３は、応答データが画像データ又はテキストデータである場合には、画像又はテキストを表示してもよい。The response output unit 413 outputs the response data given by the response generation unit 412.
For example, when the response data is voice data, the response output unit 413 outputs the voice of the interrogative sentence based on the voice data.
When the response data is image data or text data, the response output unit 413 may display an image or text.

実施の形態４によれば、音声認識された内容に、例えば、名前が含まれている場合に、「斉藤は、簡単な方の斉藤ですね？」、又は、「中村俊輔はサッカー選手の中村ですね？」といった、その名前の漢字又は綴りを確認する応答を行うことができる。 According to the fourth embodiment, when the voice-recognized content includes, for example, a name, "Saito is the simpler Saito, isn't it?" Or "Shunsuke Nakamura is the soccer player Nakamura. You can make a response to confirm the kanji or spelling of the name, such as "Isn't it?"

以上に記載された応答生成部４１２は、プロセッサ１２が対応するプログラムを実行することで、実現可能である。この対応するプログラムは、メモリ１１に記憶されているものとする。
応答出力部４１３は、音声Ｉ／Ｆ１３により実現可能である。なお、応答出力部４１３は、図示されていないが、画像又はテキストを表示するための表示Ｉ／Ｆであってもよい。The response generation unit 412 described above can be realized by the processor 12 executing the corresponding program. It is assumed that this corresponding program is stored in the memory 11.
The response output unit 413 can be realized by the voice I / F13. Although not shown, the response output unit 413 may be a display I / F for displaying an image or text.

以上に記載された実施の形態１～４では、音声入力部１０１で音声信号の入力を受け付けて、音声取得部１０２で音声信号をデジタルの音声データに変換しているが、実施の形態１～４は、このような例に限定されない。例えば、通話データ情報抽出装置１００～４００は、図９に示されているネットワークＩ／Ｆ１５により実現される通信部（図示せず）を介して、デジタルの音声データを取得して、その音声データを音声認識部１０３に与えてもよい。また、通話データ情報抽出装置１００～４００は、図９に示されているメモリ１１により実現される記憶部（図示せず）に事前に、デジタルの音声データが記憶されており、その音声データを音声認識部１０３に与えてもよい。 In the above-described embodiments 1 to 4, the voice input unit 101 receives the input of the voice signal, and the voice acquisition unit 102 converts the voice signal into digital voice data. 4 is not limited to such an example. For example, the call data information extraction devices 100 to 400 acquire digital voice data via a communication unit (not shown) realized by the network I / F15 shown in FIG. 9, and the voice data thereof. May be given to the voice recognition unit 103. Further, in the call data information extraction devices 100 to 400, digital voice data is stored in advance in a storage unit (not shown) realized by the memory 11 shown in FIG. 9, and the voice data is stored in the storage unit (not shown). It may be given to the voice recognition unit 103.

１００，２００，３００，４００通話データ情報抽出装置、１０１音声入力部、１０２音声取得部、１０３，３０３音声認識部、１０４，２０４説明部分抽出部、１０５，４０５説明ＤＢ、１０６，３０６固有情報確定部、２０７入力受付部、２０８，４０８伝票データ生成部、２０９，４０９伝票データ記憶部、２１０，４１０ＤＢ更新部、３１１音声認識結果修正部、４１２応答生成部、４１３応答出力部。 100,200,300,400 Call data information extraction device, 101 voice input unit, 102 voice acquisition unit, 103,303 voice recognition unit, 104,204 explanation partial extraction unit, 105,405 explanation DB, 106,306 unique information confirmation Unit, 207 input reception unit, 208,408 slip data generation unit, 209,409 slip data storage unit, 210,410 DB update unit, 311 voice recognition result correction unit, 412 response generation unit, 413 response output unit.

Claims

A voice recognition unit that recognizes the spoken voice from voice data including the spoken voice, and
An explanatory part extraction unit that extracts an explanatory part that is a part including a character or a character string and an explanatory expression explaining how to write the character or the character string from the recognized voice.
An explanatory expression information storage unit that stores explanatory expression information that associates the explanatory expression with the character or the character string described in the explanatory expression.
An information processing apparatus comprising: a unique information determination unit that determines the character or the character string described in the explanatory expression as unique information by referring to the explanatory expression information .

The explanatory partial extraction unit extracts a portion of the recognized voice that matches the explanatory extraction rule, which is a rule of expression used to explain how to write the character or the character string, as the explanatory portion. The information processing apparatus according to claim 1.

An input receiving unit that accepts input of the character or the character string described by the explanatory expression included in the explanatory portion, and
A part of the explanatory expression information in association with the character or the character string input to the input receiving unit and the explanatory expression explaining how to write the character or the character string input to the input receiving unit. The information processing apparatus according to claim 1 or 2 , further comprising an update unit that stores the information in the explanatory expression information storage unit.

The voice recognition unit generates voice text data, which is text data indicating the recognized voice, and generates voice text data.
In the voice text data, the voice recognition that corrects the voice text data by replacing the portion corresponding to the unique information confirmed by the unique information confirmation unit with the unique information confirmed by the unique information confirmation unit. The information processing apparatus according to any one of claims 1 to 3 , further comprising a result correction unit.

The explanatory expression explaining the specified character or the specified character string by specifying the character or the character string included in the recognized voice and referring to the explanatory expression information. The present invention is described in any one of claims 1 to 3, further comprising a response generation unit that generates response data indicating a question sentence inquiring about how to write the specified character or the specified character string. Information processing equipment.

Computer,
A voice recognition unit that recognizes the spoken voice from voice data including the spoken voice,
An explanatory part extraction unit that extracts an explanatory part that is a part including a character or a character string and an explanatory expression explaining how to write the character or the character string from the recognized voice.
An explanatory expression information storage unit that stores explanatory expression information that associates the explanatory expression with the character or the character string described in the explanatory expression, and
A program characterized in that, by referring to the explanatory expression information, the character or the character string described in the explanatory expression functions as a unique information determination unit that determines the unique information.

The voice recognition unit recognizes the spoken voice from the voice data including the spoken voice, and the voice recognition unit recognizes the spoken voice.
The explanation part extraction unit extracts an explanation part which is a part including a character or a character string and an explanation expression explaining how to write the character or the character string from the recognized voice.
The unique information determination unit refers to the explanatory expression information associated with the explanatory expression and the character or the character string described in the explanatory expression, whereby the character or the character described in the explanatory expression or the said. An information processing method characterized by determining a character string as unique information.