JP4067483B2

JP4067483B2 - Telephone reception translation system

Info

Publication number: JP4067483B2
Application number: JP2003390388A
Authority: JP
Inventors: 充久菊川; 早苗内田
Original assignee: Fujitsu FSAS Inc
Current assignee: Fujitsu FSAS Inc
Priority date: 2003-11-20
Filing date: 2003-11-20
Publication date: 2008-03-26
Anticipated expiration: 2023-11-20
Also published as: JP2005159395A

Description

本発明は、受付けた電話の音声パターンなどから通話相手あるいは通話相手が話す言語を特定し、特定された言語種別に基づき、通話相手が話す内容を音声認識して所望の言語に翻訳する電話受付け翻訳システムに関し、特に特定の顧客などから電話を受付けるコールセンタ等に適用するのに好適な電話受付け翻訳システムに関するものである。 The present invention identifies a call partner or a language spoken by a call partner from a received phone call pattern, etc., and based on the specified language type, accepts a call by recognizing the content spoken by the call partner and translating it into a desired language. The present invention relates to a translation system, and more particularly to a telephone reception translation system suitable for application to a call center or the like that receives a telephone call from a specific customer.

顧客からの質問等に対する回答を顧客に返すシステムとしては、従来から音声自動応答システムを利用した顧客応答システムが種々提案されている( 例えば特許文献１参照）。上記特許文献１に記載のものは、音声認識を導入することで、音声で得た情報を即時に返信できるようにするととともに、アクセスしてきた顧客を特定し、顧客名等をキーとして顧客の属性情報を格納したデータベースを検索し、顧客に適合したメッセージを選択して、自動音声応答をすることで、種々のサービスを実現するようにしたものである。
一方、コールセンタ等では、日本語以外の言語あるいは日本語の場合でも特殊な方言など（以下ではこれらを合わせて日本語以外の言語という）で話す顧客から電話を受付けることもある。この場合には当該言語を話すことができる対応者に電話を転送する外ないが、各種言語を話すことができる対応者を必要数確保するのは難しく、十分な対応ができないのが現状である。 As a system for returning an answer to a question from a customer to the customer, various customer response systems using an automatic voice response system have been proposed (see, for example, Patent Document 1). The one described in Patent Document 1 introduces voice recognition so that information obtained by voice can be sent back immediately, and the customer who has accessed is specified, and the customer attribute is set using the customer name as a key. Various services are realized by searching a database storing information, selecting a message suitable for the customer, and making an automatic voice response.
On the other hand, a call center or the like sometimes receives a call from a customer who speaks a language other than Japanese or a special dialect even in the case of Japanese (hereinafter referred to as a language other than Japanese). In this case, it is necessary to transfer the call to a person who can speak the language, but it is difficult to secure the necessary number of persons who can speak various languages, and it is not possible to deal with it sufficiently. .

上記のように、日本語以外の話者との通話を容易にするため、従来から、電話翻訳サービスを行うシステムは種々提案されている。
例えば、特許文献２には、予め記憶された通話相手の使用言語、通話履歴を参照して、翻訳電話サービスの利用を自動的に設定し、電話装置に内蔵した音声認識装置で、音声認識をして、音声認識結果である文字情報を翻訳電話装置に供給し、翻訳を行うようにした翻訳電話装置が記載されている。
特開２００３−１６９１４７号公報特開２００２−１１８６５９号公報 As described above, various systems for providing a telephone translation service have been proposed in order to facilitate a call with a speaker other than Japanese.
For example, in Patent Document 2, referring to the language and call history of a call partner stored in advance, the use of a translated telephone service is automatically set, and voice recognition is performed by a voice recognition device built in the telephone device. Thus, a translation telephone device is described in which character information as a speech recognition result is supplied to a translation telephone device for translation.
JP 2003-169147 A JP 2002-118659 A

言語を翻訳するシステムでは、使用言語を特定できないと翻訳を行うことができず、使用言語の特定が不可欠である。
受けた電話に対して言語を翻訳する従来のシステムにおいては、通知されてきた電話番号から地域を特定して、その地域で喋る言語種別を判断して言語を翻訳するのが一般的である。また、通話相手がいるＧＰＳの位置情報をもとに、その地域で話される言語種別を特定して翻訳するシステムもある。
上記した通知されてきた電話番号から言語種別を判断する方法は、その電話を特定の人だけでなく、色々な言語を話す複数のものが使用している場合には、電話番号だけでは言語種別を判定することができない。また、通話相手が出張等により、通常電話を掛ける場所以外の場所（あいるは他の国）から電話した場合は言語種別を誤って特定してしまい正しく翻訳が出来ない場合も生ずる。また、電話番号による言語種別の特定は、通話相手先が電話番号非通知モードになっていると、電話番号が分からないため特定が不可能となる。
さらに、上記ＧＰＳの位置情報を使用する方法は、ＧＰＳ位置情報を送信可能な電話器を用いるなど、ＧＰＳ情報を送信しなければならず、通常の電話ではＧＰＳ情報から通話相手の位置を特定するのは難しい。
本発明は上記従来技術の問題点を解決するためになされたものであって、通話相手がどこの場所から電話しても、通話相手の話す言語を特定して、言語翻訳することができるシステムを提供することを目的とする。 In a system that translates a language, if the language used cannot be identified, translation cannot be performed, and identification of the language used is essential.
In a conventional system that translates a language for a received phone call, it is common to identify a region from the notified telephone number, determine the language type spoken in that region, and translate the language. In addition, there is a system that specifies and translates the language type spoken in the area based on the position information of GPS where the other party is located.
The method of determining the language type from the notified phone number is that the phone number alone is used only when the phone is used by multiple people who speak various languages. Cannot be determined. In addition, when the other party makes a phone call from a place other than the place where he / she usually makes a call due to a business trip or the like (or in another country), the language type may be specified incorrectly and the translation may not be performed correctly. Also, the identification of the language type based on the telephone number is impossible because the telephone number is unknown when the other party is in the telephone number non-notification mode.
Furthermore, the method of using the GPS position information must transmit GPS information, such as using a telephone capable of transmitting GPS position information. In a normal telephone, the position of the other party is specified from the GPS information. Is difficult.
The present invention has been made to solve the above-mentioned problems of the prior art, and is a system capable of specifying the language spoken by the call partner and translating the language regardless of where the call partner calls. The purpose is to provide.

本発明においては、上記課題を次のように解決する。
通話相手からの電話を受けて、受けた電話の通話相手が話す言語を翻訳して、通話相手と会話する電話受け付け翻訳システムにおいて、通話相手が発する最初の音声と、その音声が出現する出現確率と出現回数を、個人名と対応させて記憶した音声パターン記憶手段と、電話を受け付けた際、通話相手が発する最初の音声パターンと、予め登録された言語種別に対応させて、通話相手が最初に応答すると考えられる言語種別毎の音声パターンと、その音声パターンが出現する出現確率を記憶した言語種別記憶手段と、電話を受け付けた際、通話相手が発する最初の音声パターンと、前記音声パターン記憶手段に登録された音声パターンとを出現確率順に照合し、通話相手を特定する個人特定手段と、上記個人特定手段により個人が特定された場合に、特定された個人名に対応した言語種別を格納した言語種別情報を参照して、通話相手の言語を特定する言語種別抽出手段と、特定した言語種別に従って、通話相手が話す言語を翻訳する翻訳手段とを設け、上記個人特定手段により個人が特定された場合に、特定された個人名に対応した言語種別を格納した言語種別情報を参照して、通話相手の言語を特定し、特定した言語種別に従って、通話相手が話す言語を翻訳する。また、通話相手が特定できない場合に、通話相手が発する最初の音声パターンと上記言語種別記憶手段に記憶された音声パターンを出現確率順に比較して、通話相手の言語を特定し、特定された通話相手の喋る言語種別情報を、個人名に対応させて、前記音声パターン記憶手段に登録する。 In the present invention, the above problem is solved as follows.
In the call acceptance translation system that receives a call from the other party, translates the language spoken by the other party, and talks to the other party, the first voice produced by the other party and the probability of the appearance of that voice And the voice pattern storage means that stores the number of appearances corresponding to the personal name, the first voice pattern that the other party calls when receiving a call, and the language type registered in advance, A speech pattern for each language type that is considered to respond to the speech, a language type storage means that stores the appearance probability of the appearance of the speech pattern, an initial speech pattern that the other party utters when receiving a call, and the speech pattern storage collates the voice pattern registered in the means the appearance order of probability, and personal identification means for identifying the other party, person is identified by the personal identification means In addition, referring to the language type information storing the language type corresponding to the specified personal name, the language type extraction means for specifying the language of the other party and the language spoken by the other party are translated according to the specified language type. When the person is specified by the individual specifying means, the language type information storing the language type corresponding to the specified personal name is referred to specify the language of the other party and specified. The language spoken by the other party is translated according to the selected language type. In addition, when the other party cannot be identified, the first voice pattern emitted by the other party is compared with the voice pattern stored in the language type storage means in the order of appearance probability, the other party's language is identified, and the identified call The language type information spoken by the other party is registered in the voice pattern storage means in association with the personal name.

本発明においては、以下の効果を得ることができる。
（１）通話相手を特定して言語の種類を判別しているので、相手がどこから電話しても相手の喋る言語を翻訳することができる。
（２）初めての通話相手で、通話相手の喋る言語種別が不明でも、言語種別を特定する手段により、通話相手の言語種別を特定することができ、翻訳をすることが可能となる。
また、同じ人が電話をかけてきた場合、２度目からは個人を特定して即座に言語種別を特定することができる。
さらに、個人を特定して、個人別に用意された音声認識辞書を用いて音声認識処理を行えば、音声認識精度を向上させることもできる。 In the present invention, the following effects can be obtained.
(1) Since the type of language is determined by identifying the other party, the language spoken by the other party can be translated no matter where the other party calls.
(2) Even if the language type spoken by the call partner is unknown for the first call partner, the language type of the call partner can be specified by means of specifying the language type and can be translated.
Further, when the same person calls, it is possible to specify the individual from the second time and immediately specify the language type.
Furthermore, if a person is identified and a voice recognition process is performed using a voice recognition dictionary prepared for each person, the voice recognition accuracy can be improved.

図１は本発明の実施例のシステム構成を示す図である。
同図において、１００は交換機であり、交換機１００は公衆網１０１を介して接続され、各交換機１００には、それぞれ通話相手の電話機１０３ｂと、例えばコールセンタ側などに設けられた電話機１０３ａおよび処理装置１０２が接続されている。
上記処理装置１０２は、通話相手からの電話を受付けて、後述するように通話相手を特定するなどして通話相手の言語種別を特定し、音声認識を行って翻訳処理を行う。
なお、図１は、電話機１０３ａ側のみに処理装置１０２を設け、通話相手の言語種別が例えば英語の場合、公衆網１０１を介して送られてくる通話相手の話す内容を上記処理装置１０２で日本語等に翻訳して音声などで対応者の電話機１０３ａに伝え、該対応者の話す日本語等を処理装置１０２で英語等に翻訳して公衆網を介して通話相手の電話機に伝える場合を示しており、以下の実施例ではこの場合について説明するが、例えば、以下のように構成することもできる。
（１）通話相手の電話機１０３ｂ側にも処理装置を設け、該処理装置で、通話相手の話す英語等を日本語に翻訳して公衆網１０１を介して対応者の電話機１０３ａに伝え、該対応者の話す日本語等を、対応者側の処理装置１０２で英語に翻訳して、公衆網を介して通話相手の電話機に伝える。
（２）通話相手の電話機１０３ｂ側にも処理装置を設け、通話相手の話す英語等をそのまま公衆網１０１を介して処理装置１０２に送り、上記処理装置１０２で日本語等に翻訳して音声などで対応者の電話機１０３ａに伝え、該対応者の話す日本語等をそのまま公衆網１０１を介して通話相手に伝え、通話相手側の処理装置で該日本語を英語に翻訳して通話相手に伝える。 FIG. 1 is a diagram showing a system configuration of an embodiment of the present invention.
In the figure, reference numeral 100 denotes an exchange, and the exchange 100 is connected via a public network 101. Each exchange 100 is connected to a telephone 103b of a call partner, a telephone 103a and a processing device 102 provided on the call center side, for example. Is connected.
The processing device 102 receives a call from the other party, specifies the other party's language type by specifying the other party, as will be described later, and performs speech recognition to perform translation processing.
In FIG. 1, when the processing device 102 is provided only on the telephone 103 a side, and the language type of the other party is, for example, English, the content of the other party speaking sent via the public network 101 is displayed in the processing unit 102 in Japan. This shows a case where it is translated into a word or the like and transmitted to the correspondent's telephone 103a by voice or the like, and the Japanese spoken by the correspondent is translated into English or the like by the processing device 102 and transmitted to the other party's telephone via the public network. In the following embodiment, this case will be described. However, for example, the following configuration may be used.
(1) A processing device is also provided on the telephone 103b side of the call partner, and the processing device translates English or the like spoken by the call partner into Japanese and transmits it to the correspondent phone 103a via the public network 101. The Japanese spoken by the person is translated into English by the processing unit 102 on the correspondent side and transmitted to the telephone of the other party via the public network.
(2) A processing device is also provided on the telephone 103b side of the call partner, and the English or the like spoken by the call partner is sent as it is to the processing device 102 via the public network 101, and the processing device 102 translates it into Japanese etc. Is communicated to the telephone 103a of the responder, the Japanese spoken by the correspondent is transmitted to the other party via the public network 101 as it is, and the Japanese is translated into English by the processing unit on the other party and transmitted to the other party. .

図２は、本発明の第１の実施例の処理装置の機能構成を示すブロック図である。
処理装置１０２には、通話相手Ａの音声を音声パターン記憶部１１に登録する音声パターン登録手段１０と、音声パターン記憶部１１に登録された上記通話相手Ａの第１声の音声パターンと、通話相手別音声パターンテーブル１３に記憶された音声パターンとを照合し、通話相手名を特定する個人特定手段１２が設けられている。
上記個人特定手段１２により、通話相手の個人名が特定された場合、言語特定手段１４は、個人名と、個人名に対応させて言語種別を登録した言語種別情報テーブル１７を参照して、通話相手の言語種別を特定し、特定された言語種別を言語種別情報登録部１５に登録する。
また、通話相手Ａの個人名が特定できない場合、言語特定手段１４は、通話相手の第１声の音声パターンと、言語別音声パターンテーブル１６に記憶された音声パターンとを照合し、通話相手の言語種別を特定し、特定された言語種別を言語種別情報登録部１５に登録する。 FIG. 2 is a block diagram illustrating a functional configuration of the processing apparatus according to the first embodiment of this invention.
The processing device 102 includes a voice pattern registration means 10 for registering the voice of the call partner A in the voice pattern storage unit 11, a voice pattern of the first voice of the call partner A registered in the voice pattern storage unit 11, and a call Personal identification means 12 is provided for collating the voice pattern stored in the voice pattern table 13 for each party and identifying the name of the other party.
When the individual name of the other party is specified by the individual specifying means 12, the language specifying means 14 refers to the personal name and the language type information table 17 in which the language type is registered corresponding to the personal name, and the call is made. The language type of the other party is specified, and the specified language type is registered in the language type information registration unit 15.
If the personal name of the calling party A cannot be specified, the language specifying unit 14 compares the voice pattern of the first voice of the calling party with the voice pattern stored in the language-specific voice pattern table 16, and The language type is specified, and the specified language type is registered in the language type information registration unit 15.

通話相手Ａの言語種別が特定されると、音声認識手段１８は、言語別音声認識辞書１９を参照して、言語種別情報登録部１５に登録された言語種別に対応した認識辞書を使用して、音声パターン記憶部１１に登録された通話相手の通話内容を文字情報に変換する。
また、翻訳手段２０は、言語別翻訳辞書２２を参照して、言語種別情報登録部１５に登録された言語種別に対応した認識辞書を使用して、通話相手の通話内容を電話対応者Ｂが話すことができる言語（以下では日本語として説明する）に翻訳する。
日本語に翻訳された通話内容は、表示／音声出力手段２１から音声出力されたり、処理装置１０２の表示装置に文字情報として出力される。
対応者Ｂがこの翻訳結果に応じて、通話相手Ａに対して日本語で音声もしく文字入力などで応答すると、この応答内容は、音声の場合には音声認識手段で文字情報に変換され、また、文字入力の場合はそのまま翻訳手段２０に送られて、通話相手Ａの言語に翻訳され、前記した公衆網１０１を介して通話相手Ａに送られる。
また、前記言語特定手段１４で、通話相手の言語が特定されない場合、転送手段２３は、受け付けた電話を多国語で対応できる対応者Ｃに転送する。 When the language type of the communication partner A is specified, the voice recognition means 18 refers to the language-specific voice recognition dictionary 19 and uses the recognition dictionary corresponding to the language type registered in the language type information registration unit 15. The call contents of the call partner registered in the voice pattern storage unit 11 are converted into character information.
In addition, the translation means 20 refers to the language-specific translation dictionary 22 and uses the recognition dictionary corresponding to the language type registered in the language type information registration unit 15 so that the call counterpart B can call Translate to a spoken language (explained as Japanese below).
The content of the call translated into Japanese is output as voice from the display / voice output means 21 or as character information on the display device of the processing device 102.
When the responder B responds to the caller A with Japanese voice or character input according to the translation result, the response content is converted into character information by voice recognition means in the case of voice, In the case of character input, it is sent to the translation means 20 as it is, translated into the language of the other party A, and sent to the other party A via the public network 101 described above.
If the language identification unit 14 does not identify the language of the call partner, the transfer unit 23 transfers the received call to the correspondent C who can handle multiple languages.

図３は、本実施例の処理を示すフローチャートであり、同図により本実施例の処理について説明する。
図３において、通話相手の電話を着信すると、通話相手の音声を取り込み、音声パターン記憶部１１に記憶する（ステップＳ１）。また、個人特定手段１２は、通話相手の第１声と、通話相手別音声パターンテーブルに記憶された音声パターンを比較し、通話相手を特定する（ステップＳ２）。上記通話相手別音声パターンテーブルには、通話相手の個人名と、通話相手の第１声の音声パターンとその出現確率が登録されており、このテーブルに登録された音声パターンと、上記通話相手の第１声を比較することで、通話相手を特定する（通話相手の特定の詳細については後述する）。
通話相手の個人名が特定されると言語種別情報テーブル１７を参照して、通話相手の言語種別を特定し、特定された言語種別を言語種別情報登録部１５に登録する。図４は、言語種別情報テーブル１７の一例を示す図である。言語種別情報テーブルには、同図に示すよう個人名と個人名に対応させて、その人が話す言語種別が登録されており、個人名が特定することで、言語種別が特定される。 FIG. 3 is a flowchart showing the processing of the present embodiment. The processing of the present embodiment will be described with reference to FIG.
In FIG. 3, when the other party's call is received, the other party's voice is captured and stored in the voice pattern storage unit 11 (step S1). Further, the individual specifying means 12 compares the first voice of the call partner with the voice pattern stored in the call partner-specific voice pattern table, and specifies the call partner (step S2). In the above-mentioned voice pattern table for each call partner, the personal name of the call partner, the voice pattern of the first voice of the call partner and the appearance probability thereof are registered. The voice pattern registered in this table and the call partner's voice pattern are registered. By comparing the first voice, the other party is specified (specific details of the other party will be described later).
When the personal name of the other party is specified, the language type information table 17 is referred to, the language type of the other party is specified, and the specified language type is registered in the language type information registration unit 15. FIG. 4 is a diagram illustrating an example of the language type information table 17. In the language type information table, a personal name and a language type spoken by the person are registered in correspondence with the personal name as shown in the figure, and the language type is specified by specifying the personal name.

一方、個人名が特定されなかった場合、言語特定手段１４は、上記通話相手の第１声の音声パターンと、言語別音声パターンテーブル１６に記憶された音声パターンを比較し、通話相手の話す言語種別を特定する（ステップＳ９，１０）。
言語別音声パターンテーブル１６には、言語種別と、言語種別毎に通話相手の第１声の音声パターンとその出現確率が登録されており、このテーブルに登録された音声パターンと、上記通話相手の第１声を比較することで、言語種別を特定する。言語種別が特定されると、特定された言語種別を言語種別情報登録部１５に登録する（言語種別の特定の詳細については後述する）。
また、言語種別が特定されない場合、電話対応可能な対応者Ｃに電話を転送する（ステップＳ１１，Ｓ１２）。
言語種別が特定されると、言語別音声認識辞書１９を参照して、音声認識手段１８で文字情報に変換し（ステップＳ４）、言語別翻訳辞書２２を参照して、翻訳手段２０で翻訳する（ステップＳ５）。
通話相手の音声が明瞭でないなど上記音声認識手段１８、あるいは翻訳手段２０で翻訳できない場合には、上記したように、電話対応可能な対応者Ｃに電話を転送する（ステップＳ６，Ｓ１２）。翻訳可能であれば、翻訳結果は、音声出力されたり表示装置に文字情報として出力される。
上記ステップＳ４−Ｓ７の処理を通話相手Ａおよび対応者Ｂの通話内容ついて繰り返し、電話が終了すると処理を終わる。
上記実施例では、音声認識辞書として通話相手に共通に使用される言語別認識辞書を用いているが、さらに、通話相手別に音声認識辞書を用意し、通話相手が特定されたら、通話相手に応じた認識辞書を用いて音声認識をするようにしてもよい。これにより、音声認識精度を向上させることができる。 On the other hand, if the personal name is not specified, the language specifying unit 14 compares the voice pattern of the first voice of the other party with the voice pattern stored in the language-specific voice pattern table 16 to determine the language spoken by the other party. The type is specified (steps S9 and S10).
The language-specific voice pattern table 16 registers the language type, the voice pattern of the first voice of the other party for each language type, and the appearance probability thereof. The voice pattern registered in this table and the other party's voice pattern are registered. The language type is specified by comparing the first voice. When the language type is specified, the specified language type is registered in the language type information registration unit 15 (details of specifying the language type will be described later).
If the language type is not specified, the call is transferred to the correspondent C who can handle the call (steps S11 and S12).
When the language type is specified, the speech recognition unit 18 refers to the language-specific speech recognition dictionary 19 to convert it into character information (step S4), and the language translation dictionary 22 refers to the translation unit 20 to translate it. (Step S5).
If the voice recognition means 18 or the translation means 20 cannot translate, for example, the voice of the other party is not clear, as described above, the call is transferred to the correspondent C who can handle the call (steps S6 and S12). If translation is possible, the translation result is output as audio or as character information on a display device.
The processes in steps S4 to S7 are repeated for the call contents of the call partner A and the correspondent B, and the process ends when the call ends.
In the above embodiment, a language-specific recognition dictionary that is commonly used for the other party is used as the voice recognition dictionary. In addition, a voice recognition dictionary is prepared for each other party. Voice recognition may be performed using the recognition dictionary. Thereby, the voice recognition accuracy can be improved.

次に、上記個人特定手段による通話相手の個人名の特定処理について詳述する。
図５は通話相手別音声パターンテーブル１３の一例を示す図である。図５に示すように、通話相手別音声パターンテーブルには、個人名と、個人名に対応させて、音声パターンと、その音声パターンが出現する出現確率と、出現回数が登録されている。
通話相手は、例えば日本語の場合、第１声で以下のような音声を発することが想定される。
（ａ）「モシモシ。」
（ｂ）「ハイ、ヤマダデス。」
（ｃ）「ヤマダデス。」
本実施例では、上記応答において、一まとまりの音声からなる１音節単位で（例えば、「モシモシ」、「ハイ」、「ヤマダデス」など）音声パターンを登録しておき、通話相手が発した最初の１音節の音声パターンと上記音声パターンを比較する。
通話相手は、時と場合によって上記（ａ）あるいは（ｂ）あるいは（ｃ）で応答すると考えられ、上記出現確率は、上記のように、最初に現れる音声パターンの出現する確率を示したものである。 Next, the personal name identification process of the call partner by the personal identification means will be described in detail.
FIG. 5 is a diagram showing an example of the voice pattern table 13 for each call partner. As shown in FIG. 5, in the voice pattern table for each caller, a personal name, a voice pattern, an appearance probability of appearance of the voice pattern, and the number of appearances are registered in association with the personal name.
For example, in the case of Japanese, it is assumed that the other party speaks the following voice with the first voice.
(A) "Moshimoshi."
(B) “Hi, Yamadades.”
(C) “Yamadades.”
In this embodiment, in the above response, a voice pattern is registered in units of one syllable consisting of a group of voices (for example, “Moshimoshi”, “High”, “Yamadades”, etc.) The voice pattern of one syllable is compared with the voice pattern.
It is considered that the other party responds with the above (a), (b), or (c) depending on the time and circumstances, and the appearance probability indicates the probability that the first voice pattern appears as described above. is there.

図６は、個人特定処理を示すフローチャートである。
まず、通話相手の最初の音声を取り込み（ステップＳ１）、最初の音声パターンＡを、通話相手別音声パターンテーブルの音声パターンと比較する。すなわち、通話相手別音声パターンテーブルから出現率の高い順に音声パターンＢを取り出し、音声パターンＡとＢを比較し、ヒット率をパターンヒット率登録手段に格納する（ステップＳ２，Ｓ３）。
図７にパターンヒット率登録手段の登録例を示す。パターンヒット率登録手段には、上記音声パターンＡと、通話相手別音声パターンテーブルから出現率の高い順に取り出された音声パターンＢとの比較結果であるヒット率が格納される。
上記比較が終了すると、パターンヒット率登録手段に登録されたパターンヒット率を読み出し、ヒット率が一定値を越えているものを検索する（ステップＳ４，Ｓ５）。
ヒット率が一定値を越えているものがあれば、ヒット率が高いものを通話相手と特定する（ステップＳ６，Ｓ７）。また、ヒット率が高いものがなければ、個人が特定できなかったとして、処理を終了する（ステップＳ８）。例えば、図７に示した例では、個人名Ｂの音声パターンＢ１とのヒット率が高いので、通話相手はＢと特定される。 FIG. 6 is a flowchart showing the individual identification process.
First, the first voice of the calling party is captured (step S1), and the first voice pattern A is compared with the voice pattern in the calling party voice pattern table. That is, the voice pattern B is extracted from the voice pattern by call partner in descending order of appearance rate, the voice patterns A and B are compared, and the hit rate is stored in the pattern hit rate registration means (steps S2 and S3).
FIG. 7 shows a registration example of the pattern hit rate registration means. The pattern hit rate registration means stores a hit rate which is a comparison result between the voice pattern A and the voice pattern B extracted from the caller-specific voice pattern table in descending order of appearance rate.
When the comparison is completed, the pattern hit rate registered in the pattern hit rate registering unit is read out, and a search is made for a hit rate exceeding a certain value (steps S4 and S5).
If there is a hit rate exceeding a certain value, the one with a high hit rate is identified as the call partner (steps S6 and S7). If there is no item with a high hit rate, it is determined that the individual cannot be identified, and the process is terminated (step S8). For example, in the example shown in FIG. 7, since the hit rate with the voice pattern B1 of the personal name B is high, the call partner is identified as B.

次に、上記言語特定手段による言語特定処理について詳述する。
図８に言語別音声パターンテーブル１６の一例を示す。前記図５と同様、言語音声パターンテーブル１６には、言語の種類と、言語の種類に対応させて、その音声パターンと、その音声パターンが出現する出現確率が登録されている。
上記テーブルには、通話相手が最初に応答すると考えられる各種言語の音声パターンを、予め各言語について１音節ずつ複数種類、前記通話相手別音声パターンテーブルと同様に言語別音声パターンテーブルに登録しておく。
そして、前記個人を特定する場合と同様に、通話相手が最初に発した音声の音声パターンと、予め登録された各種言語の音声パターンとを、出現確率順に照合し、一致率の最も高い言語種別を判定する。
図９は、言語特定処理を示すフローチャートである。
まず、通話相手の最初の音声を取り込み（ステップＳ１）、最初の音声パターンＡを、言語別音声パターンテーブルの音声パターンと比較する。すなわち、個人特定処理の場合と同様、言語別音声パターンテーブルから出現率の高い順に音声パターンＢを取り出し、音声パターンＡとＢを比較し、ヒット率を言語音声パターンヒット率登録手段に格納する（ステップＳ２，Ｓ３）。
上記比較が終了すると、言語音声パターンヒット率登録手段に登録されたパターンヒット率を読み出し、ヒット率が一定値を越えているものを検索する（ステップＳ４，Ｓ５）。
ヒット率が一定値を越えているものがあれば、ヒット率が高いものを通話相手が話す言語と特定する（ステップＳ６，Ｓ７）。また、ヒット率が高いものがなければ、言語を特定できなかったとして、処理を終了する（ステップＳ８）。 Next, language specifying processing by the language specifying means will be described in detail.
FIG. 8 shows an example of the speech pattern table 16 for each language. Similar to FIG. 5, the language voice pattern table 16 registers the type of language, the voice pattern, and the appearance probability of the voice pattern corresponding to the language type.
In the above table, a plurality of types of speech patterns that are considered to be answered first by the other party are registered in the language-specific voice pattern table in the same manner as the other party's individual voice pattern table. deep.
Then, as in the case of specifying the individual, the speech pattern of the speech that the call partner first uttered and the speech patterns of various languages registered in advance are collated in the order of appearance probability, and the language type with the highest match rate Determine.
FIG. 9 is a flowchart showing the language specifying process.
First, the first voice of the calling party is captured (step S1), and the first voice pattern A is compared with the voice patterns in the language-specific voice pattern table. That is, as in the case of the individual specifying process, the speech pattern B is extracted from the speech pattern table by language in descending order of appearance rate, the speech patterns A and B are compared, and the hit rate is stored in the language speech pattern hit rate registration means ( Steps S2, S3).
When the comparison is completed, the pattern hit rate registered in the language / speech pattern hit rate registering means is read, and a search is made for a hit rate exceeding a certain value (steps S4 and S5).
If there is a hit rate exceeding a certain value, the one with a high hit rate is identified as the language spoken by the call partner (steps S6 and S7). If no hit rate is found, it is determined that the language cannot be specified, and the process is terminated (step S8).

上記通話相手別音声パターンテーブルの出現確率は、通話相手から電話を受付ける毎に更新され、また、該テーブルに登録されていないの音声を、通話相手が最初の音声として発したり、あるいは新たな通話相手から電話あった場合、これらの音声を上記テーブルに追加登録する。以下、図１０の(i) −(xi)により、上記出現確率の更新処理、音声パターンの追加処理について説明する。
(i)通話相手が、例えば最初の音声として、（ａ）〜（ｂ）のように答えたとする。
(ii)上記最初の音声パターンが得られたら、最初の音声パターンを仮登録する。例えば、上記（ａ）の場合には、「ハイ」の音声パターンｘが仮登録される。また、（ｂ）の場合には、「Hallo 」の音声パターンｘが仮登録される。
(iii) 仮登録された音声パターンｘと図５に示した通話相手別音声パターンテーブルに登録された音声パターンを照合する。
ここで、音声パターンｘと、図５の通話相手別音声パターンテーブルの照合は図１０に示すように、前記出現確率の高い音声パターンから出現確率の低い音声パターンの順に行われる。
(iv)上記照合により、ヒット率が所定の閾値を越える音声パターンがあれば、前記したように電話を受付けた通話相手は、その音声パターンに対応した個人名のものであるとして、その出現確率を更新する。例えば、照合した結果、前記図５において音声パターンＢ１と音声パターンｘとのヒット率が所定の閾値を越えている場合、電話を受付けた通話相手は、「Ｂさん」であるとして、音声パターンＢ１の出現回数を１増やして、「Ｂさん」の音声パターンＢ１の出現確率を計算して更新する。ここで、上記音声パターンｘと、既存の音声パターンＢ１とを置き換えてもよい。このようにすれば、音声パターンを最新のものに更新することができる。
(v) 以上のように通話相手が特定されたら、前記したように言語種別情報テーブル１７を参照して、通話相手の言語種別を特定し、特定された言語種別を言語種別情報登録部１５に登録する。 The appearance probability of the above-mentioned voice pattern table for the other party is updated every time a call is received from the other party, and the voice that is not registered in the table is emitted as the first voice by the other party or a new call is made. When a call is received from the other party, these voices are additionally registered in the table. Hereinafter, the appearance probability update process and the voice pattern addition process will be described with reference to (i)-(xi) of FIG.
(i) Assume that the other party answers, for example, as (a) to (b) as the first voice.
(ii) When the first voice pattern is obtained, the first voice pattern is provisionally registered. For example, in the case of (a), the “high” voice pattern x is provisionally registered. In the case of (b), the voice pattern x of “Hallo” is provisionally registered.
(iii) The temporarily registered voice pattern x is collated with the voice pattern registered in the voice pattern table for each caller shown in FIG.
Here, as shown in FIG. 10, the collation of the voice pattern x and the voice pattern table by call partner in FIG. 5 is performed in the order of the voice pattern having the highest appearance probability to the voice pattern having the lowest appearance probability.
(iv) If there is a voice pattern with a hit rate exceeding a predetermined threshold by the above collation, the call partner who accepted the call as described above is assumed to have the personal name corresponding to the voice pattern, and its appearance probability Update. For example, as a result of the collation, if the hit rate between the voice pattern B1 and the voice pattern x in FIG. 5 exceeds a predetermined threshold, it is determined that the other party receiving the call is “Mr. B” and the voice pattern B1 Is increased by 1, and the appearance probability of the voice pattern B1 of “Mr. B” is calculated and updated. Here, the voice pattern x may be replaced with the existing voice pattern B1. In this way, the voice pattern can be updated to the latest one.
(v) When the other party is specified as described above, the language type information table 17 is referred to as described above, the language type of the other party is specified, and the specified language type is stored in the language type information registration unit 15. sign up.

(vi)一方、ヒット率が閾値を越える音声パターンがない場合には、前記したように、音声パターンｘと、言語別音声パターンテーブルに登録された言語別音声パターンを比較して、通話相手の話す言語を特定する。
(vii) 通話相手の話す言語が特定できれば、特定された言語種別を言語種別情報登録部１５に登録する。
(viii)通話相手の話す言語が特定できなければ、電話を多国語対応可能な対応者に転送する。
(ix)通話相手の話す言語が特定され、かつ、対応者が電話で会話することにより通話相手との通話により通話相手の個人名がわかった場合には、通話相手の個人名が通話相手別音声パターンテーブルに登録済みであるかを調べる。
(x) 通話相手が、通話相手別音声パターンテーブルに登録済みであれば、通話相手別音声パターンテーブルの該当する個人名の欄に音声パターンｘを新たに登録し、その欄の音声パターンの出現確率を更新する。例えば、図１０に示すように、通話相手がＣの場合、通話相手Ｃの音声パターンに音声パターンｘをＣ４として追加し出現確率を計算して登録する。また、音声パターンＣ１〜Ｃ３の出現確率を更新する。
(xi)通話相手が通話相手別音声パターンテーブルに登録済みでなければ、図１０に示すように、通話相手別音声パターンテーブルに、音声パターンｘを、特定された通話相手名に対応付けて登録する。また、出現確率を登録する（この場合、出現確率は１００％）。 (vi) On the other hand, if there is no voice pattern whose hit rate exceeds the threshold, as described above, the voice pattern x is compared with the language-specific voice pattern registered in the language-specific voice pattern table, and Identify the spoken language.
(vii) If the language spoken by the call partner can be specified, the specified language type is registered in the language type information registration unit 15.
(viii) If the language spoken by the other party cannot be specified, the call is forwarded to a person who can handle multiple languages.
(ix) If the language spoken by the other party is specified, and the corresponding person knows the individual name of the other party through a call with the other party, the individual name of the other party Check if it is registered in the voice pattern table.
(x) If the other party has already been registered in the voice pattern table for each caller, a new voice pattern x is registered in the corresponding personal name column of the caller-specific voice pattern table, and the voice pattern in that column appears. Update the probability. For example, as shown in FIG. 10, when the other party is C, the voice pattern x is added as C4 to the voice pattern of the other party C, and the appearance probability is calculated and registered. Also, the appearance probability of the voice patterns C1 to C3 is updated.
(xi) If the other party is not registered in the voice pattern table for each caller, as shown in FIG. 10, the voice pattern x is registered in the caller-specific voice pattern table in association with the specified caller name. To do. Also, the appearance probability is registered (in this case, the appearance probability is 100%).

なお、前記言語音声パターンテーブルは、各言語種別について、予め、通話相手が最初に発すると考えられる音声パターンを出現確率とともに登録したものであり、上記通話相手別音声パターンテーブルのように、通話相手が特定される毎に、テーブルの出現確率等を更新することは想定していないが、上記通話相手別音声パターンテーブルと同様、上記図１０で説明したように言語種別が特定できた場合、該特定された言語種別が言語別音声パターンテーブルに登録済みであれば、言語別音声パターンテーブルの該当する言語種別の欄に音声パターンを新たに登録し、その欄の音声パターンの出現確率を更新するようにしてもよい。 In addition, the language voice pattern table is obtained by previously registering a voice pattern considered to be the first caller for each language type together with an appearance probability. Although it is not assumed that the appearance probability of the table is updated every time is specified, the language type can be specified as described in FIG. If the identified language type is already registered in the language-specific speech pattern table, the speech pattern is newly registered in the corresponding language type column of the language-specific speech pattern table, and the appearance probability of the speech pattern in that column is updated. You may do it.

本発明の実施例のシステム構成を示す図である。It is a figure which shows the system configuration | structure of the Example of this invention. 本発明の実施例の処理装置の機能ブロック図である。It is a functional block diagram of the processing apparatus of the Example of this invention. 本発明の実施例の全体処理を示すフローチャートである。It is a flowchart which shows the whole process of the Example of this invention. 言語種別情報テーブルの一例を示す図である。It is a figure which shows an example of a language classification information table. 通話相手別音声パターンテーブルの一例を示す図である。It is a figure which shows an example of the audio | voice pattern table classified by telephone call partner. 個人特定処理を示すフローチャートである。It is a flowchart which shows a person specific process. パターンヒット率登録手段の登録例を示す図である。It is a figure which shows the example of registration of a pattern hit rate registration means. 言語別音声パターンテーブルの一例を示す図である。It is a figure which shows an example of the audio | voice pattern table according to language. 言語特定処理を示すフローチャートである。It is a flowchart which shows a language specific process. 通話相手別音声パターンテーブルの出現確率の更新、音声パターンの追加登録を説明する図である。It is a figure explaining the update of the appearance probability of the voice pattern table classified by call partner, and the additional registration of a voice pattern.

Explanation of symbols

１０音声パターン登録手段
１１音声パターン記憶部
１２個人特定手段
１３通話相手別音声パターンテーブル
１４言語特定手段
１５言語種別情報登録部
１６言語別音声パターンテーブル
１７言語種別情報テーブル
１８音声認識手段
１９言語別音声認識辞書
２０翻訳手段
２１表示／音声出力手段
２２言語別翻訳辞書
２３電話転送手段
１００ａ交換機
１００ｂ交換機
１０１公衆網
１０２処理装置
１０３ａ電話機
１０３ｂ電話機 10 Voice pattern registration means 11 Voice pattern storage unit
DESCRIPTION OF SYMBOLS 12 Individual identification means 13 Voice pattern table according to call party 14 Language identification means 15 Language type information registration part 16 Voice pattern table according to language 17 Language type information table 18 Speech recognition means 19 Speech recognition dictionary according to language 20 Translation means 21 Display / speech output Means 22 Language-specific translation dictionary 23 Telephone transfer means 100a exchange 100b exchange 101 public network 102 processing device 103a telephone 103b telephone

Claims

A telephone reception translation system that receives a call from a customer, translates the language spoken by the other party of the received telephone, and talks to the other party.
Voice pattern storage means for storing the first voice uttered by the calling party, the appearance probability of the voice and the number of appearances in association with the personal name;
Corresponding to a pre-registered language type, a language type storage means for storing a voice pattern for each language type that the call partner is supposed to respond to first, and an appearance probability that the voice pattern appears;
When receiving a call, the first voice pattern issued by the call partner and the voice pattern registered in the voice pattern storage means are collated in the order of appearance probability, and the individual specifying means for specifying the call partner;
When an individual is specified by the individual specifying means, a language type extracting means for specifying the language of the other party with reference to language type information storing a language type corresponding to the specified personal name, and the specified language Translation means for translating the language spoken by the other party according to the type,
A language type specifying means for specifying the language of the call partner by comparing the first voice pattern emitted by the call partner and the voice pattern stored in the language type storage means in the order of appearance probability when the call partner cannot be specified;
A telephone reception translation system comprising: language type registration means for registering in the voice pattern storage means the language type information spoken by the specified caller in correspondence with the personal name .

A program that receives a call from a customer and causes a computer to execute a telephone reception translation process for translating the language spoken by the other party of the received telephone,
When receiving a call, the first voice uttered by the other party, the appearance probability and the number of appearances of the voice are stored in the storage means with reference to the voice pattern storage means stored in association with the personal name . The process of matching the voice pattern with the first voice pattern emitted by the other party to identify the other party,
When an individual is identified by the above process, a process of extracting a language type for identifying a language of a call partner with reference to language type information storing a language type corresponding to the specified personal name;
A process of translating the language spoken by the other party according to the specified language type ,
A language type memory that stores a speech pattern for each language type that the call partner is supposed to respond to first, and an appearance probability of the appearance of the speech pattern in association with a pre-registered language type when the other party cannot be specified The language type specifying process for specifying the language of the other party by comparing the first voice pattern emitted by the other party with the voice pattern stored in the language type storage unit in the order of appearance probability, with reference to the means,
A telephone reception translation processing program that causes a computer to execute processing for registering language type information spoken by a specified call partner in the voice pattern storage means in association with an individual name .