JP2002252705A

JP2002252705A - Method and device for detecting talker id

Info

Publication number: JP2002252705A
Application number: JP2001050871A
Authority: JP
Inventors: Tetsuya Muroi; 哲也室井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-02-26
Filing date: 2001-02-26
Publication date: 2002-09-06

Abstract

PROBLEM TO BE SOLVED: To provide a method and device for detecting a talker ID that limits number of recognition objects to recognize a talker's voice and to identify the talker in the case of recognizing and identifying the talker making a phone call. SOLUTION: Upon the receipt of a phone call, a caller number is detected (S1), the detected caller number and telephone numbers in a data storage table are compared (S2) by using the data storage table where each telephone number, each talker ID and each of voice data to identify the talker are recorded as each set, the data storage table including a telephone number in matching with the detected caller number is extracted as an object. If there exist a plurality of extracted objects, the voice data to identify the talker are collated with a phone voice of the caller (S4), and the talker ID with the highest similarity is selected and the talker is identified.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、話者ＩＤ検出方法
及び装置、より詳細には、留守番電話や電話自動応答シ
ステム、コールセンターなどにおいて、電話をかけてき
た話者ＩＤを特定する機器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for detecting a speaker ID, and more particularly, to a device for specifying a speaker ID of an incoming call in an answering machine, an automatic telephone answering system, a call center, and the like.

【０００２】[0002]

【従来の技術】電話をかけてきた話者ＩＤを特定する方
法として、例えば、特開平１０−２８５２８６号公報に
記載された発信者の電話番号の検出に検出装置を使う方
法が広く知られている。しかし、電話番号を検出するだ
けでは、話者を特定することができない場合がある。例
えば、会社や家庭の電話では、同じ電話番号で複数の人
が電話を掛けるので、その中の誰かを特定することがで
きない。また、逆に企業などで複数の電話回線を持って
いる場合には、電話を掛けるたびに異なった回線を使っ
てしまうため、それを同じ話者である、と判定すること
ができない。2. Description of the Related Art As a method of specifying a caller ID of a caller, for example, a method of using a detecting device for detecting a telephone number of a caller disclosed in Japanese Patent Application Laid-Open No. 10-285286 is widely known. I have. However, it may not be possible to identify the speaker only by detecting the telephone number. For example, in the case of a company or home telephone, since a plurality of people call using the same telephone number, it is not possible to identify any of them. Conversely, when a company has a plurality of telephone lines, each time a call is made, a different line is used, and it cannot be determined that they are the same speaker.

【０００３】一方、特開平１０−３２２４５０号公報に
記載された電話を掛けてきた話者の音声や電話を受けた
話者（企業の受付オペレータなど）の音声、あるいはそ
の両方の音声を認識して、話者ＩＤを特定する方法も知
られている。しかし、企業の受付など、話者ＩＤを得る
べき対象の顧客データベースなどが膨大な場合には、認
識対象が非常に多くなってしまうので、誤認識の可能性
が大きい、という不具合があった。このため、正確な話
者ＩＤが得られないので、過去の注文内容や修理依頼内
容、質問内容などの顧客情報が正確に得られず、発信者
と受信者の円滑な会話ができないという不具合があっ
た。On the other hand, Japanese Patent Laid-Open Publication No. Hei 10-322450 recognizes the voice of a caller, the voice of a speaker who receives a call (such as a receptionist in a company), or both. Thus, a method of specifying a speaker ID is also known. However, when the customer database from which a speaker ID is to be obtained is large, such as in a company reception, the number of recognition targets is very large, and there is a problem that the possibility of erroneous recognition is large. For this reason, since an accurate speaker ID cannot be obtained, customer information such as past order contents, repair request contents, and question contents cannot be accurately obtained, and a problem that a smooth conversation between a sender and a receiver cannot be performed. there were.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、上述の点に
鑑みてなされたものであり、認識対象を絞って音声を認
識することで、正確に話者ＩＤを取得すること、を目的
としてなされたものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and has as its object to accurately acquire a speaker ID by recognizing speech by narrowing down a recognition target. It was done.

【０００５】[0005]

【課題を解決するための手段】請求項１の発明は、電話
を受信した時に発信者番号を検出し、電話番号と話者Ｉ
Ｄと話者を識別するための音声データを組としてなるデ
ータ保持テーブルを用いて前記検出された発信者番号と
該データ保持テーブルの電話番号とを比較し、該検出さ
れた発信者番号に一致する電話番号を有するデータ保持
テーブルを候補として、前記話者を識別するための音声
データと発信者の電話音声とを照合し、最も類似度が高
い話者ＩＤを得ることを特徴としたものである。According to the first aspect of the present invention, when a telephone call is received, a caller ID is detected, and the telephone number and the speaker ID are detected.
The detected caller number is compared with the telephone number in the data hold table using a data holding table as a set of D and voice data for identifying the speaker, and matches with the detected caller number. A data holding table having a telephone number to be used as a candidate, collating the voice data for identifying the speaker with the telephone voice of the caller, and obtaining a speaker ID having the highest similarity. is there.

【０００６】請求項２の発明は、電話を受信した時に発
信者番号を検出し、電話番号と話者ＩＤと話者を識別す
るための音声データを組としてなるデータ保持テーブル
を用いて前記検出された発信者番号と該データ保持テー
ブルの電話番号とを比較し、該検出された発信者番号に
先頭から部分一致する電話番号を有するデータ保持テー
ブルを候補として、前記話者を識別するための音声デー
タと発信者の電話音声とを照合し、最も類似度が高い話
者ＩＤを得ることを特徴としたものである。According to a second aspect of the present invention, when a call is received, a caller number is detected, and the detection is performed using a data holding table in which a telephone number, a speaker ID, and voice data for identifying the speaker are combined. Comparing the detected caller number with the telephone number of the data holding table, and identifying the speaker as a data holding table having a telephone number that partially matches the detected caller number from the beginning as a candidate. The voice data is compared with the caller's telephone voice to obtain a speaker ID having the highest similarity.

【０００７】請求項３の発明は、電話を受信した時に発
信者番号を検出し、電話番号と話者ＩＤと話者を識別す
るための音声データと話者を識別するためのＩＤデータ
を組としてなるデータ保持テーブルを用いて前記検出さ
れた発信者番号と該データ保持テーブルの電話番号とを
比較し、該検出された発信者番号に一致する電話番号を
有するデータ保持テーブルを候補として、前記話者を識
別するためのＩＤデータと受信者の前記話者に対する応
答音声とを照合し、最も類似度が高い話者ＩＤを得るこ
とを特徴としたものである。According to a third aspect of the present invention, a caller number is detected when a telephone call is received, and a telephone number, a speaker ID, voice data for identifying the speaker, and ID data for identifying the speaker are combined. Compare the detected caller number and the telephone number of the data holding table using a data holding table as a, and as a candidate a data holding table having a telephone number that matches the detected caller number, ID data for identifying a speaker is compared with a response voice of the receiver to the speaker, and a speaker ID having the highest similarity is obtained.

【０００８】請求項４の発明は、電話を受信した時に発
信者番号を検出し、電話番号と話者ＩＤと話者を識別す
るための音声データと話者を識別するためのＩＤデータ
を組としてなるデータ保持テーブルを用いて前記検出さ
れた発信者番号と該データ保持テーブルの電話番号とを
比較し、該検出された発信者番号に先頭から部分一致す
る電話番号を有するデータ保持テーブルを候補として、
前記話者を識別するためのＩＤデータと受信者の前記話
者に対する応答音声とを照合し、最も類似度が高い話者
ＩＤを得ることを特徴としたものである。According to a fourth aspect of the present invention, a caller number is detected when a telephone call is received, and a telephone number, a speaker ID, voice data for identifying the speaker, and ID data for identifying the speaker are combined. The detected caller number is compared with the telephone number in the data hold table using the data hold table as a candidate for the data hold table having a telephone number that partially matches the detected caller number from the beginning. As
ID data for identifying the speaker is compared with a response voice of the receiver to the speaker, and a speaker ID having the highest similarity is obtained.

【０００９】請求項５の発明は、電話を受信した時に発
信者番号を検出する発信者番号検出手段と、電話番号と
話者ＩＤと話者を識別するための音声データを組として
なるデータ保持テーブルを保持するデータ保持手段とを
具備し、前記発信者番号検出手段を用いて検出された発
信者番号と前記データ保持テーブルの電話番号とを比較
し、該検出された発信者番号に一致する電話番号を有す
るデータ保持テーブルを候補として、前記話者を識別す
るための音声データと発信者の電話音声とを照合して最
も類似度が高い話者ＩＤを得ることを特徴としたもので
ある。According to a fifth aspect of the present invention, there is provided a caller number detecting means for detecting a caller number when receiving a telephone call, and a data holding unit comprising a telephone number, a speaker ID, and voice data for identifying the speaker. A data holding means for holding a table, comparing the caller number detected by using the caller number detecting means with a telephone number in the data holding table, and matching the detected caller number. The data holding table having a telephone number is set as a candidate, and the voice data for identifying the speaker is compared with the telephone voice of the caller to obtain a speaker ID having the highest similarity. .

【００１０】請求項６の発明は、電話を受信した時に発
信者番号を検出する発信者番号検出手段と、電話番号と
話者ＩＤと話者を識別するための音声データを組として
なるデータ保持テーブルを保持するデータ保持手段とを
具備し、前記発信者番号検出手段を用いて検出された発
信者番号と前記データ保持テーブルの電話番号とを比較
し、該検出された発信者番号に先頭から部分一致する電
話番号を有するデータ保持テーブルを候補として、前記
話者を識別するための音声データと発信者の電話音声の
データとを照合して最も類似度が高い話者ＩＤを得るこ
とを特徴としたものである。According to a sixth aspect of the present invention, there is provided a caller number detecting means for detecting a caller number when a telephone call is received, and a data holding unit comprising a telephone number, a speaker ID, and voice data for identifying the speaker. Data holding means for holding a table, comparing the caller number detected by using the caller number detecting means and the telephone number of the data holding table, from the top to the detected caller number Assuming that a data holding table having a partially matching telephone number is a candidate, voice data for identifying the speaker is compared with telephone call data of the caller to obtain a speaker ID having the highest similarity. It is what it was.

【００１１】請求項７の発明は、電話を受信した時に発
信者番号を検出する発信者番号検出手段と、電話番号と
話者ＩＤと話者を識別するための音声データと話者を識
別するためのＩＤデータを組としてなるデータ保持テー
ブルを保持するデータ保持手段とを具備し、前記発信者
番号検出手段を用いて検出された発信者番号と前記デー
タ保持テーブルの電話番号とを比較し、該検出された発
信者番号に一致する電話番号を有するデータ保持テーブ
ルを候補として、前記話者を識別するためのＩＤデータ
と受信者の前記話者に対する応答音声とを照合して最も
類似度が高い話者ＩＤを得ることを特徴としたものであ
る。According to a seventh aspect of the present invention, a caller number detecting means for detecting a caller number when a call is received, a telephone number, a speaker ID, and voice data for identifying the speaker and the speaker are identified. And a data holding means for holding a data holding table as a set of ID data for comparing the caller number detected by using the caller number detecting means and the telephone number of the data holding table, Using the data holding table having a telephone number that matches the detected caller number as a candidate, the ID data for identifying the speaker is compared with the response voice of the receiver to the speaker, and the similarity is determined to be the highest. It is characterized by obtaining a high speaker ID.

【００１２】請求項８の発明は、電話を受信した時に発
信者番号を検出する発信者番号検出手段と、電話番号と
話者ＩＤと話者を識別するための音声データと話者を識
別するためのＩＤデータを組としてなるデータ保持テー
ブルを保持するデータ保持手段とを具備し、前記発信者
番号検出手段を用いて検出された発信者番号と前記デー
タ保持テーブルの電話番号とを比較し、該検出された発
信者番号に先頭から部分一致する電話番号を有するデー
タ保持テーブルを候補として、前記話者を識別するため
のＩＤデータと受信者の前記話者に対する応答音声とを
照合して最も類似度が高い話者ＩＤを得ることを特徴と
したものである。[0012] The invention of claim 8 is a caller number detecting means for detecting a caller number when receiving a telephone call, a telephone number, a speaker ID, and voice data for identifying the speaker and the speaker. And a data holding means for holding a data holding table as a set of ID data for comparing the caller number detected by using the caller number detecting means and the telephone number of the data holding table, A candidate is a data holding table having a telephone number that partially matches the detected caller number from the beginning, and compares the ID data for identifying the speaker with the response voice of the receiver to the speaker. The feature is that a speaker ID having a high degree of similarity is obtained.

【００１３】請求項９の発明は、請求項１に記載の話者
ＩＤ検出方法をコンピュータに実行させるプログラムを
記録したコンピュータ読み取り可能な記録媒体である。According to a ninth aspect of the present invention, there is provided a computer-readable recording medium storing a program for causing a computer to execute the speaker ID detecting method according to the first aspect.

【００１４】請求項１０の発明は、請求項２に記載の話
者ＩＤ検出方法をコンピュータに実行させるプログラム
を記録したコンピュータ読み取り可能な記録媒体であ
る。According to a tenth aspect of the present invention, there is provided a computer-readable recording medium storing a program for causing a computer to execute the speaker ID detecting method according to the second aspect.

【００１５】請求項１１の発明は、請求項３に記載の話
者ＩＤ検出方法をコンピュータに実行させるプログラム
を記録したコンピュータ読み取り可能な記録媒体であ
る。According to an eleventh aspect of the present invention, there is provided a computer-readable recording medium storing a program for causing a computer to execute the speaker ID detecting method according to the third aspect.

【００１６】請求項１２の発明は、請求項４に記載の話
者ＩＤ検出方法をコンピュータに実行させるプログラム
を記録したコンピュータ読み取り可能な記録媒体であ
る。According to a twelfth aspect of the present invention, there is provided a computer-readable recording medium storing a program for causing a computer to execute the speaker ID detecting method according to the fourth aspect.

【００１７】[0017]

【発明の実施の形態】（実施例１）図１は、本発明が適
用される話者ＩＤ検出装置の構成例を示すブロック図
で、図中、１は、発信者番号検出手段、２は、データ保
持手段、３は、照合手段である。公衆網などの外部から
掛けられた電話に対し、発信者番号検出手段１によっ
て、発信者番号が検出される。発信者番号の検出手段に
関しては、既に製品化されて広く利用されており、ここ
での説明は省略する。(Embodiment 1) FIG. 1 is a block diagram showing a configuration example of a speaker ID detecting apparatus to which the present invention is applied. , Data holding means 3 and collation means. A caller number is detected by a caller number detecting means 1 for a call made from outside such as a public network. The means for detecting the caller ID has already been commercialized and widely used, and a description thereof will be omitted.

【００１８】図２は、本発明におけるデータ保持テーブ
ルの一例を示す図で、図中、２ａは、データ保持手段２
に記録されたデータ保持テーブルで、該データ保持テー
ブル２ａは、顧客番号など電話を掛けてきた話者を特定
するための話者ＩＤと電話番号と話者を識別するための
音声データを１つの組として保持している。FIG. 2 is a diagram showing an example of a data holding table according to the present invention. In FIG.
The data holding table 2a stores a speaker ID such as a customer number, a telephone number, and a voice number for identifying a speaker. Holding as a pair.

【００１９】話者を識別するための情報として、広く公
知である話者認識や話者照合と呼ばれる話者を識別する
技術で用いられるデータ（パラメータ）を準備すれば良
い。詳細は省略するが、例えば、言葉によらない長時間
スペクトルやあるいは、「ひらけゴマ」のような話者ご
とに特定の言葉と話者性を同時に計測するようなパラメ
ータがある。As information for identifying a speaker, data (parameters) used in a widely known technique for identifying a speaker called speaker recognition or speaker verification may be prepared. Although details are omitted, for example, there is a long-time spectrum that does not depend on words, or a parameter such as “Hiraki Sesame” that simultaneously measures a specific word and speaker characteristics for each speaker.

【００２０】図３は、本発明が適用される話者ＩＤ検出
処理の一例を説明するフローチャートである。まず、電
話を掛けてきた発信者の番号を取得し（ステップＳ
１）、図２に示したデータ保持テーブル２ａを用いて検
出された電話番号に一致するデータの組を検出し、候補
とする（ステップＳ２）。その候補が１つかどうか判断
し（ステップＳ３）、候補が１つしかない場合（ＹＥＳ
の場合）には、話者ＩＤは１つに決定されるので処理を
終了する。一方、一致する電話番号を有するデータの組
が複数ある場合（ＮＯの場合）には、これらをすべて候
補として、電話音声と照合を行ない（ステップＳ４）、
最も類似度の高い候補を検出し、この話者ＩＤを出力す
る（ステップＳ５）。FIG. 3 is a flowchart illustrating an example of the speaker ID detection processing to which the present invention is applied. First, the caller number of the caller is obtained (step S).
1) Using the data holding table 2a shown in FIG. 2, a data set that matches the detected telephone number is detected and set as a candidate (step S2). It is determined whether there is one candidate (step S3), and if there is only one candidate (YES)
In this case, the number of speaker IDs is determined to be one, and the process ends. On the other hand, when there are a plurality of data sets having the same telephone number (in the case of NO), these are all set as candidates and collation with the telephone voice is performed (step S4).
The candidate with the highest similarity is detected, and this speaker ID is output (step S5).

【００２１】図３に示した話者ＩＤ検出処理フローの他
の実施例として、検出された電話番号とデータ保持テー
ブル２ａの電話番号とを比較し、電話番号の先頭部分か
ら部分一致するデータの組を検出する。例えば、部分一
致の桁数を６桁とする場合、発信者番号が０１２３４５
６７８９であったとすると、データ保持テーブル２ａの
中で電話番号が０１２３４５ｘｘｘｘ（ｘは任意の数
字）であるデータの組の検出を行ない、これらをすべて
候補として、電話音声と照合を行ない、最も類似度の高
い候補を検出し、この話者ＩＤを出力する。As another embodiment of the flow of the speaker ID detection processing shown in FIG. 3, the detected telephone number is compared with the telephone number in the data holding table 2a, and the data of the partially coincident data from the head of the telephone number is compared. Find pairs. For example, when the number of digits of the partial match is 6 digits, the caller number is 012345.
If it is 6789, a data set having a telephone number of 012345xxxx (x is an arbitrary number) in the data holding table 2a is detected, all of them are set as candidates, collation with telephone voice is performed, and And outputs the speaker ID.

【００２２】また、図３に示した話者ＩＤ検出処理フロ
ーの他の実施例として、発信者を識別するためのＩＤ情
報として、発信者の名称などをデータ保持テーブル２ａ
に登録する。照合の際には、受信者の音声でこの名称を
認識する。受信者は、企業の電話オペレータなどあらか
じめ話者が特定できるので、特定話者型の音声認識方式
や話者適応型の音声認識方式など、話者が特定できない
不特定話者型の音声認識方式に比べて、認識率の高い音
声認識方式を利用することができる。また、発信者と受
信者との会話の自然性を考えると、受信者が認識のため
だけに、発信者の名称を発声するのは不自然で、通常は
復唱「はい、ＸＸさんですね」のような発声から受信者
の名称（ＸＸの部分）を認識する方が望ましい。そのた
め、あらかじめ想定される復唱のパターンに沿った文法
を利用する連続音声認識か、あるいはワードスポッティ
ングのような方式でＸＸの部分を抜き出すような認識方
式が良い。As another embodiment of the flow of the speaker ID detection processing shown in FIG. 3, as the ID information for identifying the sender, the name of the sender and the like are stored in the data holding table 2a.
Register with. At the time of verification, the name is recognized by the voice of the recipient. Since the receiver can identify the speaker in advance, such as a company telephone operator, an unspecified speaker-type speech recognition method that cannot identify the speaker, such as a specific speaker-type speech recognition method or a speaker-adaptive speech recognition method. , A voice recognition method having a higher recognition rate can be used. Also, given the natural nature of the conversation between the sender and the recipient, it is unnatural for the recipient to utter the name of the sender just for recognition, and the repetition is usually "Yes, XX." It is desirable to recognize the name of the recipient (XX part) from the utterance like this. Therefore, a continuous speech recognition using a grammar according to a presumed repetition pattern or a recognition method of extracting the XX portion by a method such as word spotting is preferable.

【００２３】本実施例の手順としては、まず、発信者の
電話番号と一致するデータの組を検出する。このデータ
の組が複数あるときは、それらの組を候補とする。そし
て、受信者の復唱「はい、ＸＸさんですね」という発声
と候補の名称との照合を行ない、最も類似度の高い候補
の話者ＩＤを出力する。また、発信者の電話番号と先頭
から部分一致するデータの組を候補として、その後、受
信者の復唱と候補の名称との照合を行ない、最も類似度
の高い話者ＩＤを出力することもできる。As a procedure of this embodiment, first, a data set that matches the telephone number of the caller is detected. When there are a plurality of data sets, those sets are set as candidates. Then, the utterance of the receiver, "Yes, XX-san," is collated with the candidate name, and the speaker ID of the candidate with the highest similarity is output. In addition, a set of data that partially matches the telephone number of the caller from the beginning is set as a candidate, and then the repetition of the receiver and the name of the candidate are collated to output the speaker ID with the highest similarity. .

【００２４】（実施例２）図４は、本発明が適用される
話者ＩＤ検出装置の構成例を示すブロック図で、図中、
４は、受信者音声入力手段である。本実施例は、公衆網
などからかかってきた電話に対し、コールセンターなど
企業の電話受付機関が応対するための装置に関するもの
である。(Embodiment 2) FIG. 4 is a block diagram showing a configuration example of a speaker ID detection apparatus to which the present invention is applied.
Reference numeral 4 denotes a receiver voice input unit. The present embodiment relates to a device for allowing a telephone receiving organization of a company such as a call center to answer a call received from a public network or the like.

【００２５】公衆網などからかかってきた電話に対し、
まず発信者番号検出手段１が発信者番号を検出する。こ
の発信者番号検出手段１に関しては、広く知られている
のでここでは説明を省く。検出された発信者番号は、デ
ータ保持テーブル２ａの各データの組の電話番号と比較
され、一致した電話番号あるいは電話番号の先頭から部
分一致した電話番号を有するデータの組が候補として検
出される。その候補が複数あった場合には、まず、電話
オペレータによって発声された音声は、受話器などの受
信者音声入力手段４によって入力され、照合手段３へ送
られる。For a call received from a public network or the like,
First, the caller number detecting means 1 detects a caller number. The caller number detecting means 1 is widely known and will not be described here. The detected caller number is compared with the telephone number of each data set in the data holding table 2a, and a matched telephone number or a data set having a partially matched telephone number from the beginning of the telephone number is detected as a candidate. . If there are a plurality of candidates, first, the voice uttered by the telephone operator is input by the receiver voice input means 4 such as a receiver and sent to the matching means 3.

【００２６】対象となる音声は、「はい、ＸＸさんです
ね」のような発信者の名乗りを復唱する部分であるの
で、会話の先頭部分のみを対象にして認識するようにす
れば良い。また、受信者が、復唱する際に入力スイッチ
を押して、スイッチが押されたときだけ認識するように
しても良い。スイッチを利用する場合には、長い会話音
声の中で音声認識すべき時間的な対象区間を正確に限定
できるのでより一層精密な照合が可能になる。認識を行
なった結果、最も類似度の高い話者ＩＤを出力する。Since the target voice is a part where the caller's name is repeated, such as "Yes, Mr. XX", it is sufficient to recognize only the head part of the conversation. Further, the receiver may press the input switch at the time of reciting and recognize only when the switch is pressed. When a switch is used, a temporal target section to be subjected to voice recognition in a long conversation voice can be accurately limited, so that more precise collation can be performed. As a result of the recognition, a speaker ID having the highest similarity is output.

【００２７】（実施例３）図５は、本発明が適用される
話者ＩＤ検出装置の構成例を示すブロック図で、図中、
５は、データ提示手段である。本実施例は、家庭用など
の電話機に関するものである。データ保持手段２には、
話者ＩＤ、電話番号、話者を識別するための音声デー
タ、および発信者の名称などの話者ＩＤを提示する情報
が組として格納されている。ここで、話者を識別するた
めの音声データは、周知の話者照合技術で使用される話
者固有のデータとする。例えば、長時間スペクトルとい
ったデータである。(Embodiment 3) FIG. 5 is a block diagram showing a configuration example of a speaker ID detecting apparatus to which the present invention is applied.
Reference numeral 5 denotes data presentation means. This embodiment relates to a telephone for home use or the like. The data holding means 2 includes:
Information for presenting a speaker ID such as a speaker ID, a telephone number, voice data for identifying the speaker, and the name of the caller is stored as a set. Here, the voice data for identifying the speaker is speaker-specific data used in a well-known speaker verification technique. For example, it is data such as a long-time spectrum.

【００２８】まず、発信者番号検出手段１が発信者番号
を検出し、この番号に一致するデータをデータ保持手段
２から検出する。一致するデータが１つのときは、その
話者ＩＤをデータ提示手段５に出力する。また、一致す
るデータが複数あった場合には、電話を掛けてきた話者
の音声を照合手段３で照合し、最も類似度の高い話者の
データを選択して、その話者ＩＤを出力する。First, the caller number detecting means 1 detects the caller number, and detects data corresponding to this number from the data holding means 2. When there is one matching data, the speaker ID is output to the data presenting means 5. If there is a plurality of matching data, the voices of the callers are collated by the collation means 3, the data of the speaker having the highest similarity is selected, and the speaker ID is output. I do.

【００２９】データ提示手段５は、ディスプレイなど視
覚的な提示を行なう。この場合には、データ保持手段２
に保持される話者ＩＤの提示情報は、話者の名前や愛称
などのテキスト情報となる。また、データ提示手段５
は、「ＸＸさんから電話です」のように音声で出力する
ようにしても良い。この場合には、データ保持手段２の
話者ＩＤ提示情報はテキスト情報として、テキスト音声
変換の音声合成装置で音声を出力するようにする。ある
いは、データ提示情報として音声波形をそのまま保持
し、再生機でその音声波形を再生するようにしても良
い。図６は、話者ＩＤ提示情報を追加したデータ保持テ
ーブル２ａを示す図である。The data presenting means 5 performs visual presentation such as a display. In this case, the data holding means 2
The presentation information of the speaker ID held in is the text information such as the name and nickname of the speaker. Data presentation means 5
May be output as a voice such as "Call from XX". In this case, the speaker ID presentation information of the data holding unit 2 is output as text information by a text-to-speech conversion speech synthesizer. Alternatively, an audio waveform may be held as it is as data presentation information, and the audio waveform may be reproduced by a reproducing device. FIG. 6 is a diagram showing a data holding table 2a to which speaker ID presentation information is added.

【００３０】図５に示した実施例の他の例を示す。基本
的な動作は、実施例３と同様である。異なる点は、発信
者番号検出手段１で検出された電話番号と先頭から部分
一致する電話番号を検出することである。部分一致の桁
数は、目的によって異なるが、同じ企業内で使用される
ときに、識別することを主な目的とすれば、おおよそ先
頭から８桁程度の部分一致するものを検出するようにす
れば良い。また、外出先の公衆電話から掛けてきた場合
にも識別することを主な目的とすれば、市外局番に相当
する先頭から６桁程度の部分一致する電話番号を検出す
るようにすれば良い。以下の処理は、図５に示した実施
例と同様で部分一致したデータが１つのときは、そのま
まデータ提示手段５にその話者ＩＤを出力し、複数ある
場合には、話者照合を行なって最も類似度の高い話者Ｉ
Ｄを出力するようにする。Another example of the embodiment shown in FIG. 5 is shown. The basic operation is the same as in the third embodiment. The difference is that a telephone number that partially matches the telephone number detected by the caller number detecting means 1 from the beginning is detected. The number of digits of the partial match varies depending on the purpose, but when used within the same company, if the main purpose is to identify, a partial match of approximately eight digits from the beginning should be detected. Good. In addition, if the main purpose is to identify a call from a pay phone on the go, it is sufficient to detect a partially coincident telephone number of about six digits from the beginning corresponding to the area code. . In the following processing, similar to the embodiment shown in FIG. 5, when there is one partially matched data, the speaker ID is output to the data presenting means 5 as it is, and when there are a plurality of data, speaker verification is performed. Speaker I with the highest similarity
D is output.

【００３１】[0031]

【発明の効果】請求項１、５、９の発明によると、正確
に発信者の話者ＩＤが特定できるようになる。つまり、
発信者番号が一致するだけでは、家族や企業など同じ電
話番号を使う話者間の区別ができないが、その識別がで
きるようになり、正しく発信者が特定できるようにな
る。また、照合する範囲が発信者番号の一致している部
分だけであるので、高い認識率で照合することが可能に
なる。さらに、正確に発信者が特定できるので、発信者
固有の情報（過去の注文内容など）に精密にアクセスで
きるようになる。According to the first, fifth and ninth aspects of the present invention, the speaker ID of the caller can be accurately specified. That is,
Just by matching the caller numbers, it is not possible to distinguish between speakers using the same phone number, such as family members and businesses, but it is possible to identify them and to correctly identify the caller. In addition, since the range to be compared is only the part where the caller numbers match, it is possible to perform the comparison with a high recognition rate. Further, since the caller can be specified accurately, it becomes possible to precisely access information unique to the caller (such as past order contents).

【００３２】請求項２、６、１０の発明によると、正確
に発信者の話者ＩＤが特定できるようになる。企業や事
務所のように電話回線が複数ある場合に、同じ話者であ
りながら、異なる発信者番号で電話を掛けてきた場合
に、この話者を特定することができなかったが、発信者
番号の先頭から部分一致する範囲を対象に照合すること
で、その話者ＩＤを正確に特定することが可能になる。According to the second, sixth and tenth aspects of the present invention, the speaker ID of the caller can be specified accurately. When there are multiple telephone lines, such as in a company or office, and the same speaker is called with a different caller ID, the caller could not be identified. By collating a range that partially matches from the beginning of the number, the speaker ID can be accurately specified.

【００３３】請求項３、４、７、８、１１、１２の発明
によると、照合する音声の話者が電話オペレータのよう
に事前に特定できるので、特定話者型あるいは話者照合
型のように認識率の高い音声認識方式を利用でき、精密
な照合によって話者ＩＤが得られるようになる。According to the third, fourth, seventh, eighth, eleventh and twelfth aspects of the present invention, the speaker of the voice to be collated can be specified in advance like a telephone operator. In this case, a voice recognition method having a high recognition rate can be used, and a speaker ID can be obtained by precise collation.

[Brief description of the drawings]

【図１】本発明が適用される話者ＩＤ検出装置の構成
例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a speaker ID detection device to which the present invention is applied.

【図２】本発明におけるデータ保持テーブルの一例を
示す図である。FIG. 2 is a diagram illustrating an example of a data holding table according to the present invention.

【図３】本発明が適用される話者ＩＤ検出処理の一例
を説明するフローチャートである。FIG. 3 is a flowchart illustrating an example of a speaker ID detection process to which the present invention is applied.

【図４】本発明が適用される話者ＩＤ検出装置の構成
例を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration example of a speaker ID detection device to which the present invention is applied;

【図５】本発明が適用される話者ＩＤ検出装置の構成
例を示すブロック図である。FIG. 5 is a block diagram illustrating a configuration example of a speaker ID detection device to which the present invention is applied;

【図６】話者ＩＤ提示情報を追加したデータ保持テー
ブルを示す図である。FIG. 6 is a diagram showing a data holding table to which speaker ID presentation information is added.

[Explanation of symbols]

１…発信者番号検出手段、２…データ保持手段、２ａ…
データ保持テーブル、３…照合手段、４…受信者音声入
力手段、５…データ提示手段。1 ... Caller ID detecting means, 2 ... Data holding means, 2a ...
Data holding table, 3 ... collation means, 4 ... receiver voice input means, 5 ... data presentation means.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 11/00 ３０２Ｇ１０Ｌ 3/00 ５５１Ａ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04M 11/00 302 G10L 3/00 551A

Claims

[Claims]

1. A caller number is detected when a telephone call is received, and the detected caller number is detected using a data holding table in which a telephone number, a speaker ID, and voice data for identifying a speaker are combined. And the telephone number of the data holding table, and using the data holding table having a telephone number that matches the detected caller number as a candidate, voice data for identifying the speaker and telephone sound of the caller And a speaker ID having the highest similarity is obtained.

2. A caller number is detected when a telephone call is received, and the detected caller number is detected using a data holding table in which a telephone number, a speaker ID, and voice data for identifying the speaker are combined. And the telephone number in the data holding table, and using the data holding table having a telephone number partially matching the detected caller number from the beginning as a candidate, voice data for identifying the speaker and the caller A speaker ID having the highest similarity is obtained by collating with the telephone voice of the present invention.

3. A data holding table for detecting a caller number when receiving a telephone call, and assembling a telephone number, a speaker ID, voice data for identifying the speaker, and ID data for identifying the speaker. And comparing the detected caller number with the telephone number in the data holding table, and identifying the speaker as a candidate data holding table having a telephone number that matches the detected caller number. I for
A method for detecting a speaker ID, comprising: comparing D data with a response voice of the receiver to the speaker to obtain a speaker ID having the highest similarity.

4. A data holding table which detects a caller number when receiving a telephone call, and includes a telephone number, a speaker ID, voice data for identifying the speaker, and ID data for identifying the speaker. And comparing the detected caller number with the telephone number in the data holding table, and selecting the data holding table having a telephone number partially matching the detected caller number from the beginning as a candidate, A speaker ID having the highest similarity is obtained by comparing ID data for identifying the ID with a response voice of the receiver to the speaker.

5. A caller number detecting means for detecting a caller number when a telephone call is received, and data holding a data holding table comprising a set of a telephone number, a speaker ID, and voice data for identifying the speaker. Holding means for comparing a caller number detected by using the caller number detecting means with a telephone number in the data holding table, and having a telephone number matching the detected caller number. A speaker ID detection apparatus, wherein a speaker ID having the highest similarity is obtained by collating voice data for identifying said speaker with a telephone voice of a caller using a holding table as a candidate.

6. A caller number detecting means for detecting a caller number when a telephone call is received, and data holding a data holding table comprising a set of a telephone number, a speaker ID, and voice data for identifying the speaker. Holding means for comparing a caller number detected by the caller number detecting means with a telephone number in the data holding table, and a telephone number partially coincident with the detected caller number from the beginning. A speaker ID having the highest similarity by comparing the voice data for identifying the speaker with the data of the telephone voice of the caller as a candidate for a data holding table having Detection device.

7. A caller number detecting means for detecting a caller number when a call is received, a telephone number, a speaker ID, voice data for identifying the speaker, and an ID for identifying the speaker.
Data holding means for holding a data holding table as a set of data, comparing the caller number detected by using the caller number detecting means with the telephone number of the data holding table, and A data holding table having a telephone number that matches the caller number
A speaker ID detection device, wherein ID data for identifying the speaker is compared with a response voice of a receiver to the speaker to obtain a speaker ID having the highest similarity.

8. A caller number detecting means for detecting a caller number when receiving a telephone call, a telephone number, a speaker ID, voice data for identifying the speaker, and an ID for identifying the speaker.
Data holding means for holding a data holding table as a set of data, comparing the caller number detected by using the caller number detecting means with the telephone number of the data holding table, and The ID data for identifying the speaker and the response voice of the receiver to the speaker are compared with each other as a candidate for a data holding table having a telephone number that partially matches the caller number from the beginning. A speaker ID detection device characterized by obtaining a high speaker ID.

9. A computer-readable recording medium on which a program for causing a computer to execute the speaker ID detection method according to claim 1 is recorded.

10. A computer-readable recording medium on which a program for causing a computer to execute the speaker ID detection method according to claim 2 is recorded.

11. A computer-readable recording medium on which a program for causing a computer to execute the speaker ID detection method according to claim 3 is recorded.

12. A computer-readable recording medium storing a program for causing a computer to execute the speaker ID detection method according to claim 4.