JP4438014B1

JP4438014B1 - Harmful customer detection system, method thereof and harmful customer detection program

Info

Publication number: JP4438014B1
Application number: JP2008285862A
Authority: JP
Inventors: 邦俊杉; 一仁横内
Original assignee: 株式会社ネイクス
Priority date: 2008-11-06
Filing date: 2008-11-06
Publication date: 2010-03-24
Anticipated expiration: 2028-11-06
Also published as: JP2010113167A

Abstract

【課題】通話中の話者が予め登録された全有害顧客のそれぞれに該当するか否かを実時間内で検知し、電話応対者に警告する。
【解決手段】声紋データベースを参照して、発信元話者の声紋情報と、声紋データベースに登録された各グループの基準者の声紋情報との間でのみ、相対距離を算出し、該相対距離が最も小さい声紋情報を有する基準者が属するグループを選択するグループ選択部と、発信元話者の声紋情報と、選択されたグループに属する、基準者以外の被登録話者の声紋情報との間の相対距離を算出し、この相対距離の算出を基準者以外の全被登録話者について繰り返し、選択されたグループ内で相対距離が最も小さい声紋情報を有し、かつ該相対距離が、前記第１の閾値より小さい第２の閾値内にある被登録話者を選択する被登録話者選択部とを備える。
【選択図】図１PROBLEM TO BE SOLVED: To detect in real time whether or not a speaker in a call corresponds to each of all harmful customers registered in advance, and warn a telephone representative.
SOLUTION: With reference to a voiceprint database, relative distance is calculated only between voiceprint information of a source speaker and voiceprint information of a reference person of each group registered in the voiceprint database, and the relative distance is calculated Between the group selection unit which selects the group to which the reference person having the smallest voiceprint information belongs, the voiceprint information of the originating speaker, and the voiceprint information of the registered speaker other than the reference person belonging to the selected group The relative distance is calculated, and the calculation of the relative distance is repeated for all registered speakers other than the reference person, and voiceprint information having the smallest relative distance in the selected group is included, and the relative distance is the first And a registered speaker selection unit for selecting a registered speaker within a second threshold smaller than the threshold.
[Selected figure] Figure 1

Description

本発明は、有害顧客検知システム、その方法及び有害顧客検知プログラムに関する。より詳しくは、例えば顧客の電話と応対担当者の電話との間でなされた通話を録音蓄積して管理するＣｕｓｔｏｍｅｒＲｅｌａｔｉｏｎｓｈｉｐＭａｎａｇｅｍｅｎｔ（ＣＲＭ）システムにおいて、円滑な電話応対業務を阻害するような有害顧客を、例えば話者識別技術により検知して応対担当者に自動的に警告するための技術に関する。 The present invention relates to a harmful customer detection system, a method thereof and a harmful customer detection program. More specifically, for example, in a Customer Relationship Management (CRM) system that records and manages calls made between a customer's phone and a person-in-charge's phone, a harmful customer who interferes with smooth telephone handling operations is For example, the present invention relates to a technology for automatically detecting a person in charge of a response by detecting the speaker identification technology.

顧客と事業者との間でなされた音声通話を事業者側において録音して管理する各種技術が提案されている。 There have been proposed various techniques for recording and managing voice calls made between a customer and a carrier at the carrier side.

例えば、特許文献１は、顧客からの電話応対部署であるコールセンタにおけるオペレータの通話内容をデータ化して録音すると共に検索するための、中央集中型通話録音システムを開示する。 For example, Patent Document 1 discloses a centralized call recording system for digitizing, recording and searching an operator's call contents in a call center which is a telephone answering department from a customer.

一般に、事業者が運営するコールセンタ等の構内には、公衆電話交換回線網（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ：ＰＳＴＮ）からの発信及び着信が集中する交換機（ＰＢＸ）が設置され、この交換機により音声通話が、コールセンタ構内の複数の固定電話に分配される。このため、この交換機から分岐する通話録音サーバを設ければ、通話を録音蓄積することができる。オペレータ側には、電話応対用内線電話と共に、ＰＣなどの端末装置が設けられてよく、このオペレータ端末装置には、発話者が告げた顧客名をキーとして顧客情報を検索する機能や、当該顧客の過去の通話履歴を表示する機能が備えられてよい。 In general, a switchboard (PBX) where calls and calls from a Public Switched Telephone Network (PSTN) are concentrated is installed in a premises such as a call center operated by a business operator, and a voice call is made by this switch. Distributed to multiple landlines in call center premises. Therefore, if a call recording server branched from the exchange is provided, the call can be recorded and stored. On the operator side, a terminal device such as a PC may be provided together with a telephone answering extension telephone, and this operator terminal device has a function of searching customer information using a customer name told by a speaker as a key, and the customer A function may be provided to display the past call history of

一方、特許文献２は、事業者から顧客へ発呼されるアウトバウンドコールを大量に行なうための分散型通話録音システムを開示する。
特開２００６−９４２６０号公報特開２００５−２１０２２７号公報 On the other hand, Patent Document 2 discloses a distributed call recording system for making a large number of outbound calls made by an operator to a customer.
JP, 2006-94260, A JP, 2005-210227, A

ところで、一般にコールセンターでは、凡そ事業者が製造販売するあらゆる製品を購入した顧客に対して、また場合によっては製品購入者に限らず一般消費者に対して、広く電話受け付けの対象としている。このため、事業者の製品の問題点を過剰に誇張して非難したり、或いは接客態度やサービスの不手際を過剰に追及する有害顧客、いわゆるクレーマーからの苦情電話であっても、コールセンタのオペレータが電話応対せざるを得ない。 By the way, in general, in the call center, the call reception is widely targeted to customers who purchase almost all products manufactured and sold by business operators, and in some cases, not only to the product purchasers but also to general consumers. For this reason, the call center operator can overwhelm the problem of the product of the business operator excessively or criticize the customer's attitude or service failure, or the complaint call from the so-called "cremer" I can not but answer the phone.

しかしながら、こうした有害顧客からの電話は長時間に亘り通話を占有することも多く、さらに執拗に多数回の通話がなされることも多いため、オペレータの円滑な電話応対業務が阻害され、オペレータの電話応対の効率が著しく低下してしまう。他方、有害顧客からのときに脅迫的な苦情に電話応対する際に、その応対を誤ると、これを奇貨としてさらに不当な金銭賠償を求められたり、法的措置に及ばれたりすることにより、事業者を訴訟リスクに晒し、さらに事業者の信用を毀損する情報をインターネット上で流布されればその社会的信用も損ないかねない。特に、コールセンターオペレータが社員でなく、アウトソース先の社外の契約者であった場合には、有害顧客に対するオペレータの電話応対の質及び効率を同時に維持することは一層困難となる。 However, the calls from such harmful customers often occupy calls for a long time, and furthermore, many calls are made repeatedly and frequently, which hinders the operator's smooth telephone service and the operator's call. The efficiency of the response is significantly reduced. On the other hand, when responding to a threatening complaint from a harmful customer by telephone, if the response is wrong, this may be used as a paradox to be called for further unfair monetary compensation or to be extended to legal action. If business operators are exposed to litigation risk, and information disseminating the business operators' credit is disseminated on the Internet, their social trust may be lost. In particular, if the call center operator is not an employee but an outsourced contractor, it will be more difficult to simultaneously maintain the quality and efficiency of the operator's telephone response to harmful customers.

このため、受け付けた通話が上記のような有害顧客からの通話であることを、コールセンタのオペレータに、電話応対の初期段階で通知、警告することが強く要請される。 For this reason, it is strongly required to notify and warn the call center operator at the initial stage of telephone service that the received call is a call from the above-mentioned harmful customer.

ここで、通常の顧客からの通話であれば、電話を受け付けた際に、オペレータが、顧客名や住所、製品シリアル番号その他の顧客を特定するための１つ又は複数の識別子を顧客に告げさせ、この告げられた識別子をキーとして、顧客データベースを検索し、検索された顧客情報や過去の通話履歴情報をオペレータ端末に表示することによって、当該顧客が誰であり、過去にどのような電話応対がされたのかを容易に把握することができる。また、通信キャリアから得られる発信者電話番号をキーに顧客データベースを検索することによっても、顧客名を特定することができる。 Here, in the case of a normal customer call, when the operator receives a call, the operator is made to tell the customer one or more identifiers for identifying the customer name, address, product serial number, and other customers. By searching the customer database using the told identifier as a key, and displaying the retrieved customer information and past call history information on the operator terminal, who is the customer and what telephone call was answered in the past You can easily grasp what has been done. The customer name can also be identified by searching the customer database using the caller telephone number obtained from the communication carrier as a key.

しかしながら、上記の有害顧客は、その苦情内容が悪質であればあるほど、過去にも苦情を申し立てた有害顧客であると特定されることを避けようとする言動及び行動パターンを有する。このため、たとえオペレータが電話受付の際に名前を尋ねたとしても、真正な名前を名乗らないことが多いため、顧客データベースを検索することによって過去に特定された有害顧客と同定することができない。さらに、有害顧客は、通話にあたって、過去の有害顧客であると特定されることを避けるため、顧客データベースに登録された自宅電話や携帯電話以外の電話を使用したり、或いは発信者電話番号を非通知モードにして通話することも多い。このため、顧客データベースを参照することによって、或いは通信キャリアから得られる発信者電話番号によって顧客名を特定することもできない。 However, the above-mentioned harmful customer has a behavior and behavior pattern that tries to avoid being identified as a harmful customer who has filed a complaint in the past, as the content of the complaint is worse. For this reason, even if the operator asks for a name at the time of telephone reception, it is not possible to identify the harmful customer identified in the past by searching the customer database because the true name is not often given. In addition, harmful customers can use phones other than home phones and mobile phones registered in the customer database to avoid being identified as past harmful customers when making calls, or they can not use the caller's phone number. Calls are often made in notification mode. Therefore, the customer name can not be specified by referring to the customer database or by the caller telephone number obtained from the communication carrier.

本発明は、上記課題に鑑みてされたものであり、その目的は、顧客の電話と応対担当者の電話との間でなされた通話を録音蓄積し管理するＣＲＭシステムにおいて、円滑な電話応対業務を阻害するような有害顧客を、電話応対の初期段階でリアルタイムに検知し、警告することの可能な有害顧客検知システム、その方法及び有害顧客検知プログラムを提供する点にある。 The present invention has been made in view of the above problems, and an object thereof is to provide a smooth telephone answering service in a CRM system for recording, storing, and managing calls made between a customer's telephone and a person in charge of the person in charge of the call. To provide a harmful customer detection system, method and a harmful customer detection program capable of detecting and warning such harmful customers in real time at the initial stage of telephone service.

本発明の他の目的は、発信元の有害顧客自身に気付かれることなく、通話中にリアルタイムで、発話者が有害顧客であることを同定することを可能にする点にある。 Another object of the present invention is to make it possible to identify that a speaker is a harmful customer in real time during a call, without being aware of the originating harmful customer himself.

本発明の他の目的は、真正な顧客名を名乗らない有害顧客や、コールセンタにおいて既登録である電話以外の電話から通話を行なう有害顧客であっても、通話中にリアルタイムで、発話者が有害顧客であることを同定することを可能にする点にある。 Another object of the present invention is that even a harmful customer who does not give a true customer name or a harmful customer who makes a call from a telephone other than a telephone registered at a call center, the speaker may be harmful in real time during the call. The point is to make it possible to identify the customer.

コールセンタ業務の文脈における有害顧客は、コールセンター対する通話において、一般に、過去にコールセンタ宛に苦情電話を架けた履歴が存在することが多いが、他方、真正な顧客名を名乗らず、また登録された電話番号による特定が困難であると特徴付けられる。 Harmful customers in the context of call center operations generally have a history of having made complaint calls to the call center in the past when calling to the call center, but on the other hand, they do not name an authentic customer name, nor are registered phones It is characterized as difficult to identify by number.

なお、本明細書及び請求項において、「有害顧客」とは、典型的には事業者の製品の問題点を過剰に誇張して非難したり、或いは接客態度やサービスの不手際を過剰に追及する顧客であり、いわゆる「クレーマー」もこれに含まれるものであるが、これに限られず円滑な電話応対業務を阻害する可能性のある行動パターンをとり得るあらゆる利用者を広く意味するものであり、例えばいわゆる「不審者」等、有害顧客の候補となり得る者も含むものとする。 In the present specification and claims, the term “harmful customer” typically exaggerates and criticizes problems of the product of the business operator excessively, or excessively pursues the inconvenience of customer service and service. It is a customer, and so-called "cremers" are also included in this, but it is not limited to this, and broadly means any user who can take an action pattern that may hinder smooth telephone service operations. For example, so-called "suspicious persons" and the like can also be candidates for potential customers of harmful customers.

このような特徴を持つ話者を特定するために利用可能な技術として、事前に登録した話者の録音音声と、認証時に入力された音声との間で声紋を照合することにより、誰が発声している音声であるかを特定する声紋照合技術が公知である。しかしながら、この声紋照合は、識別対象となる発話者が予め登録されたＮ人の話者中の誰であるかを、Ｎ回の比較処理を行なって判断するものであるため、コンピュータ処理が不可避的に高負荷となり、相当数（一例として１００名以上であるがこれに限定されない）の有害顧客の声紋を登録した場合、入力された音声と全件の声紋との照合を、有害顧客に気付かれることなく、リアルタイムで実現することは著しく困難である。 As a technology that can be used to identify a speaker having such a feature, a person can utter a voiceprint by collating a voiceprint between the pre-registered speaker's recorded voice and the voice input at the time of authentication. A voiceprint matching technique is known which specifies whether a voice is voiced. However, since this voiceprint collation determines who among the N speakers who are registered in advance is the speaker to be identified by performing N comparison processes, computer processing is inevitable. In case the voiceprint of a considerable number (for example, 100 or more but not limited to) harmful customers is registered because of heavy load, is it likely that the harmful customers notice the matching of the input voice with the voiceprints of all cases? It is extremely difficult to realize in real time without being

本願発明においては、入力された音声の登録された音声との一致判定を、コールセンターにおける通話中にリアルタイムで実行するため、被登録者数の増加にも耐え得る複数段階（例えば二段階）での音声照合処理を実行する。発信元話者が有害顧客と思しき場合に、電話応対する担当者に注意喚起できれば足りるため、一致判定処理は実用的精度であればよい。また、入力された音声の発話テキストと、予め登録された音声の発話テキストとは、必ずしも一致しなくてもよい。 In the present invention, in order to execute the matching determination of the input voice with the registered voice in real time during a call at a call center, it is possible to withstand an increase in the number of registered persons in multiple stages (for example, two stages). Execute voice matching processing. Since it is sufficient for the person in charge of telephone calls to be alerted when the source speaker thinks that the customer is a harmful customer, the matching determination process may have practical accuracy. Moreover, the speech text of the input speech and the speech text of the speech registered in advance may not necessarily match.

本発明のある特徴によれば、通話中の話者が予め登録された全有害顧客のそれぞれに該当するか否かを実時間内で検知する、有害顧客検知システムであって、予め特定された被登録話者の録音音声信号を記憶装置から読み出してそれぞれの被登録話者を特徴付ける声紋情報を抽出する声紋情報抽出部と、全被登録話者相互の声紋情報の相対距離を算出し、該相対距離が第一の閾値内にある声紋情報を有する被登録話者をグループ化するグルーピング部と、各グループに属する他の被登録話者の声紋情報との相対距離の総和が最小となる声紋情報を有する被登録話者を当該グループ内の基準者に設定する基準者設定部と、抽出された声紋情報を、当該被登録話者の識別子、当該被登録話者の属するグループ識別子、基準者の情報と共に記憶する声紋データベースと、発信元話者の通話音声を取得して、該発信元話者の声紋情報を抽出する音声取得部と、前記声紋データベースを参照して、前記発信元話者の声紋情報と、前記声紋データベースに登録された各グループの基準者の声紋情報との間でのみ、相対距離を算出し、該相対距離が最も小さい声紋情報を有する基準者が属するグループを選択するグループ選択部と、前記発信元話者の声紋情報と、選択されたグループに属する、基準者以外の被登録話者の声紋情報との間の相対距離を算出し、この相対距離の算出を基準者以外の全被登録話者について繰り返し、選択されたグループ内で相対距離が最も小さい声紋情報を有し、かつ該相対距離が、前記第１の閾値より小さい第２の閾値内にある被登録話者を選択する被登録話者選択部と、選択された被登録話者の識別子を含む警告メッセージを、前記発信元話者と通話する着信先話者により、通話中に視認或いは音声認識可能に出力するメッセージ出力部とを具備することを特徴とする有害顧客検知システムが提供される。 According to one aspect of the present invention, there is provided a harmful customer detection system for detecting in real time whether or not a speaker in a call corresponds to each of all pre-registered harmful customers, wherein the harmful customer detection system is specified in advance. A voiceprint information extraction unit which reads out the recorded speech signal of the speaker to be registered from the storage device and extracts voiceprint information characterizing the speaker to be registered, and calculates a relative distance of voiceprint information of all registered speakers. A voiceprint having a minimum total sum of relative distances between voiceprint information of other registered speakers belonging to each group and a grouping unit that groups the registered speakers having voiceprint information whose relative distance is within the first threshold. A reference person setting unit for setting a registered speaker having information as a reference person in the group, extracted voiceprint information, an identifier of the registered speaker, a group identifier to which the registered speaker belongs, a reference person Store with the information of A fingerprint database, a voice acquisition unit for acquiring a call voice of a caller, and extracting voiceprint information of the caller, voiceprint information of the caller with reference to the voiceprint database, A group selection unit which calculates a relative distance only with voiceprint information of a reference person of each group registered in the voiceprint database, and selects a group to which a reference person having voiceprint information having the smallest relative distance belongs; The relative distance between the voiceprint information of the source speaker and the voiceprint information of the registered speaker other than the reference person belonging to the selected group is calculated, and the calculation of the relative distance is performed for all subjects other than the reference person. Select a registered speaker having voiceprint information having the smallest relative distance in the selected group, and the relative distance being within a second threshold smaller than the first threshold, repeatedly for the registered speaker Speaker selection And a message output unit for outputting a warning message including the identifier of the selected speaker to be visible or voice-recognizable during a call by the called party talker with the caller. A hazardous customer detection system is provided.

上記有害顧客検知システムは、さらに、前記グルーピング部により１のグループにグループ化された被登録話者の数が、所定の登録者数を超えた場合に、前記第１の閾値を減少させ、減少された第１の閾値内にある声紋情報を有する被登録話者を再度グループ化するグループ再構成部を具備してよい。 The harmful customer detection system further decreases the first threshold when the number of registered speakers grouped into one group by the grouping unit exceeds a predetermined number of registrants. And a group reconstruction unit for regrouping registered speakers having voiceprint information within a specified first threshold.

前記声紋データベースは、さらに、抽出された声紋情報に対応付けて、当該被登録話者の有害顧客の程度を示す有害顧客ランク情報を共に記憶し、前記メッセージ出力部は、選択された被登録話者の有害顧客ランク情報が所定値以下の場合に第１の警告メッセージを、前記有害顧客ランク情報が前記所定値より大きい場合に第２の警告メッセージを、それぞれ出力してよい。 The voiceprint database further stores together harmful customer rank information indicating the degree of harmful customers of the registered speaker in association with the extracted voiceprint information, and the message output unit selects the selected registered speech The first warning message may be output when the harmful customer rank information of the person is lower than a predetermined value, and the second warning message may be output when the harmful customer rank information is larger than the predetermined value.

上記有害顧客検知システムは、さらに、利用者の電話敷設履歴情報を、当該電話の電話番号と共に記憶する電話番号データベースと、前記電話番号データベースを参照して、発信元呼情報から得られる発信元電話番号の電話敷設履歴情報に基づいて発信元話者の与信チェックを実行し、該与信チェックの結果を与信チェックメッセージとして、通話開始前に、前記着信先話者により視認或いは音声認識可能に出力する第２のメッセージ出力部とを具備してよい。 The above-mentioned harmful customer detection system further refers to a telephone number database storing the telephone installation history information of the user together with the telephone number of the telephone, and a caller telephone obtained from the caller call information with reference to the telephone number database. Performing a credit check of the caller based on the telephone laying history information of the number, and outputting the result of the credit check as a credit check message so that the called party can visually recognize or recognize the voice before the call starts And a second message output unit.

上記有害顧客検知システムは、さらに、前記入力された通話音声の揺らぎ発生を検出してメッセージ出力部に通知する感情解析部を具備してよい。 The harmful customer detection system may further include an emotion analysis unit that detects occurrence of fluctuation of the input call voice and notifies a message output unit.

本発明の他の特徴によれば、通話中の話者が予め登録された全有害顧客のそれぞれに該当するか否かを実時間内で検知する、声紋情報抽出部と、グルーピング部と、基準者設定部と、声紋データベースと、音声取得部と、グループ選択部と、被登録話話者選択部と、メッセージ出力部とを具備する有害顧客検知システムが実行する有害顧客検知方法であって、予め特定された被登録話者の録音音声信号を記憶装置から読み出してそれぞれの被登録話者を特徴付ける声紋情報を抽出する声紋情報抽出ステップと、全被登録話者相互の声紋情報の相対距離を算出し、該相対距離が第一の閾値内にある声紋情報を有する被登録話者をグループ化するグルーピングステップと、各グループに属する他の被登録話者の声紋情報との相対距離の総和が最小となる声紋情報を有する被登録話者を当該グループ内の基準者に設定する基準者設定ステップと、抽出された声紋情報を、当該被登録話者の識別子、当該被登録話者の属するグループ識別子、基準者の情報と共に声紋データベースに記憶する声紋情報記憶ステップと、発信元話者の通話音声を取得して、該発信元話者の声紋情報を抽出する音声取得ステップと、前記声紋データベースを参照して、前記発信元話者の声紋情報と、前記声紋データベースに登録された各グループの基準者の声紋情報との間でのみ、相対距離を算出し、該相対距離が最も小さい声紋情報を有する基準者が属するグループを選択するグループ選択ステップと、前記発信元話者の声紋情報と、選択されたグループに属する、基準者以外の被登録話者の声紋情報との間の相対距離を算出し、この相対距離の算出を基準者以外の全被登録話者について繰り返し、選択されたグループ内で相対距離が最も小さい声紋情報を有し、かつ該相対距離が、前記第１の閾値より小さい第２の閾値内にある被登録話者を選択する被登録話者選択ステップと、選択された被登録話者の識別子を含む警告メッセージを、前記発信元話者と通話する着信先話者により、通話中に視認或いは音声認識可能に出力するメッセージ出力ステップとを含むことを特徴とする有害顧客検知方法が提供される。 According to another feature of the present invention, a voiceprint information extraction unit, a grouping unit, and a criterion for detecting in real time whether a speaker in a call corresponds to each of all harmful customers registered in advance. A harmful customer detection method executed by a harmful customer detection system including a person setting unit, a voiceprint database, a voice acquisition unit, a group selection unit, a registered speaker selection unit, and a message output unit, A voiceprint information extraction step of reading out from the storage device the recorded voice signal of the registered speaker identified in advance and extracting voiceprint information characterizing the respective registered speaker, and a relative distance between voiceprint information of all registered speakers The sum total of relative distances between the grouping step of calculating and grouping the speakers to be registered having voiceprint information whose relative distance is within the first threshold, and the voiceprint information of the other registered speakers belonging to each group minimum And a reference person setting step of setting a registered speaker having voiceprint information as a reference person in the group, an extracted voiceprint information, an identifier of the registered speaker, a group identifier to which the registered speaker belongs, A voiceprint information storing step of storing the voiceprint database together with the information of the reference person; a voice obtaining step of obtaining a call voice of the caller; and extracting voiceprint information of the caller; and referring to the voiceprint database. The relative distance is calculated only between the voiceprint information of the originating speaker and the voiceprint information of the reference person of each group registered in the voiceprint database, and a criterion having voiceprint information with the smallest relative distance. Between the group selection step of selecting the group to which the subject belongs, the voiceprint information of the originating speaker, and the voiceprint information of the registered speaker other than the reference subject belonging to the selected group A relative distance is calculated, and the calculation of the relative distance is repeated for all registered speakers other than the reference person, and voiceprint information having the smallest relative distance in the selected group is included, and the relative distance is the first one. Selecting a speaker-to-be-selected for selecting a speaker-to-be-registered within a second threshold smaller than the threshold and a warning message including an identifier of the speaker-to-be-selected selected with the source speaker A method for detecting a harmful customer is provided, which includes a message output step of outputting a visual recognition or voice recognition during a call by a pre-speaker.

本発明の他の特徴によれば、通話中の話者が予め登録された全有害顧客のそれぞれに該当するか否かを実時間内で検知する有害顧客検知処理をコンピュータに実行させるための有害顧客検知プログラムであって、該プログラムは、前記コンピュータに、予め特定された被登録話者の録音音声信号を記憶装置から読み出してそれぞれの被登録話者を特徴付ける声紋情報を抽出する声紋情報抽出処理と、全被登録話者相互の声紋情報の相対距離を算出し、該相対距離が第一の閾値内にある声紋情報を有する被登録話者をグループ化するグルーピング処理と、各グループに属する他の被登録話者の声紋情報との相対距離の総和が最小となる声紋情報を有する被登録話者を当該グループ内の基準者に設定する基準者設定処理と、抽出された声紋情報を、当該被登録話者の識別子、当該被登録話者の属するグループ識別子、基準者の情報と共に声紋データベースに記憶する声紋情報記憶処理と、発信元話者の通話音声を取得して、該発信元話者の声紋情報を抽出する音声取得処理と、前記声紋データベースを参照して、前記発信元話者の声紋情報と、前記声紋データベースに登録された各グループの基準者の声紋情報との間でのみ、相対距離を算出し、該相対距離が最も小さい声紋情報を有する基準者が属するグループを選択するグループ選択処理と、前記発信元話者の声紋情報と、選択されたグループに属する、基準者以外の被登録話者の声紋情報との間の相対距離を算出し、この相対距離の算出を基準者以外の全被登録話者について繰り返し、選択されたグループ内で相対距離が最も小さい声紋情報を有し、かつ該相対距離が、前記第１の閾値より小さい第２の閾値内にある被登録話者を選択する被登録話者選択処理と、選択された被登録話者の識別子を含む警告メッセージを、前記発信元話者と通話する着信先話者により、通話中に視認或いは音声認識可能に出力するメッセージ出力処理とを含む処理をコンピュータに実行させるためのものであることを特徴とする有害顧客検知プログラムが提供される。 According to another feature of the present invention, the harmful for causing the computer to execute a harmful customer detection process which detects in real time whether or not a speaker in a call corresponds to each of all the harmful customers registered in advance. A customer detection program, wherein the program reads out from a storage device a recorded voice signal of a registered speaker specified in advance in the computer, and extracts voiceprint information extracting the voiceprint information characterizing the respective registered speaker And grouping processing for calculating the relative distance of voiceprint information between all registered speakers and grouping the registered speakers having voiceprint information whose relative distance is within the first threshold, and belonging to each group A reference person setting process for setting a registered speaker having voiceprint information having a minimum total sum of relative distances with voiceprint information of the registered speaker as a reference person in the group, and extracted voiceprint information The voiceprint information storage process stored in the voiceprint database together with the identifier of the registered speaker, the group identifier to which the registered speaker belongs, and the information of the reference person, and the call voice of the calling speaker is acquired, Acquisition processing for extracting the voiceprint information of the person, and referring to the voiceprint database, only between the voiceprint information of the originating speaker and the voiceprint information of the reference person of each group registered in the voiceprint database Group selection processing for calculating a relative distance and selecting a group to which a reference person having voiceprint information having the smallest relative distance belongs, voiceprint information of the source speaker, and a selected person belonging to the selected group Calculating the relative distance to the voiceprint information of the registered speaker, and repeating the calculation of the relative distance for all registered speakers other than the reference person, and the relative distance is the smallest within the selected group Registered speaker selection processing for selecting a registered speaker having voiceprint information and whose relative distance is within a second threshold smaller than the first threshold, and an identifier of the selected registered speaker And a message output process for outputting a warning message including the message by the called party who talks with the calling party during the call so as to allow the computer to execute a process including a message output process. A characterizing harmful customer detection program is provided.

本発明によれば、声紋分析サーバは、予め登録され、グルーピングされた複数の有害顧客の音声モデルの中から、各グループの基準話者と、電話受け付けされた入力音声との照合のみを実行し、最も類似する音声と判定された基準話者が属するグループに属する音声モデルと入力音声との照合を実行し、入力音声との類似度が所定の閾値内にある音声モデルを特定し、この音声モデルに対応する有害顧客を同定して、電話応対を行なう対応者に通話中に通知する。 According to the present invention, the voiceprint analysis server executes only matching between the reference speaker of each group and the input voice accepted by telephone from among the plurality of harmful customers' voice models registered and grouped in advance. The input speech is compared with the speech model belonging to the group to which the reference speaker determined as the most similar speech belongs, and the speech model having the similarity to the input speech within a predetermined threshold value is identified, and this speech is identified. Identify the offending customers that correspond to the model and notify the correspondent who answers the call during the call.

これにより、顧客の電話と応対担当者の電話との間でなされた通話を録音蓄積し管理するＣＲＭシステムにおいて、円滑な電話応対業務を阻害するような有害顧客を、電話応対の初期段階でリアルタイムに検知し、警告することができる。 As a result, in the CRM system that records, manages, and manages calls made between the customer's phone and the person in charge of the handling staff, harmful customers who impede smooth phone handling operations can be real-timed at the initial stage of phone handling. Can be detected and warned.

また、発信元の有害顧客自身に気付かれることなく、通話中にリアルタイムで、発話者が有害顧客であることを同定することができる。 Also, it is possible to identify that the utterer is a harmful customer in real time during a call, without being aware of the harmful customer of the originator.

さらに、真正な顧客名を名乗らない有害顧客や、コールセンタにおいて既登録である電話以外の電話から通話を行なう有害顧客であっても、通話中にリアルタイムで、発話者が有害顧客であることを同定することができる。 Furthermore, even in the case of a harmful customer who does not give a true customer name or a harmful customer who makes a call from a phone other than a telephone registered at the call center, the speaker identifies the harmful customer in real time during the call. can do.

従って、本発明に係る有害顧客検知システム、その方法及び有害顧客検知プログラムによれば、コールセンタ業務において、追加的設備を要することなく、有害顧客を電話応対の初期段階で自動的に把握することができ、電話応対業務の効率化が図られるため、事業者のＣＲＭ向上に資する。 Therefore, according to the harmful customer detection system, the method thereof and the harmful customer detection program according to the present invention, in the call center operation, it is possible to automatically grasp harmful customers at the initial stage of telephone service without requiring additional facilities. This will contribute to the improvement of the CRM of business operators, because it will improve the efficiency of telephone handling operations.

以下、添付図面を参照しながら、本発明の好適な実施形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能及び構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functions and configurations will be denoted by the same reference numerals and redundant description will be omitted.

＜本実施形態の構成＞
図１は、本発明の実施形態に係る有害顧客検知システムのネットワーク構成の非限定的一例を示す。有害顧客検知システムは、ＰＢＸ（交換機）１、音声取得サーバ２、声紋分析サーバ３、電話番号分析サーバ４、感情解析サーバ５、制御サーバ６、顧客電話端末７、ＰＳＴＮ（公衆電話網）８、オペレータ電話端末９ａ、オペレータＰＣ端末9ｂ、通話録音サーバ１０を具備する。有害顧客検知システム中、ＰＢＸ（交換機）１、音声取得サーバ２、声紋分析サーバ３、電話番号分析サーバ４、感情解析サーバ５、制御サーバ６、オペレータ電話端末９ａ、オペレータＰＣ端末9ｂ、通話録音サーバ１０の全部或いは一部は、コールセンタ内に設置され、ＬＡＮ／ＷＡＮ等のイントラネット１１ｄ等のＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）網により相互接続されてよい。或いは代替的に、音声取得サーバ２、声紋分析サーバ３、電話番号分析サーバ４、感情解析サーバ５、制御サーバ６、通話録音サーバ１０、及びこれらサーバが備える通話録音データベース１０１、声紋データベース３１、電話番号データベース４１等の全部或いは一部は、インターネット等の遠隔ＩＰ接続を介して適宜コールセンタ外部に設置されてもよい。 <Configuration of this embodiment>
FIG. 1 shows one non-limiting example of a network configuration of a harmful customer detection system according to an embodiment of the present invention. The harmful customer detection system includes a PBX (switchboard) 1, a voice acquisition server 2, a voice print analysis server 3, a telephone number analysis server 4, an emotion analysis server 5, a control server 6, a customer telephone terminal 7, a PSTN (public telephone network) 8, The operator telephone terminal 9a, the operator PC terminal 9b, and the call recording server 10 are provided. Among the harmful customer detection system, PBX (switchboard) 1, voice acquisition server 2, voice print analysis server 3, telephone number analysis server 4, emotion analysis server 5, control server 6, operator telephone terminal 9a, operator PC terminal 9b, call recording server All or a part of 10 may be installed in a call center and interconnected by an IP (Internet Protocol) network such as an intranet 11 d such as a LAN / WAN. Alternatively, alternatively, the voice acquisition server 2, the voice print analysis server 3, the telephone number analysis server 4, the emotion analysis server 5, the control server 6, the call recording server 10, and the call recording database 101, the voice print database 31, these telephones All or part of the number database 41 or the like may be installed outside the call center as appropriate via a remote IP connection such as the Internet.

ＰＢＸ１は、コールセンタ内の内線電話同士を接続すると共に、各オペレータ電話端末９ａを、構内回線１１ａ、１１ｂ、１１ｃ・・・を介してＰＳＴＮ（公衆電話網）８に回線交換接続して、各オペレータ電話端末９ａと顧客電話端末７との通話を実現する。 The PBX 1 connects the extension telephones in the call center and connects each operator telephone terminal 9a to the PSTN (public telephone network) 8 via the private lines 11a, 11b, 11c,... A call between the telephone terminal 9a and the customer telephone terminal 7 is realized.

音声取得サーバ２は、ＰＢＸ１に分岐接続され、各オペレータ電話端末９ａと顧客電話端末７との通話音声を取得すると共に、取得された音声をオペレータ電話端末９ａの番号（例えば内線番号）と対応付けて各サーバに供給する。 The voice acquisition server 2 is branch-connected to the PBX 1 and acquires call voices between each operator telephone terminal 9a and the customer telephone terminal 7, and associates the acquired voice with the number (for example, extension number) of the operator telephone terminal 9a. Supply to each server.

代替的に、この音声取得サーバ２は、ＰＳＴＮ８の終端装置（ＤＳＵ）とＰＢＸ１との間の回線に分岐接続されてもよい。 Alternatively, this voice acquisition server 2 may be branch-connected to the circuit between the termination unit (DSU) of the PSTN 8 and the PBX 1.

声紋分析サーバ３は、音声取得サーバ２から供給される取得音声を、声紋データベース３１内に予め登録された有害顧客の音声モデル（声紋パターン）と比較し、この比較結果をオペレータ電話端末９ａの内線番号と共に制御サーバ６に供給する。 The voiceprint analysis server 3 compares the acquired voice supplied from the voice acquisition server 2 with the voice model (voiceprint pattern) of the harmful customer registered in advance in the voiceprint database 31, and the comparison result is an extension of the operator's phone terminal 9a. It supplies the control server 6 with the number.

電話番号分析サーバ４は、ＰＢＸ１から供給される発信元の顧客電話端末７の電話番号をキーとして、予め顧客電話番号ごとに与信情報を登録する電話番号データベース４１を検索して得られる与信チェックの結果をオペレータ電話端末９ａの内線番号と共に制御サーバ６に供給する。 The telephone number analysis server 4 uses the telephone number of the customer telephone terminal 7 of the transmission source supplied from the PBX 1 as a key, and in advance performs a credit check obtained by searching the telephone number database 41 for registering credit information for each customer telephone number. The result is supplied to the control server 6 together with the extension number of the operator's telephone terminal 9a.

感情解析サーバ５は、音声取得サーバ２から供給される取得音声の声の揺らぎや感情の高ぶり、落ち着きのなさ等を検出することにより得られる感情解析結果をオペレータ電話端末９ａの内線番号と共に制御サーバ６に供給する。 The emotion analysis server 5 controls the emotion analysis result obtained by detecting fluctuation in voice of the acquired voice supplied from the voice acquisition server 2, the height of emotion, restlessness, etc. along with the extension number of the operator's telephone terminal 9a. Supply to 6.

制御サーバ６は、声紋分析サーバ３から供給される比較結果、電話番号分析サーバ４から供給される与信チェック結果、及び感情解析サーバ５から供給される感情解析結果の全部又は一部に基づいて、取得された音声が有害顧客か否かを判定してこの判定結果を、リアルタイムにオペレータＰＣ端末装置９ｂ上に警告表示する。好適には、この分析結果は有害顧客の程度を示すランク情報と共に警告表示されてよい。また好適には、オペレータＰＣ端末装置９ｂは、ブラウザ機能を有し、通話録音データベース１０１に蓄積された録音音声データ、及び電話番号データベース４１ないし声紋データベース３１に記憶された利用者情報ないし有害顧客情報を適宜検索及び表示させることができる。代替的に、或いはこれに加えて、制御サーバ６は、オペレータ電話端末９ａに、通話中の発信者が有害顧客に該当した場合に、発信者である顧客側には聴取されない警告音をトーン等により割り込み音声出力してもよい。 The control server 6 is based on the comparison result supplied from the voice print analysis server 3, the credit check result supplied from the telephone number analysis server 4, and all or part of the emotion analysis result supplied from the emotion analysis server 5. It is determined whether the acquired voice is a harmful customer or not, and the determination result is displayed as a warning on the operator PC terminal 9b in real time. Preferably, the analysis results may be alerted along with rank information indicating the extent of harmful customers. Preferably, the operator PC terminal 9b has a browser function, and the recorded voice data stored in the call recording database 101, and user information or harmful customer information stored in the telephone number database 41 to the voiceprint database 31. Can be searched and displayed as appropriate. Alternatively, or in addition to this, the control server 6 tones the operator's telephone terminal 9a with a warning sound not heard by the customer who is the caller when the caller during the call corresponds to a harmful customer, etc. Voice output may be interrupted.

通話録音サーバ１０は、制御サーバ６の制御の下、着呼後の音声取得サーバ２から供給される取得音声を、例えばＮＡＳ（ＮｅｔｗｏｒｋＡｐｐｌｉａｎｃｅＳｔｏｒａｇｅ）等の大規模外部記憶装置により構成される通話録音データベース１０１に蓄積保存する。 Under control of the control server 6, the call recording server 10 is a call recording that includes acquired voice supplied from the voice acquisition server 2 after an incoming call, for example, by a large scale external storage device such as NAS (Network Appliance Storage). It is accumulated and saved in the database 101.

なお、図１におけるＰＢＸ１は、ＰＳＴＮ１等の公衆電話交換回線網を介して顧客通話端末４に接続されているが、これに替えて、或いはこれに加えて、ＩＰ網接続機能を備えることにより、ＶｏＩＰ（ＶｏｉｃｅＯｖｅｒＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）ネットワーク等の音声パケット通信ネットワークを介して、ＩＰ電話機能を備える顧客ＩＰ通話端末に接続されてよく、この場合、音声取得サーバ２は、顧客ＩＰ通話端末及びオペレータ電話端末９ａ間の音声通話を取得することができる。顧客電話端末７は、固定電話機或いは携帯電話機のいずれであってもよい。 Although the PBX 1 in FIG. 1 is connected to the customer call terminal 4 via a public switched telephone network such as the PSTN 1 or the like, instead of this or in addition to this, by providing an IP network connection function, The voice acquisition server 2 may be connected to a customer IP call terminal having an IP telephone function via a voice packet communication network such as a Voice Over Internet Protocol (VoIP) network, and in this case, the voice acquisition server 2 comprises a customer IP call terminal and an operator phone terminal. A voice call between 9a can be obtained. The customer telephone terminal 7 may be either a fixed telephone or a mobile telephone.

なお、請求項における声紋情報抽出部、グルーピング部、基準者設定部は、声紋分析サーバ３ないし制御サーバ６に、請求項における音声取得部は、音声取得サーバ２ないし制御サーバ６に、請求項におけるグループ選択部及び被登録話者選択部は、声紋分析サーバ３に、請求項におけるメッセージ出力部は、制御サーバ６ないしオペレータ端末装置９ｂに、それぞれ相当する。 The voiceprint information extraction unit, grouping unit, and reference person setting unit in the claims refer to the voiceprint analysis server 3 to the control server 6, and the voice acquisition unit in the claims refers to the voice acquisition server 2 to the control server 6, in the claims. The group selection unit and the registered speaker selection unit correspond to the voiceprint analysis server 3, and the message output unit in the claims corresponds to the control server 6 or the operator terminal device 9b.

また、図１に示すネットワーク及びハードウエアの構成は一例に過ぎず、各サーバ及びデータベースを必要に応じて一体としてもよく、各コンポーネントをＡＳＰ（ＡｐｐｌｉｃａｔｉｏｎＳｅｒｖｉｃｅＰｒｏｖｉｄｅｒ）等の外部に設置してもよい。 Further, the configuration of the network and hardware shown in FIG. 1 is merely an example, and each server and database may be integrated as necessary, or each component may be installed outside an application service provider (ASP) or the like. .

＜本実施形態における有害顧客検知処理概要＞
図２は、制御サーバ６による制御の下実行される、第１の実施形態に係る有害顧客検知システムにおける、顧客電話端末７からコールセンタのオペレータ電話端末９ａへの着呼から、通話中の顧客が有害顧客であることをオペレータ端末９ｂないし電話端末９ａに通知して振り分け処理を行なうまでの処理シーケンスの非限定的一例を示す。 <Overview of Hazardous Customer Detection Processing in this Embodiment>
FIG. 2 shows a customer who is in a call from an incoming call from the customer telephone terminal 7 to the operator telephone terminal 9a of the call center in the harmful customer detection system according to the first embodiment, which is executed under the control of the control server 6. A non-limiting example of a processing sequence for notifying the operator terminal 9b to the telephone terminal 9a that it is a harmful customer and performing distribution processing is shown.

図２において、まず顧客電話端末７からＰＢＸ１に着呼する（ステップＳ１）。ＰＢＸ１は、着呼により、呼情報を取得する。取得される呼情報とは、例えば、着信開始情報（着信開始タイムスタンプを含む）、発信開始情報（発信開始タイムスタンプを含む）、通話開始情報（通話開始タイムスタンプを含む）、通話終了情報（通話終了タイムスタンプを含む）等の呼制御情報と、発信元電話番号、発信先電話番号、発信元チャネル番号、発信者番号、着信チャネル番号、着信電話番号（着信先内線番号等）等の呼識別情報とを含み、好適には、ＣＴＩ（ＣｏｍｐｕｔｅｒＴｅｌｅｐｈｏｎｙＩｎｔｅｇｒａｔｉｏｎ）プロトコルを実装した制御サーバ６上ないしオペレータＰＣ端末装置９ｂ上で稼動するＣＴＩプログラムと連動して、これらの表示装置上に呼情報をリアルタイムに表示してよい。ＰＢＸ１は、取得される呼情報の全部又は一部、少なくとも発信元の顧客電話端末７の番号と、オフフックしたオペレータ電話端末９ａの内線番号とを対応付けて、この発信元電話番号とオペレータ内線番号とを電話番号分析サーバ４に送出する（ステップＳ２）。好適には、電話番号分析サーバ４は、さらに、呼情報から、発信元顧客電話端末が発信者番号を非通知としたか否かを示すフラグを取得する。 In FIG. 2, the customer telephone terminal 7 first receives a call from the customer telephone terminal 7 to the PBX 1 (step S1). The PBX 1 obtains call information by an incoming call. The acquired call information includes, for example, incoming call start information (including incoming call start time stamp), outgoing call start information (including outgoing call start time stamp), call start information (including call start time stamp), call end information ( Call control information such as call end time stamp), source phone number, destination phone number, source channel number, sender number, incoming channel number, incoming phone number (destination extension number etc) And identification information, preferably in conjunction with a CTI program operating on the control server 6 implementing the Computer Telephony Integration (CTI) protocol or on the operator PC terminal device 9b, on these display devices. May be displayed in real time. The PBX 1 associates all or part of the acquired call information, at least the number of the customer telephone terminal 7 of the transmission source, and the extension number of the operator telephone terminal 9a which has been off hook, and this transmission source telephone number and the operator extension number And the telephone number analysis server 4 (step S2). Preferably, the telephone number analysis server 4 further obtains, from the call information, a flag indicating whether or not the calling customer's telephone terminal has not notified the calling party number.

図３は、電話番号分析サーバ４により参照される電話番号データベース４１の構成の一例を示す。図３において、電話番号データベース４１は、利用者データベース４１１と電話敷設データベース４１２を含む。利用者データベース４１１は、例えば表形式で構成され、一例として、各利用者毎に、利用者（顧客）を一意に識別するための利用者ＩＤ、利用者名、電話敷設データベース４１２中の電話番号へのリンク、電話番号変更履歴の情報、電話料金滞納履歴の情報を含む。電話敷設データベース４１２は、例えば表形式で構成され、一例として、各電話番号毎に、電話番号、電話敷設年月日の情報を含む。代替的に、利用者データベース４１１と電話敷設データベース４１２とは、例えば電話番号をキーとして一体に構成されてもよい。 FIG. 3 shows an example of the configuration of the telephone number database 41 referred to by the telephone number analysis server 4. In FIG. 3, the telephone number database 41 includes a user database 411 and a telephone installation database 412. The user database 411 is configured, for example, in a table format, and as an example, for each user, a user ID for uniquely identifying a user (customer), a user name, a telephone number in the telephone installation database 412 Link to, telephone number change history information, telephone charge delinquent history information. The telephone installation database 412 is configured, for example, in a table format, and includes, for example, information of a telephone number and a telephone installation date for each telephone number. Alternatively, the user database 411 and the telephone installation database 412 may be integrally configured using, for example, a telephone number as a key.

図２に戻り、電話番号分析サーバ４は、取得された発信元電話番号をキーとして電話番号サーバ４１を検索することにより、発信元電話番号に基づく電話番号分析処理、すなわち簡易的な与信チェック処理を行ない（ステップＳ３）、この与信チェック結果を、制御サーバ６を介して、取得されたオペレータ内線番号に対応するオペレータＰＣ端末装置９ｂに出力する（ステップＳ４）。 Returning to FIG. 2, the telephone number analysis server 4 searches the telephone number server 41 using the acquired source telephone number as a key to perform telephone number analysis processing based on the source telephone number, that is, a simple credit check processing. (Step S3), and the credit check result is output to the operator PC terminal 9b corresponding to the acquired operator extension number via the control server 6 (step S4).

一例として、電話番号分析サーバ４は、発信元電話番号が通知されたか否かを示すフラグを参照し、かつ電話敷設年月日を参照して、発信者電話番号が通知され、かつ発信元電話が敷設されて所定期間経過したものであれば（すなわち固定電話ないし携帯電話の契約期間が長期間経過していれば）、与信チェック結果として、異常なしと判断する。他方、発信者番号が通知されたが発信元電話が敷設されて間もないものであった場合（すなわち固定電話ないし携帯電話の契約期間が短期間であれば）、或いは発信者番号が非通知であった場合には、与信チェック結果として、警告ありと判断する。 As an example, the telephone number analysis server 4 refers to a flag indicating whether or not the caller telephone number has been notified, and refers to the telephone installation date to be notified of the caller telephone number, and the caller telephone If the predetermined period of time has been laid (that is, if the contract period of the fixed telephone or the mobile telephone has been long), it is determined that there is no abnormality as the credit check result. On the other hand, if the caller ID is notified but the originator's phone is installed soon (ie, the contract term of the fixed telephone or the mobile phone is short), or the caller ID is not notified. If it is, it is judged that there is a warning as a credit check result.

好適には、電話番号分析サーバ４は、警告ありと判断した場合には、当該発信元話者を不審者の可能性ある者として、その録音音声の声紋情報を抽出して声紋データベース３１に登録するよう、制御サーバ６に指示してもよい。また、好適には、複数の警告レベルを予め設定し、例えば発信者番号が非通知であった場合には、発信者番号が通知されたが発信元電話が敷設されて間もないものであった場合よりも警告レベルを上げてもよい。顧客とオペレータとの通話が開始される前に、オペレータＰＣ端末装置９ｂに有害顧客可能性ありとの警告を表示することができ、これにより、通話開始前にオペレータに注意喚起することができる（ステップＳ５）。 Preferably, when it is determined that the warning is present, the telephone number analysis server 4 extracts the voiceprint information of the recorded voice as the potential person of the suspicious person and registers the voiceprint information in the voiceprint database 31. The control server 6 may be instructed to do this. In addition, preferably, a plurality of warning levels are set in advance, for example, when the caller number is not notified, the caller number is notified but the caller telephone has been installed for a while. You may raise the warning level more than the case. Before the call between the customer and the operator is started, the operator PC terminal device 9b can display a warning about the possibility of a harmful customer, which can alert the operator before the call starts ( Step S5).

次に、顧客電話端末７とオペレータ電話端末９ａとの間で音声通話が開始され（ステップＳ６、Ｓ７）、この音声が音声取得サーバ２により取得され、声紋分析サーバ３に供給される。 Next, a voice call is started between the customer phone terminal 7 and the operator phone terminal 9a (steps S6 and S7), and the voice is acquired by the voice acquisition server 2 and supplied to the voiceprint analysis server 3.

図４は、声紋分析サーバ３により参照される声紋データベース３１の構成の一例を示す。図４において、声紋データベース３１は、有害顧客データベース３１１と声紋グループデータ３１２を含む。有害顧客データベース３１１は、例えば表形式で構成され、一例として、予め有害顧客と特定された各顧客毎に、有害顧客を一意に識別するための有害顧客ＩＤ、有害顧客個人情報、有害顧客ランク、声紋グループデータへのリンク、当該有害顧客が属する声紋情報のグループＩＤの情報を含む。声紋グループデータ３１２は、一例として、予め類似度の高い声紋データをグルーピングして、各声紋グループ毎に構成され、当該声紋グループに属する有害顧客の声紋データを含み、この声紋データにはそれぞれ、個人情報へのリンクと、当該声紋データを有する有害顧客が当該グループ内の基準者であるか否かを占めす情報が対応付けられている。 FIG. 4 shows an example of the configuration of the voiceprint database 31 referred to by the voiceprint analysis server 3. In FIG. 4, the voiceprint database 31 includes a harmful customer database 311 and voiceprint group data 312. The harmful customer database 311 is configured, for example, in tabular form, and for example, for each customer identified beforehand as a harmful customer, a harmful customer ID for uniquely identifying the harmful customer, harmful customer personal information, harmful customer rank, A link to voiceprint group data and information of a group ID of voiceprint information to which the harmful customer belongs are included. The voiceprint group data 312 is formed by grouping voiceprint data having a high degree of similarity in advance, and is configured for each voiceprint group, includes voiceprint data of harmful customers belonging to the voiceprint group, and each voiceprint data is an individual A link to the information is associated with information that accounts for whether the harmful customer having the voiceprint data is a reference person in the group.

図２に戻り、声紋分析サーバ３は、通話により取得された音声を所定時間分（例えば３秒間以上）切り出して、この切り出された取得音声の声紋データ（後述する声紋パターン）と、声紋データベース３１に事前に登録された有害顧客の声紋データとを照合することにより、取得音声の話者が登録された有害顧客に該当するか否かを判定し（ステップＳ８）、この声紋分析結果を、制御サーバを介して、取得されたオペレータ内線番号に対応するオペレータＰＣ端末９ｂに出力する（ステップＳ９）。好適には、出力される声紋分析結果には、検出された有害顧客名、有害である程度を示す有害顧客ランクを含み、また表示された有害顧客名等をオペレータが選択すると、当該有害顧客の過去の通話履歴や電話対応履歴が続けてオペレータＰＣ端末９ｂに表示されるよう構成されてよい。代替的に、入力音声とその類似度が一定の閾値内にある複数の有害顧客名を、該当有害顧客候補者リストとして表示させてもよい。この声紋分析処理の詳細は後述される。 Returning to FIG. 2, the voiceprint analysis server 3 cuts out a voice acquired by telephone call for a predetermined time (for example, 3 seconds or more), and voiceprint data (voiceprint pattern to be described later) of this clipped acquired voice The voiceprint data of the harmful customer registered in advance is compared with it to determine whether the speaker of the acquired voice corresponds to the registered harmful customer (step S8), and this voiceprint analysis result is controlled It outputs to the operator PC terminal 9b corresponding to the acquired operator extension number via the server (step S9). Preferably, the output voiceprint analysis result includes the detected harmful customer name, the harmful customer rank indicating the degree of harmfulness, and when the operator selects the displayed harmful customer name etc., the past of the harmful customer The call history and the telephone correspondence history may be displayed on the operator PC terminal 9b continuously. Alternatively, the input voice and a plurality of harmful customer names whose similarity is within a certain threshold may be displayed as the corresponding harmful customer candidate list. Details of this voiceprint analysis process will be described later.

次に、感情解析サーバ５は、音声取得サーバ２から供給される取得音声の声の揺らぎや感情の高ぶり、落ち着きのなさ等を検出することにより得られる感情解析結果を、制御サーバ６を介して、取得されたオペレータ内線番号に対応するオペレータＰＣ端末９ｂに出力する（ステップＳ１０、ステップＳ１１）。入力音声の感情分析結果を、オペレータＰＣ端末９ｂに表示することによって、過去にクレームを申し立てた履歴のない顧客についても、電話対応の初期段階で潜在的な有害顧客として識別することができる。 Next, the emotion analysis server 5 detects, through the control server 6, an emotion analysis result obtained by detecting fluctuation in voice of acquired voice supplied from the voice acquisition server 2, height of emotion, restlessness and the like. Then, the information is output to the operator PC terminal 9b corresponding to the acquired operator extension number (steps S10 and S11). By displaying the emotion analysis result of the input voice on the operator PC terminal 9b, it is possible to identify as a potential harmful customer at the initial stage of telephone correspondence even a customer who has no history of claiming a complaint in the past.

電話番号分析サーバ４からの電話番号分析結果、声紋分析サーバ３からの声紋分析結果、及び感情分析サーバ５からの感情分析結果は、結果が得られた都度、制御サーバ６を介して、オペレータＰＣ端末装置９ｂに出力されてもよく、代替的に、或いはこれに加えて、制御サーバ６において、これら複数の分析結果に基づき総合判断結果を合成し、例えば現在通話中の話者が有害顧客であると判定された場合には、通話中の顧客が有害顧客に該当することを注意喚起するメッセージと共に最適な対処方法をポップアップメッセージ等でオペレータＰＣ端末９ｂ上に表示してもよい。好適には、この注意喚起メッセージと共に、或いはこのメッセージ出力に呼応したオペレータの表示要求に応答して、オペレータＰＣ端末９ｂ上に、電話番号データベース４１及び声紋データベース３１中に記憶された顧客情報を参照し、例えば、顧客名、住所、登録電話番号、現在の発信元電話番号、声紋チェックの結果、顧客属性、商品購入履歴等が表示されてもよい。 The telephone number analysis result from the telephone number analysis server 4, the voice print analysis result from the voice print analysis server 3, and the emotion analysis result from the emotion analysis server 5 are transmitted to the operator PC via the control server 6 each time a result is obtained. Alternatively, or in addition to this, the control server 6 synthesizes the integrated judgment result on the basis of the plurality of analysis results, for example, the speaker currently in a call is a harmful customer If it is determined that there is a problem, it may be displayed on the operator PC terminal 9b as a pop-up message or the like along with a message notifying that the customer in a call corresponds to a harmful customer. Preferably, the customer information stored in the telephone number database 41 and the voiceprint database 31 is referred to on the operator PC terminal 9b together with the alerting message or in response to the operator's display request in response to the message output. For example, a customer name, an address, a registered telephone number, a current caller telephone number, a result of a voiceprint check, a customer attribute, a product purchase history, and the like may be displayed.

好適には、例えば、類似する登録有害顧客名が得られなかった場合には、通常どおりの電話対応を指示し、或いは何もメッセージを表示せず、類似する登録有害顧客名が得られた場合には、得られた有害顧客ランクに応じて、有害顧客ランクが低ければ通話中のオペレータに有害顧客向け電話応対マニュアルどおりの対応を指示し、有害顧客ランクが高ければ特に慎重な対応を要するとして通話中のオペレータのスーパーバイザーや熟練オペレータ等に呼を転送してもよく、代替的にこれらスーパーバイザーのＰＣ端末等に直接警告メッセージを表示して割り込み通話を促してもよい（ステップＳ１２）。 Preferably, for example, when a similar registered harmful customer name is not obtained, a normal telephone correspondence is instructed or no message is displayed, and a similar registered harmful customer name is obtained. In accordance with the obtained harmful customer rank, if the harmful customer rank is low, the operator in the call is instructed to handle the telephone call for harmful customers according to the manual, and if the harmful customer rank is high, a particularly careful response is required. The call may be transferred to the supervisor of the operator who is in a call, a skilled operator, or the like, or alternatively, a warning message may be directly displayed on the PC terminal or the like of these supervisors to prompt an interrupt call (step S12).

変形例として、オペレータ電話端末９ａに着呼した際には、例えば公知のＩＶＲ（ＩｎｔｅｒａｃｔｉｖｅＶｏｉｃｅＲｅｓｐｏｎｓｅ）機能を利用して、自動応答音声を発信元電話端末に音声出力してから、オペレータが電話応対を開始するよう構成してもよい。例えば、着呼時には、自動応答音声で、問い合わせ内容や顧客名などを発信元顧客が発話するよう促し、これに応答して入力された音声の声紋分析結果及び感情分析結果が得られ、オペレータＰＣ端末装置９ｂに表示された後に、オペレータが電話応対を開始すれば、発信元顧客に不自然さを抱かせることなく声紋チェックをより詳細に実行することが可能であるし、例えば顧客名や問い合わせ内容など予め決められた発話内容とこれと同様の発話内容をテンプレート化した登録声紋パターン（音声モデル）との間で声紋チェックを行なうことにより、声紋分析の精度も向上する利点が得られる。 As a modification, when a call is made to the operator's telephone terminal 9a, for example, the operator responds to the telephone after outputting an automatic response voice to the source telephone terminal using a known IVR (Interactive Voice Response) function. May be configured to start. For example, when receiving an incoming call, the automatic response voice prompts the originating customer to utter the contents of the inquiry and the customer name etc. In response to this, the voice print analysis result and emotion analysis result of the input voice are obtained, and the operator PC After being displayed on the terminal device 9b, if the operator starts telephone answering, it is possible to execute the voiceprint check in more detail without causing the originating customer to be unnatural, for example, the customer name and inquiry By performing a voiceprint check between a predetermined utterance content such as the content and a registered voiceprint pattern (voice model) in which the same utterance content is templated, it is possible to improve the accuracy of voiceprint analysis.

他の変形例として、電話番号分析サーバ４により、発信元電話番号に基づく電話番号分析処理の結果、発信元話者について「警告あり」と判断された場合にのみ、声紋分析サーバ３が発信元話者の声紋分析処理を実行し、発信元話者について「異常なし」と判断された場合には、声紋分析サーバ３による声紋分析処理を実行しないよう、本実施形態を構成してもよい。このように構成すれば、「異常なし」と判断された顧客との電話応対に当たっての声紋分析に係るシステムの負荷が軽減されると共に、オペレータの電話応対も迅速化され得る。 As another modification, the voiceprint analysis server 3 transmits the voiceprint source only when it is determined by the phone number analysis server 4 that “the alert is present” for the source speaker as a result of the telephone number analysis process based on the source phone number. This embodiment may be configured such that the voiceprint analysis process of the speaker is executed, and the voiceprint analysis process by the voiceprint analysis server 3 is not performed when it is determined that "the abnormality is not present" for the source speaker. According to this configuration, the load on the system related to voice print analysis at the time of telephone response with a customer who is determined to be "no abnormality" can be alleviated, and the operator's telephone response can be speeded up.

＜本実施形態における有害顧客声紋データ登録処理＞
図５は、第１の実施形態に係る有害顧客の音声モデルを、予め声紋データベース３１に登録する処理手順の一例を示す。 <Injury customer voiceprint data registration process in the present embodiment>
FIG. 5 shows an example of a processing procedure for registering the voice model of the harmful customer according to the first embodiment in the voiceprint database 31 in advance.

図５において、通話録音データベース１１から、過去に通話録音された通話音声データの中から、有害顧客として登録されるべき特定話者の通話音声データを読み出し（ステップＳ３１）、この特定話者の通話音声データからその特徴を示す声紋情報（声紋パターンのデータ）を抽出する（ステップＳ３２）。この有害顧客として登録されるべき話者の特定は、電話応対を行なったオペレータにより指定されてもよく、或いは通話履歴中のキーワードや個人情報等に基づき自動的に選択されてもよい。代替的に、オペレータと被登録話者との間の電話応対において、被登録話者の本人確認がされた後、例えば個人情報を、新たに録音してもよい。 In FIG. 5, the call voice data of a specific speaker to be registered as a harmful customer is read out from the call voice data recorded in the past from the call recording database 11 (step S31), and the call of this specific speaker is made. Voiceprint information (data of voiceprint pattern) indicating the feature is extracted from the voice data (step S32). The identification of the speaker to be registered as the harmful customer may be specified by the operator who made the telephone call, or may be automatically selected based on a keyword in the call history, personal information, and the like. Alternatively, for example, personal information may be newly recorded after identification of the registered speaker is confirmed in a telephone response between the operator and the registered speaker.

なお、本明細書において「個人情報」とは、顧客の個人情報であって、例えば顧客氏名、住所、登録された電話番号、生年月日、顧客属性、製品購入履歴等を含むものとし、オペレータＰＣ端末９ｂ上に、警告と共に或いはオペレータの指示入力に応じて、適宜表示され得る。また、顧客属性としては、例えば「有害顧客（＋有害顧客ランク）」、「重要顧客（ＶＩＰ）」等を識別可能とする。さらに、入力音声との照合のため必要な音声モデルは、例えば１０秒程度発話している音声から生成することができるが、この必要秒数は、プロセッサの処理能力や外部記憶装置の容量及びデータ構造に依存して変動し得る。 In the present specification, “personal information” is personal information of a customer, and includes, for example, a customer name, an address, a registered telephone number, a date of birth, a customer attribute, a product purchase history, etc. It can be appropriately displayed on the terminal 9b together with a warning or in response to an instruction input by the operator. Further, as the customer attribute, for example, “hazardous customer (plus harmful customer rank)”, “important customer (VIP)” and the like can be identified. Furthermore, although a speech model necessary for matching with the input speech can be generated from speech uttered for, for example, about 10 seconds, the required number of seconds is the processing capacity of the processor, the capacity and data of the external storage device, etc. It may vary depending on the structure.

ここで、本実施形態で声紋分析サーバ３及び制御サーバ６により実行される音声モデル登録処理及び声紋分析処理が使用し得る話者認識技術、とりわけ話者識別技術につき説明する。話者が何を話しているかを認識する音声認識処理とは異なり、話者の声が誰の声であるかを識別するのが話者認識（ＳｐｅａｋｅｒＲｅｃｏｇｎｉｔｉｏｎ）であり、この話者認識においては、人間の音声から、男女の性別、咽喉や口の大きさや形状等の解剖学的特徴、発話スピードやスタイル等の言語環境等に起因して特徴付けられる音響パターンを声紋情報として抽出しモデル化して、個人の声の認識を行なう。この話者認識のうち、本実施形態においては、入力された話者の音声を、記憶されている多数の音声モデルとそれぞれ照合することにより、誰であるか分からない音声を誰の音声か識別するものであり、これを話者識別（ＳｐｅａｋｅｒＩｄｅｎｔｉｆｉｃａｔｉｏｎ）という。また、本実施形態に係る話者識別においては、有害顧客自身に気付かれることなく、その発話音声を録音して照合用声紋情報を抽出する必要があるため、話者の発話内容を予め制限することが困難である。このため声紋情報が抽出された録音音声の発話テキストと、入力音声の発話テキストとは一致する場合はもとより、一致しない場合も前提として、話者の一般的音声特徴に基づき話者識別を実行する。 Here, a speaker recognition technology that can be used by the voice model analysis process and the voice print analysis process executed by the voice print analysis server 3 and the control server 6 in this embodiment will be described, in particular, a speaker identification technology. Unlike speech recognition processing that recognizes what a speaker is speaking, what is identified by the speaker's voice is speaker recognition, and in this speaker recognition, it is speaker recognition. And acoustic patterns extracted from human speech, gender characteristics of men and women, anatomical features such as throat and mouth size and shape, speech speed and style, etc. extracted as voice print information and modeled Recognize the voice of the individual. In this speaker recognition, in the present embodiment, the voice of the input speaker is compared with each of a large number of stored voice models to identify the voice whose voice does not know who the voice is. Speaker identification (Speaker Identification). Further, in the speaker identification according to the present embodiment, since it is necessary to record the speech and extract the voiceprint information for comparison without the harmful customer being aware of it, the speech content of the speaker is limited in advance. It is difficult. Therefore, the speaker identification is performed based on the general voice feature of the speaker, assuming that the speech text of the recorded speech from which the voiceprint information is extracted matches the speech text of the input speech as well as the case where they do not coincide. .

図６ないし図１２を参照して、録音音声から声紋情報を抽出する処理の詳細を説明する。録音された音声は、音素単位で、ラベリングされる。すなわち音素単位に、母音であれば、「ａ」、「ｉ」、「ｕ」、「ｅ」、「ｏ」であり、子音であれば、「ｋ」、「ｓ」、「ｔ」、「ｎ」、「ｈ」、「ｍ」、「ｊ」、「ｒ」、「ｗ」、「ｇ」、「ｚ」、「ｄ」、「ｂ」、「ｐ」のいずれかがラベルとして付与される。図６は、「電子（ｄｅｎｓｈｉ）」と発音した場合の、空気振動の大きさを縦軸に、時間を横軸に示したグラフであり、音素ごとに異なる波形パターンが表れている。 The details of the process of extracting voiceprint information from the recorded voice will be described with reference to FIGS. 6 to 12. The recorded speech is labeled in phoneme units. That is, if it is a vowel in the phoneme unit, it is "a", "i", "u", "e", "o", and if it is a consonant, "k", "s", "t", " n "," h "," m "," j "," r "," w "," g "," z "," d "," b "," p "are given as labels Ru. FIG. 6 is a graph showing the magnitude of air vibration on the vertical axis and the time on the horizontal axis in the case of sounding “denshi”, and a different waveform pattern appears for each phoneme.

次に、ラベリングされた音声を、音の高さ、大きさ、速度により正規化（統一化）する。音声は個人によって音の高さ（音声ピッチ）が異なるため、図７に示すように、元の信号を間引き、更に時間軸上縮めることにより、このピッチを変化させて音の高さを正規化する。また、音声は発声毎に大きさ及び速度が異なるため、図８に示すように、基準音声の速度に合うよう、音声波形を伸縮させて、音声の大きさ及び速度を正規化する。これらの正規化処理は、音素単位で実行される。 Next, the labeled speech is normalized (unified) according to the pitch, size, and speed of the sound. Since the voice differs in pitch (voice pitch) depending on the individual, as shown in FIG. 7, the pitch is changed by thinning the original signal and further reducing the time axis to normalize the pitch. Do. Further, since the voice differs in size and speed for each utterance, as shown in FIG. 8, the voice waveform is expanded or contracted to normalize the voice size and speed so as to match the speed of the reference voice. These normalization processes are performed on a phoneme basis.

次に、サウンドスペクトログラムを生成することにより、声紋情報を抽出する。音声周波数のスペクトルは、話者の声紋を特徴付ける。この周波数スペクトルは、時間信号をフーリエ変換（ＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）することで求めることができ、例えば、プロセッサでの処理に適する高速フーリエ変換（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：ＦＦＴ）処理を音声の時間波形信号に適用することにより実現することができる。図９は、音声信号にＦＦＴ処理を適用することによって得られる、ある音がどのくらいの周波数をどの程度含んでいるかを、横軸に周波数（Ｈｚ）、縦軸に音の大きさ（ｄＢ）をとってグラフに示した音声スペクトル、すなわち音声スペクトグラムである。 Next, voiceprint information is extracted by generating a sound spectrogram. The spectrum of speech frequencies characterizes the speaker's voiceprint. This frequency spectrum can be obtained by Fourier transforming a time signal, for example, applying Fast Fourier Transform (FFT) processing suitable for processing by a processor to a time waveform signal of speech. Can be realized by FIG. 9 shows the frequency (Hz) on the horizontal axis and the magnitude (dB) on the vertical axis of how much frequency a certain sound contains by applying FFT processing to a speech signal. The speech spectrum shown in the graph, that is, the speech spectrum is shown.

図１０は、この音声スペクトグラム上の包絡線を示したもので、この包絡線には、複数のピークが表れている。この音声スペクトル上のピークのそれぞれを、フォルマント（ｆｏｒｍａｎｔ）といい、このフォルマントの時間軸上の変化が、個体差に由来する声紋上の特徴を示すものとなる。 FIG. 10 shows an envelope on the speech spectrogram, in which a plurality of peaks appear. Each of the peaks on the speech spectrum is called a formant, and the change on the time axis of the formant indicates the voice print characteristic derived from the individual difference.

人間は、音声を連続して発声するときに、その個体差が音声に表れてくる。例えば、滑舌の良い話者の発声は音ごとの区切りが明瞭となり、滑舌の悪い話者の発声は音ごとの区切りが曖昧となる。音声スペクトルの時間軸上の変化を観察すれば、図１１は、Ｘ軸に時間、Ｙ軸に周波数、Ｚ軸に音の大きさをとって、「電子（ｄｅｎｓｈｉ）」と発声した音声信号について、時間を少しずつずらしながら各時点でのスペクトルを計算してグラフ化することにより、音声スペクトルの時間軸上の変化を三次元で示したものであり、図１２は、Ｚ軸の時間の大きさを色の濃淡で示すことにより、図１１の三次元グラフを二次元表示に変換したものである。図１２を参照すれば、音素ごとに模様が異なることが表されていることが理解される。図１３は、図１２で得られた音声スペクトグラム、すなわち声紋パターンを、話者Ａのパターンと話者Ｂのパターンとを対比してその個体差を示したものであり、本実施形態において声紋分析サーバ３が実行する声紋照合処理は、入力音声の声紋パターンが予め登録された被登録者の声紋パターンのいずれに最も類似するかを算出する。 When a human utters a voice continuously, individual differences appear in the voice. For example, the speech of a good speaker of syntactic tongues has a clear separation of tones, and the speech of a poor speaker of speech tongues has an unclear separation of tones. If the change on the time axis of the speech spectrum is observed, FIG. 11 shows the speech signal uttered as "denshi" with time on the X axis, frequency on the Y axis, and magnitude of sound on the Z axis. The graph of the spectrum at each point in time, while shifting the time gradually, shows the three-dimensional change of the speech spectrum on the time axis, and FIG. 12 shows the magnitude of the time on the Z axis. The three-dimensional graph of FIG. 11 is converted to a two-dimensional display by indicating the thickness by the shade of color. Referring to FIG. 12, it is understood that patterns are different for each phoneme. FIG. 13 shows the voice spectrogram obtained in FIG. 12, that is, the voiceprint pattern, showing the individual difference between the pattern of the speaker A and the pattern of the speaker B and showing the individual differences in this embodiment. The voiceprint collation process executed by the analysis server 3 calculates which of the voiceprint patterns of the registrant registered in advance the voiceprint pattern of the input voice is most similar.

図５に戻り、このように得られた音素毎のスペクトルを、声紋情報として、被登録者の個人情報と共に、声紋データベース３１に登録する（ステップＳ３３）。 Returning to FIG. 5, the thus obtained spectrum of each phoneme is registered in the voiceprint database 31 as voiceprint information together with the personal information of the registrant (step S33).

＜本実施形態における有害顧客音声モデルグルーピング処理及び更新処理＞
図５において、さらに声紋データベース３１に登録された各話者（被登録有害顧客）について、他の被登録有害顧客との間の類似指数を算出する（ステップＳ３４）。 <Noxious customer voice model grouping processing and update processing in the present embodiment>
In FIG. 5, for each speaker (registered harmful customer) registered in the voiceprint database 31, a similarity index with other registered harmful customer is calculated (step S34).

図１４は、声紋データベース３１に登録された被登録者の音素毎の時間軸上のスペクトル信号に基づいて、算出された類似指数の模式的一例を示し、例えば声紋データベースに被登録者ＡないしＫが登録されているとした場合に、各登録者間について、０ポイントから１００ポイントの範囲でのスケーリングで算出された類似指数がマトリクス上記載されている。図１５は、図１５で算出された類似指数に基づき、（１００−類似指数）をそれぞれ算出することにより、被登録者間の声紋パターンの相対的距離を算出した一例を示す。図１６及び図１７は、図１５で得られた各登録者間の声紋パターンの相対距離に基づき、例えば相対距離４０ポイントをグルーピング用閾値とした場合に、被登録話者ＡないしＫについて、グルーピングを行なった結果を示すものである。例えば、被登録話者Ａ、Ｂ、Ｃ、Ｄ、Ｅの間の相対距離は、―４０≦Ａ−Ｂ≦４０、−４０≦Ａ−Ｃ≦４０、・・・、−４０≦Ｄ−Ｅ≦４０であるから、被登録話者Ａ、Ｂ、Ｃ、Ｄ、Ｅは相互の相対距離絶対値がグルーピング用閾値である４０ポイント以内にあり、これをグループ１とする。同様に、被登録話者Ｆ、Ｇ、Ｈ、Ｉ、Ｊの間の相対距離は、−４０≦Ｆ−Ｇ≦４０、−４０≦Ｆ−Ｈ≦４０、・・・、−４０≦Ｉ−Ｊ≦４０であるから、被登録話者Ｆ、Ｇ、Ｈ、Ｉ、Ｊは相互の相対距離絶対値がグルーピング用閾値である４０ポイント以内にあり、これをグループ２とする。被登録話者Ｋについては、他の被登録話者との相対距離がいずれもグルーピング用閾値を上回る（５５ないし７５）であるため、Ｋ単独をグループ３とする。 FIG. 14 shows a schematic example of the similarity index calculated based on the spectrum signal on the time axis for each phoneme of the registrant registered in the voiceprint database 31. For example, the registrants A to K in the voiceprint database Assuming that is registered, the similarity index calculated by scaling in the range of 0 points to 100 points is described on the matrix between each registrant. FIG. 15 shows an example of calculating the relative distance of voiceprint patterns between registrants by calculating (100-similarity index) based on the similarity index calculated in FIG. FIGS. 16 and 17 are based on the relative distance of voiceprint patterns between registrants obtained in FIG. Show the results of For example, the relative distances between the registered speakers A, B, C, D, and E are −40 ≦ AB ≦ 40, −40 ≦ AC ≦ 40,..., −40 ≦ DE Since ≦ 40, the relative distances between the registered speakers A, B, C, D, and E are within 40 points which is the grouping threshold, and this is regarded as group 1. Similarly, the relative distances between the registered speakers F, G, H, I, and J are −40 ≦ F−G ≦ 40, −40 ≦ F−H ≦ 40,..., −40 ≦ I− Since J ≦ 40, the relative distance absolute values of the registered speakers F, G, H, I, J are within 40 points which is the grouping threshold, and this is regarded as the group 2. With respect to the registered speaker K, since all relative distances to other registered speakers exceed the grouping threshold (55 to 75), K alone is set to the group 3.

図５に戻り、他登録話者間との間の声紋パターン（声紋情報）の相対距離の絶対値（本明細書において、特記しない限り、「相対距離」とは相対距離の絶対値を示すものとする）がグルーピング用閾値内にある被登録者をグループ化する（ステップＳ３５）。 Returning to FIG. 5, the absolute value of the relative distance of the voiceprint pattern (voiceprint information) between the other registered speakers (in the present specification, unless otherwise specified, the “relative distance” indicates the absolute value of the relative distance ) Groups the registrants who are within the grouping threshold (step S35).

次に、各グループ内で他のグループ内被登録話者との間の相対距離の総和が最小の被登録話者を、当該グループの基準者と設定する（ステップＳ３６）。例えば、図１４ないし図１７の例においては、グループ１内基準者をＢ、グループ２内基準者をＨ，グループ３内基準者をＫとする。このようにして得られたグループの情報及び各グループの基準者の情報を、声紋データベース３１に登録する（ステップＳ３７）。 Next, a registered speaker with the smallest total sum of relative distances with other in-group registered speakers in each group is set as a reference person of the group (step S36). For example, in the example shown in FIGS. 14 to 17, the in-group 1 reference person is B, the in-group 2 reference person is H, and the in-group 3 reference person is K. The information of the group thus obtained and the information of the reference person of each group are registered in the voiceprint database 31 (step S37).

ここで、１グループに属する被登録者の数が、所定のグループ内被登録者数閾値、例えば一例として１０人を超えた場合には、１グループに属する被登録者数（グループ内話者数閾値）が１０人以内に納まるよう、声紋パターンの相対距離（絶対値）のグルーピング用閾値を減少させる（ステップＳ３８）。このように１グループに属する被登録者の数について閾値を設けて、グループ内被登録者数を所定数内に維持することによって、１つのグループ内の全被登録者について実行される声紋照合の所要時間を所定時間内に留め、入力音声声紋照合におけるリアルタイム性を維持することができる。 Here, when the number of registrants belonging to one group exceeds a predetermined number of registered persons in a group threshold, for example, 10 as an example, the number of registrants belonging to one group (the number of speakers in the group The threshold value for grouping of the relative distance (absolute value) of the voiceprint pattern is decreased so that the threshold value is within 10 persons (step S38). As described above, by providing a threshold value for the number of registrants belonging to one group and maintaining the number of registrants in the group within a predetermined number, voiceprint collation performed for all registrants in one group It is possible to keep the required time within a predetermined time, and to maintain the real-time property in the input voiceprint collation.

なお、上記の例では、グルーピング用閾値を４０ポイント、グループ内話者数閾値を１０人として説明したが、これらの閾値を決定するには、本実施形態に係る話者識別に要するトータル時間が、通話中に許容される応答インターバル時間を超えるか否かに基づき決定することが好適である。一例として、例えば、通話中に許容される応答インターバル時間は５秒以内、より好適には２秒以内とし、１件の被登録話者の声紋情報と入力された声紋情報との間の比較処理に０．２秒を要するとすると、トータルで１０人分の比較処理を行なうことが実用的時間内で許容されるため、（グループ数＋（グループ内話者数閾値−１））が１０となるようグループ内話者数閾値を決定し、これに基づき相対距離のグルーピング用閾値を決定すればよい。 In the above example, the grouping threshold is 40 points and the in-group speaker number threshold is 10, but in order to determine these thresholds, the total time required for speaker identification according to the present embodiment is determined. It is preferable to make a decision based on whether or not the response interval time allowed during a call is exceeded. As an example, for example, the response interval time allowed during a call is within 5 seconds, more preferably within 2 seconds, and comparison processing between voiceprint information of one registered speaker and input voiceprint information If it takes 0.2 seconds, it is acceptable within a practical time to perform comparison processing for a total of 10 people, so (number of groups + (number of inter-group speakers threshold -1)) is 10 The threshold value for the number of inter-group speakers may be determined so that the relative distance grouping threshold value may be determined based on this.

＜本実施形態における声紋照合処理＞
図１８は、本実施形態に係る入力音声の声紋照合の処理手順の一例を示す。 <Voice print collation processing in the present embodiment>
FIG. 18 shows an example of the processing procedure of voiceprint collation of input voice according to the present embodiment.

図１８において、各グループの基準者の声紋情報（図１６、図１７の例においては、被登録者Ｂ，Ｈ，Ｋ）と、オペレータが電話応対中である発信元話者の入力音声の声紋情報との照合を実行する（ステップＳ６１）。この声紋照合を行なうために、入力音声は、好適には例えば３秒間程度録音され、声紋データベース３１への登録処理において既に説明したのと同様に、音声信号に対して音素毎ラベリングされ、音のピッチ、大きさ、速度について正規化された上で、音声スペクトグラム（声紋パターン）が抽出される。 In FIG. 18, voiceprint information of a reference person of each group (registered users B, H, and K in the example of FIGS. 16 and 17) and voiceprints of input speech of a source speaker whose operator is answering a telephone call The collation with the information is executed (step S61). In order to perform this voiceprint collation, the input voice is preferably recorded, for example, for about 3 seconds, and the voice signal is labeled for each phoneme, as already described in the registration process to the voiceprint database 31. Speech spectograms (voiceprint patterns) are extracted after being normalized with respect to pitch, size, and speed.

ステップＳ６１における基準者との声紋照合処理の結果、入力音声の声紋情報と最も相対距離が近い（すなわち類似指数が最も大きい）基準者が属するグループを選択する（ステップＳ６２）。 As a result of the voiceprint collation process with the reference person in step S61, the group to which the reference person who has the closest relative distance to the voiceprint information of the input voice (that is, the largest similarity index) belongs is selected (step S62).

選択されたグループ内で、既に声紋照合がされた基準者を除く全被登録話者の声紋情報と、入力音声の声紋情報との照合を実行する（ステップＳ６３）。 In the selected group, the voiceprint information of all registered speakers except the reference person who has already been subjected to voiceprint verification is compared with the voiceprint information of the input voice (step S63).

この声紋照合の結果、選択されたグループ内の全被登録者（基準者＋ステップＳ６３で照合の対象とされた被登録者）を母集団として、入力音声の声紋情報との相対距離が最も小さく、かつ、一致判定用閾値内にある声紋情報を有する被登録者を特定する（ステップＳ６４）。なお、この一致判定用閾値は、好適には、上記のグルーピング用閾値（例えば４０ポイント）より小さい値であり、例えば２０ポイントであってよい。 As a result of the voiceprint collation, the relative distance of the input speech to the voiceprint information is the smallest, with all registrants in the selected group (reference person + registrants targeted for collation in step S63) as a population And, a registrant having voiceprint information within the coincidence determination threshold is specified (step S64). The match determination threshold is preferably a value smaller than the grouping threshold (for example, 40 points) described above, and may be, for example, 20 points.

このようにして、声紋分析サーバ３は、ステップＳ６４で特定された有害顧客名等の有害顧客識別子をその登録された個人情報と共に、制御サーバ６に送出する。他方、ステップＳ６４において被登録者が特定されなかった場合には、声紋分析サーバ３は、該当者なしの旨を示すデータを制御サーバ６に送出する。制御サーバ６は、例えば受信した有害顧客名及び有害顧客ランクをオペレータＰＣ端末９ｂ上に表示させ、好適には、オペレータからの指示入力に応答して、受信した有害顧客識別子をキーとしてさらに利用者データベース及び通話録音データベースを適宜参照し、オペレータＰＣ端末９ｂ上に過去の対応履歴や通話履歴等を表示させる。 Thus, the voiceprint analysis server 3 sends the harmful customer identifier such as the harmful customer name identified in step S64 to the control server 6 together with the registered personal information. On the other hand, when the registrant is not specified in step S64, the voiceprint analysis server 3 sends data indicating that there is no corresponding person to the control server 6. The control server 6 displays, for example, the received harmful customer name and the harmful customer rank on the operator PC terminal 9b, and preferably, in response to an instruction input from the operator, the user further uses the received harmful customer identifier as a key The database and the call recording database are referred to as appropriate, and the past correspondence history, call history and the like are displayed on the operator PC terminal 9b.

＜本実施形態に係る有害顧客検知システムのハードウエア構成＞
図１９は、本実施形態に係る各サーバ装置のハードウエア構成の一例を示すブロック図である。図１９に示されるコンピュータ装置１１０である各サーバ装置において、ＣＰＵ１１１は、ＲＯＭ１１４および／またはハードディスクドライブ１１６に格納されたプログラムに従い、ＲＡＭ１１５を一次記憶用ワークメモリとして利用して、システム全体を制御する。さらに、ＣＰＵ１１１は、マウス１１２ａまたはキーボード１１２を介して入力される利用者の指示に従い、ハードディスクドライブ１１６に格納されたプログラムに基づき、第１の実施形態に係る有害顧客検知処理を実行する。ディスプレイインタフェイス１１３には、ＣＲＴやＬＣＤなどのディスプレイが接続され、ＣＰＵ１１１が実行する有害顧客検知処理のための入力待ち受け画面、処理経過や処理結果、検索結果などが表示される。リムーバブルメディアドライブ１１７は、主に、リムーバブルメディアからハードディスクドライブ１１６へファイルを書き込んだり、ハードディスクドライブ１１６から読み出したファイルをリムーバブルメディアへ書き込む場合に利用される。リムーバブルメディアとしては、フロッピディスク(ＦＤ)、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、ＤＶＤ−ＲＯＭ、ＤＶＤ−Ｒ、ＤＶＤ−Ｒ／Ｗ、ＤＶＤ−ＲＡＭやＭＯ、あるいはメモリカード、ＣＦカード、スマートメディア、ＳＤカード、メモリスティックなどが利用可能である。 <Hardware configuration of the harmful customer detection system according to the present embodiment>
FIG. 19 is a block diagram showing an example of the hardware configuration of each server device according to the present embodiment. In each server device, which is the computer device 110 shown in FIG. 19, the CPU 111 controls the entire system according to a program stored in the ROM 114 and / or the hard disk drive 116 using the RAM 115 as a primary storage work memory. Furthermore, the CPU 111 executes the harmful customer detection process according to the first embodiment based on a program stored in the hard disk drive 116 in accordance with a user instruction input via the mouse 112 a or the keyboard 112. A display such as a CRT or LCD is connected to the display interface 113, and an input standby screen for harmful customer detection processing executed by the CPU 111, processing progress, processing results, search results and the like are displayed. The removable media drive 117 is mainly used when writing a file from the removable media to the hard disk drive 116 or writing a file read from the hard disk drive 116 to the removable media. As removable media, floppy disk (FD), CD-ROM, CD-R, CD-R / W, DVD-ROM, DVD-R, DVD-R / W, DVD-RAM or MO, or memory card, CF Cards, smart media, SD cards, memory sticks, etc. can be used.

プリンタインタフェイス１１８には、レーザビームプリンタやインクジェットプリンタなどのプリンタが接続される。ネットワークインタフェイス１１９は、コンピュータ装置をネットワークへ接続するためのインターフェースである。 A printer such as a laser beam printer or an ink jet printer is connected to the printer interface 118. The network interface 119 is an interface for connecting a computer device to a network.

なお、第１の実施形態に係る各サーバ装置及びオペレータＰＣ端末９ｂに対する入力手段は、マウス１１２ａあるいはキーボード１１２に限定されることなく、任意のポインティングデバイス、例えばトラックボール、トラックパッド、タブレットなどを適宜用いることができる。携帯情報端末を本実施形態に係るサーバ装置及びオペレータＰＣ端末９ｂに接続される入出力装置として用いる場合には、入力部をボタンやモードダイヤル等で構成してもよい。 The input means for each server device and the operator PC terminal 9b according to the first embodiment is not limited to the mouse 112a or the keyboard 112, and any pointing device such as a track ball, a track pad, a tablet, etc. It can be used. When the portable information terminal is used as an input / output device connected to the server device according to the present embodiment and the operator PC terminal 9b, the input unit may be configured by a button, a mode dial, or the like.

また、図１９に示した本実施形態に係る各サーバのハードウエア構成は一例に過ぎず、その他の任意のハードウエア構成を用いることができることはいうまでもない。 Further, the hardware configuration of each server according to the present embodiment shown in FIG. 19 is merely an example, and it is needless to say that any other hardware configuration can be used.

殊に、本実施形態に係る有害顧客検知処理の全部又は一部は、上記コンピュータ端末装置１１０あるいはＰＤＡ等の携帯情報端末装置等によって実現されてもよく、コンピュータ端末装置等とサーバー装置とをＢｌｕｅｔｏｏｔｈ（登録商標）等の無線、あるいはインターネット（ＴＣＰ／ＩＰ）、公共電話網（ＰＳＴＮ）、統合サービス・ディジタル網（ＩＳＤＮ）等の有線通信回線で相互接続した、インターネットあるいは任意の周知のローカル・エリア・ネットワーク（ＬＡＮ）またはワイド・エリア・ネットワーク（ＷＡＮ）からなるネットワークシステムによって通話録音処理の一部又は全部が実現されてもよい。 In particular, all or part of the harmful customer detection process according to the present embodiment may be realized by the computer terminal device 110 or a portable information terminal device such as a PDA, etc. The Internet or any known local area interconnected by a wireless communication such as (registered trademark) or a wired communication line such as the Internet (TCP / IP), public telephone network (PSTN), integrated services digital network (ISDN), etc. Some or all of the call recording process may be implemented by a network system consisting of a network (LAN) or a wide area network (WAN).

以上のとおり、本実施形態によれば、顧客の電話と応対担当者の電話との間でなされた通話を録音蓄積し管理するＣＲＭシステムにおいて、円滑な電話応対業務を阻害するような有害顧客を、電話応対の初期段階でリアルタイムに検知し、警告することができる。 As described above, according to the present embodiment, in the CRM system for recording, storing and managing the calls made between the customer's telephone and the telephone of the person in charge, the harmful customer who impedes the smooth telephone correspondence operation is It can detect and alert in real time at the initial stage of telephone service.

本発明の範囲は、図示され記載された例示的な実施形態に限定されるものではなく、本発明が目的とするものと均等な効果をもたらすすべての実施形態をも含み、その要旨を逸脱しない範囲で多様な改良ないし変更が可能である。例えば、本実施形態において開示された電話番号分析処理、声紋分析処理、及び感情解析処理は、それぞれ本実施形態に係る有害顧客検知システムに単独で実装されてもよく、任意の組み合わせで実装されてもよい。 The scope of the present invention is not limited to the illustrated and described exemplary embodiments, but also includes all the embodiments that bring about the same effects as the object of the present invention, without departing from the gist thereof. Various improvements or changes are possible within the scope. For example, the telephone number analysis process, the voice print analysis process, and the emotion analysis process disclosed in the present embodiment may be implemented alone in the harmful customer detection system according to the present embodiment, or implemented in any combination. It is also good.

また本発明は、重要顧客（ＶＩＰ）をオペレータが電話応対の初期段階で識別することにも適用され得る。この場合、「顧客属性」として、「重要顧客」であること及びその内容を示す属性値を設け、発信元話者識別の結果として、例えばオペレータＰＣ端末９ｂ等に、重要顧客であることを示すメッセージを表示すればよい。例えば発信元話者が、単に「鈴木だが」と名乗ったとすると、話者識別の結果から、オペレータが即時に「○○商事の鈴木一郎様ですね」等と電話応対することが可能となる。さらに、本発明の範囲は、請求項１により画される発明の特徴の組み合わせに限定されるものではなく、すべての開示されたそれぞれの特徴のうち特定の特徴のあらゆる所望する組み合わせによって画されうる。 The present invention can also be applied to the operator identifying the key customer (VIP) at the initial stage of the telephone service. In this case, as "customer attribute", an attribute value indicating "important customer" and its content is provided, and as a result of the identification of the source speaker, for example, operator PC terminal 9b or the like indicates the importance customer. Just display the message. For example, assuming that the originating speaker simply names "Suzu-chi," the result of the speaker identification enables the operator to immediately answer telephone calls such as "You are Mr. Ichiro Suzuki of ○○ Shoji." Furthermore, the scope of the present invention is not limited to the combination of the features of the invention as defined by claim 1, but can be defined by any desired combination of particular features of all the disclosed respective features. .

本発明の一実施形態に係る有害顧客検知システムのネットワーク構成の一例を示すブロック図である。It is a block diagram showing an example of the network configuration of the harmful customer detection system concerning one embodiment of the present invention. 本発明の一実施形態に係る有害顧客検知システムにより実行される有害顧客検知処理の制御・びデータフロー及びタイムシーケンスを示す図である。It is a figure showing control of a harmful customer detection processing, data flow, and time sequence which are performed by a harmful customer detection system concerning one embodiment of the present invention. 図１における電話番号データベース４１内のデータ構造の一例を示す模式図である。It is a schematic diagram which shows an example of the data structure in the telephone number database 41 in FIG. 図１における声紋データベース３１内のデータ構造の一例を示す模式図である。It is a schematic diagram which shows an example of the data structure in the voiceprint database 31 in FIG. 本発明の一実施形態に係る声紋分析サーバ３及び制御サーバ６が実行する有害顧客声紋情報登録処理の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of the harmful customer voiceprint information registration process which the voiceprint analysis server 3 and control server 6 which concern on one Embodiment of this invention perform. 録音音声に対する音素単位でのラベリング処理を説明する模式図である。It is a schematic diagram explaining the labeling process in a phoneme unit with respect to a recording speech. 録音音声に対するピッチ統一処理を説明する模式図である。It is a schematic diagram explaining the pitch unification process with respect to a recording audio | voice. 録音音声に対する音声の大きさ・速度統一処理を説明する模式図である。It is a schematic diagram explaining the magnitude | size and speed unification process of an audio | voice with respect to a recording audio | voice. 録音音声の時間信号に高速フーリエ変換を適用して音声スペクトグラムを得る処理を説明する模式図である。It is a schematic diagram explaining the process which applies a fast Fourier transform to the time signal of sound recording audio | voice, and obtains an audio | voice spectrogram. 音声スペクトグラムに対する包絡線からフォルマントを検出する処理を説明する模式図である。It is a schematic diagram explaining the process which detects a formant from the envelope with respect to audio | voice spectrogram. 音声スペクトグラムの時間軸上の変化を説明する模式図である。It is a schematic diagram explaining the change on the time-axis of an audio | voice spectrogram. 図１１の音声スペクトグラム三次元グラフを二次元グラフに変換して表示した図である。It is the figure which converted and displayed the audio | voice spectrogram three-dimensional graph of FIG. 11 into a two-dimensional graph. 図１２の音声スペクトグラムの個体差による相違を一例として説明する図である。It is a figure explaining the difference by the individual difference of the audio | voice spectrogram of FIG. 12 as an example. 声紋データベースに登録された話者相互間について算出される類似指数を一例として説明する表である。It is a table | surface which demonstrates the similarity index calculated about the speakers registered into the voiceprint database as an example. 図１４において算出された類似指数に基づき算出される話者相互間の相対距離を一例として説明する表である。FIG. 15 is a table for explaining as an example the relative distance between speakers calculated based on the similarity index calculated in FIG. 14. 図１５における話者相互間の相対距離を模式的に示すグラフである。It is a graph which shows typically the relative distance between the speakers in FIG. 図１６のグラフに基づき、相対距離が所定のグルーピング閾値内にある話者声紋パターンをグルーピングした結果を一例として説明する表である。FIG. 17 is a table for explaining an example of grouping results of speaker voiceprint patterns whose relative distance is within a predetermined grouping threshold, based on the graph of FIG. 16. 本発明の一実施形態に係る声紋分析サーバ３が実行する声紋照合処理の処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the process sequence of the voiceprint collation process which the voiceprint analysis server 3 which concerns on one Embodiment of this invention performs. 本実施形態に係る各サーバ装置のハードウエア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of each server apparatus which concerns on this embodiment.

Explanation of sign

ＰＢＸ１
音声取得サーバ２
声紋分析サーバ３
電話番号分析サーバ４
感情解析サーバ５
制御サーバ６
顧客電話端末７
ＰＳＴＮ８
オペレータ電話端末９ａ
オペレータＰＣ端末９ｂ
通話録音サーバ１０
構内回線１１ａ，１１ｂ，１１ｃ
声紋データベース３１
電話番号データベース４１
通話録音データベース１０ PBX 1
Voice acquisition server 2
Voice print analysis server 3
Telephone number analysis server 4
Emotion analysis server 5
Control server 6
Customer phone 7
PSTN 8
Operator telephone terminal 9a
Operator PC terminal 9b
Call recording server 10
Public lines 11a, 11b, 11c
Voiceprint Database 31
Telephone number database 41
Call recording database 10

Claims

A harmful customer detection system that detects in real time whether a speaker in a call corresponds to each of all registered harmful customers in advance,
A voiceprint information extraction unit which reads out from the storage device a recorded voice signal of a registered speaker identified in advance and extracts voiceprint information characterizing each of the registered speakers;
A grouping unit that calculates relative distances of voiceprint information between all registered speakers, and groups the registered speakers having voiceprint information whose relative distances between the registered speakers are all within the first threshold. When,
A reference person setting unit for setting, as a reference person in the group, a registered speaker having voiceprint information in which a total sum of relative distances with voiceprint information of other registered speakers belonging to each group is minimum;
A voiceprint database storing the extracted voiceprint information together with the identifier of the registered speaker, the group identifier to which the registered speaker belongs, and the information of the reference person;
A voice acquisition unit that acquires a call voice of a caller, and extracts voiceprint information of the caller;
The relative distance is calculated only between the voiceprint information of the originating speaker and the voiceprint information of the reference person of each group registered in the voiceprint database with reference to the voiceprint database, and the relative distance is the largest. A group selection unit for selecting a group to which a reference person having small voiceprint information belongs;
The relative distance between the voiceprint information of the source speaker and the voiceprint information of the registered speaker other than the reference person belonging to the selected group is calculated, and the calculation of the relative distance is performed for all subjects other than the reference person. Select a registered speaker having voiceprint information having the smallest relative distance in the selected group, and the relative distance being within a second threshold smaller than the first threshold, repeatedly for the registered speaker A speaker selection unit to be registered,
A message output unit for outputting a warning message including the identifier of the selected speaker to be displayed visually or voice-recognizable during a call by the called party who talks with the source speaker;
When the number of registered speakers grouped into one group by the grouping unit exceeds a predetermined number of registered users, the number of registered speakers in the group selected within the allowable call response interval time And a group reconstruction unit for decreasing the first threshold to complete selection processing, and regrouping a registered speaker having voiceprint information within the reduced first threshold. Harmful customer detection system.

The voiceprint database further stores together harmful customer rank information indicating the degree of harmful customers of the registered speaker in association with the extracted voiceprint information.
The message output unit generates a first warning message when harmful customer rank information of the selected registered speaker is equal to or less than a predetermined value, and a second warning message when the harmful customer rank information is larger than the predetermined value. The harmful customer detection system according to claim 1, wherein:

The above-mentioned harmful customer detection system further
A telephone number database for storing telephone installation history information of the user along with the telephone number of the telephone;
The credit check of the caller is executed based on the telephone installation history information of the caller telephone number obtained from the caller call information with reference to the telephone number database, and the result of the credit check is used as a credit check message. The harmful customer detection system according to claim 1 or 2, further comprising: a second message output unit that outputs a visual recognition or voice recognition by the destination speaker before the call starts.

The above-mentioned harmful customer detection system further
The harmful customer detection system according to any one of claims 1 to 3, further comprising: an emotion analysis unit that detects occurrence of fluctuation of the input call voice and notifies a message output unit of the detection.

A voiceprint information extraction unit, a grouping unit, a reference person setting unit, and a voiceprint information storage unit, which detects in real time whether or not a speaker in a call corresponds to each of all harmful customers registered in advance. A harmful customer detection method executed by a harmful customer detection system comprising a voice acquisition unit, a group selection unit, a registered speaker selection unit, a message output unit, and a group reconstruction unit,
Reading the recorded voice signal of the registered speaker identified in advance from the storage device by the voiceprint information extraction unit and extracting voiceprint information characterizing the respective registered speakers;
The grouping unit calculates a relative distance of voiceprint information of all registered speakers, and a registered speaker having voiceprint information whose relative distance between the registered speakers is within a first threshold. Grouping step,
Setting, by the reference person setting unit, a registered speaker having voiceprint information for which the sum total of relative distances to voiceprint information of other registered speakers belonging to each group is minimum as a reference person in the group; ,
Storing the voiceprint information extracted by the voiceprint information storage unit in the voiceprint database together with the identifier of the registered speaker, the group identifier to which the registered speaker belongs, and the information of the reference person;
Acquiring a call voice of a caller speaker by the voice acquisition unit, and extracting voiceprint information of the caller speaker;
The group selection unit calculates the relative distance only between the voiceprint information of the originating speaker and the voiceprint information of the reference person of each group registered in the voiceprint database, with reference to the voiceprint database. Selecting a group to which a reference person having voiceprint information with the smallest relative distance belongs;
The relative distance between the voiceprint information of the source speaker and the voiceprint information of the registered speaker other than the reference person belonging to the selected group is calculated by the registered speaker selection unit, and this relative distance Calculation is repeated for all registered speakers other than the reference person, and voiceprint information having the smallest relative distance in the selected group is included, and the relative distance is within the second threshold smaller than the first threshold. Selecting a speaker to be registered in
Outputting a warning message including the identifier of the selected registered speaker by the message output unit by the called party talker to the calling speaker so as to be visible or capable of voice recognition during the call;
When the number of registered speakers grouped into one group by the grouping unit exceeds a predetermined number of registered persons by the group reconstruction unit, the group is selected within the group which is selected within the allowable call response interval time. And D. reducing the first threshold to regroup the registered speakers having voiceprint information within the reduced first threshold so that the selection process of the registered speaker in S. is completed. A harmful customer detection method characterized by

The voiceprint database further stores together harmful customer rank information indicating the degree of harmful customers of the registered speaker in association with the extracted voiceprint information.
The step executed by the message output is a first warning message if the selected customer's harmful customer rank information of the registered speaker is less than or equal to a predetermined value, and if the harmful customer rank information is greater than the predetermined value. 6. The harmful customer detection method according to claim 5, wherein the warning message of 2 is output respectively.

The above harmful customer detection method is further
Storing the telephone installation history information of the user in the telephone number database together with the telephone number of the telephone by the telephone number storage unit;
The second message output unit refers to the telephone number database and executes the credit check of the caller on the basis of the telephone installation history information of the caller telephone number obtained from the caller call information, and the credit check The harmful customer detection method according to claim 5 or 6, further comprising the step of: outputting the result of (4) as a credit check message so as to allow visual recognition or voice recognition by the destination speaker before the start of a call.

The above harmful customer detection method is further
The harmful customer detection method according to any one of claims 5 to 7, further comprising the step of: detecting an occurrence of fluctuation of the input call voice by the emotion analysis unit and notifying the message output unit.

A harmful customer detection program for causing a computer to execute a harmful customer detection process for detecting in real time whether or not a speaker in a call corresponds to each of all previously registered harmful customers, the program comprising: , To the computer,
Voiceprint information extraction processing for reading out from the storage device a recorded voice signal of a registered speaker identified in advance and extracting voiceprint information characterizing each registered speaker;
A grouping process in which relative distances of voiceprint information between all registered speakers are calculated, and registered speakers having voiceprint information whose relative distances between the registered speakers are all within the first threshold are grouped. When,
A reference person setting process of setting a registered speaker having voiceprint information having a minimum sum of relative distances with voiceprint information of other registered speakers belonging to each group as a reference person in the group;
Voiceprint information storage processing for storing the extracted voiceprint information in the voiceprint database together with the identifier of the registered speaker, the group identifier to which the registered speaker belongs, and the information of the reference person;
Voice acquisition processing of acquiring a call voice of a caller, and extracting voiceprint information of the caller;
The relative distance is calculated only between the voiceprint information of the originating speaker and the voiceprint information of the reference person of each group registered in the voiceprint database with reference to the voiceprint database, and the relative distance is the largest. Group selection processing for selecting a group to which a reference person having small voiceprint information belongs;
The relative distance between the voiceprint information of the source speaker and the voiceprint information of the registered speaker other than the reference person belonging to the selected group is calculated, and the calculation of the relative distance is performed for all subjects other than the reference person. Select a registered speaker having voiceprint information having the smallest relative distance in the selected group, and the relative distance being within a second threshold smaller than the first threshold, repeatedly for the registered speaker Called speaker selection processing,
Message output processing for outputting a warning message including the identifier of the selected speaker to be recognized or voice-recognizable during a call by the called party speaking to the calling party;
When the number of registered speakers grouped into one group in the grouping process exceeds a predetermined number of registered users, the number of registered speakers in the group selected within the allowable call response interval time A computer including: group reconstruction processing for reducing the first threshold to end selection processing, and regrouping a registered speaker having voiceprint information within the reduced first threshold; A harmful customer detection program characterized by being for execution.

The voiceprint database further stores together harmful customer rank information indicating the degree of harmful customers of the registered speaker in association with the extracted voiceprint information.
The message output process is configured to generate a first warning message when harmful customer rank information of the selected registered speaker is less than a predetermined value, and a second warning message when the harmful customer rank information is larger than the predetermined value. The harmful customer detection program according to claim 9, wherein: the harmful customer detection program.

The above-mentioned harmful customer detection program further
A telephone number storage process of storing telephone installation history information of the user in a telephone number database together with the telephone number of the telephone;
The credit check of the caller is executed based on the telephone installation history information of the caller telephone number obtained from the caller call information with reference to the telephone number database, and the result of the credit check is used as a credit check message. The harmful customer detection program according to claim 9 or 10, further comprising: a second message output process of outputting a voice recognition enabling visual recognition by the called party before the call starts.

The above-mentioned harmful customer detection program further
The harmful customer detection program according to any one of claims 9 to 11, further comprising an emotion analysis process of detecting occurrence of fluctuation of the input call voice and notifying a message output unit of the occurrence.