JP5510838B2

JP5510838B2 - Voice control system, voice control apparatus, voice control method, and voice control program

Info

Publication number: JP5510838B2
Application number: JP2011180217A
Authority: JP
Inventors: 徹廣野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2011-08-22
Filing date: 2011-08-22
Publication date: 2014-06-04
Anticipated expiration: 2031-08-22
Also published as: JP2013046088A

Description

本発明は、音声制御技術に関する。 The present invention relates to a voice control technique.

上記技術分野において、特許文献１に示されているように、音声の感情部分の音量およびピッチ変動量を算出して、所定内に収まるように音量およびピッチ変動量を変換する技術が知られている。 In the above technical field, as disclosed in Patent Document 1, a technique for calculating the volume and pitch fluctuation amount of the emotion part of speech and converting the volume and pitch fluctuation amount so as to be within a predetermined range is known. Yes.

また、特許文献２に示されているように、発話音声データから危険語句を検出すると、対応モードに応じて危険語句の音量を下げるなどの変換処理を行なう技術が知られている。さらに、発話音声データの音声ピッチの変動量が所定の変動量に達する回数が所定の回数以上である場合に、ユーザが感情的に発話しているものとして通話受付を他のオペレータに切り換える技術が知られている。 Further, as shown in Patent Document 2, a technique is known in which when a dangerous word / phrase is detected from utterance voice data, conversion processing such as lowering the volume of the dangerous word / phrase is performed in accordance with a corresponding mode. Further, there is a technique for switching the call reception to another operator as if the user is emotionally speaking when the number of times the fluctuation amount of the voice pitch of the utterance voice data reaches the predetermined fluctuation amount is a predetermined number or more. Are known.

特開2004-252085号公報JP 2004-252085 A 特開2009-159558号公報JP 2009-159558

しかしながら、上記特許文献１に記載の技術では、音声の感情部分の音量およびピッチ変動量を変換するに過ぎず、音声を他の音声に代替したりマスキングしたりすることはできなかった。 However, the technique described in Patent Document 1 merely converts the volume and pitch fluctuation amount of the emotion part of the voice, and cannot substitute or mask the voice with another voice.

また、上記特許文献２に記載の技術では、危険語句を検出して音量を下げるなどの変換処理を行なうとしてもあらかじめ設定された内容で処理されるため、オペレータのストレスを軽減することに必ずしも効果的ではなかった。また、音声ピッチの変動の検出によりユーザが感情的に発話しているものと判定しても、通話を続行するか否かオペレータに判断させるため、感情的な発話をオペレータに伝達することを回避できなかった。 Further, in the technique described in Patent Document 2, even if conversion processing such as detecting a dangerous word / phrase and lowering the volume is performed, processing is performed in accordance with preset contents, so that it is not necessarily effective in reducing operator stress. It was not right. Also, even if it is determined that the user is emotionally speaking by detecting fluctuations in the voice pitch, it is avoided to convey the emotional speech to the operator so that the operator can determine whether to continue the call. could not.

本発明の目的は、上述の課題を解決する技術を提供することにある。 The objective of this invention is providing the technique which solves the above-mentioned subject.

上記目的を達成するため、本発明に係るシステムは、
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換手段と、
前記電話交換手段により確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタ手段と、
前記音声情報中に、あらかじめ定められた特定の感情を表わす感情的音声が含まれるか否か判定する感情判定手段と、
判定された前記感情的音声を前記オペレータ端末に直接伝達しないように制御することを指示する感情制御指示手段と、
否定的な語彙としての特定語彙が前記音声情報に含まれるか否か判定する語彙判定手段と、
前記音声情報に含まれると判定された前記特定語彙を前記オペレータ端末に直接伝達しないように制御することを指示する語彙制御指示手段と、
前記感情的音声及び前記特定語彙を制御した制御音声を生成する音声生成手段と、
前記特定語彙を記憶する特定語彙記憶手段と、
を備え、
前記特定語彙記憶手段に記憶された前記特定語彙は、前記感情判定手段または前記語彙判定手段のいずれかに伝達された前記音声情報から抽出されて更新されることを特徴とする。 In order to achieve the above object, a system according to the present invention provides:
Telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network;
Call monitoring means for monitoring the call established by the telephone exchange means and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
Emotion determination means for determining whether or not emotional voice representing a predetermined specific emotion is included in the voice information ;
Emotion control instruction means for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
Vocabulary determining means for determining whether or not a specific vocabulary as a negative vocabulary is included in the audio information;
Vocabulary control instruction means for instructing control not to directly transmit the specific vocabulary determined to be included in the voice information to the operator terminal;
Voice generating means for generating control voice that controls the emotional voice and the specific vocabulary ;
Specific vocabulary storage means for storing the specific vocabulary;
With
The specific vocabulary stored in the specific vocabulary storage means is extracted and updated from the speech information transmitted to either the emotion determination means or the vocabulary determination means .

上記目的を達成するため、本発明に係る方法は、
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換ステップと、
前記電話交換ステップにより確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタステップと、
前記音声情報中に、あらかじめ定められた特定の感情を表わす感情的音声が含まれるか否か判定する感情判定ステップと、
判定された前記感情的音声を前記オペレータ端末に直接伝達しないように制御することを指示する感情制御指示ステップと、
特定語彙記憶手段に記憶された否定的な語彙としての特定語彙が前記音声情報に含まれるか否か判定する語彙判定ステップと、
判定された前記特定語彙を前記オペレータ端末に直接伝達しないように制御することを指示する語彙制御指示ステップと、
前記感情的音声及び前記特定語彙を制御した制御音声を生成する音声生成ステップと、
前記特定語彙記憶手段に記憶された前記特定語彙を、前記感情判定ステップまたは前記語彙判定ステップのいずれかにおいて伝達された前記音声情報から抽出して更新する更新ステップと、
を含むことを特徴とする。 In order to achieve the above object, the method according to the present invention comprises:
A telephone exchange step for establishing a call between a user terminal and an operator terminal via a public telephone network;
A call monitoring step of monitoring the call established by the telephone exchange step and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
An emotion determination step for determining whether or not emotional voice representing a predetermined specific emotion is included in the voice information;
An emotion control instruction step for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
A vocabulary determination step of determining whether or not a specific vocabulary as a negative vocabulary stored in the specific vocabulary storage means is included in the speech information;
A vocabulary control instruction step for instructing control so as not to directly transmit the determined specific vocabulary to the operator terminal;
A voice generation step of generating a control voice in which the emotional voice and the specific vocabulary are controlled;
An update step for extracting and updating the specific vocabulary stored in the specific vocabulary storage means from the speech information transmitted in either the emotion determination step or the vocabulary determination step;
It is characterized by including.

上記目的を達成するため、本発明に係るプログラムは、
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換ステップと、
前記電話交換ステップにより確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタステップと、
前記通話モニタステップにより取得された前記音声情報中に認識される前記ユーザ端末の音声に、あらかじめ定められた特定の感情を表わす感情的音声が含まれるか否か判定する感情判定ステップと、
判定された前記感情的音声を前記オペレータ端末に直接伝達しないように制御することを指示する感情制御指示ステップと、
特定語彙記憶手段に記憶された否定的な語彙としての特定語彙が前記音声情報に含まれるか否か判定する語彙判定ステップと、
判定された前記特定語彙を前記オペレータ端末に直接伝達しないように制御することを指示する語彙制御指示ステップと、
前記感情的音声及び前記特定語彙を制御した制御音声を生成する音声生成ステップと、
前記特定語彙記憶手段に記憶された前記特定語彙を、前記感情判定ステップまたは前記語彙判定ステップのいずれかにおいて伝達された前記音声情報から抽出して更新する更新ステップと、
をコンピュータに実行させることを特徴とする。 In order to achieve the above object, a program according to the present invention provides:
A telephone exchange step for establishing a call between a user terminal and an operator terminal via a public telephone network;
A call monitoring step of monitoring the call established by the telephone exchange step and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
An emotion determination step for determining whether or not the voice of the user terminal recognized in the voice information acquired by the call monitoring step includes an emotional voice representing a predetermined specific emotion;
An emotion control instruction step for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
A vocabulary determination step of determining whether or not a specific vocabulary as a negative vocabulary stored in the specific vocabulary storage means is included in the speech information;
A vocabulary control instruction step for instructing control so as not to directly transmit the determined specific vocabulary to the operator terminal;
A voice generation step of generating a control voice in which the emotional voice and the specific vocabulary are controlled;
An update step for extracting and updating the specific vocabulary stored in the specific vocabulary storage means from the speech information transmitted in either the emotion determination step or the vocabulary determination step;
Is executed by a computer.

本発明によれば、ユーザの通話を監視して感情的な音声や危険語句をオペレータに伝達する前にマスキングなどにより一律に変換処理を行なうため、オペレータにストレスを感じさせることなく通話内容を制御可能である。 According to the present invention, since the user's call is monitored and emotional voices and dangerous phrases are uniformly converted by masking before transmitting to the operator, the contents of the call can be controlled without causing the operator to feel stress. Is possible.

本発明の第１実施形態に係る音声制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice control system which concerns on 1st Embodiment of this invention. 本発明の第２実施形態に係る音声制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice control system which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る音声制御システムの動作手順を示すシーケンス図である。It is a sequence diagram which shows the operation | movement procedure of the audio | voice control system which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る音声制御システムの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the audio | voice control system which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る不穏当音声検出部のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the unrest sound detection part which concerns on 2nd Embodiment of this invention. 本発明の第３実施形態に係る音声制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice control system which concerns on 3rd Embodiment of this invention. 本発明の第４実施形態に係る音声制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice control system which concerns on 4th Embodiment of this invention. 本発明の第５実施形態に係る音声制御システムの構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice control system which concerns on 5th Embodiment of this invention.

以下に、本発明を実施するための形態について、図面を参照して、例示的に詳しく説明記載する。ただし、以下の実施の形態に記載されている、構成、数値、処理の流れ、機能要素などは一例に過ぎず、その変形や変更は自由であって、本発明の技術範囲を以下の記載に限定する趣旨のものではない。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be exemplarily described in detail with reference to the drawings. However, the configuration, numerical values, process flow, functional elements, and the like described in the following embodiments are merely examples, and modifications and changes are free, and the technical scope of the present invention is described in the following description. It is not intended to be limited.

［第１実施形態］
本発明の第１実施形態としての音声制御システム１００について、図１を用いて説明する。音声制御システム１００は、ユーザ端末１１１とオペレータ端末１１２との通話を制御する。 [First Embodiment]
A voice control system 100 as a first embodiment of the present invention will be described with reference to FIG. The voice control system 100 controls a call between the user terminal 111 and the operator terminal 112.

図１に示すように、音声制御システム１００は、電話交換機１０１と、通話モニタ部１０２と、不穏当音声検出部１０３と、音声生成部１０４とを含む。また、音声制御システム１００は、ユーザ端末１１１と、オペレータ端末１１２と、公衆電話網１２０とを含む。 As shown in FIG. 1, the voice control system 100 includes a telephone exchange 101, a call monitor unit 102, a disturbing voice detection unit 103, and a voice generation unit 104. The voice control system 100 includes a user terminal 111, an operator terminal 112, and a public telephone network 120.

電話交換機１０１は、ユーザ端末１１１と、オペレータ端末１１２との通話を確立させる。通話モニタ部１０２は、電話交換機１０１により確立された通話をモニタしてユーザ端末１１１とオペレータ端末１１２との通話内容を音声情報として取得する。 The telephone exchange 101 establishes a call between the user terminal 111 and the operator terminal 112. The call monitor unit 102 monitors the call established by the telephone exchange 101 and acquires the call contents between the user terminal 111 and the operator terminal 112 as voice information.

不穏当音声検出部１０３は、通話モニタ部１０２により取得された音声情報中に認識されるユーザ端末１１１の音声から、あらかじめ定められた特定の感情を表わす感情的音声または否定的な語彙を含む特定語彙の音声のうちいずれかの不穏当音声を検出する。 The unrestful voice detection unit 103 is a specific vocabulary including emotional voice or a negative vocabulary representing a predetermined specific emotion from the voice of the user terminal 111 recognized in the voice information acquired by the call monitor unit 102. Detects the disturbing voice of any of the voices.

音声生成部１０４は、不穏当音声検出部１０３により検出された不穏当音声を制御した制御音声を生成する。 The sound generation unit 104 generates a control sound that controls the unrest sound detected by the unrest sound detection unit 103.

以上の構成および動作により、本実施形態における音声制御システムはユーザの通話を監視して感情的な音声や危険語句をオペレータに伝達する前にマスキングなどにより一律に変換処理を行なうため、オペレータにストレスを感じさせることなく通話内容を制御可能である。 With the above configuration and operation, the voice control system in the present embodiment monitors the user's call and performs uniform conversion processing by masking etc. before transmitting emotional voices and dangerous phrases to the operator. The content of the call can be controlled without making you feel.

［第２実施形態］
次に本発明の第２実施形態に係る音声制御システム２００について、図２を用いて説明する。図２は、本実施形態に係る音声制御システム２００の概略構成を示す図である。 [Second Embodiment]
Next, a voice control system 200 according to the second embodiment of the present invention will be described with reference to FIG. FIG. 2 is a diagram showing a schematic configuration of the voice control system 200 according to the present embodiment.

図２に示すように、音声制御システム２００は、電話交換機２０１と、通話モニタ部２０２と、不穏当音声検出部２０３と、ユーザ端末２１１と、オペレータ端末２１２と、公衆電話網２２０とを含む。電話交換機２０１は、音声生成部２０４と、バッファリング部２１０とを有する。不穏当音声検出部２０３は、感情判定部２０５と、感情制御指示部２０６と、語彙判定部２０７と、語彙制御指示部２０８と、特定語彙記憶部２０９とを有する。なお、本実施形態において特定語彙記憶部２０９は、不穏当音声検出部２０３の外部に設置された構成であるが、不穏当音声検出部２０３の内部に含まれる構成であってもよい。 As shown in FIG. 2, the voice control system 200 includes a telephone exchange 201, a call monitor unit 202, a disturbing voice detection unit 203, a user terminal 211, an operator terminal 212, and a public telephone network 220. The telephone switch 201 includes a voice generation unit 204 and a buffering unit 210. The disturbing voice detection unit 203 includes an emotion determination unit 205, an emotion control instruction unit 206, a vocabulary determination unit 207, a vocabulary control instruction unit 208, and a specific vocabulary storage unit 209. In addition, although the specific vocabulary memory | storage part 209 is the structure installed in the outside of the unrestful sound detection part 203 in this embodiment, the structure contained inside the unrestful sound detection part 203 may be sufficient.

ユーザ端末２１１は、公衆電話網２２０を介して電話交換機２０１に接続される。通話モニタ部２０２は電話交換機２０１から、ユーザ端末２１１より発信された音声を取得して記憶部（不図示）に記憶する。通話モニタ部２０２は、取得した音声を不穏当音声検出部２０３の感情判定部２０５と語彙判定部２０７とに伝達する。 The user terminal 211 is connected to the telephone exchange 201 via the public telephone network 220. The call monitor unit 202 acquires the voice transmitted from the user terminal 211 from the telephone switch 201 and stores it in a storage unit (not shown). The call monitor unit 202 transmits the acquired sound to the emotion determination unit 205 and the vocabulary determination unit 207 of the unrest sound detection unit 203.

感情判定部２０５は、通話モニタ部２０２より伝達されたユーザ端末の音声（通話内容）から認識される音声中に、特定の感情を表わす感情的音声（不穏当音声）が含まれるか否か判定する。特定の感情とは、気持ちが高揚した状態であって、主に話者の怒りの感情を伴う状態である。感情的音声とは、主に話者が怒りの感情を伴う状態で発せられる音声であり、たとえば怒鳴り声や平常の会話時の音量と比べて大きな音量など、一定の音量以上の音声である。 The emotion determination unit 205 determines whether or not emotional speech (a disturbing speech) representing a specific emotion is included in the speech recognized from the user terminal speech (call content) transmitted from the call monitor unit 202. . The specific emotion is a state in which the feeling is elevated and is mainly accompanied by the emotion of the speaker's anger. Emotional voice is voice that is emitted mainly when the speaker is accompanied by an angry emotion. For example, the voice is higher than a certain volume such as a yelling voice or a volume that is larger than that during normal conversation.

感情判定部２０５は、ユーザ端末２１１の音声（通話内容）から認識される音声中に、感情的音声（不穏当音声）が含まれると判定した場合には、感情制御指示部２０６に判定結果を通知する。感情制御指示部２０６は、感情判定部２０５より通知された感情的音声をオペレータ端末２１２に直接伝達しないように制御する指示を行なう。感情的音声の制御方法は、たとえば、感情的音声の音量をあらかじめ定められた音量以下に下げたり、感情的音声を他の音で打ち消したり（マスキング）、他の音声で代替するなどの処理により行なう。なお、音量を下げることには、無音にすることも含まれる。 If the emotion determination unit 205 determines that emotional voice (disturbance voice) is included in the voice recognized from the voice (call contents) of the user terminal 211, the emotion determination instruction unit 206 notifies the emotion control instruction unit 206 of the determination result. To do. The emotion control instruction unit 206 gives an instruction to control so that the emotional voice notified from the emotion determination unit 205 is not directly transmitted to the operator terminal 212. The control method of emotional voice is, for example, by reducing the volume of emotional voice below a predetermined volume, canceling emotional voice with other sounds (masking), or substituting with other voices. Do. Note that lowering the volume includes silence.

語彙判定部２０７は、通話モニタ部２０２より伝達されたユーザ端末２１１の音声（通話内容）に特定語彙（不穏当音声）が含まれるか否か判定する。特定語彙とは、オペレータが聴取した際に不快に感じる語彙であり、例えば相手を誹謗中傷する言葉や脅迫的な言葉など、否定的な意味の言葉を含む語彙である。 The vocabulary determination unit 207 determines whether or not the specific vocabulary (disturbance speech) is included in the voice (call contents) of the user terminal 211 transmitted from the call monitor unit 202. The specific vocabulary is a vocabulary that feels uncomfortable when the operator listens to it, and includes vocabulary with negative meanings such as words that slander or threaten the other party.

特定語彙は特定語彙記憶部２０９にあらかじめ格納されている。また、特定語彙は本実施形態が利用されるコールセンタの業種や、コールセンタが対象とする商品またはサービスに応じて、ユーザ端末２１１の音声に含まれる特定語彙を抽出して適宜追加され、更新される。特定語彙記憶部２０９は、特定語彙として不穏当音声を格納するだけでなく、コールセンタが対応する特定分野の業務において多用される専門用語やコールセンタの業務において一般的に使用される語彙（一般語彙）を記憶してもよい。語彙判定部２０７は、ユーザ端末２１１の音声中に、特定語彙記憶部２０９に格納された専門用語や一般語彙以外の語彙が含まれると判定すると、この判定した語彙を特定語彙に該当するものとして語彙制御指示部２０８に伝達してもよい。 The specific vocabulary is stored in advance in the specific vocabulary storage unit 209. In addition, the specific vocabulary is extracted and added as appropriate by extracting the specific vocabulary included in the voice of the user terminal 211 according to the type of business of the call center in which the present embodiment is used or the product or service targeted by the call center. . The specific vocabulary storage unit 209 not only stores disturbing speech as a specific vocabulary, but also includes vocabulary (general vocabulary) commonly used in call center operations and technical terms frequently used in specific field operations supported by the call center. You may remember. If the vocabulary determination unit 207 determines that the vocabulary other than the technical terms and the general vocabulary stored in the specific vocabulary storage unit 209 is included in the voice of the user terminal 211, the determined vocabulary corresponds to the specific vocabulary. You may transmit to the vocabulary control instruction | indication part 208. FIG.

語彙判定部２０７は、ユーザ端末２１１からの音声（通話内容）に特定語彙記憶部２０９に記憶されている特定語彙が含まれると判定した場合には、判定結果を語彙制御指示部２０８に通知する。語彙制御指示部２０８は、語彙判定部２０７より通知された特定語彙をオペレータ端末２１２に直接伝達しないように制御する指示を行なう。特定語彙の制御方法は、たとえば、特定語彙を含む音声を消去したり（ミュート）、特定語彙の音声を他の音で打ち消したり（マスキング）、または特定語彙を他の語彙に代替したりするなどの処理により行なう。 When the vocabulary determination unit 207 determines that the specific vocabulary stored in the specific vocabulary storage unit 209 is included in the voice (call contents) from the user terminal 211, the vocabulary determination unit 207 notifies the vocabulary control instruction unit 208 of the determination result. . The vocabulary control instruction unit 208 gives an instruction to control not to directly transmit the specific vocabulary notified from the vocabulary determination unit 207 to the operator terminal 212. Specific vocabulary control methods include, for example, erasing the voice that includes the specific vocabulary (mute), canceling the voice of the specific vocabulary with another sound (masking), or replacing the specific vocabulary with another vocabulary, etc. This process is performed.

電話交換機２０１に含まれる音声生成部２０４は、感情制御指示部２０６および／または語彙制御指示部２０８による指示に基づいてユーザ端末２１１からの感情的音声および／または特定語彙（不穏当音声）を制御した制御音声を生成する。 The voice generation unit 204 included in the telephone exchange 201 controlled emotional voice and / or specific vocabulary (unspeakable voice) from the user terminal 211 based on instructions from the emotion control instruction unit 206 and / or the vocabulary control instruction unit 208. Generate control audio.

はじめに、感情制御指示部２０６による感情的音声の制御指示に対応した、制御音声の生成について説明する。音声生成部２０４は、感情的音声の音量をあらかじめ定められた音量以下に下げるか、無音にする。これにより、オペレータ端末２１２で対応するオペレータが不快に感じない音量として聴取することができる。また、音声生成部２０４は、感情的音声を他の音で打ち消す（マスキングする）。これにより、ユーザ端末２１１からの感情的音声は、オペレータ端末２１２において他の音に打ち消されるか、聴取しがたい音声として伝えられるため、感情的音声を聴取されることはなく、オペレータが不快に感じることはない。さらに、音声生成部２０４は、感情的音声を他の音声で代替する。これにより、ユーザ端末２１１からの感情的音声は、オペレータ端末２１２に直接伝達されることはなく、オペレータの不快感を減殺することができる。 First, generation of control speech corresponding to an emotional speech control instruction by the emotion control instruction unit 206 will be described. The voice generation unit 204 lowers the volume of emotional voice to a predetermined volume or silence. As a result, it is possible to listen to the volume at which the operator corresponding to the operator terminal 212 does not feel uncomfortable. In addition, the voice generation unit 204 cancels (masks) the emotional voice with another sound. As a result, the emotional voice from the user terminal 211 is canceled by another sound at the operator terminal 212 or transmitted as a voice that is difficult to hear, so that the emotional voice is not heard and the operator becomes uncomfortable. I don't feel it. Further, the voice generation unit 204 substitutes other voices for emotional voices. Thereby, the emotional voice from the user terminal 211 is not directly transmitted to the operator terminal 212, and an operator's discomfort can be reduced.

次に、語彙制御指示部２０８による特定語彙の制御指示に対応した、制御音声の生成について説明する。音声生成部２０４は、ユーザ端末２１１からの特定語彙の音声を消去する。また、音声生成部２０４は、特定語彙の音声を他の音で打ち消す（マスキング）。これにより、オペレータ端末２１２で対応するオペレータが不快に感じる音声を伝達することを回避できる。さらに、音声生成部２０４は、特定語彙の音声を他の語彙に代替する。これにより、対応するオペレータが不快に感じることを回避できる。 Next, generation of control speech corresponding to a specific vocabulary control instruction by the vocabulary control instruction unit 208 will be described. The voice generation unit 204 deletes the voice of the specific vocabulary from the user terminal 211. The voice generation unit 204 cancels the voice of the specific vocabulary with other sounds (masking). Thereby, it can be avoided that the operator feels uncomfortable voice at the operator terminal 212. Further, the voice generation unit 204 substitutes the voice of the specific vocabulary with another vocabulary. Thereby, it can be avoided that the corresponding operator feels uncomfortable.

たとえば、ユーザ端末２１１からの音声に「〜をしろ。」、「〜をやれ。」のような命令調の音声が含まれる場合には、音声生成部２０４は、「〜をして下さい。」のように丁寧調に変換した音声を生成する。 For example, when the voice from the user terminal 211 includes voices of command tone such as “Please do.” And “Do it.”, The voice generation unit 204 “please do”. The sound converted into polite tone like this is generated.

音声生成部２０４は、代替音声や打ち消し音を記憶する記憶部（不図示）を有する。そして、記憶部に記憶された代替音声や打ち消し音は適宜更新される。 The voice generation unit 204 includes a storage unit (not shown) that stores alternative voices and cancellation sounds. And the alternative sound and the cancellation sound memorize | stored in the memory | storage part are updated suitably.

バッファリング部２１０は、音声生成部２０４において、感情制御指示部２０６および／または語彙制御指示部２０８による指示に基づいて音声が生成される際に、ユーザ端末２１１とオペレータ端末２１２との会話に支障が生じないように音声情報を一時保存する。バッファリング部２１０は、このバッファリングを通話内容の解析が十分可能な時間内で行なう。 When the voice generation unit 204 generates a voice based on an instruction from the emotion control instruction unit 206 and / or the vocabulary control instruction unit 208, the buffering unit 210 interferes with the conversation between the user terminal 211 and the operator terminal 212. The audio information is temporarily stored so that no occurs. The buffering unit 210 performs this buffering within a time during which the content of the call can be analyzed sufficiently.

電話交換機２０１は、音声生成部２０４において生成された音声（通話内容）を、ユーザ端末２１１からの音声としてオペレータ端末２１２に伝達する。以上の一連の動作により、ユーザ端末２１１とオペレータ端末２１２との通話が成立する。 The telephone exchange 201 transmits the voice (call content) generated by the voice generation unit 204 to the operator terminal 212 as voice from the user terminal 211. Through the series of operations described above, a call between the user terminal 211 and the operator terminal 212 is established.

図３は、本実施形態における電話交換機２０１と、通話モニタ部２０２と、不穏当音声検出部２０３（感情判定部２０５、感情制御指示部２０６、語彙判定部２０７、語彙制御指示部２０８）との動作手順を示すシーケンス図である。 FIG. 3 shows operations of the telephone switch 201, the call monitor unit 202, and the unrestful voice detection unit 203 (the emotion determination unit 205, the emotion control instruction unit 206, the vocabulary determination unit 207, and the vocabulary control instruction unit 208) in the present embodiment. It is a sequence diagram which shows a procedure.

ステップＳ３０１において、電話交換機２０１は、公衆電話網２２０より受信したユーザ端末２１１からの音声情報を通話モニタ部２０２に伝達する。ステップＳ３０３において、通話モニタ部２０２は、電話交換機２０１から伝達された音声情報を不穏当音声検出部２０３の感情判定部２０５と、語彙判定部２０７とに伝達する。ステップＳ３０５において、感情判定部２０５は、ユーザ端末２１１からの音声中に感情的音声が含まれていると判定すると、感情制御指示部２０６に判定結果を通知する。ステップＳ３０７において、感情制御指示部２０６は、感情的音声に対応する最適な音声制御指示を決定する。ステップＳ３０９において、感情制御指示部２０６は、決定した感情的音声の制御指示を音声生成部２０４に送信する。 In step S 301, the telephone exchange 201 transmits the voice information from the user terminal 211 received from the public telephone network 220 to the call monitor unit 202. In step S 303, the call monitor unit 202 transmits the voice information transmitted from the telephone exchange 201 to the emotion determination unit 205 of the unrestful voice detection unit 203 and the vocabulary determination unit 207. In step S 305, when the emotion determination unit 205 determines that emotional voice is included in the voice from the user terminal 211, the emotion determination unit 205 notifies the emotion control instruction unit 206 of the determination result. In step S307, the emotion control instruction unit 206 determines an optimal voice control instruction corresponding to emotional voice. In step S309, the emotion control instruction unit 206 transmits the determined emotional voice control instruction to the voice generation unit 204.

ステップＳ３１１において、語彙判定部２０７は、特定語彙記憶部２０９に記憶された特定語彙に該当する語彙がユーザ端末２１１からの音声中に含まれていると判定すると、語彙制御指示部２０８に判定結果を通知する。ステップＳ３１３において、語彙制御指示部２０８は、特定語彙に対応する最適な音声制御指示を決定する。ステップＳ３１５において、語彙制御指示部２０８は、決定した特定語彙の制御指示を音声生成部２０４に送信する。 In step S311, when the vocabulary determination unit 207 determines that the vocabulary corresponding to the specific vocabulary stored in the specific vocabulary storage unit 209 is included in the speech from the user terminal 211, the vocabulary control instruction unit 208 determines the determination result. To be notified. In step S313, the vocabulary control instruction unit 208 determines an optimal voice control instruction corresponding to the specific vocabulary. In step S 315, the vocabulary control instruction unit 208 transmits a control instruction for the determined specific vocabulary to the voice generation unit 204.

ステップＳ３１７において、電話交換機２０１の音声生成部２０４は、感情制御指示部２０６および／または語彙制御指示部２０８からの制御指示を受け付けると、制御指示にしたがった制御音声を生成する。ステップＳ３１９において、電話交換機２０１のバッファリング部２１０は、音声生成部２０４において生成された音声がオペレータ端末２１２に伝達されるまで、ユーザ端末２１１とオペレータ端末２１２との会話に支障が生じないように音声情報を一時保存する。ステップＳ３２１において、電話交換機２０１は、音声生成部２０４において生成された音声をオペレータ端末２１２に送信する。 In step S317, when the voice generation unit 204 of the telephone exchange 201 receives a control instruction from the emotion control instruction unit 206 and / or the vocabulary control instruction unit 208, the voice generation unit 204 generates a control voice according to the control instruction. In step S319, the buffering unit 210 of the telephone exchange 201 does not hinder the conversation between the user terminal 211 and the operator terminal 212 until the voice generated by the voice generation unit 204 is transmitted to the operator terminal 212. Save audio information temporarily. In step S 321, the telephone exchange 201 transmits the voice generated by the voice generation unit 204 to the operator terminal 212.

図４は、本実施形態における音声制御システム２００の処理手順を示すフローチャートである。 FIG. 4 is a flowchart showing a processing procedure of the voice control system 200 in the present embodiment.

ステップＳ４０１において、電話交換機２０１は、ユーザ端末２１１から音声情報を受け付ける。ステップＳ４０３において、通話モニタ部２０２は、電話交換機２０１において受け付けられたユーザ端末２１１からの音声情報を取得する。ステップＳ４０５において、通話モニタ部２０２は、電話交換機２０１から取得した音声を不穏当音声検出部２０３の感情判定部２０５と語彙判定部２０７とに伝達する。 In step S 401, the telephone exchange 201 receives voice information from the user terminal 211. In step S 403, the call monitor unit 202 acquires voice information from the user terminal 211 received by the telephone exchange 201. In step S 405, the call monitor unit 202 transmits the voice acquired from the telephone exchange 201 to the emotion determination unit 205 and the vocabulary determination unit 207 of the unrest sound detection unit 203.

ステップＳ４０７において、感情判定部２０５は、音声に感情的音声が含まれるか否かの判定を行なう。感情的音声が含まれないと判定した場合には、ステップＳ４０３の処理に戻る。一方、感情的音声が含まれると判定した場合には、ステップＳ４１１の処理に進む。また、ステップＳ４０９において、語彙判定部２０７は音声情報中に特定語彙が含まれるか否かの判定を行ない、特定語彙が含まれると判定した場合には、ステップＳ４１１の処理に進む。 In step S407, the emotion determination unit 205 determines whether or not emotional voice is included in the voice. If it is determined that the emotional voice is not included, the process returns to step S403. On the other hand, if it is determined that emotional voice is included, the process proceeds to step S411. In step S409, the vocabulary determination unit 207 determines whether or not the specific vocabulary is included in the voice information. If it is determined that the specific vocabulary is included, the process proceeds to step S411.

ステップＳ４１１において、感情制御指示部２０６は、感情判定部２０５において判定された感情的音声を制御するのに最適な手段を選択して、音声生成部２０４に制御指示を送信する。また、語彙制御指示部２０８は、語彙判定部２０７において判定された特定語彙を制御するのに最適な手段を選択して、音声生成部２０４に制御指示を送信する。ステップＳ４１３において、音声生成部２０４は、受け付けた感情的音声の制御指示および／または特定語彙の制御指示に基づいて、音声情報を制御して音声を生成する。ステップＳ４１５において、電話交換機２０１は、音声生成部２０４において生成された音声をオペレータ端末２１２に送信する。 In step S 411, the emotion control instruction unit 206 selects an optimum means for controlling the emotional voice determined by the emotion determination unit 205 and transmits a control instruction to the voice generation unit 204. Further, the vocabulary control instruction unit 208 selects an optimum means for controlling the specific vocabulary determined by the vocabulary determination unit 207 and transmits a control instruction to the voice generation unit 204. In step S413, the speech generation unit 204 generates speech by controlling speech information based on the received emotional speech control instruction and / or specific vocabulary control instruction. In step S 415, the telephone exchange 201 transmits the voice generated by the voice generation unit 204 to the operator terminal 212.

不穏当音声検出部２０３の内部構成について、図５を用いて説明する。不穏当音声検出部２０３は、ＣＰＵ５１０、ＲＯＭ５２０、通信制御部５３０、ＲＡＭ５４０、およびストレージ５５０を備えている。ＣＰＵ５１０は中央処理部であって、様々なプログラムを実行することにより不穏当音声検出部２０３全体を制御する。ＲＯＭ５２０は、リードオンリメモリであり、ＣＰＵ５１０が最初に実行すべきブートプログラムの他、各種パラメータ等を記憶している。また、ＲＡＭ５４０は、ランダムアクセスメモリである。ＲＡＭ５４０は、伝達された音声情報５４１と、感情的音声５４２と、特定語彙５４３と、感情制御指示５４４と、語彙制御指示５４５とを有する。 The internal configuration of the disturbing sound detection unit 203 will be described with reference to FIG. The disturbing sound detection unit 203 includes a CPU 510, a ROM 520, a communication control unit 530, a RAM 540, and a storage 550. The CPU 510 is a central processing unit, and controls the entire disturbing sound detection unit 203 by executing various programs. The ROM 520 is a read-only memory, and stores various parameters and the like in addition to a boot program to be executed first by the CPU 510. The RAM 540 is a random access memory. The RAM 540 includes transmitted voice information 541, emotional voice 542, specific vocabulary 543, emotion control instruction 544, and vocabulary control instruction 545.

一方、ストレージ５５０は、感情的音声判定プログラム５５１と、特定語彙判定プログラム５５２と、感情制御指示プログラム５５３と、語彙制御指示プログラム５５４とを有する。また、通信制御部５３０は、他の端末とのネットワークを介した通信を制御する。 On the other hand, the storage 550 includes an emotional speech determination program 551, a specific vocabulary determination program 552, an emotion control instruction program 553, and a vocabulary control instruction program 554. In addition, the communication control unit 530 controls communication with other terminals via a network.

伝達された音声情報５４１は、通話モニタ部２０２から不穏当音声検出２０３に伝達された、ユーザ端末２１１からの音声である。感情的音声５４２は、感情判定部２０５に伝達されたユーザ端末２１１からの音声に含まれた感情的音声である。特定語彙５４３は、語彙判定部２０７に伝達されたユーザ端末２１１からの音声に含まれる特定語彙である。 The transmitted voice information 541 is a voice from the user terminal 211 transmitted from the call monitoring unit 202 to the unrestful voice detection 203. The emotional voice 542 is an emotional voice included in the voice from the user terminal 211 transmitted to the emotion determination unit 205. The specific vocabulary 543 is a specific vocabulary included in the voice from the user terminal 211 transmitted to the vocabulary determination unit 207.

感情制御指示５４４は、感情判定部２０５において感情的音声と判定された音声を制御するために、感情制御指示部２０６において選択された最適な制御指示である。語彙制御指示５４５は、語彙判定部２０７において特定語彙と判定された音声を制御するために、語彙制御指示部２０８において選択された最適な制御指示である。 The emotion control instruction 544 is an optimal control instruction selected by the emotion control instruction unit 206 in order to control the sound determined as emotional sound by the emotion determination unit 205. The vocabulary control instruction 545 is an optimal control instruction selected by the vocabulary control instruction unit 208 in order to control the speech determined as the specific vocabulary by the vocabulary determination unit 207.

感情的音声判定プログラム５５１は、ユーザ端末２１１からの音声中に感情的音声が含まれるか否かの判定を行なう。感情的音声であるか否かの判定は、音量があらかじめ定められた音量以上に達するか否か、またはユーザ端末２１１からの音声の話速があらかじめ定められたピッチ以上に達するか否かにより判定する。 The emotional voice determination program 551 determines whether or not emotional voice is included in the voice from the user terminal 211. Whether or not the voice is emotional is determined based on whether or not the volume reaches a predetermined volume or higher, or whether or not the speech speed of the voice from the user terminal 211 reaches or exceeds a predetermined pitch. To do.

特定語彙判定プログラム５５２は、ユーザ端末２１１からの音声情報中に特定語彙が含まれるか否かの判定を行なう。特定語彙であるか否かの判定は、特定語彙記憶部２０９にあらかじめ記憶された特定語彙に該当するか否かにより判定する。 The specific vocabulary determination program 552 determines whether or not the specific vocabulary is included in the audio information from the user terminal 211. Whether or not a specific vocabulary is determined is determined by whether or not it corresponds to a specific vocabulary stored in the specific vocabulary storage unit 209 in advance.

感情制御指示プログラム５５３は、感情的音声判定プログラム５５１により感情的音声であると判定された音声を制御するために最適な制御指示として、音量の消去や他の音声で代替を選択する。 The emotion control instruction program 553 selects an alternative by erasing the volume or using another sound as an optimal control instruction for controlling the sound determined to be emotional sound by the emotional sound determination program 551.

語彙制御指示プログラム５５４は、特定語彙判定プログラム５５２により特定語彙であると判定された音声を制御するために最適な制御指示として、音量の消去や他の音で特定語彙のマスキングを選択する。 The vocabulary control instruction program 554 selects volume erasure or masking of the specific vocabulary with another sound as an optimal control instruction for controlling the speech determined to be the specific vocabulary by the specific vocabulary determination program 552.

以上の構成および動作により、本実施形態における音声制御システムは、ユーザの通話を監視して感情的音声や特定語彙をオペレータに伝達する前に一律に変換処理を行うため、通話内容に何ら影響を生じさせることなく通話内容を制御することが可能である。 With the above configuration and operation, the voice control system according to the present embodiment monitors the user's call and uniformly converts the emotional voice and specific vocabulary before transmitting it to the operator. It is possible to control the content of a call without causing it.

［第３実施形態］
次に本発明の第３実施形態に係る６００について、図６を用いて説明する。図６は、本実施形態に係る音声制御システム６００の構成を説明するための図である。本実施形態に係る音声制御システム６００は、上記第２実施形態と比べると、音声生成部６０４が独立した構成である点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Third Embodiment]
Next, 600 according to the third embodiment of the present invention will be described with reference to FIG. FIG. 6 is a diagram for explaining the configuration of the voice control system 600 according to the present embodiment. The voice control system 600 according to the present embodiment is different from the second embodiment in that the voice generation unit 604 has an independent configuration. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

音声生成部６０４は、感情制御指示部２０６および語彙制御指示部２０８からの制御指示に基づいてユーザ端末２１１の音声を制御するための制御音声を生成する。音声生成部６０４は、生成した音声を電話交換機６０１に送信すると、バッファリング部２１０を介してオペレータ端末２１２に伝達される。 The voice generation unit 604 generates a control voice for controlling the voice of the user terminal 211 based on control instructions from the emotion control instruction unit 206 and the vocabulary control instruction unit 208. When the voice generation unit 604 transmits the generated voice to the telephone exchange 601, the voice generation unit 604 is transmitted to the operator terminal 212 via the buffering unit 210.

以上の構成および動作により、本実施形態における音声制御システムは、ユーザの通話を監視して感情的音声や特定語彙をオペレータに伝達する前に一律に変換処理を行うため、通話内容に何ら影響を生じさせることなく通話内容を制御することが可能である。
［第４実施形態］
次に本発明の第４実施形態に係る７００について、図７を用いて説明する。図７は、本実施形態に係る音声制御システム７００の構成を説明するための図である。本実施形態に係る音声制御システム７００は、上記第２実施形態と比べると、不穏当音声検出部７０３に音声生成部７０４を有する点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 With the above configuration and operation, the voice control system according to the present embodiment monitors the user's call and uniformly converts the emotional voice and specific vocabulary before transmitting it to the operator. It is possible to control the content of a call without causing it.
[Fourth Embodiment]
Next, 700 according to the fourth embodiment of the present invention will be described with reference to FIG. FIG. 7 is a diagram for explaining the configuration of the voice control system 700 according to the present embodiment. The voice control system 700 according to the present embodiment is different from the second embodiment in that the disturbing voice detection unit 703 includes a voice generation unit 704. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

不穏当音声検出部７０３の感情制御指示部２０６および語彙制御指示部２０８は、制御指示を音声生成部７０４に伝達する。音声生成部７０４は、各制御指示に基づいてユーザ端末２１１の音声を制御するための制御音声を生成する。音声生成部７０４は、生成した音声を電話交換機７０１に送信する。バッファリング部２１０は、音声生成部７０４から生成された音声が電話交換機７０１に送信されるまで、ユーザ端末２１１とオペレータ端末２１２との会話に支障が生じないように音声情報を一時保存する。 The emotion control instruction unit 206 and the vocabulary control instruction unit 208 of the unrest sound detection unit 703 transmit the control instruction to the sound generation unit 704. The sound generation unit 704 generates control sound for controlling the sound of the user terminal 211 based on each control instruction. The voice generation unit 704 transmits the generated voice to the telephone exchange 701. The buffering unit 210 temporarily stores the voice information so that the conversation between the user terminal 211 and the operator terminal 212 does not hinder until the voice generated from the voice generation unit 704 is transmitted to the telephone exchange 701.

［第５実施形態］
次に本発明の第５実施形態に係る８００について、図８を用いて説明する。図８は、本実施形態に係る音声制御システム８００の構成を説明するための図である。本実施形態に係る音声制御システム８００は、上記第２実施形態と比べると、通話モニタ部２０２を有さず、電話交換機２０１から不穏当音声検出部８０３にユーザ端末２１１の音声が直接伝達される点で異なる。その他の構成および動作は、第２実施形態と同様であるため、同じ構成および動作については同じ符号を付してその詳しい説明を省略する。 [Fifth Embodiment]
Next, 800 according to the fifth embodiment of the present invention will be described with reference to FIG. FIG. 8 is a diagram for explaining the configuration of the voice control system 800 according to the present embodiment. Compared with the second embodiment, the voice control system 800 according to the present embodiment does not have the call monitoring unit 202, and the voice of the user terminal 211 is directly transmitted from the telephone switch 201 to the disturbing voice detection unit 803. It is different. Since other configurations and operations are the same as those of the second embodiment, the same configurations and operations are denoted by the same reference numerals, and detailed description thereof is omitted.

不穏当音声検出部８０３は、電話交換機２０１からユーザ端末２１１の音声情報を直接取得する。不穏当音声検出部８０３は、取得した音声情報を記憶する記憶部（不図示）を有してもよい。 The disturbing voice detection unit 803 directly acquires the voice information of the user terminal 211 from the telephone switch 201. The disturbing voice detection unit 803 may include a storage unit (not shown) that stores the acquired voice information.

［他の実施形態］
以上、本発明の実施形態について詳述したが、それぞれの実施形態に含まれる別々の特徴を如何様に組み合わせたシステムまたは装置も、本発明の範疇に含まれる。 [Other Embodiments]
As mentioned above, although embodiment of this invention was explained in full detail, the system or apparatus which combined the separate characteristic contained in each embodiment how was included in the category of this invention.

また、本発明は、複数の機器から構成されるシステムに適用されてもよいし、単体の装置に適用されてもよい。さらに、本発明は、実施形態の機能を実現する情報処理プログラムが、システムあるいは装置に直接あるいは遠隔から供給される場合にも適用可能である。したがって、本発明の機能をコンピュータで実現するために、コンピュータにインストールされるプログラム、あるいはそのプログラムを格納した媒体、そのプログラムをダウンロードさせるＷＷＷ(World Wide Web)サーバも、本発明の範疇に含まれる。 In addition, the present invention may be applied to a system composed of a plurality of devices, or may be applied to a single device. Furthermore, the present invention can also be applied to a case where an information processing program that implements the functions of the embodiments is supplied directly or remotely to a system or apparatus. Therefore, in order to realize the functions of the present invention on a computer, a program installed in the computer, a medium storing the program, and a WWW (World Wide Web) server that downloads the program are also included in the scope of the present invention. .

［実施形態の他の表現］
上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換手段と、
前記電話交換手段により確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタ手段と、
前記通話モニタ手段により取得された前記音声情報中に認識される前記ユーザ端末の音声から、あらかじめ定められた特定の感情を表わす感情的音声または否定的な語彙を含む特定語彙の音声のうちいずれかの不穏当音声を検出する不穏当音声検出手段と、
前記不穏当音声検出手段により検出された前記不穏当音声を制御した制御音声を生成する音声生成手段と、
を備えることを特徴とする音声制御システム。
（付記２）
前記不穏当音声検出手段は、
前記感情的音声が前記ユーザ端末からの前記音声に含まれるか否か判定する感情判定手段と、
判定された前記感情的音声を前記オペレータ端末に直接伝達しないように制御することを指示する感情制御指示手段と、
前記特定語彙が前記ユーザ端末からの前記音声に含まれるか否か判定する語彙判定手段と、
判定された前記特定語彙を前記オペレータ端末に直接伝達しないように制御することを指示する語彙制御指示手段と、
を備えることを特徴とする付記１に記載の音声制御システム。
（付記３）
前記感情制御指示手段は、前記ユーザ端末からの前記感情的音声が前記特定の感情を表わす場合には前記音声の音量をあらかじめ定められた音量以下に下げ、または前記音声を他の音を用いてマスキングし、他の音声で代替することにより、前記感情的音声を前記オペレータ端末に直接伝達することを回避する指示を出すことを特徴とする請求項２に記載の音声制御システム。
（付記４）
前記語彙制御指示手段は、前記ユーザ端末からの前記音声に前記特定語彙が含まれる場合には前記特定語彙の音声を消去しまたは他の音を用いてマスキングし、前記特定語彙を他の語彙に代替することにより、前記特定語彙を前記オペレータ端末に直接伝達することを回避する指示を出すことを特徴とする請求項２または３に記載の音声制御システム。
（付記５）
前記特定語彙を記憶する特定語彙記憶手段をさらに有することを特徴とする請求項１ないし４に記載の音声制御システム。
（付記６）
前記特定語彙記憶手段は、特定分野において多用される用語や語彙を特定語彙としてあらかじめ記憶することを特徴とする請求項５に記載の音声制御システム。
（付記７）
前記特定語彙記憶手段に記憶される前記特定語彙は、前記感情判定部または前記語彙判定部のいずれかに伝達された前記音声情報から抽出されて更新されることを特徴とする請求項５または６に記載の音声制御システム。
（付記８）
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換手段により確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタ手段と、
前記通話モニタ手段により取得された前記音声情報から認識される前記ユーザ端末の音声から、あらかじめ定められた特定の感情を表わす感情的音声または否定的な語彙を含む特定語彙の音声のうちいずれかの不穏当音声を検出する不穏当音声検出手段と、
前記不穏当音声検出手段により検出された前記不穏当音声を制御した制御音声を生成する音声生成手段と、
を備えることを特徴とする音声制御装置。
（付記９）
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換手段により確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタステップと、
前記通話モニタステップにより取得された前記音声情報から認識される前記ユーザ端末の音声から、あらかじめ定められた特定の感情を表わす感情的音声または否定的な語彙を含む特定語彙の音声のうちいずれかの不穏当音声を検出する不穏当音声検出ステップと、
前記不穏当音声検出ステップにより検出された前記不穏当音声を制御した制御音声を生成する音声生成ステップと、
を備えることを特徴とする音声制御方法。
（付記１０）
ユーザ端末とオペレータ端末との通話を公衆電話網を介して確立させる電話交換手段により確立された前記通話をモニタして前記ユーザ端末と前記オペレータ端末との通話内容を音声情報として取得する通話モニタステップと、
前記通話モニタステップにより取得された前記音声情報から認識される前記ユーザ端末の音声から、あらかじめ定められた特定の感情を表わす感情的音声または否定的な語彙を含む特定語彙の音声のいずれかの不穏当音声を検出する不穏当音声検出ステップと、
前記不穏当音声検出ステップにより検出された前記不穏当音声を制御した制御音声を生成する音声生成ステップと、
をコンピュータに実行させることを特徴とする音声制御プログラム。 [Other expressions of embodiment]
A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.
(Appendix 1)
Telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network;
Call monitoring means for monitoring the call established by the telephone exchange means and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
From the voice of the user terminal recognized in the voice information acquired by the call monitoring means, either emotional voice representing a predetermined specific emotion or voice of a specific vocabulary including a negative vocabulary A disturbing sound detecting means for detecting a disturbing sound of
A sound generation means for generating a control sound that controls the unrest sound detected by the unrest sound detection means;
A voice control system comprising:
(Appendix 2)
The disturbing sound detection means includes:
Emotion judging means for judging whether or not the emotional voice is included in the voice from the user terminal;
Emotion control instruction means for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
Vocabulary determining means for determining whether or not the specific vocabulary is included in the voice from the user terminal;
Vocabulary control instruction means for instructing control so as not to directly transmit the determined specific vocabulary to the operator terminal;
The voice control system according to supplementary note 1, comprising:
(Appendix 3)
When the emotional voice from the user terminal represents the specific emotion, the emotion control instruction means lowers the volume of the voice below a predetermined volume, or uses the other voice for the voice 3. The voice control system according to claim 2, wherein an instruction for avoiding direct transmission of the emotional voice to the operator terminal is issued by masking and substituting with another voice.
(Appendix 4)
The vocabulary control instruction means deletes the voice of the specific vocabulary or masks it using another sound when the voice from the user terminal includes the specific vocabulary, and converts the specific vocabulary to another vocabulary. 4. The voice control system according to claim 2, wherein an instruction for avoiding direct transmission of the specific vocabulary to the operator terminal is issued by substituting.
(Appendix 5)
5. The voice control system according to claim 1, further comprising specific vocabulary storage means for storing the specific vocabulary.
(Appendix 6)
6. The voice control system according to claim 5, wherein the specific vocabulary storage means stores in advance, as a specific vocabulary, a term or vocabulary frequently used in a specific field.
(Appendix 7)
7. The specific vocabulary stored in the specific vocabulary storage means is extracted and updated from the speech information transmitted to either the emotion determination unit or the vocabulary determination unit. The voice control system described in 1.
(Appendix 8)
Call monitoring means for monitoring the call established by a telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network and acquiring the contents of the call between the user terminal and the operator terminal as voice information When,
From the voice of the user terminal recognized from the voice information acquired by the call monitoring means, either emotional voice representing a predetermined specific emotion or voice of a specific vocabulary including a negative vocabulary A disturbing sound detection means for detecting disturbing sound,
A sound generation means for generating a control sound that controls the unrest sound detected by the unrest sound detection means;
A voice control device comprising:
(Appendix 9)
A call monitoring step of monitoring the call established by telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network and acquiring the contents of the call between the user terminal and the operator terminal as voice information When,
From the voice of the user terminal recognized from the voice information acquired by the call monitoring step, either emotional voice representing a predetermined specific emotion or voice of a specific vocabulary including a negative vocabulary A disturbing sound detection step for detecting disturbing sound;
A sound generation step of generating a control sound that controls the unrest sound detected by the unrest sound detection step;
A voice control method comprising:
(Appendix 10)
A call monitoring step of monitoring the call established by telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network and acquiring the contents of the call between the user terminal and the operator terminal as voice information When,
From the voice of the user terminal recognized from the voice information acquired by the call monitoring step, either the emotional voice representing a predetermined specific emotion or the voice of a specific vocabulary including a negative vocabulary A disturbing voice detection step for detecting voice;
A sound generation step of generating a control sound that controls the unrest sound detected by the unrest sound detection step;
A voice control program for causing a computer to execute the program.

Claims

Telephone exchange means for establishing a call between a user terminal and an operator terminal via a public telephone network;
Call monitoring means for monitoring the call established by the telephone exchange means and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
In said voice information acquired by the call monitoring means, and determining whether the emotion determination means includes emotional speech representing a particular emotion predetermined,
Emotion control instruction means for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
Vocabulary determining means for determining whether or not a specific vocabulary as a negative vocabulary is included in the audio information;
Vocabulary control instruction means for instructing control not to directly transmit the specific vocabulary determined to be included in the voice information to the operator terminal;
Voice generating means for generating control voice that controls the emotional voice and the specific vocabulary ;
Specific vocabulary storage means for storing the specific vocabulary;
With
The voice control system, wherein the specific vocabulary stored in the specific vocabulary storage means is extracted and updated from the voice information transmitted to either the emotion judgment means or the vocabulary judgment means .

The emotion control instruction means lowers the volume of the voice of the call to a predetermined volume or less when the emotional voice is included in the voice information , or masks the voice using another sound. The voice control system according to claim 1 , wherein an instruction to avoid transmitting the emotional voice directly to the operator terminal is issued by substituting with another voice.

The vocabulary control instruction means erases the voice of the specific vocabulary or masks it using another sound when the specific vocabulary is included in the voice information, and substitutes the specific vocabulary with another vocabulary. the voice control system according to claim 1 or 2, characterized in that instructs to avoid directly transmitting the specific vocabulary to the operator terminal.

A telephone exchange step for establishing a call between a user terminal and an operator terminal via a public telephone network;
A call monitoring step of monitoring the call established by the telephone exchange step and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
An emotion determination step for determining whether or not emotional voice representing a predetermined specific emotion is included in the voice information;
An emotion control instruction step for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
A vocabulary determination step of determining whether or not a specific vocabulary as a negative vocabulary stored in the specific vocabulary storage means is included in the speech information;
A vocabulary control instruction step for instructing control so as not to directly transmit the determined specific vocabulary to the operator terminal;
A voice generation step of generating a control voice in which the emotional voice and the specific vocabulary are controlled;
An update step for extracting and updating the specific vocabulary stored in the specific vocabulary storage means from the speech information transmitted in either the emotion determination step or the vocabulary determination step;
A voice control method comprising:

A telephone exchange step for establishing a call between a user terminal and an operator terminal via a public telephone network;
A call monitoring step of monitoring the call established by the telephone exchange step and acquiring the contents of the call between the user terminal and the operator terminal as voice information;
An emotion determination step for determining whether or not emotional voice representing a predetermined specific emotion is included in the voice information;
An emotion control instruction step for instructing control so as not to directly transmit the determined emotional voice to the operator terminal;
A vocabulary determination step of determining whether or not a specific vocabulary as a negative vocabulary stored in the specific vocabulary storage means is included in the speech information;
A vocabulary control instruction step for instructing control so as not to directly transmit the determined specific vocabulary to the operator terminal;
A voice generation step of generating a control voice in which the emotional voice and the specific vocabulary are controlled;
An update step for extracting and updating the specific vocabulary stored in the specific vocabulary storage means from the speech information transmitted in either the emotion determination step or the vocabulary determination step;
A voice control program for causing a computer to execute the program.