JP4852584B2

JP4852584B2 - Prohibited word transmission prevention method, prohibited word transmission prevention telephone, prohibited word transmission prevention server

Info

Publication number: JP4852584B2
Application number: JP2008273173A
Authority: JP
Inventors: 祐宮崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2008-10-23
Filing date: 2008-10-23
Publication date: 2012-01-11
Anticipated expiration: 2028-10-23
Also published as: JP2010103751A

Description

本発明は、ユーザの発した不用意な発言を相手先に発信することを防止する方法、当該機能を有する電話、及び当該機能を有するサーバに関する。 The present invention relates to a method for preventing an inadvertent statement made by a user from being transmitted to a destination, a telephone having the function, and a server having the function.

近年、通信回線を用いた通話は、携帯電話の普及に伴い子供から老人まで幅広い年齢のユーザに用いられるようになった。ここで、年少の子供が電話を用いると、うっかり自宅の住所や電話番号を発してしまう場合があり、このような場合には意図せずにユーザの個人情報が流出してしまう。また、通話中に感情が昂ぶったときには、相手方を不快にさせる発言をしてしまう場合があり、このような場合には相手との人間関係をこじらせる原因となってしまうおそれがあった。そのため、通話の際に使用が禁止されている語句を適切に管理する技術が求められている。 In recent years, calls using communication lines have come to be used by users of a wide range of age, from children to the elderly, with the spread of mobile phones. Here, when a young child uses a telephone, his / her home address or telephone number may be inadvertently issued. In such a case, the user's personal information leaks unintentionally. In addition, when emotions are struck during a call, there are cases where the other party makes a statement that makes the other party uncomfortable. Therefore, there is a demand for a technique for appropriately managing words that are prohibited from being used during a call.

このような現実のもと、特許文献１には、コールセンタにおいて使用が禁止されている語句を発言した場合に、この語句を顕著化し確認作業の効率を向上させる技術が開示されている。
特開２００７−６８０４４号公報 Under such circumstances, Japanese Patent Application Laid-Open No. 2004-151867 discloses a technique for making a word conspicuous and improving the efficiency of confirmation work when a word prohibited from being used in a call center is spoken.
JP 2007-68044 A

しかしながら、上記のような技術は、禁止された語句を発していたか否かを、業務終了後の確認作業において容易にチェックすることを目的とするものであり、禁止された語句を発すること自体を防止することはできない。 However, the technology as described above is intended to easily check whether or not a prohibited word has been issued in the confirmation work after the end of the work, and issuing the prohibited word or phrase itself. It cannot be prevented.

本発明は、このような問題に鑑みて提案されたものであり、その目的は、ユーザの発した不用意な発言の発信をリアルタイムに防止する方法及び電話機を提供することにある。 The present invention has been proposed in view of such problems, and an object of the present invention is to provide a method and a telephone that prevent inadvertent transmission of an utterance made by a user in real time.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）ユーザが発した不用意な発言についての音声の発信を防止する方法であって、
ユーザの発した音声を受け付け、一時的に蓄積する音声蓄積ステップと、
前記音声を音声認識し、単語レベルに分析する音声分析ステップと、
分析された単語レベルの音声が、予め定められた禁止語音素モデルと一致するか否かを判定する判定ステップと、
ユーザの発した音声のうち、前記禁止語音素モデルと一致すると判定された部分を特定する時間特定ステップと、
前記特定された部分の音声をダミー音に置換する置換ステップと、
置換した部分を含む音声を発信する発信ステップと、
を含むことを特徴とする方法。 (1) A method for preventing a voice from being transmitted about an inadvertent utterance made by a user,
An audio accumulation step for accepting and temporarily accumulating user-generated audio;
A voice analysis step of recognizing the voice and analyzing it to a word level;
A determination step of determining whether the analyzed word level speech matches a predetermined prohibited word phoneme model;
A time specifying step for specifying a portion determined to match the prohibited word phoneme model in the speech uttered by the user;
A replacing step of replacing the sound of the identified portion with a dummy sound;
A sending step for sending a voice including the replaced part;
A method comprising the steps of:

（１）記載の方法によれば、判定ステップは、一時的に蓄積された音声の認識結果である単語レベルの音声が、予め定められた禁止語音素モデルと一致するか否かを判定する。そして、一致すると判定した場合には、一時的に蓄積された音声は、時間特定ステップが特定した当該部分を、置換ステップがダミー音（例えば、「ピーッ」という音）に置換した上で、発信ステップにより発信される。これにより、ユーザが不用意な発言をした場合であっても、リアルタイムにこの不用意な発言をダミー音に置換することができ、不用意な発言の発信を防止することができる。 According to the method described in (1), in the determination step, it is determined whether or not the word level speech, which is the recognition result of the temporarily accumulated speech, matches a predetermined prohibited phoneme model. If it is determined that they match, the temporarily accumulated voice is transmitted after the replacement step replaces the portion specified by the time specification step with a dummy sound (for example, a beep). Sent by step. Thereby, even when the user makes an inadvertent utterance, the inadvertent utterance can be replaced with a dummy sound in real time, and an inadvertent utterance can be prevented from being transmitted.

（２）（１）記載の方法において、
分析された単語レベルの音声を言語モデルを用いてテキストに変換するテキスト変換ステップを更に含み、
前記判定ステップは、前記分析された単語レベルの音声が予め定められた禁止語音素モデルと一致するか否かに加え、変換されたテキストが予め定められた禁止語テキストと一致するか否かを判定し、
前記時間特定ステップは、前記判定ステップにより一致すると判定された部分を特定することを特徴とする方法。 (2) In the method according to (1),
A text conversion step of converting the analyzed word level speech into text using a language model;
In the determination step, in addition to whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model, whether or not the converted text matches a predetermined prohibited word text. Judgment,
The time specifying step specifies a portion determined to match by the determination step.

（２）記載の方法によれば、判定ステップは、音声を用いた判定のほか、言語モデルを用いて変換されたテキストを用いて、不用意な発言であるか否かを判定することができる。これにより、（１）記載の方法に比べてより精度を高めることができる。 According to the method described in (2), in the determination step, it is possible to determine whether it is an inadvertent utterance using text converted using a language model in addition to determination using speech. . Thereby, compared with the method of (1) description, a precision can be improved more.

（３）（１）又は（２）記載の方法において、
ユーザの対人関係を示す情報を記載したアドレス帳を備え、
前記アドレス帳に記載された情報に基づいて、前記不用意な発言についての音声の発信を防止するか否かを判定するステップを更に含むことを特徴とする方法。 (3) In the method according to (1) or (2),
It has an address book with information showing the user's interpersonal relationship,
The method further comprising the step of determining whether or not to prevent voice transmission of the inadvertent utterance based on information described in the address book.

（３）記載の方法によれば、不用意な発言の防止を行うか否かについて、対人関係を考慮することができる。 (3) According to the method described in the above, it is possible to consider the interpersonal relationship as to whether or not to prevent inadvertent speech.

（４）（１）又は（２）記載の方法において、
ユーザの対人関係を示す情報を記載したアドレス帳を備え、
前記不用意な発言のカテゴリーとして、個人情報及び他者を不快にさせる使用禁止語を少なくとも含み、
前記判定ステップは、前記アドレス帳及び前記カテゴリーを参照して、前記判定を行うことを特徴とする方法。 (4) In the method according to (1) or (2),
It has an address book with information showing the user's interpersonal relationship,
As a category of the inadvertent speech, including at least prohibited words that make personal information and other people uncomfortable,
The determining step refers to the address book and the category, and performs the determination.

（４）記載の方法によれば、不用意な発言のカテゴリー及び対人関係を考慮して、不用意な発言の防止を行うことができる。例えば、知人には個人情報の発信を行う一方で、不快にさせる発言の発信を防止することができる。 According to the method described in (4), careless speech can be prevented in consideration of the category of casual speech and interpersonal relationships. For example, while sending personal information to an acquaintance, it is possible to prevent the sending of unpleasant speech.

（５）ユーザが発した不用意な発言についての音声の発信を防止する禁止語発信防止電話であって、
ユーザの発した音声を受け付け、一時的に蓄積する音声一時蓄積部と、
前記音声を音声認識し、単語レベルに分析する音声分析部と、
分析された単語レベルの音声が、予め定められた禁止語音素モデルと一致するか否かを判定する禁止語判定部と、
ユーザの発した音声のうち、前記禁止語音素モデルと一致すると判定された部分を特定する禁止語音声発話時間測定部と、
前記特定された部分の音声をダミー音に置換する禁止語置換部と、
置換した部分を含む音声を送信する送信部と、
備えることを特徴とする禁止語発信防止電話。 (5) A forbidden word calling prevention telephone for preventing voice transmission about an inadvertent utterance made by a user,
A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
A voice analysis unit that recognizes the voice and analyzes it at a word level;
A prohibited word determination unit that determines whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model;
A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
A transmitter that transmits audio including the replaced part;
A telephone for preventing prohibited words from being provided.

（６）（５）記載の禁止語発信防止電話において、
分析された単語レベルの音声を言語モデルを用いてテキストに変換する音声／テキスト変換部を更に備え、
前記禁止語判定部は、前記分析された単語レベルの音声が予め定められた禁止語音素モデルと一致するか否かに加え、変換されたテキストが予め定められた禁止語テキストと一致するか否かを判定し、
前記禁止語音声発話時間測定部は、前記禁止語判定部により一致すると判定された部分を特定することを特徴とする禁止語発信防止電話。 (6) In the prohibited word calling prevention telephone set forth in (5),
A speech / text converter for converting the analyzed word level speech into text using a language model;
The prohibited word determination unit determines whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model, and whether or not the converted text matches a predetermined prohibited word text. Determine whether
The prohibited-word outgoing utterance time measuring unit specifies a portion determined to be matched by the prohibited-word determining unit.

（５）、（６）記載の禁止語発信防止電話によれば、（１）、（２）記載の方法と同様の効果を奏することができる。 According to the prohibited word transmission preventing telephone set forth in (5) and (6), the same effects as the methods described in (1) and (2) can be obtained.

（７）通信回線を解した音声の通信を管理し、ユーザが発した不用意な発言についての音声の発信を防止する禁止語発信防止サーバであって、
音声を発信したユーザを判定する発信ユーザ判定部と、
ユーザの発した音声を受け付け、一時的に蓄積する音声一時蓄積部と、
前記音声を音声認識し、単語レベルに分析する音声分析部と、
分析された単語レベルの音声が、前記発信ユーザ判定部により判定されたユーザについての予め定められた禁止語音素モデルと一致するか否かを判定する禁止語判定部と、
ユーザの発した音声のうち、前記禁止語音素モデルと一致すると判定された部分を特定する禁止語音声発話時間測定部と、
前記特定された部分の音声をダミー音に置換する禁止語置換部と、
置換した部分を含む音声を送信する送信部と、
備えることを特徴とする禁止語発信防止サーバ。 (7) A prohibited word transmission prevention server that manages voice communication through a communication line and prevents voice transmission of inadvertent speech made by a user,
A calling user determination unit for determining a user who has transmitted voice;
A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
A voice analysis unit that recognizes the voice and analyzes it at a word level;
A prohibited word determination unit that determines whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model for the user determined by the calling user determination unit;
A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
A transmitter that transmits audio including the replaced part;
A server for preventing prohibited words from being provided.

（７）記載の禁止語発信防止サーバによれば、（１）記載の方法と同様の効果を奏することができる。 According to the prohibited word transmission preventing server described in (7), the same effect as the method described in (1) can be obtained.

本発明によれば、ユーザの発した不用意な発言の発信をリアルタイムに防止することができる。 ADVANTAGE OF THE INVENTION According to this invention, the transmission of the careless utterance which the user uttered can be prevented in real time.

［第１実施形態］
図１〜図５を参照して、本発明の好適な一実施形態である禁止語発信防止方法及び禁止語発信防止電話について説明する。なお、本実施の形態では、禁止語発信防止装置３０が組み込まれた携帯電話を用いて禁止語発信防止電話について説明するが、禁止語発信防止電話は携帯電話に限らず、固定電話に用いることも可能である。 [First Embodiment]
With reference to FIG. 1 to FIG. 5, a prohibited word transmission preventing method and a prohibited word transmission preventing telephone, which are preferred embodiments of the present invention, will be described. In the present embodiment, a forbidden word transmission prevention telephone is described using a mobile phone in which the prohibited word transmission prevention device 30 is incorporated. However, the prohibited word transmission prevention telephone is not limited to a mobile phone, and is used for a fixed telephone. Is also possible.

［禁止語発信防止電話１０の全体構成］
図１は、本実施の形態における禁止語発信防止電話１０の概略を示した図である。禁止語発信防止電話１０は、ユーザが発した音声の入力を受け付けるマイク１１と、入力された音声を既存の通信回線を解して相手ユーザに送信する送信部１３と、ユーザが発した音声のうち不用意な発言である禁止語を除外する機能を有する禁止語発信防止装置３０と、を備える。なお、図示は省略するが、携帯電話１０は、ユーザの発した音声をデジタル信号に変換するＡ／Ｄ変換部も備えている。 [Overall configuration of prohibited-word outgoing call telephone 10]
FIG. 1 is a diagram showing an outline of a prohibited word calling prevention telephone 10 in the present embodiment. The prohibited-word calling prevention telephone 10 includes a microphone 11 that receives input of a voice uttered by the user, a transmission unit 13 that transmits the input voice to the other user through an existing communication line, and a voice uttered by the user. And a prohibited word transmission preventing device 30 having a function of excluding prohibited words that are inadvertent statements. Although not shown, the mobile phone 10 is also provided with an A / D conversion unit that converts voice uttered by the user into a digital signal.

「禁止語」とは、例えば、住所、電話番号、誕生日などの個人情報や、相手を不快にさせる言葉、例えば、卑猥語、侮辱語、差別語など予め定められた使用禁止語が挙げられる。なお、個人情報及び使用禁止語は禁止語の一例であり、禁止語の種別は個人情報及び使用禁止語に限定されるものではない。 The “prohibited words” include, for example, personal information such as addresses, telephone numbers, birthdays, and words that make the other person uncomfortable, such as predetermined prohibited words such as obscene words, insults, and discrimination words. . Personal information and prohibited words are examples of prohibited words, and the types of prohibited words are not limited to personal information and prohibited words.

禁止語発信防止装置３０は、音声一時蓄積部３１と、音声分析部３２と、禁止語音声判定部３３と、禁止語音素モデル３４と、禁止語音声発話時間測定部３５と、禁止語置換部３６と、を備える。 The prohibited word transmission preventing device 30 includes a speech temporary storage unit 31, a voice analysis unit 32, a prohibited word speech determination unit 33, a prohibited word phoneme model 34, a prohibited word speech utterance time measurement unit 35, and a prohibited word replacement unit. 36.

音声一時蓄積部３１は、デジタル化された音声を一時的に蓄積する。音声分析部３２は、蓄積された音声を連続音声認識により分析する。なお、音声分析部３２の概要については図２において後述する。禁止語音声判定部３３は、音声認識の結果が禁止語音素モデル３４（図３参照）に挙げられた音素モデルと一致するか否かを判定する。即ち、禁止語音声判定部３３は、禁止語か否かを音声のまま判定する。禁止語音声発話時間測定部３５は、禁止語と判定された音声の発話時間、即ち、蓄積された音声のうち禁止語に該当する部分を特定する。禁止語置換部３６は、特定された禁止語に該当する部分の音声をダミー音に置換する。 The temporary sound storage unit 31 temporarily stores digitized sound. The voice analysis unit 32 analyzes the accumulated voice by continuous voice recognition. The outline of the voice analysis unit 32 will be described later with reference to FIG. The prohibited word speech determination unit 33 determines whether the result of speech recognition matches the phoneme model listed in the prohibited word phoneme model 34 (see FIG. 3). That is, the prohibited word speech determination unit 33 determines whether or not it is a prohibited word as it is. The prohibited word speech utterance time measurement unit 35 specifies the speech utterance time of the speech determined to be a prohibited word, that is, the portion corresponding to the prohibited word in the accumulated speech. The prohibited word replacement unit 36 replaces the sound corresponding to the specified prohibited word with a dummy sound.

［音声分析部３２の概要］
図２を参照して、音声分析部３２による連続音声認識について説明する。なお、この技術は従来公知の技術であり、例えば、「ねっとテクノロジー解体新書５画像・音声処理技術（古井貞熙・酒井善則著株式会社電波新聞社２００４年１月２５日第１版発行）」に記載されているため、図中の以下のステップＳ１〜Ｓ４ではその概略のみ説明する。 [Outline of Speech Analysis Unit 32]
With reference to FIG. 2, the continuous speech recognition by the speech analysis unit 32 will be described. This technology is a conventionally known technology. For example, “Netto Technology Dismantling New Book 5 Image / Sound Processing Technology (Sadaaki Furui, Yoshinori Sakai, Denpa Shimbun Co., Ltd., published on January 25, 2004, first edition)” In the following steps S1 to S4 in the figure, only the outline will be described.

Ｓ１：蓄積されたデジタル音声を所定の時間間隔（２ｍｓ〜４ｍｓ）で複数のフレームに区分し、区分した各フレームからパワー成分を抽出する。抽出したパワー成分が所定の閾値を超えたフレームを音声区間として検出する。 S1: The stored digital audio is divided into a plurality of frames at a predetermined time interval (2 ms to 4 ms), and a power component is extracted from each divided frame. A frame in which the extracted power component exceeds a predetermined threshold is detected as a voice section.

Ｓ２：検出された音声区間の音声スペクトルをフーリエ変換し、音響特徴ベクトルを抽出する。 S2: The sound spectrum of the detected speech section is Fourier transformed to extract an acoustic feature vector.

Ｓ３：抽出した音響特徴ベクトルと音素モデルとを参照して、フレーム毎の連続音素認識を行い、音素レベルの認識を行う。なお、音素モデルとしては、隠れマルコフモデル（ＨＭＭ）を適用することができる。 S3: Referring to the extracted acoustic feature vector and phoneme model, continuous phoneme recognition is performed for each frame, and the phoneme level is recognized. Note that a hidden Markov model (HMM) can be applied as the phoneme model.

Ｓ４：音素レベルにまで認識された音声を、各単語の発音をモデル化した単語辞書を参照して、単語レベルに分析する。 S4: The speech recognized up to the phoneme level is analyzed at the word level with reference to a word dictionary that models the pronunciation of each word.

［禁止語音素モデル３４］
図３は、禁止語音素モデル３４の一例を示す図である。 [Forbidden Phoneme Model 34]
FIG. 3 is a diagram illustrating an example of the prohibited word phoneme model 34.

禁止語音素モデル３４は、禁止語カテゴリー毎に禁止語音素モデルを格納している。例えば、禁止語カテゴリーの個人情報には、住所を示す「ｔｏｏｋｊｏｏｔｏ・・・(東京都・・・)」などが格納されている。音声分析部３２による音声認識の結果出力される単語レベルの分析が、禁止語音素モデル３４に格納される禁止語音素モデルと一致する場合には、禁止語であると判定される。 The prohibited word phoneme model 34 stores a prohibited word phoneme model for each prohibited word category. For example, the personal information of the prohibited word category stores "tokyojoto ... (Tokyo ...)" indicating an address. If the analysis of the word level output as a result of speech recognition by the speech analysis unit 32 matches the prohibited word phoneme model stored in the prohibited word phoneme model 34, it is determined to be a prohibited word.

［禁止語発信防止電話１０の主なハードウェア構成］
図４は、禁止語発信防止電話１０の主なハードウェア構成を示す概略図である。 [Main hardware configuration of the prohibited-word calling prevention telephone 10]
FIG. 4 is a schematic diagram showing the main hardware configuration of the prohibited-word outgoing call prevention telephone 10.

禁止語発信防止電話１０はバス２２を有する。バス２２には、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２４、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２５、入力装置２６、通信装置２７及び表示装置２８が接続されている。 The prohibited word calling prevention telephone 10 has a bus 22. A CPU (Central Processing Unit) 23, a RAM (Random Access Memory) 24, a ROM (Read Only Memory) 25, an input device 26, a communication device 27, and a display device 28 are connected to the bus 22.

ＣＰＵ２３は、ＲＯＭ２５に記憶された各種プログラムを適宜読み出して実行することにより、各ハードウェアと協働し、各種機能を実現している。ＲＡＭ２４は、プログラムの実行に使用するローカルメモリである。入力装置２６は、ユーザによる入力の受付を行うものであり、マイク、キーボードなどを含んでよい。表示装置２８は、ユーザにデータの入力を受け付ける画面を表示するものであり、液晶表示装置（ＬＣＤ）などのディスプレイ装置を含む。 The CPU 23 implements various functions in cooperation with each hardware by appropriately reading and executing various programs stored in the ROM 25. The RAM 24 is a local memory used for program execution. The input device 26 accepts input by the user, and may include a microphone, a keyboard, and the like. The display device 28 displays a screen for accepting data input to the user, and includes a display device such as a liquid crystal display device (LCD).

［禁止語発信防止電話１０の基本動作］
図５は、禁止語発信防止電話１０における処理の流れの詳細（ステップＳ２０１〜Ｓ２０７）を示した図である。 [Basic operation of the prohibited-phone call prevention telephone 10]
FIG. 5 is a diagram showing the details of the processing flow (steps S201 to S207) in the prohibited-word call prevention telephone 10.

Ｓ２０１：ＣＰＵ２３は、ＲＡＭ２４（音声一時蓄積部３１）にユーザが発した音声をデジタル化したデジタル音声データを蓄積する。 S201: The CPU 23 stores digital voice data obtained by digitizing voice uttered by the user in the RAM 24 (temporary voice storage unit 31).

Ｓ２０２：ＣＰＵ２３（音声分析部３２）は、ＲＡＭ２４に蓄積したデジタル音声データの音声認識を行う。なお、音声認識の概要は図２において説明したとおりである。この処理により、ユーザが発した音声を単語レベルで認識することができる。 S202: The CPU 23 (voice analysis unit 32) performs voice recognition of the digital voice data stored in the RAM 24. The outline of voice recognition is as described in FIG. By this process, the voice uttered by the user can be recognized at the word level.

Ｓ２０３、Ｓ２０４：ＣＰＵ２３（禁止語音声判定部３３）は、認識された各単語についての音声が、予め定められた禁止語音素モデルと一致するか否かを判定する。即ち、ユーザが発した音声に個人情報や使用禁止語などからなる禁止語が含まれるか否かを判定する。 S203, S204: The CPU 23 (prohibited word sound determination unit 33) determines whether or not the sound for each recognized word matches a predetermined prohibited word phoneme model. That is, it is determined whether or not prohibited words including personal information and prohibited words are included in the voice uttered by the user.

Ｓ２０５：上記判定の結果、禁止語が含まれる場合には、ＣＰＵ２３（禁止語音声発話時間測定部３５）は、禁止語と判定された音声の発話時間を測定する。 S205: As a result of the determination, if a prohibited word is included, the CPU 23 (prohibited word speech utterance time measuring unit 35) measures the utterance time of the speech determined to be a prohibited word.

Ｓ２０６：ＣＰＵ２３（禁止語置換部３６）は、禁止語と判定された部分の音声をダミー音に置換する。 S206: The CPU 23 (prohibited word replacement unit 36) replaces the sound of the portion determined as the prohibited word with a dummy sound.

Ｓ２０７：通信装置２７（送信部１３）は、置換されたダミー音を含むデジタル音声データを通信回線を解して相手ユーザに送信する。 S207: The communication device 27 (transmission unit 13) transmits the digital audio data including the replaced dummy sound to the partner user through the communication line.

これにより、通話中の音声に禁止語が含まれる場合に、リアルタイムでこの禁止語をダミー音に置換することができる。その結果、住所・電話番号などの個人情報を相手ユーザに伝えてしまうことを防止でき、防犯に貢献することができる。また、相手を不快にさせる使用禁止語を誤って発声してしまった場合であっても、この使用禁止語をリアルタイムでダミー音に置換することができるため、相手を不快にさせずにすむ。 Thereby, when a prohibited word is included in the voice during a call, the prohibited word can be replaced with a dummy sound in real time. As a result, it is possible to prevent personal information such as an address and a telephone number from being transmitted to the other user, thereby contributing to crime prevention. Further, even when a prohibited word that makes a partner uncomfortable is accidentally uttered, the prohibited word can be replaced with a dummy sound in real time, so that the partner is not made uncomfortable.

［第２実施形態］
次に、図６を参照して、本発明の第２実施形態の禁止語発信防止方法及び禁止語発信防止電話について説明する。近年における電話機には知人の電話番号などを記憶したアドレス帳が設けられているところ、第２実施形態では禁止語をダミー音に置換するか否かをアドレス帳を用いて判定する点に特徴を有している。
なお、禁止語発信防止電話１０Ａのハードウェア構成は、禁止語発信防止電話１０のハードウェア構成と同じであるため、説明を省略する。 [Second Embodiment]
Next, a prohibited word transmission preventing method and a prohibited word transmission preventing telephone according to a second embodiment of the present invention will be described with reference to FIG. In recent years, telephones are provided with an address book that stores an acquaintance's telephone number, and the second embodiment is characterized in that it uses the address book to determine whether to replace a prohibited word with a dummy sound. Have.
Note that the hardware configuration of the prohibited-word calling prevention telephone 10A is the same as the hardware configuration of the prohibited-word calling preventing telephone 10, and thus the description thereof is omitted.

［禁止語発信防止電話１０Ａの全体構成］
図６（１）は、第２実施形態における禁止語発信防止電話１０Ａの概略を示した図である。禁止語発信防止電話１０Ａは、第１実施形態の禁止語発信防止電話１０に加え、アドレス帳１５を備えている。アドレス帳１５には、ユーザの知人の電話番号などの情報が格納されている。禁止語発信防止装置３０は、アドレス帳１５に格納されている情報を用いて禁止語をダミー音に置換するか否かを判定する。 [Overall configuration of prohibited-word calling prevention telephone 10A]
FIG. 6 (1) is a diagram showing an outline of the prohibited word call preventing telephone 10A in the second embodiment. The prohibited word transmission preventing telephone 10A includes an address book 15 in addition to the prohibited word transmission preventing telephone 10 of the first embodiment. The address book 15 stores information such as the telephone number of the user's acquaintance. The prohibited word transmission preventing device 30 uses information stored in the address book 15 to determine whether to replace the prohibited word with a dummy sound.

［禁止語発信防止電話１０Ａの基本動作］
図６（２）は、禁止語発信防止電話１０Ａにおける処理の流れの詳細（ステップＳ２０１〜Ｓ２０７、Ｓ２１１）を示した図である。なお、禁止語発信防止電話１０Ａにおける処理では、禁止語発信防止電話１０における処理のうちＳ２０２とＳ２０３との間の処理にアドレス帳を用いた判定処理が設けられている（Ｓ２１１）。以下、禁止語発信防止電話１０における処理との相違部分について説明する。 [Basic operation of prohibited word calling prevention telephone 10A]
FIG. 6 (2) is a diagram showing details of the processing flow (steps S201 to S207, S211) in the prohibited word call prevention telephone 10A. In the process in the prohibited word transmission preventing telephone 10A, a determination process using an address book is provided in the process between S202 and S203 among the processes in the prohibited word transmission preventing telephone 10 (S211). Hereinafter, a difference from the processing in the prohibited word call prevention telephone 10 will be described.

Ｓ２１１：音声分析部３２が、音声一時蓄積部３１に蓄積されたデジタル音声データの音声認識を行うと、ＣＰＵ２３（登録判定部４１）は、通信先の相手ユーザの電話番号がアドレス帳１５に登録された電話番号であるか否かを判定する。この判定がＹＥＳのときは、ＣＰＵ２３は禁止語判定（Ｓ２０３）〜音声送信（Ｓ２０７）の処理を行い、ＮＯのときはダミー音に置換することなく音声送信（Ｓ２０７）の処理を行う。 S211: When the voice analysis unit 32 performs voice recognition of the digital voice data stored in the temporary voice storage unit 31, the CPU 23 (registration determination unit 41) registers the telephone number of the communication partner user in the address book 15. It is determined whether or not the received telephone number. When this determination is YES, the CPU 23 performs the prohibited word determination (S203) to the voice transmission (S207), and when NO, the CPU 23 performs the voice transmission (S207) without replacing the dummy sound.

このように、本実施の形態における禁止語発信防止方法及び禁止語発信防止電話１０Ａによれば、未登録の通話先に通話するときだけ禁止語発信防止機能を作動させるようにすることができる。なお、アドレス帳における判定の処理（Ｓ２１１）を音声認識の処理（Ｓ２０２）の後に行っているが、これに限られるものではない。例えば、音声認識の処理の前に行うこととしてもよい。これにより、禁止語発信防止機能を作動させる必要がない場合には、音声認識を行うことなく直ちに音声データを送信することができる。 As described above, according to the prohibited word transmission preventing method and the prohibited word transmission preventing telephone 10A according to the present embodiment, the prohibited word transmission preventing function can be operated only when a call is made to an unregistered call destination. The determination process in the address book (S211) is performed after the voice recognition process (S202), but the present invention is not limited to this. For example, it may be performed before the voice recognition process. Thereby, when it is not necessary to operate the prohibited word transmission preventing function, the voice data can be transmitted immediately without performing voice recognition.

また、アドレス帳に登録された通話先である場合には禁止語発信防止機能を作動させないこととしているが、これに限られるものではない。具体的には、禁止語カテゴリー「個人情報」については、アドレス帳に登録された知人に知らせてもよいが、アドレス帳に未登録の通話先には知られては問題となる場合がある。他方、禁止語カテゴリー「使用禁止語」については、アドレス帳に登録された知人に使用した場合に人間関係において問題となる場合がある。 In addition, the prohibited word transmission preventing function is not activated when the destination is registered in the address book, but the present invention is not limited to this. Specifically, the prohibited word category “personal information” may be notified to an acquaintance registered in the address book, but may be problematic if known to a callee not registered in the address book. On the other hand, the prohibited word category “prohibited words” may cause a problem in human relations when used for an acquaintance registered in the address book.

そこで、アドレス帳の登録と禁止語カテゴリーの情報とを参照し、禁止語発信防止機能を作動させるか否かを判定することとしてもよい。この場合には、禁止語判定の処理（Ｓ２０３）が行われた後、ＣＰＵ２３（登録判定部４１、禁止語音声判定部３３）が、アドレス帳の登録と禁止語カテゴリーの情報とを参照した判定を行うことで実現することができる。 Therefore, it may be determined whether or not the prohibited word transmission prevention function is activated by referring to the address book registration and the prohibited word category information. In this case, after the prohibited word determination process (S203) is performed, the CPU 23 (registration determination unit 41, prohibited word voice determination unit 33) determines whether the address book registration and prohibited word category information are referred to. It can be realized by doing.

［第３実施形態］
次に、図７〜図９を参照して、第３実施形態における禁止語発信防止電話１０Ｂについて説明する。上記実施形態においては、禁止語であるか否かの判定を、（１）音声（禁止語音素モデル）により判定していたが、本実施の形態では、（１）音声に基づく判定に加え、（２）言語モデルを考慮したテキストに基づく判定を行うことに特徴を有している。
なお、禁止語発信防止電話１０Ｂのハードウェア構成は、禁止語発信防止電話１０のハードウェア構成と同じであるため、説明を省略する。 [Third Embodiment]
Next, with reference to FIG. 7 to FIG. 9, a prohibited word call preventing telephone 10B according to the third embodiment will be described. In the above embodiment, whether or not it is a prohibited word is determined by (1) speech (prohibited word phoneme model), but in this embodiment, in addition to (1) determination based on speech, (2) It is characterized by performing determination based on text in consideration of a language model.
Note that the hardware configuration of the prohibited-word call prevention telephone 10B is the same as the hardware configuration of the prohibited-word call prevention telephone 10, and thus the description thereof is omitted.

［禁止語発信防止電話１０Ｂの全体構成］
図７は、第３実施形態における禁止語発信防止電話１０Ｂの概略を示した図である。なお、上記実施形態における禁止語発信防止電話１０と同様の構成については同一の符号を付し説明を省略する。 [Overall configuration of prohibited-word calling prevention telephone 10B ]
FIG. 7 is a diagram showing an outline of a prohibited word call preventing telephone 10B in the third embodiment. In addition, about the structure similar to the prohibited word transmission prevention telephone 10 in the said embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

禁止語発信防止電話１０Ｂは、禁止語発信防止電話１０の構成に加え、言語モデル５１と、音声／テキスト変換部５２と、禁止語テキスト判定部５３と、禁止語テキストリスト５４と、を備える。 The prohibited word transmission preventing telephone 10B includes a language model 51, a voice / text conversion unit 52, a prohibited word text determination unit 53, and a prohibited word text list 54 in addition to the configuration of the prohibited word transmission preventing telephone 10.

言語モデル５１は、単語のそれぞれについて出現確率・接続確率をデータ化したものであり、構文知識（文法的構造）、意味知識（単語間の関係や属性）、文脈知識（会話の流れ）、会話の一般的知識をモデル化したものである。禁止語（特に使用禁止語）は訛りや方言によって異なることがあり、全ての禁止語の音声データを予め網羅的に洗い出すことができない場合に言語モデルを用いると好適である。 The language model 51 is obtained by converting appearance probabilities and connection probabilities for each word into data, including syntax knowledge (grammatical structure), semantic knowledge (relationships and attributes between words), context knowledge (conversation flow), conversation This is a model of general knowledge. Forbidden words (especially prohibited words) may vary depending on utterances and dialects, and it is preferable to use a language model when the speech data of all prohibited words cannot be comprehensively identified in advance.

なお、２００８年度の日本音響学会春季大会において掲載された論文「複数の話題言語モデルによる音声認識結果の事後統合（著磯健一）」に記載されているように、複数の話題言語モデルを用いて独立に発話をデコードし、発話単位に最良スコアの仮説を事後選択することで、単一の汎用言語モデルや話題言語モデルを用いるよりも、認識結果について高い精度を得ることができる。よって、言語モデル５１には、複数の話題言語モデルが用いられることが好ましい。 In addition, as described in the paper “Post-integration of speech recognition results by multiple topic language models (Ken-ichi Ken)) published at the 2008 Acoustical Society of Japan Spring Conference, using multiple topic language models. By decoding the utterance independently and selecting the hypothesis with the best score for each utterance, higher recognition accuracy can be obtained than when using a single general-purpose language model or topic language model. Therefore, a plurality of topic language models are preferably used for the language model 51.

音声／テキスト変換部５２は、音声分析部３２が分析した音声を、言語モデル５１を用いてテキストに変換する。禁止語テキスト判定部５３は、変換されたテキストが禁止語テキストリスト５４に挙げられた禁止語テキストと一致するか否かを判定する。なお、禁止語テキストリスト５４は、図８に示すように、禁止語カテゴリー毎に禁止語テキストを格納している。 The voice / text conversion unit 52 converts the voice analyzed by the voice analysis unit 32 into text using the language model 51. The prohibited word text determination unit 53 determines whether or not the converted text matches the prohibited word text listed in the prohibited word text list 54. The prohibited word text list 54 stores prohibited word text for each prohibited word category, as shown in FIG.

［禁止語発信防止電話１０Ｂの基本動作］
図９は、禁止語発信防止電話１０Ｂにおける処理の流れの詳細（ステップＳ２０１〜Ｓ２０７、Ｓ２５１〜Ｓ２５３）を示した図である。なお、禁止語発信防止電話１０Ｂは、（１）音声に基づく判定に加え、（２）言語モデルを考慮したテキストに基づく判定を行うところ、（１）音声に基づく判定（Ｓ２０２〜Ｓ２０５）については第１実施形態の処理と同一であるため、説明を省略する。以下では、本実施の形態における特徴部分である（２）言語モデルを考慮したテキストに基づく判定の処理について説明する。 [Basic operation of prohibited-word calling prevention telephone 10B ]
FIG. 9 is a diagram showing the details of the processing flow (steps S201 to S207, S251 to S253) in the prohibited word call prevention telephone 10B . Incidentally, Stopwords originating prevent telephone 10B, in addition to the determination based on (1) voice, (2) where a determination based on a consideration text language model, the determination (S202 to S205) Based on (1) audio Since it is the same as the process of 1st Embodiment, description is abbreviate | omitted. In the following, a description will be given of (2) text-based determination processing in consideration of the language model, which is a characteristic part in the present embodiment.

Ｓ２５１：ＣＰＵ２３（音声／テキスト変換部５２）は、音声分析部３２により認識された音声を言語モデル５１を用いてテキストに変換する。 S251: The CPU 23 (voice / text converter 52) converts the voice recognized by the voice analyzer 32 into text using the language model 51.

Ｓ２５２、Ｓ２５３：ＣＰＵ２３（禁止語テキスト判定部５３）は、変換されたテキストが、予め定められた禁止語テキストと一致するか否かを判定する。そして、禁止語テキストと一致する場合には、Ｓ２０５、Ｓ２０６の処理が行われる。 S252, S253: The CPU 23 (prohibited word text determination unit 53) determines whether or not the converted text matches a predetermined prohibited word text. Then, if it matches the prohibited word text, the processing of S205 and S206 is performed.

このように、本実施の形態における禁止語発信防止電話１０Ｂによれば、音声だけでなく、音声を言語モデルを用いて変換したテキストをも用いて禁止語の判定を行うことができる。そのため、ユーザが発した音声に禁止語が含まれるか否かの判定を、音声のみによる判定よりも高い精度で行うことができる。

As described above, according to the prohibited word transmission preventing telephone 10B in the present embodiment, it is possible to determine a prohibited word using not only speech but also text converted from speech using a language model. Therefore, it is possible to determine whether or not prohibited words are included in the voice uttered by the user with higher accuracy than the determination using only the voice.

なお、本実施の形態においても、アドレス帳を用いた判定を行うことができ、また、アドレス帳及び禁止語カテゴリーを用いた判定を行うことができる。 Also in this embodiment, determination using an address book can be performed, and determination using an address book and a prohibited word category can be performed.

（変形形態）
上記実施形態においては、禁止語発信防止装置がユーザの所有する電話に格納されるものを一例として説明したが、電話回線を相互接続し電話網を構成するための交換機やＩＰ電話におけるサーバに格納することとしてもよい。 (Deformation)
In the above embodiment, the prohibited word transmission preventing device is stored in the telephone owned by the user as an example. However, the prohibited word transmission preventing apparatus is stored in a server in an exchange or an IP telephone for interconnecting telephone lines to form a telephone network. It is good to do.

［禁止語発信防止交換機／サーバ６０の全体構成］
図１０は、禁止語発信防止装置３０Ｃを格納した禁止語発信防止交換機／サーバ６０の概略を示した図である。なお、上記実施形態における禁止語発信防止電話１０と同様の構成については同一の符号を付し説明を省略する。また、禁止語発信防止交換機／サーバ６０は、ユーザの電話と通信するための通信部を備えているが図示は省略する。 [Overall Configuration of Prohibited Word Transmission Preventing Exchange / Server 60]
FIG. 10 is a diagram showing an outline of the prohibited word transmission preventing exchange / server 60 in which the prohibited word transmission preventing device 30C is stored. In addition, about the structure similar to the prohibited word transmission prevention telephone 10 in the said embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted. The prohibited word transmission preventing exchange / server 60 includes a communication unit for communicating with the user's telephone, but the illustration is omitted.

禁止語発信防止交換機／サーバ６０の禁止語発信防止装置３０Ｃは、禁止語発信防止装置３０の構成に加え、発信ユーザ判定部６１と禁止語音素モデルＤＢ３４Ｃとを備える。発信ユーザ判定部６１は、音声を発信したユーザを特定する。禁止語音素モデルＤＢ３４Ｃは、禁止語カテゴリー毎の禁止語音素モデルをユーザ毎に対応付けて格納する。 The prohibited word transmission preventing device 30C of the prohibited word transmission preventing exchange / server 60 includes a transmission user determining unit 61 and a prohibited word phoneme model DB 34C in addition to the configuration of the prohibited word transmission preventing device 30. The calling user determination unit 61 identifies the user who has sent the voice. The prohibited word phoneme model DB 34C stores prohibited word phoneme models for each prohibited word category in association with each user.

なお、禁止語発信防止交換機／サーバ６０においても、（１）音声に基づく判定に加え、（２）言語モデルを考慮したテキストに基づく判定を行うこととしてもよく、また、ユーザ毎のアドレス帳を備え、アドレス帳を用いた判定、アドレス帳及び禁止語カテゴリーを用いた判定を行うこととしてもよい。 The prohibited word transmission prevention exchange / server 60 may also perform (2) determination based on text in consideration of the language model in addition to (1) determination based on speech, and an address book for each user. It is good also as performing determination using an address book and an address book and a prohibited word category.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。例えば、ＴＶやラジオの生放送において放送禁止用語を出演者がうっかり発言してしまうことを防止する手段としても用いることができる。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. For example, it can be used as a means for preventing performers from inadvertently speaking broadcast-prohibited terms in live TV and radio broadcasts.

本発明の実施形態における禁止語発信防止電話の機能構成を示す図である。It is a figure which shows the function structure of the prohibited word transmission prevention telephone in embodiment of this invention. 上記実施形態における音声分析部の模式図である。It is a schematic diagram of the audio | voice analysis part in the said embodiment. 上記実施形態における禁止語音素モデルを示す図である。It is a figure which shows the prohibited word phoneme model in the said embodiment. 上記実施形態における禁止語発信防止電話のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the prohibited word transmission prevention telephone in the said embodiment. 上記実施形態における禁止語発信防止電話の処理のフローチャートである。It is a flowchart of the process of the prohibited word transmission prevention telephone in the said embodiment. 第２実施形態における禁止語発信防止電話の機能構成及び処理を示す図である。It is a figure which shows the function structure and process of a prohibited word transmission prevention telephone in 2nd Embodiment. 第３実施形態における禁止語発信防止電話の機能構成を示す図である。It is a figure which shows the function structure of the prohibited word transmission prevention telephone in 3rd Embodiment. 上記実施形態における禁止語テキストリストを示す図である。It is a figure which shows the prohibited word text list in the said embodiment. 上記実施形態における禁止語発信防止電話の処理のフローチャートである。It is a flowchart of the process of the prohibited word transmission prevention telephone in the said embodiment. 禁止語発信防止交換機及びサーバの機能構成を示す図である。It is a figure which shows the function structure of a prohibited word transmission prevention switch and a server.

Explanation of symbols

１０禁止語発信防止電話
３０禁止語発信防止装置
３１音声一時蓄積部
３２音声分析部
３３禁止語音声判定部
３４禁止語音素モデル
３５禁止語音声発話時間測定部
３６禁止語置換部 10 Prohibited Word Transmission Preventing Phone 30 Prohibited Word Transmission Preventing Device 31 Temporary Voice Storage Unit 32 Speech Analysis Unit 33 Prohibited Word Speech Judgment Unit 34 Prohibited Word Phoneme Model 35 Prohibited Word Speech Speaking Time Measurement Unit 36 Prohibited Word Replacement Unit

Claims

A telephone provided with an address book that stores a telephone number of a user's acquaintance is a method for preventing outgoing voices about inadvertent utterances made by the user,
An audio accumulation step for accepting and temporarily accumulating user-generated audio;
A voice analysis step of recognizing the voice and analyzing it to a word level;
An inadvertent determination step for determining whether to prevent the transmission of voice about the inadvertent speech based on the information described in the address book;
As a result of the determination, it is determined whether or not the analyzed word level speech matches a predetermined prohibited phoneme phoneme model on condition that it is determined to prevent transmission of speech for the inadvertent speech. A prohibited word judging step for judging;
A time specifying step for specifying a portion determined to match the prohibited word phoneme model in the speech uttered by the user;
A replacing step of replacing the sound of the identified portion with a dummy sound;
A sending step for sending a voice including the replaced part;
A method comprising the steps of:

A telephone provided with an address book that stores a telephone number of a user's acquaintance is a method for preventing outgoing voices about inadvertent utterances made by the user,
As the category of careless speech, at least personal information and prohibited words are included,
An audio accumulation step for accepting and temporarily accumulating user-generated audio;
A voice analysis step of recognizing the voice and analyzing it to a word level;
A prohibited word determination step of determining whether or not the analyzed word level speech matches the predetermined prohibited word phoneme model with reference to the address book and the category ;
A time specifying step for specifying a portion determined to match the prohibited word phoneme model in the speech uttered by the user;
A replacement step of replacing the sound of the identified part with a dummy sound;
A sending step for sending a voice including the replaced part;
A method comprising the steps of:

The method according to claim 1 or 2, wherein
A text conversion step of converting the analyzed word level speech into text using a language model;
In the prohibited word determination step, in addition to whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model, whether or not the converted text matches a predetermined prohibited word text Determine whether
The time specifying step specifies a portion determined to match by the prohibited word determining step.

This is a prohibited word calling prevention phone that prevents the sending of voices about inadvertent utterances made by users,
An address book storage unit for storing an address book for storing telephone numbers of acquaintances of users;
A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
A voice analysis unit that recognizes the voice and analyzes it at a word level;
An inadvertent determination unit that determines whether or not to prevent transmission of voice for the inadvertent utterance based on the information described in the address book;
As a result of the determination, it is determined whether or not the analyzed word level speech matches a predetermined prohibited phoneme phoneme model on condition that it is determined to prevent transmission of speech for the inadvertent speech. A prohibited word determination unit for determining;
A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
A transmitter that transmits audio including the replaced part;
A telephone for preventing prohibited words from being provided.

  This is a prohibited word calling prevention phone that prevents the sending of voices about inadvertent utterances made by users,
  An address book storage unit for storing an address book for storing telephone numbers of acquaintances of users;
  As the category of inadvertent speech, a prohibited word storage unit including at least personal information and prohibited words,
  A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
  A voice analysis unit that recognizes the voice and analyzes it at a word level;
  A prohibited word determination unit that determines whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model with reference to the address book and the category;
  A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
  A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
  A transmitter that transmits audio including the replaced part;
  A telephone for preventing prohibited words from being provided.

A prohibited word transmission prevention server that manages voice communication via a communication line and prevents voice transmission of inadvertent speech made by a user;
An address book storage unit for storing an address book for storing telephone numbers of acquaintances of users;
A calling user determination unit for determining a user who has transmitted voice;
A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
A voice analysis unit that recognizes the voice and analyzes it at a word level;
An inadvertent determination unit that determines whether or not to prevent transmission of voice for the inadvertent utterance based on the information described in the address book;
As a result of the determination, the analyzed word level speech is determined in advance for the user determined by the transmitting user determination unit on the condition that it is determined to prevent transmission of the voice for the inadvertent speech. A prohibited word determination unit that determines whether or not the prohibited word phoneme model matches,
A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
A transmitter that transmits audio including the replaced part;
A server for preventing prohibited words from being provided.

  A prohibited word transmission prevention server that manages voice communication via a communication line and prevents voice transmission of inadvertent speech made by a user;
  An address book storage unit for storing an address book for storing telephone numbers of acquaintances of users;
  As the category of inadvertent speech, a prohibited word storage unit including at least personal information and prohibited words,
  A calling user determination unit for determining a user who has transmitted voice;
  A voice temporary storage unit that receives and temporarily stores the voice uttered by the user;
  A voice analysis unit that recognizes the voice and analyzes it at a word level;
  A prohibited word determination unit that determines whether or not the analyzed word level speech matches a predetermined prohibited word phoneme model with reference to the address book and the category;
  A prohibited word speech utterance time measuring unit that identifies a portion determined to match the prohibited word phoneme model of the speech uttered by the user;
  A prohibited word replacement unit that replaces the sound of the identified part with a dummy sound;
  A transmitter that transmits audio including the replaced part;
  A server for preventing prohibited words from being provided.