JPWO2007015319A1

JPWO2007015319A1 - Audio output device, audio communication device, and audio output method

Info

Publication number: JPWO2007015319A1
Application number: JP2007503136A
Authority: JP
Inventors: 浩司幡野
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2005-08-02
Filing date: 2006-03-07
Publication date: 2009-02-19
Also published as: WO2007015319A1

Abstract

勧誘電話や電話利用者の個人情報を入手しようとする悪意呼に対して、電話利用者の国籍を知られることなく意思疎通不可能であると発呼者に思わせる携帯電話端末を提供する。携帯電話端末１００は、マイクロホン６１が取得した利用者の入力音声２０１を分析して音韻情報２０２を出力する音声分析手段２と、音韻情報２０２に含まれる音素または音節をランダムに置換して音声出力指示３０２として出力する音声出力制御手段３と、音声出力指示３０２に基づいて出力音声４０２を出力する音声出力手段４と、出力音声４０２を送話信号５０１として出力する信号切替手段７と、送話信号５０１を無線公衆網へ出力する無線通信手段５を備えることにより、利用者の音声を意味不明の音声に変換して発呼者に聞かせる。Provided is a mobile phone terminal that makes a caller think that a call cannot be communicated without knowing the nationality of the telephone user in response to a malicious call that seeks to obtain personal information of the telephone call or the telephone user. The mobile phone terminal 100 analyzes the user's input voice 201 acquired by the microphone 61 and outputs the phoneme information 202, and the phoneme or syllable included in the phoneme information 202 is randomly replaced to output the voice. Voice output control means 3 for outputting as an instruction 302, voice output means 4 for outputting an output voice 402 based on the voice output instruction 302, signal switching means 7 for outputting the output voice 402 as a transmission signal 501; By providing the wireless communication means 5 for outputting the signal 501 to the wireless public network, the user's voice is converted into an unknown meaning voice to be heard by the caller.

Description

本発明は、入力音声に処理を施して音声を出力する装置に関し、特に電話通信におけるいたずら電話等の不当または不正な呼びを防止する手段として利用可能な音声出力装置に関する。 The present invention relates to an apparatus that processes input voice and outputs the voice, and more particularly to a voice output apparatus that can be used as a means for preventing illegal or unauthorized calls such as mischievous telephone calls in telephone communications.

近年、電話を利用して人々を騙し、金銭を指定口座に振り込ませることで不正に利益を得ようとする「振り込め詐欺」が社会問題化している。また、電話を利用して利用者が望まない高額なサービスや商品を売りつけようとする悪徳商法も後を絶たない。このような詐欺行為や勧誘行為等を目的とした悪意呼の他にも、ランダムな電話番号に発呼して利用者に発声させ、利用者の国籍や性別、年代等の個人情報を入手し、詐欺行為や勧誘行為につなげようとする悪意呼もあることが知られている。 In recent years, “transfer fraud”, which attempts to gain profits by fraudulently using people over the phone and transferring money to a designated account, has become a social problem. In addition, there is no end to the unscrupulous commercial law that attempts to sell expensive services and products that users do not want by using the telephone. In addition to malicious calls for the purpose of fraud and solicitation, call a random phone number and have the user speak to obtain personal information such as the nationality, gender and age of the user. It is also known that there are malicious calls trying to lead to fraud and solicitation.

従来より、いたずら電話等の悪意呼を撃退するための装置が提案されてきた。たとえば、利用者が発する音声のピッチ周期を変換して相手に聞かせる電話装置が提案されている。この発明によれば、女性の利用者が発声しても相手には男性の声に聞こえるため、相手に利用者が男性であると思わせ、いたずら電話をやめさせることができる（例えば特許文献１参照）。 Conventionally, devices for repelling malicious calls such as mischievous telephones have been proposed. For example, there has been proposed a telephone device that converts the pitch period of a voice uttered by a user and listens to the other party. According to the present invention, even if a female user speaks, the other party can hear a male voice, so that the other party can think that the user is a male and stop the prank call (for example, Patent Document 1). reference).

また、留守番電話の自動応答機能を利用して「ただ今電話に出ることができません」といった定型メッセージを相手に聞かせることで、いたずら電話をあきらめさせる方法も知られている。
特開２０００−７８２４６号公報（第６ページ、図１） There is also a known method of giving up a mischievous call by using the automatic answering function of an answering machine and letting the other party hear a standard message such as “I can't answer the call right now”.
JP 2000-78246 A (6th page, FIG. 1)

しかしながら、上記従来の装置にあっては、利用者の性別が相手に知られることは避けられるものの、利用者の発話内容が保たれたまま相手に伝わってしまうため、利用者の国籍が判別されてしまい、利用者の個人情報を入手しようとする悪意呼に対する対策としては不十分であるという問題があった。また、詐欺行為や勧誘行為を目的とする者は利用者との意思疎通が可能であることを知り、悪意呼を継続したり繰り返したりしてしまうという問題があった。 However, in the above-mentioned conventional apparatus, the gender of the user is prevented from being known to the other party, but the user's nationality is determined because it is transmitted to the other party while maintaining the utterance content of the user. As a result, there is a problem that it is insufficient as a countermeasure against malicious calls to obtain personal information of users. In addition, there is a problem that a person aiming at fraud or solicitation learns that communication with the user is possible and continues or repeats malicious calls.

また、定型メッセージを相手に聞かせる従来の方法の場合、メッセージが利用者本人の発話によるものではないことを相手が容易に判別できるため、相手は利用者本人が電話に出るまであきらめず、悪意呼を何度も繰り返してしまうという問題があった。 In addition, in the case of the conventional method of letting the other party listen to a standard message, the other party can easily determine that the message is not due to the user's own utterance, so the other party does not give up until the user himself / herself answers the phone. There was a problem that the call was repeated many times.

本発明は、上記従来の課題を解決するものであり、相手に利用者の国籍を知られることがなく、利用者との意思疎通が不可能であると相手に思わせることにより、再度の悪意呼を抑止することのできる音声通信装置を提供することを目的とする。 The present invention solves the above-described conventional problems, and makes the other party not aware of the nationality of the user and makes the other party think that communication with the user is impossible. An object of the present invention is to provide a voice communication apparatus capable of suppressing calls.

前記従来の課題を解決するために、本発明の音声出力装置は、入力音声から音韻情報を抽出する音声分析手段と、音韻情報に基づいて音声出力を指示する音声出力制御手段と、音声出力制御手段の指示に基づいて音声を出力する音声出力手段とを用いて入力音声の音韻情報に基づいてランダムな音韻の音声を出力するよう構成した。 In order to solve the above-described conventional problems, a speech output device according to the present invention includes speech analysis means for extracting phonological information from input speech, speech output control means for instructing speech output based on phonological information, and speech output control. Random phoneme speech is output based on phoneme information of the input speech using speech output means for outputting speech based on the instructions of the means.

上記構成により、入力音声を元にして聞く者にとって意味を成さない音声を出力することができるので、相手に利用者の国籍を知られることなく、利用者との意思疎通が不可能であると思わせることができる。 With the above configuration, voice that does not make sense for the listener based on the input voice can be output, so communication with the user is impossible without the other party knowing the nationality of the user. Can be thought of.

また本発明の音声出力装置は、音韻情報が入力音声に含まれる音素または音節を識別する情報を含み、音声出力制御手段は、音素または音節をあらかじめ定めた規則に従って置換することによって出力音声を構成する音素または音節を決定する。 In the speech output device of the present invention, the phoneme information includes information for identifying a phoneme or syllable included in the input speech, and the speech output control means configures the output speech by replacing the phoneme or syllable according to a predetermined rule. Determine phonemes or syllables to play.

上記構成により、入力音声の音素または音節が別の音素または音節に置換された音声が出力されるため、聞く者にとって意味を成さない音声を出力することができる。そのため、相手に利用者の国籍を知られることなく、利用者との意思疎通が不可能であると思わせることができる。 With the above-described configuration, a voice in which the phoneme or syllable of the input voice is replaced with another phoneme or syllable is output, so that a voice that does not make sense for the listener can be output. For this reason, it is possible to make it impossible to communicate with the user without knowing the nationality of the user.

また本発明の音声出力装置は、音韻情報が入力音声が有音かどうかを表す情報を含み、音声出力制御手段は、有音かどうかを表す情報に基づいて音声出力を指示する。 In the sound output device of the present invention, the phoneme information includes information indicating whether the input sound is sounded, and the sound output control means instructs sound output based on the information indicating whether sound is sounded.

上記構成により、入力音声が有音かどうかによって音声出力の開始や停止を制御することができるため、利用者や相手の発話のタイミングに合わせた音声出力ができる。そのため、出力音声が定型メッセージや合成音声であると疑う余地を相手に与えず、利用者との意思疎通が不可能であると思わせることができる。 With the above configuration, since the start and stop of voice output can be controlled depending on whether or not the input voice is voiced, voice output that matches the utterance timing of the user or the other party can be performed. For this reason, it is possible to make it seem that communication with the user is impossible without giving the other party the room for suspicion that the output voice is a standard message or synthesized voice.

また本発明の音声出力装置は、音韻情報が入力音声の基本周波数を表す情報を含み、音声出力制御手段が基本周波数情報に基づいて出力音声の基本周波数を決定する。 In the speech output device of the present invention, the phoneme information includes information indicating the fundamental frequency of the input speech, and the speech output control means determines the fundamental frequency of the output speech based on the fundamental frequency information.

上記構成により、出力音声の基本周波数の変化が入力音声の基本周波数の変化と統計的に同様の性質を持つ揺らぎを持つようになり、出力音声の自然さが増す。そのため、出力音声が定型メッセージや合成音声であると疑う余地を相手に与えず、利用者との意思疎通が不可能であると思わせることができる。 With the above configuration, the change in the fundamental frequency of the output voice has fluctuations having the same characteristics as the change in the fundamental frequency of the input voice, and the naturalness of the output voice is increased. For this reason, it is possible to make it seem that communication with the user is impossible without giving the other party the room for suspicion that the output voice is a standard message or synthesized voice.

また本発明は、通信処理を実行し出力音声を通信先へ出力する通信手段を更に具備する音声通信装置を構成する。 The present invention also constitutes a voice communication device further comprising a communication means for executing communication processing and outputting output voice to a communication destination.

上記構成により、相手に利用者の国籍を知られることがなく、利用者との意思疎通が不可能であると思わせることにより、再度の悪意呼を抑止することのできる音声通信装置を提供することができる。 With the above configuration, a voice communication device is provided that can prevent another malicious call by making the other party not know the nationality of the user and making it impossible to communicate with the user. be able to.

また本発明の音声出力方法は、入力音声から音韻情報を抽出する第１のステップと、前記音韻情報に基づいてランダムな音韻で構成される出力音声の出力を指示する第２のステップと、前記指示に基づいて前記出力音声を出力する第３のステップとを有する。 The speech output method of the present invention includes a first step of extracting phonological information from an input speech, a second step of instructing output of an output speech composed of random phonemes based on the phonological information, And a third step of outputting the output sound based on the instruction.

上記方法により、聞く者にとって意味を成さない音声を入力音声に基づいて出力することができるので、相手に利用者の国籍を知られることなく、利用者との意思疎通が不可能であると思わせることができる。 By the above method, voice that does not make sense for the listener can be output based on the input voice, so it is impossible to communicate with the user without knowing the nationality of the user. It can be reminiscent.

本発明によれば、入力した音声の音韻情報をもとにランダムな音韻情報を作り出して音声を出力することにより、悪意呼を発呼した相手に利用者の国籍を知られることがなく、利用者との意思疎通が不可能であると思わせることができ、再度の悪意呼を抑止することができる。 According to the present invention, by generating random phonological information based on the phonological information of the input speech and outputting the speech, the other party who made the malicious call is not aware of the nationality of the user, It is possible to make it seem that communication with the person is impossible, and it is possible to suppress malicious calls again.

本発明の実施の形態１における音声出力装置の概略構成図1 is a schematic configuration diagram of an audio output device according to Embodiment 1 of the present invention. 本発明の実施の形態１における携帯電話端末の構成図Configuration diagram of mobile phone terminal according to Embodiment 1 of the present invention 本発明の実施の形態１における携帯電話端末の外観図External view of mobile phone terminal according to Embodiment 1 of the present invention 本発明の実施の形態１における携帯電話端末の音声出力制御手段が保持している音素置換候補テーブルの内容を説明する図The figure explaining the content of the phoneme replacement candidate table which the audio | voice output control means of the mobile telephone terminal in Embodiment 1 of this invention hold | maintains 本発明の実施の形態１における携帯電話端末の音声出力制御手段の処理手順を示すフローチャートThe flowchart which shows the process sequence of the audio | voice output control means of the mobile telephone terminal in Embodiment 1 of this invention. 本発明の実施の形態１における携帯電話端末の動作を説明する図The figure explaining operation | movement of the mobile telephone terminal in Embodiment 1 of this invention 本発明の実施の形態２における携帯電話端末の構成図Configuration diagram of mobile phone terminal according to Embodiment 2 of the present invention 本発明の実施の形態２における携帯電話端末の音声出力手段が保持している音節列テーブルの内容を説明する図The figure explaining the content of the syllable string table which the audio | voice output means of the mobile telephone terminal in Embodiment 2 of this invention hold | maintains 本発明の実施の形態２における携帯電話端末の音声出力制御手段の処理手順を示す第１のフローチャートFirst flowchart showing the processing procedure of the voice output control means of the mobile phone terminal according to the second embodiment of the present invention. 本発明の実施の形態２における携帯電話端末の音声出力制御手段の処理手順を示す第２のフローチャートSecond flowchart showing the processing procedure of the voice output control means of the mobile phone terminal according to the second embodiment of the present invention. 本発明の実施の形態２における携帯電話端末の音声出力制御手段の処理手順を示す第３のフローチャートThird flowchart showing the processing procedure of the voice output control means of the mobile phone terminal according to the second embodiment of the present invention. 本発明の実施の形態２における携帯電話端末の第１の動作を説明する図The figure explaining 1st operation | movement of the mobile telephone terminal in Embodiment 2 of this invention 本発明の実施の形態２における携帯電話端末の第２の動作を説明する図The figure explaining 2nd operation | movement of the mobile telephone terminal in Embodiment 2 of this invention

Explanation of symbols

１音声出力装置
２音声分析手段
３音声出力制御手段
４音声出力手段
５無線通信手段
７信号切替手段
４１定型音節保持手段
６１マイクロホン
６２スピーカー
９１信号切替ボタン
９２オンフックボタン
９３オフフックボタン
１００，２００携帯電話端末
２０１入力音声
２０２音韻情報
３０２音声出力指示
３０４オンフック信号
４０２出力音声
５０１送話信号
５０２受話信号
６１２マイクロホン出力信号DESCRIPTION OF SYMBOLS 1 Audio | voice output apparatus 2 Audio | voice analysis means 3 Audio | voice output control means 4 Audio | voice output means 5 Wireless communication means 7 Signal switching means 41 Fixed syllable holding means 61 Microphone 62 Speaker 91 Signal switching button 92 On-hook button 93 Off-hook button 100,200 201 Input voice 202 Phoneme information 302 Voice output instruction 304 On-hook signal 402 Output voice 501 Transmission signal 502 Reception signal 612 Microphone output signal

以下、本発明を実施するための最良の形態について、図面を参照しながら説明する。なお、実施の形態を説明するための全図において、同一の構成要素には同一符号を付与し、重複する説明は省略する。 The best mode for carrying out the present invention will be described below with reference to the drawings. Note that in all the drawings for explaining the embodiments, the same reference numerals are given to the same components, and duplicate explanations are omitted.

（実施の形態１）
図１は、本発明の実施の形態１における音声出力装置１の概略構成図である。図１において、音声出力装置１は音声分析手段２、音声出力制御手段３、音声出力手段４を備える。(Embodiment 1)
FIG. 1 is a schematic configuration diagram of an audio output device 1 according to Embodiment 1 of the present invention. In FIG. 1, the audio output device 1 includes an audio analysis unit 2, an audio output control unit 3, and an audio output unit 4.

音声分析手段２は、入力音声２０１を受け付けて音声分析処理を実行し、入力音声２０１の音韻情報を抽出して音韻情報２０２を出力する。なお、本明細書に記載する「音韻情報」とは音声分析の結果得られる音声の音韻的な情報を指す。例えば、音声を構成する音素や音節を識別するための記号や記号列、音声の強弱や高低を示す情報、基本周波数の情報、有音か無音かを区別するための情報等が本明細書記載の「音韻情報」に相当する。 The voice analysis unit 2 receives the input voice 201 and executes a voice analysis process, extracts phonological information of the input voice 201 and outputs phonological information 202. Note that “phonological information” described in this specification refers to phonological information of speech obtained as a result of speech analysis. For example, this specification includes symbols and symbol strings for identifying phonemes and syllables constituting speech, information indicating the strength and height of speech, information on fundamental frequencies, and information for distinguishing whether sound is present or not. It corresponds to "phoneme information".

音声出力制御手段３は、音韻情報２０２を受け付け、出力音声４０２の出力を指示する音声出力指示３０２を出力する。音声出力手段４は、音声出力指示３０２に基づいて出力音声４０２を出力する。ここで、音声出力制御手段３は、出力音声４０２がランダムな音韻で構成されるよう、音韻情報２０２に基づいて音声出力指示３０２を出力する。 The sound output control means 3 receives the phoneme information 202 and outputs a sound output instruction 302 that instructs the output of the output sound 402. The audio output unit 4 outputs an output audio 402 based on the audio output instruction 302. Here, the voice output control means 3 outputs a voice output instruction 302 based on the phoneme information 202 so that the output voice 402 is composed of random phonemes.

なお、音声分析手段２における音声分析処理には公知の音声分析装置や音声分析方法を用いてよく、従って本明細書中では詳しい説明を省略するものとする。また、音声出力手段４における音声出力処理には公知の音声合成装置や音声合成方法を用いてよく、従って本明細書中では詳しい説明を省略するものとする。 The voice analysis processing in the voice analysis means 2 may use a known voice analysis device or voice analysis method, and therefore detailed description thereof will be omitted in this specification. The voice output process in the voice output means 4 may use a known voice synthesizer or voice synthesizer, and therefore detailed description thereof will be omitted in this specification.

図２は、本発明の実施の形態１における音声出力装置１を携帯電話端末１００として構成したときの構成図である。図２において携帯電話端末１００は、図１で説明した構成要素に加えて無線通信手段５、マイクロホン６１、スピーカー６２、信号切替手段７、信号切替ボタン９１、オンフックボタン９２、オフフックボタン９３を備える。 FIG. 2 is a configuration diagram when the audio output device 1 according to Embodiment 1 of the present invention is configured as the mobile phone terminal 100. 2, the mobile phone terminal 100 includes a wireless communication unit 5, a microphone 61, a speaker 62, a signal switching unit 7, a signal switching button 91, an on-hook button 92, and an off-hook button 93 in addition to the components described in FIG.

音声分析手段２は、入力音声２０１を受け付けて音声分析処理を実行し、入力音声２０１の短時間平均パワーを算出して、受け付けた入力音声２０１が有音区間であるか無音区間であるかを判定し、有音の場合に信号「ＯＮ」を、無音の場合に信号「ＯＦＦ」を音韻情報２０２として出力する。また音声分析手段２は、入力音声２０１にケプストラム分析等を施すことにより音素を識別し、音素記号（たとえば／ｐ／、／ａ／）を音韻情報２０２として出力する。さらに音声分析手段２は、入力音声２０１の３０ミリ秒ごとの基本周波数を表す基本周波数情報を音韻情報２０２として出力する。 The voice analysis means 2 receives the input voice 201 and executes voice analysis processing, calculates the short-term average power of the input voice 201, and determines whether the received input voice 201 is a voiced section or a silent section. The signal “ON” is output as phoneme information 202 when there is sound and the signal “OFF” is output when there is no sound. The voice analysis unit 2 identifies phonemes by performing cepstrum analysis or the like on the input voice 201 and outputs phoneme symbols (for example, / p /, / a /) as phoneme information 202. Furthermore, the voice analysis means 2 outputs the fundamental frequency information representing the fundamental frequency every 30 milliseconds of the input voice 201 as the phoneme information 202.

音声出力制御手段３は、音韻情報２０２に含まれる音素記号と基本周波数情報に基づいて、出力音声４０２が含むべき音素および出力音声４０２の基本周波数を決定し、音声出力指示３０２として出力する。 The voice output control means 3 determines the phoneme to be included in the output voice 402 and the fundamental frequency of the output voice 402 based on the phoneme symbol and the fundamental frequency information included in the phoneme information 202, and outputs it as the voice output instruction 302.

音声出力手段４は、音声出力指示３０２が指示する音素および基本周波数に基づいて出力音声４０２を合成して出力する。また音声出力手段４は、音素が遷移する際の調音結合の処理、および基本周波数変化の補間処理等を行うことにより、出力音声４０２が音声として不自然にならないようにしている。 The voice output unit 4 synthesizes and outputs the output voice 402 based on the phoneme and the fundamental frequency indicated by the voice output instruction 302. The audio output means 4 performs the articulation coupling process when the phoneme transitions, the interpolation process of the fundamental frequency change, and the like so that the output sound 402 does not become unnatural as a sound.

無線通信手段５は、携帯電話端末１００を無線公衆網（図示せず）と接続して通信処理を実行する部分である。無線通信手段５は、送話信号５０１を無線公衆網へ出力する。また無線通信手段５は、無線公衆網から取得した受話信号５０２をスピーカー６２へ出力する。また無線通信手段５は、オフフックボタン９３の操作に基づいて送話信号５０１の無線公衆網への出力および受話信号５０２の無線公衆網からの取得を開始する。また無線通信手段５は、オンフックボタン９２の操作に基づいて無線公衆網との接続を終了する。 The wireless communication means 5 is a part that connects the mobile phone terminal 100 to a wireless public network (not shown) and executes communication processing. The wireless communication means 5 outputs the transmission signal 501 to the wireless public network. Further, the wireless communication means 5 outputs the reception signal 502 acquired from the wireless public network to the speaker 62. The wireless communication means 5 starts outputting the transmission signal 501 to the wireless public network and acquiring the reception signal 502 from the wireless public network based on the operation of the off-hook button 93. The wireless communication means 5 ends the connection with the wireless public network based on the operation of the on-hook button 92.

マイクロホン６１は利用者の音声を電気信号に変換してマイクロホン出力信号６１２を出力する。スピーカー６２は受話信号５０２を空気振動に変換して放音する。信号切替手段７は、２種類の信号を切り替えて出力する手段であり、信号切替ボタン９１の操作により出力音声４０２を送話信号５０１として出力するかマイクロホン出力信号６１２を送話信号５０１として出力するかを切り替えることができる。すなわち、利用者の音声に処理を施した出力音声４０２と、処理を施さないマイクロホン出力信号６１２のうち、どちらの音声を無線公衆網へ出力して通話相手に聞かせるかを、信号切替ボタン９１の操作によって利用者が選択することができるよう構成されている。なお、信号切替手段７は、無線通信手段５による通信処理の初期状態において出力音声４０２を送話信号５０１として出力するようになっている。このことにより、通話相手が悪意を抱いていることを知らずに利用者が思わず発した音声を通話相手に聞かれてしまい、利用者の個人情報が通話相手に知られてしまうという不都合を防ぐことができる。 The microphone 61 converts the user's voice into an electrical signal and outputs a microphone output signal 612. The speaker 62 converts the received signal 502 into air vibration and emits sound. The signal switching means 7 is a means for switching and outputting two types of signals. By operating the signal switching button 91, the output sound 402 is output as the transmission signal 501 or the microphone output signal 612 is output as the transmission signal 501. Can be switched. In other words, the signal switching button 91 indicates which of the output voice 402 processed for the user's voice and the microphone output signal 612 that is not processed is output to the wireless public network to be heard by the other party. It is configured so that the user can select by the operation. The signal switching unit 7 outputs the output voice 402 as the transmission signal 501 in the initial state of the communication process by the wireless communication unit 5. This prevents inconvenience that the other party's personal information is heard by the other party because the other party's voice is heard without knowing that the other party is malicious. Can do.

図３は、本発明の実施の形態における携帯電話端末１００の外観を示す図である。信号切替ボタン９１は、筐体下部の側面に配置され、利用者が通話中に手元を見ずに操作できるようになっている。 FIG. 3 is a diagram showing an appearance of the mobile phone terminal 100 according to the embodiment of the present invention. The signal switching button 91 is arranged on the side surface of the lower part of the casing so that the user can operate it without looking at his / her hand during a call.

図４は、本発明の実施の形態１における携帯電話端末１００の、音声出力制御手段３が保持している音素置換候補テーブルＴ３１の内容を示す図である。音素置換候補テーブルＴ３１は、音韻情報２０２に含まれる音素を置換するための別の音素の候補を示したテーブルである。音素置換候補テーブルＴ３１の各レコードＲ３１１〜Ｒ３１８は、音韻情報２０２を構成する音素ｐを表す第１フィールドＦ３１１と、音素ｐの替わりとなる別の音素ｐ’の候補を表す第２フィールドＦ３１２とから成る。音声出力制御手段３は音素置換候補テーブルＴ３１に基づき所定の規則（以下、音素置換規則という）に従って音韻情報２０２を構成する音素ｐを音素ｐ’で置換することによって音声出力指示３０２を生成する。すなわち音声出力制御手段３は、音素置換候補テーブルＴ３１中の、音素ｐを第１フィールド（Ｆ３１１）に含むレコードをレコードＲ３１１〜Ｒ３１８の中から探し、該当するレコードの第２フィールド（Ｆ３１２）が示す音素ｐ’の候補からランダムに選択した音素を音素ｐ’とし、音素ｐを音素ｐ’で置換することによって音声出力指示３０２を生成する。 FIG. 4 is a diagram showing the contents of the phoneme replacement candidate table T31 held by the audio output control means 3 of the mobile phone terminal 100 according to Embodiment 1 of the present invention. The phoneme replacement candidate table T31 is a table showing another phoneme candidate for replacing a phoneme included in the phoneme information 202. Each record R311 to R318 of the phoneme replacement candidate table T31 includes a first field F311 representing a phoneme p constituting the phoneme information 202, and a second field F312 representing a candidate of another phoneme p ′ that replaces the phoneme p. Become. The speech output control means 3 generates a speech output instruction 302 by replacing the phoneme p constituting the phoneme information 202 with the phoneme p ′ according to a predetermined rule (hereinafter referred to as phoneme replacement rule) based on the phoneme replacement candidate table T31. That is, the speech output control means 3 searches the record R311 to R318 for a record including the phoneme p in the first field (F311) in the phoneme replacement candidate table T31, and the second field (F312) of the corresponding record indicates it. The phoneme p ′ is randomly selected from the phoneme p ′ candidates, and the phoneme p ′ is replaced with the phoneme p ′.

図５は、本発明の実施の形態１における携帯電話端末１００の、音声出力制御手段３の処理手順を表すフローチャートである。音声出力制御手段３は、まず音韻情報２０２を取得し（ステップＳ１０１）、音韻情報２０２に含まれている信号が「ＯＮ」であるか「ＯＦＦ」であるかに基づいて入力音声２０１が有音区間であるかどうかを判断する（ステップＳ１０２）。有音区間である場合（ＹＥＳ）、音声出力制御手段３はステップＳ１０３へ進み、無音区間である場合（ＮＯ）、ステップＳ１０１へ進む。 FIG. 5 is a flowchart showing the processing procedure of audio output control means 3 of mobile phone terminal 100 according to Embodiment 1 of the present invention. The voice output control means 3 first acquires the phoneme information 202 (step S101), and the input voice 201 is voiced based on whether the signal included in the phoneme information 202 is “ON” or “OFF”. It is determined whether it is a section (step S102). If it is a voiced section (YES), the audio output control means 3 proceeds to step S103, and if it is a silent section (NO), the process proceeds to step S101.

ステップＳ１０３において音声出力制御手段３は、今回取得した音韻情報２０２に含まれる音素が前回取得した音韻情報２０２に含まれる音素と同じかどうかを判定し、同じであれば（ＮＯ）ステップＳ１０５へ進み、異なっていれば（ＹＥＳ）ステップＳ１０４へ進む。 In step S103, the speech output control means 3 determines whether or not the phoneme included in the phoneme information 202 acquired this time is the same as the phoneme included in the phoneme information 202 acquired last time (NO), and proceeds to step S105. If they are different (YES), the process proceeds to step S104.

ステップＳ１０４において音声出力制御手段３は、音韻情報２０２を構成する音素ｐを音素置換規則に従って新たな音素ｐ’に置換して音声出力指示３０２とする。これに対し、ステップＳ１０５において音声出力制御手段３は、音韻情報２０２を構成する音素ｐを前回のステップＳ１０４の処理で得た音素ｐ’に置換して音声出力指示３０２とする。ステップＳ１０３の判断とステップＳ１０５の処理は、入力音声２０１中の同一音素が連続した区間を別の同一音素が連続した区間に変換するためのものである。 In step S104, the voice output control unit 3 replaces the phoneme p constituting the phoneme information 202 with a new phoneme p 'in accordance with the phoneme replacement rule to obtain a voice output instruction 302. On the other hand, in step S105, the voice output control unit 3 replaces the phoneme p constituting the phoneme information 202 with the phoneme p 'obtained in the previous step S104 to obtain a voice output instruction 302. The determination in step S103 and the processing in step S105 are for converting a section in which the same phoneme in the input speech 201 is continued into a section in which another same phoneme is continued.

ステップＳ１０６において音声出力制御手段３は、音韻情報２０２に含まれている基本周波数情報が示す基本周波数Ｆ０に基づき、周波数変換式に従って基本周波数Ｆ０’を計算し、音声出力指示３０２とする。周波数変換式は次式のとおりである。 In step S 106, the voice output control unit 3 calculates the fundamental frequency F 0 ′ according to the frequency conversion formula based on the fundamental frequency F 0 indicated by the fundamental frequency information included in the phoneme information 202, and sets it as the voice output instruction 302. The frequency conversion formula is as follows.

Ｆ０’＝Ｆ０＊ｒ＊（乱数（０．４）＋０．８）
ここでｒは、出力音声４０２の基本周波数を入力音声２０１の基本周波数に対してどの程度高くするかを指示するための予め定められた係数である。係数ｒは、ｒ＜１とすることにより出力音声４０２の基本周波数を入力音声２０１の基本周波数よりも全体的に低くすることがでる。たとえばｒ＝０．５とすれば、入力音声２０１の女声が男声のように変換されて出力音声４０２として出力されることになるので、通話相手に利用者の性別が知られることを防ぐことができる。係数ｒは利用者の操作によって変更することを可能としてもよい。乱数（０．４）は、０．４未満の乱数を表す。基本周波数に乱数等で揺らぎを与えることにより、入力音声４０２のイントネーションの変化を攪乱することができ、出力音声４０２のイントネーションから入力音声２０１が何語であるかを通話相手に悟られてしまうことを防いでいる。F0 ′ = F0 * r * (random number (0.4) +0.8)
Here, r is a predetermined coefficient for instructing how much the fundamental frequency of the output speech 402 is to be higher than the fundamental frequency of the input speech 201. The coefficient r is set to r <1, so that the fundamental frequency of the output sound 402 can be made lower than the fundamental frequency of the input sound 201 as a whole. For example, if r = 0.5, the female voice of the input voice 201 is converted like a male voice and output as the output voice 402, thus preventing the calling party from knowing the gender of the user. it can. The coefficient r may be changed by a user operation. The random number (0.4) represents a random number less than 0.4. By giving fluctuations to the fundamental frequency with a random number or the like, the change in intonation of the input voice 402 can be disturbed, and the other party can recognize the language of the input voice 201 from the intonation of the output voice 402. Is preventing.

ステップＳ１０７において音声出力制御手段３は、ステップＳ１０６までの処理で生成した音声出力指示３０２を音声出力手段４へ出力し、ステップＳ１０１へ戻る。 In step S107, the audio output control unit 3 outputs the audio output instruction 302 generated by the processing up to step S106 to the audio output unit 4, and the process returns to step S101.

以下、本発明の実施の形態１の具体的な動作を説明する。すなわち、利用者の音声に処理を施して無線公衆網へ出力し、通話相手に聞かせる具体例である。ここで、利用者が日本語を母国語とする女性であり、係数ｒが０．５である場合を例として説明する。 The specific operation of Embodiment 1 of the present invention will be described below. In other words, this is a specific example in which the user's voice is processed and output to the wireless public network, and is then heard by the other party. Here, a case where the user is a woman whose native language is Japanese and the coefficient r is 0.5 will be described as an example.

図６は、本発明の実施の形態１における携帯電話端末１００において、通話時に利用者が音声「どちらさま？」を発した際の入力音声２０１と出力音声４０２の内容を示す図である。図６において横軸は時刻を、縦軸は基本周波数を表している。また、線群Ｌ１０１は入力音声２０１の基本周波数変化を、線群Ｌ１０２は出力音声４０２の基本周波数変化を表している。また、線群Ｌ１０１と線群Ｌ１０２のすぐ上に記載されている音素記号はそれぞれ音韻情報２０２に含まれる音素（Ｄ１０１〜Ｄ１０６）と、音声出力指示３０２で指示される音素（Ｄ１１１〜Ｄ１１６）を示す。以下、図６を中心に本発名の実施の形態１における携帯電話端末１００の動作を説明する。 FIG. 6 is a diagram showing the contents of the input voice 201 and the output voice 402 when the user utters the voice “Where?” During a call in the mobile phone terminal 100 according to Embodiment 1 of the present invention. In FIG. 6, the horizontal axis represents time, and the vertical axis represents the fundamental frequency. A line group L101 represents a change in the fundamental frequency of the input sound 201, and a line group L102 represents a change in the fundamental frequency of the output sound 402. The phoneme symbols described immediately above the line group L101 and the line group L102 are the phonemes (D101 to D106) included in the phoneme information 202 and the phonemes (D111 to D116) indicated by the voice output instruction 302, respectively. Show. Hereinafter, the operation of the mobile phone terminal 100 according to the first embodiment of the present invention will be described with reference to FIG.

まず、利用者が通話を開始すると、音声分析手段２が入力音声２０１を分析して時刻ｔ１０１に音韻情報２０２を抽出する。音韻情報２０２は音素／ｄ／（Ｄ１０１）と基本周波数Ｆ０の情報、有音区間であることを示す信号「ＯＮ」を含んでいる。 First, when the user starts a call, the voice analysis unit 2 analyzes the input voice 201 and extracts the phoneme information 202 at time t101. The phoneme information 202 includes phoneme / d / (D101) and fundamental frequency F0 information, and a signal “ON” indicating a voiced section.

次に、音声出力制御手段３が図５のフローチャートに示す手順で処理を実行する。ステップＳ１０１において音声出力制御手段３は、音韻情報２０２を取得する。ステップＳ１０２において音声出力制御手段３は、有音区間（「ＯＮ」）である（ＹＥＳ）のでステップＳ１０３へ進む。ステップＳ１０３において音声出力制御手段３は、新たに音素／ｄ／を受け付けたので判断は「ＹＥＳ」となり、ステップＳ１０４へ進む。 Next, the audio output control means 3 executes processing according to the procedure shown in the flowchart of FIG. In step S 101, the audio output control unit 3 acquires phonological information 202. In step S102, the sound output control means 3 is a sound section ("ON") (YES), so the process proceeds to step S103. In step S103, the voice output control means 3 has newly accepted the phoneme / d /, so the determination is “YES”, and the flow proceeds to step S104.

ステップＳ１０４において音声出力制御手段３は、図４の音素置換候補テーブルＴ３１を参照し、音素置換規則に従って音素／ｄ／を置換するための音素ｐ’を選ぶ。このとき、音素置換候補テーブルＴ３１中の、音素／ｄ／を第１フィールドとするレコードはレコードＲ３１３であるので、音声出力制御手段３は音素／ｋ／または／ｇ／の中からランダムに選択した音素を音素ｐ’とする。ここで音素ｐ’として／ｋ／が選択されたものとする。 In step S104, the speech output control means 3 refers to the phoneme replacement candidate table T31 in FIG. 4 and selects a phoneme p 'for replacing the phoneme / d / according to the phoneme replacement rule. At this time, since the record having the phoneme / d / as the first field in the phoneme replacement candidate table T31 is the record R313, the voice output control means 3 randomly selects from the phonemes / k / or / g /. A phoneme is defined as a phoneme p ′. Here, it is assumed that / k / is selected as the phoneme p '.

図５のステップＳ１０６において音声出力制御手段３は、周波数変換式に従って基本周波数Ｆ０’を計算し、ステップ１０７において音声出力制御手段３は、音素／ｋ／と基本周波数Ｆ０’を音声出力指示３０２として出力する。 In step S106 of FIG. 5, the audio output control means 3 calculates the basic frequency F0 ′ according to the frequency conversion equation. In step 107, the audio output control means 3 uses the phoneme / k / and the basic frequency F0 ′ as the audio output instruction 302. Output.

さらに、音声出力手段４が音声出力指示３０２に基づいて出力音声４０２を合成して出力する。出力音声４０２は図６に示すとおり、音素／ｋ／（Ｄ１１１）から成る音声であり、基本周波数は入力音声２０１の基本周波数の約半分となっている。 Further, the voice output unit 4 synthesizes and outputs the output voice 402 based on the voice output instruction 302. As shown in FIG. 6, the output sound 402 is a sound composed of phonemes / k / (D111), and the fundamental frequency is about half of the fundamental frequency of the input sound 201.

入力音声２０１の音素／ｏ／（Ｄ１０２）の部分に対しても同様の処理を行うことにより、音素／ｉ／（Ｄ１１２）から成る出力音声４０２が音声出力手段４から出力される。出力音声４０２は信号切替手段を通して送話信号５０１として無線通信手段５へ送られ、さらに無線公衆網を介して通話相手の端末へ出力される。 By performing the same process on the phoneme / o / (D102) portion of the input speech 201, the output speech 402 composed of the phoneme / i / (D112) is output from the speech output means 4. The output voice 402 is sent to the wireless communication means 5 as the transmission signal 501 through the signal switching means, and is further outputted to the terminal of the other party through the wireless public network.

次に続く音素／ｃｈ／（Ｄ１０３）は、無声子音であるため音声分析手段２は基本周波数を抽出できず、音韻情報２０２における基本周波数Ｆ０の値として０を出力する。音声出力制御手段３は、ステップＳ１０６の処理において基本周波数Ｆ０’の値として０を出力するが、音声出力手段４は、前回受け付けた音素／ｉ／（Ｄ１１２）の基本周波数と、次に受け付けた音素／ａ／（Ｄ１１４）の基本周波数とを補間し、音素／ｒ／（Ｄ１１３）に対応する出力音声４０２の基本周波数をなめらかに変化させている。 Since the next phoneme / ch / (D103) is an unvoiced consonant, the speech analysis means 2 cannot extract the fundamental frequency and outputs 0 as the value of the fundamental frequency F0 in the phoneme information 202. The voice output control means 3 outputs 0 as the value of the fundamental frequency F0 ′ in the process of step S106, but the voice output means 4 accepts the fundamental frequency of the phoneme / i / (D112) accepted last time and the next accepted frequency. The basic frequency of the phoneme / a / (D114) is interpolated to change the basic frequency of the output speech 402 corresponding to the phoneme / r / (D113) smoothly.

また、入力音声２０１の音素／ａ／（Ｄ１０５）に対応する出力音声４０２の音素／ｉ／（Ｄ１１５）の基本周波数は、図５のステップＳ１０６の周波数変換式における乱数の影響により、直前の音素／ｎ／の基本周波数に比較して若干上昇している。 Further, the fundamental frequency of the phoneme / i / (D115) of the output speech 402 corresponding to the phoneme / a / (D105) of the input speech 201 is the previous phoneme due to the influence of random numbers in the frequency conversion formula of step S106 in FIG. It is slightly higher than the fundamental frequency of / n /.

以下、同様の処理を繰り返すことにより、日本語の入力音声２０１「どちらさま？」が、日本語として意味を成さない男声「きらにじょひ？」に変換され、無線公衆網を介して通話相手の端末から出力されることになる。 Thereafter, by repeating the same processing, the Japanese input voice 201 “Whisama?” Is converted into a male voice “Kirani Johi?” That does not make sense as Japanese, and is transmitted via a wireless public network. It is output from the other party's terminal.

以上説明したように、本発明の実施の形態１における携帯電話端末１００では、第１のステップとして音声分析手段２が入力音声２０１から音韻情報を抽出して音韻情報２０２を出力し、第２のステップとして音声出力制御手段３が図５のフローチャートに示す手順で処理を実行することにより音韻情報２０２に基づいて音声出力指示３０２を出力し、第３のステップとして音声出力手段４が音声出力指示３０２に基づいて出力音声４０２を合成して出力している。この音声出力方法により、入力音声の音素が別の音素に置換されたランダムな音声を通話相手の端末へ出力することができる。 As described above, in the mobile phone terminal 100 according to Embodiment 1 of the present invention, as a first step, the speech analysis means 2 extracts phonological information from the input speech 201 and outputs the phonological information 202. As a step, the voice output control means 3 executes processing in the procedure shown in the flowchart of FIG. 5 to output a voice output instruction 302 based on the phonological information 202. As a third step, the voice output means 4 outputs a voice output instruction 302. The output voice 402 is synthesized and output based on the above. By this voice output method, random voice in which the phoneme of the input voice is replaced with another phoneme can be output to the terminal of the other party.

以上の説明から明らかなように、本発明の実施の形態１における携帯電話端末は、入力音声の音素が別の音素に置換された音声が通話相手の端末へ出力されるため、通話相手にとって意味を成さない音声を出力することができるので、通話相手に利用者の国籍を知られることなく、利用者との意思疎通が不可能であると通話相手に思わせることができる。 As is clear from the above description, the mobile phone terminal according to Embodiment 1 of the present invention outputs a voice in which the phoneme of the input voice is replaced with another phoneme to the other party's terminal. Therefore, it is possible to make the call partner feel that communication with the user is impossible without knowing the nationality of the user.

また、本発明の実施の形態１における携帯電話端末は、出力音声の基本周波数の変化が入力音声の基本周波数の変化と統計的に同様の性質を持つ揺らぎを持つようになり、出力音声の自然さが増すので、出力音声が定型メッセージや合成音声であると疑う余地を通話相手に与えず、利用者との意思疎通が不可能であると確実に通話相手に思わせることができる。 In addition, the mobile phone terminal according to Embodiment 1 of the present invention has fluctuations in which the change in the fundamental frequency of the output voice has a statistically similar property to the change in the fundamental frequency of the input voice. As a result, it is possible to make the call partner surely think that it is impossible to communicate with the user without giving the call partner room to suspect that the output voice is a standard message or a synthesized voice.

（実施の形態２）
次に本発明の第２の実施の形態について説明する。本実施の形態２では、無線通信手段５の出力である受話信号を入力音声として用いている点に特徴がある。(Embodiment 2)
Next, a second embodiment of the present invention will be described. The second embodiment is characterized in that the received signal that is the output of the wireless communication means 5 is used as the input voice.

図７は、本発明の実施の形態２における携帯電話端末２００の構成図である。図７において、携帯電話端末２００は音声分析手段２、音声出力制御手段３、音声出力手段４、定型音節保持手段４１、無線通信手段５、マイクロホン６１、スピーカー６２、信号切替手段７、信号切替ボタン９１、オンフックボタン９２、オフフックボタン９３を備える。ここで、定型音節保持手段４１は出力音声４０２を構成するための定型的な音節列を保持する手段である。 FIG. 7 is a configuration diagram of the mobile phone terminal 200 according to Embodiment 2 of the present invention. In FIG. 7, the mobile phone terminal 200 includes a voice analysis unit 2, a voice output control unit 3, a voice output unit 4, a fixed syllable holding unit 41, a wireless communication unit 5, a microphone 61, a speaker 62, a signal switching unit 7, and a signal switching button. 91, an on-hook button 92, and an off-hook button 93. Here, the fixed syllable holding means 41 is a means for holding a fixed syllable string for constituting the output sound 402.

音声分析手段２は、無線通信手段５の出力である受話信号５０２を入力音声２０１として受け付けて音声分析処理を実行し、入力音声２０１の短時間平均パワーを算出して、受け付けた入力音声２０１が有音区間であるか無音区間であるかを判定し、有音の場合に信号「ＯＮ」を、無音の場合に信号「ＯＦＦ」を音韻情報２０２として出力する。 The voice analysis unit 2 accepts the received signal 502 that is the output of the wireless communication unit 5 as the input voice 201, executes voice analysis processing, calculates the short-term average power of the input voice 201, and the received input voice 201 is It is determined whether it is a voiced section or a silent section, and a signal “ON” is output as phoneme information 202 when there is a voice and a signal “OFF” is output when there is no sound.

音声出力制御手段３は、音韻情報２０２に基づいて、出力音声４０２として出力すべき音節列を識別する音節列ＩＤを生成し、音声出力指示３０２として出力する。また音声出力制御手段３は、無線通信手段５が無線公衆網との接続を終了するための指示であるオンフック信号３０４を出力する。 The voice output control unit 3 generates a syllable string ID for identifying a syllable string to be output as the output voice 402 based on the phonological information 202, and outputs the syllable string ID as a voice output instruction 302. The audio output control unit 3 outputs an on-hook signal 304 that is an instruction for the wireless communication unit 5 to end the connection with the wireless public network.

音声出力手段４は、音声出力指示３０２に含まれる音節列ＩＤに基づいて定型音節保持手段４１から取得した音節列データを元にして出力音声４０２を合成して出力する。定型音節保持手段４１は、音節列ＩＤに基づいて音節列データを得るためのテーブルである音節列テーブルを保持する。 The voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data acquired from the fixed syllable holding means 41 based on the syllable string ID included in the voice output instruction 302. The fixed syllable holding means 41 holds a syllable string table which is a table for obtaining syllable string data based on the syllable string ID.

図８は、本発明の実施の形態２における携帯電話端末２００の、定型音節保持手段４１が保持する音節列テーブルＴ４１の内容を示す図である。音節列テーブルＴ４１の各レコードＲ４１１〜Ｒ４１４は、音節列ＩＤを表す第１フィールドＦ４１１と、音節列ＩＤに対応する音節列データを表す第２フィールドＦ４１２から成る。音節列データは出力音声４０２として出力すべき音節（括弧［］で囲まれているアルファベット）と、音節の基本周波数の係数（括弧（）で囲まれている数値）のペアの列で表されている。たとえば、レコードＲ４１１に格納されている音節列ＩＤ＝１に対応する音節列データ「［ｈｏ］（１．０），［ｌａ］（１．２）」は、音節［ｈｏ］と音節［ｌａ］を連続して、それぞれ中程度の基本周波数と高めの基本周波数で出力することを表している。 FIG. 8 is a diagram showing the contents of the syllable string table T41 held by the fixed syllable holding means 41 of the mobile phone terminal 200 according to Embodiment 2 of the present invention. Each record R411 to R414 of the syllable string table T41 includes a first field F411 representing a syllable string ID and a second field F412 representing syllable string data corresponding to the syllable string ID. The syllable string data is represented by a string of pairs of syllables (alphabetic characters enclosed in parentheses []) to be output as the output speech 402 and coefficients of fundamental frequencies of syllables (numeric values enclosed in parentheses ()). Yes. For example, the syllable string data “[ho] (1.0), [la] (1.2)” corresponding to the syllable string ID = 1 stored in the record R411 includes the syllable [ho] and the syllable [la]. Are continuously output at a medium fundamental frequency and a higher fundamental frequency.

また図８において、レコードＲ４１１の音節列ＩＤ＝１に対応する音節列データは、通話開始直後の挨拶として発声される定型句（たとえば「もしもし」）であるかのように通話相手に思わせるためのデータである。また、レコードＲ４１２の音節列ＩＤ＝２に対応する音節列データは、通話相手に問い返す際に発声される定型句（たとえば「はい？」）であるかのように通話相手に思わせるためのデータである。音節列データは、通話相手にとって意味を成さない音声を生成することを意図したものではあるが、何らかの定型句であるかのように通話相手に思わせるための音節列データをあらかじめ用意しておき、ときどき出力音声４０２中に出現させるようにすることにより、出力音声４０２に自然言語としての自然さが生まれ、出力音声４０２が利用者本人の発声によるものであると通話相手に信じさせることができる。 Further, in FIG. 8, the syllable string data corresponding to the syllable string ID = 1 in the record R411 is made to make the caller feel as if it is a fixed phrase (for example, “Hello”) uttered as a greeting immediately after the start of the call. It is data of. Further, the syllable string data corresponding to the syllable string ID = 2 in the record R412 is data for making the calling party feel as if it is a fixed phrase (for example, “Yes?”) Uttered when asking the calling party. It is. The syllable string data is intended to generate speech that does not make sense to the other party, but prepares syllable string data in advance to make the other party think as if it is some fixed phrase. Occasionally, by making it appear in the output voice 402, naturalness as a natural language is born in the output voice 402, and it is possible to make the other party believe that the output voice 402 is due to the voice of the user himself / herself. it can.

図９〜図１１は、本発明の実施の形態２における携帯電話端末２００の、音声出力制御手段３の処理手順を表すフローチャートである。音声出力制御手段３は、まずステップＳ２１１において音韻情報２０２を取得し、ステップＳ２１２において音韻情報２０２に含まれている信号が「ＯＮ」であるか「ＯＦＦ」であるかに基づいて入力音声２０１が有音区間であるかどうかを判断する。有音区間である場合（ＹＥＳ）、音声出力制御手段３は図１０のステップＳ２２１へ進み、無音区間である場合（ＮＯ）、ステップＳ２１３へ進む。 9 to 11 are flowcharts showing the processing procedure of the audio output control means 3 of the mobile phone terminal 200 according to Embodiment 2 of the present invention. The voice output control means 3 first acquires the phoneme information 202 in step S211, and the input voice 201 is determined based on whether the signal included in the phoneme information 202 is "ON" or "OFF" in step S212. Judge whether it is a voiced section. If it is a voiced section (YES), the audio output control means 3 proceeds to step S221 in FIG. 10, and if it is a silent section (NO), the process proceeds to step S213.

ステップＳ２１３において音声出力制御手段３は、音声出力手段４が出力音声４０２を出力中であるかどうかを判断し、出力中（ＹＥＳ）であればステップＳ２１１へ戻り、出力中でない（ＮＯ）ならばステップＳ２１４へ進む。ステップＳ２１４において音声出力制御手段３は、入力音声２０１の無音区間が通話開始後の最初の無音区間である、すなわち通話開始直後であるかどうかを判断し、最初の無音区間であれば（ＹＥＳ）図１１のステップＳ２３１へ進み、そうでなければ（ＮＯ）ステップＳ２１５へ進む。 In step S213, the audio output control unit 3 determines whether the audio output unit 4 is outputting the output audio 402. If the audio output unit 4 is outputting (YES), the process returns to step S211. If not output (NO). Proceed to step S214. In step S214, the voice output control means 3 determines whether or not the silent section of the input voice 201 is the first silent section after the start of the call, that is, immediately after the start of the call. If it is the first silent section (YES) The process proceeds to step S231 in FIG. 11, otherwise (NO), the process proceeds to step S215.

ステップ２１５において音声出力制御手段３は、０以上１未満の擬似乱数ｄを発生し、その値によって処理を分岐する。ｄが０．２未満の場合はステップＳ２１６へ進み、０．２以上０．９未満の場合はステップＳ２１７へ進み、ｄが０．９以上の場合はステップＳ２１９へ進む。 In step 215, the audio output control means 3 generates a pseudo random number d of 0 or more and less than 1, and branches the process depending on the value. If d is less than 0.2, the process proceeds to step S216. If d is not less than 0.2 and less than 0.9, the process proceeds to step S217. If d is not less than 0.9, the process proceeds to step S219.

ステップＳ２１６において音声出力制御手段３は、音節列ＩＤ＝２、すなわち通話相手に問い返すような音節列を出力することを選択する。また、ステップＳ２１７において音声出力制御手段３は、図８の音節列テーブルＴ４１に格納されている音節列ＩＤまたは音節列ＩＤ＝０のうちから音節列ＩＤをランダムに選択する。音節列ＩＤ＝０は、音声出力手段４の出力音声４０２を停止することを示す。ステップＳ２１８において音声出力制御手段３は、ステップＳ２１６またはＳ２１７で選択した音節列ＩＤを音声出力指示３０２として出力してステップＳ２１１へ戻る。 In step S216, the voice output control means 3 selects to output a syllable string ID = 2, that is, to output a syllable string that asks the other party. In step S217, the audio output control means 3 randomly selects a syllable string ID from the syllable string ID or the syllable string ID = 0 stored in the syllable string table T41 of FIG. The syllable string ID = 0 indicates that the output sound 402 of the sound output means 4 is stopped. In step S218, the audio output control means 3 outputs the syllable string ID selected in step S216 or S217 as the audio output instruction 302, and returns to step S211.

ステップＳ２１９において音声出力制御手段３は、オンフック信号３０４を出力することにより、無線通信手段５は無線公衆網との接続を終了する。このように、確率的に通話を切断する処理を実行することによって、利用者が通話相手の発話の意味がわからず通話を継続しても無駄だと判断してオンフック操作を行ったかのように通話相手に思わせることができる。また、通話開始からしばらくして音声出力制御手段３が自動的に通話の切断を指示するので、悪意を持った通話相手の声を聞きながら利用者自身がオンフックボタン９２を操作する手間が不要となり、利便性が向上する。 In step S219, the audio output control means 3 outputs the on-hook signal 304, whereby the wireless communication means 5 ends the connection with the wireless public network. In this way, by executing the process of probabilistically disconnecting the call, the user can understand the meaning of the other party's utterance and decide that it is useless to continue the call, so that the call is performed as if an on-hook operation was performed. You can make your opponent think. In addition, since the voice output control means 3 automatically instructs to disconnect the call after a while from the start of the call, it is not necessary for the user himself to operate the on-hook button 92 while listening to the voice of the other party with malicious intent. , Improve convenience.

図１０のステップＳ２２１〜Ｓ２２９は、図９のステップＳ２１２において入力音声２０１が有音区間である場合、すなわち通話相手が音声を発している場合の音声出力制御手段３の処理手順である。ステップＳ２２１において音声出力制御手段３は、０以上１未満の擬似乱数ｄを発生し、その値によって処理を分岐する。ｄが０．９９８未満の場合はステップＳ２２２へ進み、０．９９８以上の場合はステップＳ２２９へ進む。 Steps S221 to S229 in FIG. 10 are processing procedures of the voice output control means 3 when the input voice 201 is in a voiced section in step S212 in FIG. 9, that is, when the other party is making a voice. In step S221, the audio output control means 3 generates a pseudo-random number d that is greater than or equal to 0 and less than 1, and branches the process depending on the value. When d is less than 0.998, the process proceeds to step S222, and when d is 0.998 or more, the process proceeds to step S229.

ステップＳ２２２で音声出力制御手段３は音節列ＩＤ＝０とし、ステップＳ２２３で音声出力指示３０２を出力してステップＳ２１１へ戻る。ステップＳ２２２〜Ｓ２２３は、利用者が発話している最中に通話相手が話し始めたため利用者が発話を中断したかのように通話相手に思わせるための処理である。ステップＳ２２２〜Ｓ２２３の処理により、通話相手の音声を聴きながら利用者が発話しているのだと通話相手に信じさせることができる。 In step S222, the audio output control means 3 sets syllable string ID = 0, outputs an audio output instruction 302 in step S223, and returns to step S211. Steps S 222 to S 223 are processes for making the calling party think as if the user interrupted the speaking because the calling party started speaking while the user was speaking. Through the processing of steps S222 to S223, it is possible to make the other party believe that the user is speaking while listening to the voice of the other party.

ステップＳ２２９において、音声出力制御手段３がオンフック信号３０４を出力することにより、無線通信手段５は無線公衆網との接続を終了する。 In step S229, when the audio output control means 3 outputs the on-hook signal 304, the wireless communication means 5 ends the connection with the wireless public network.

図１１のステップＳ２３１〜Ｓ２３９は、図９のステップＳ２１４において入力音声２０１の無音区間が通話開始後の最初の無音区間であると判断した場合、すなわち通話開始直後の音声出力制御手段３の処理手順である。ステップＳ２３１において音声出力制御手段３は、既に音節列ＩＤの２回の出力を完了したかどうかを判断し、完了していなければ（ＮＯ）ステップＳ２３２へ進み、完了していれば（ＹＥＳ）ステップＳ２３９へ進んでオンフック信号３０４を出力する。 Steps S231 to S239 in FIG. 11 are the processing procedures of the voice output control means 3 immediately after the start of the call, that is, when it is determined in step S214 in FIG. 9 that the silence period of the input voice 201 is the first silence period after the start of the call. It is. In step S231, the audio output control means 3 determines whether or not the two outputs of the syllable string ID have already been completed. If not completed (NO), the process proceeds to step S232, and if completed, (YES) step. Proceeding to S239, the on-hook signal 304 is output.

ステップＳ２３２で音声出力制御手段３は音節列ＩＤ＝１、すなわち通話開始直後の挨拶のような音節列を出力することを選択し、ステップＳ２３３で音声出力指示３０２を出力して図９のステップＳ２１１へ戻る。ステップＳ２３１〜Ｓ２３９は、通話開始直後に利用者が挨拶を発声し、２回挨拶しても通話相手の応答がないので通話を切断したかのように通話相手に思わせるための処理である。 In step S232, the voice output control means 3 selects to output a syllable string ID = 1, that is, to output a syllable string such as a greeting immediately after the start of a call. In step S233, the voice output instruction 302 is output, and step S211 of FIG. Return to. Steps S 231 to S 239 are processes for making the caller feel as if the call is disconnected because the user greets immediately after the start of the call and there is no response from the caller even after greeting twice.

ステップＳ２１８、Ｓ２２３、Ｓ２３３において音声出力制御手段３が音声出力指示３０２を出力した際に、音声出力手段４がそれを受け付けて出力音声４０２を出力する処理は以下の手順で実行される。まず音声出力手段４は、音声出力指示３０２に含まれる音節列ＩＤに対応する音節列データを音節列テーブルＴ４１（図８）を検索することにより得る。次に音声出力手段４は、音節列データに基づいて出力音声４０２を合成して出力する。このとき、音声出力手段４は出力音声４０２の基本周波数Ｆ０’を、音節列データが示す基本周波数の係数αに基づき周波数算出式に従って音節ごとに計算する。周波数算出式は次式のとおりである。 When the audio output control unit 3 outputs the audio output instruction 302 in steps S218, S223, and S233, the audio output unit 4 accepts it and outputs the output audio 402 in the following procedure. First, the voice output means 4 obtains syllable string data corresponding to the syllable string ID included in the voice output instruction 302 by searching the syllable string table T41 (FIG. 8). Next, the voice output means 4 synthesizes and outputs the output voice 402 based on the syllable string data. At this time, the sound output means 4 calculates the fundamental frequency F0 'of the output sound 402 for each syllable according to the frequency calculation formula based on the coefficient α of the fundamental frequency indicated by the syllable string data. The frequency calculation formula is as follows.

Ｆ０’＝Ｆ０ｂａｓｅ＊α＊（乱数（０．４）＋０．８）
ここでＦ０ｂａｓｅは、出力音声４０２の基本周波数の初期値であり、Ｆ０ｂａｓｅの値を調整しておくことにより出力音声４０２を男声や女声とすることができる。たとえばＦ０ｂａｓｅ＝１２０Ｈｚとすれば、出力音声４０２を男声にすることができる。Ｆ０ｂａｓｅの値は利用者の操作によって変更することを可能としてもよい。乱数（０．４）は、０．４未満の乱数を表す。基本周波数に乱数を乗ずることで出力音声４０２のイントネーション変化にバリエーションを与えることができ、出力音声４０２が合成によるものであることを通話相手に悟られてしまうことを防いでいる。また音声出力手段４は、音節の境界において前後の音節の基本周波数を補間する処理を行うことにより、出力音声４０２が音声として不自然にならないようにしている。以下、本発明の実施の形態２の具体的な動作を説明する。すなわち、実施の形態２では通話相手の音声を入力音声として分析した結果に基づいて合成した音声を出力音声として無線公衆網へ出力し、通話相手に聞かせている。以下、通話相手が無言電話をかけてきた場合を説明する。F0 ′ = F0base * α * (random number (0.4) +0.8)
Here, F0base is an initial value of the fundamental frequency of the output sound 402. By adjusting the value of F0base, the output sound 402 can be made into a male voice or a female voice. For example, if F0base = 120 Hz, the output voice 402 can be a male voice. The value of F0base may be changed by a user operation. The random number (0.4) represents a random number less than 0.4. By multiplying the fundamental frequency by a random number, a variation can be given to the intonation change of the output voice 402, thereby preventing the other party from realizing that the output voice 402 is due to synthesis. The voice output means 4 performs a process of interpolating the fundamental frequency of the preceding and following syllables at the syllable boundary so that the output voice 402 does not become unnatural as a voice. The specific operation of Embodiment 2 of the present invention will be described below. That is, in the second embodiment, the synthesized voice based on the result of analyzing the voice of the other party as an input voice is output to the wireless public network as the output voice, and the other party is told. Hereinafter, a case where the other party makes a silent call will be described.

図１２は、本発明の実施の形態２における携帯電話端末２００における出力音声４０２の内容を示す図である。図１２において横軸は時刻を、縦軸は基本周波数を表している。音節記号Ｄ２１１〜Ｄ２１４は、出力音声４０２に含まれる音節を表している。以下、図１２を中心に本発明の実施の形態２における携帯電話端末２００の動作を説明する。 FIG. 12 is a diagram showing the contents of output audio 402 in mobile phone terminal 200 according to Embodiment 2 of the present invention. In FIG. 12, the horizontal axis represents time, and the vertical axis represents the fundamental frequency. Syllable symbols D211 to D214 represent syllables included in the output speech 402. Hereinafter, the operation of the mobile phone terminal 200 according to Embodiment 2 of the present invention will be described with reference to FIG.

まず、着信時である時刻ｔ２０１において利用者がオフフックボタン９３を操作して通話を開始すると、音声分析手段２が入力音声２０１を分析して音韻情報２０２を抽出する。抽出された音韻情報２０２には無音を示す信号「ＯＦＦ」が含まれている。 First, when a user starts a call by operating the off-hook button 93 at time t201 when an incoming call is received, the voice analysis means 2 analyzes the input voice 201 and extracts phoneme information 202. The extracted phoneme information 202 includes a signal “OFF” indicating silence.

次に、音声出力制御手段３が図９〜１１のフローチャートに示す手順で処理を実行する。ステップＳ２１１において音声出力制御手段３は、音韻情報２０２を取得する。ステップＳ２１２において音声出力制御手段３は、無音区間（「ＯＦＦ」）である（ＮＯ）のでステップＳ２１３へ進む。ステップＳ２１３において音声出力制御手段３は、出力音声４０２を出力していない（ＮＯ）のでステップＳ２１４へ進む。ステップＳ２１４において音声出力制御手段３は、通話開始後の最初の無音区間である（ＹＥＳ）ので図１１のステップＳ２３１へ進む。 Next, the audio output control means 3 executes processing in the procedure shown in the flowcharts of FIGS. In step S 211, the audio output control unit 3 acquires phonological information 202. In step S212, the audio output control means 3 is a silent section (“OFF”) (NO), so the process proceeds to step S213. In step S213, the audio output control means 3 does not output the output audio 402 (NO), so the process proceeds to step S214. In step S214, since the voice output control means 3 is the first silent section after the start of the call (YES), the process proceeds to step S231 in FIG.

ステップＳ２３１において音声出力制御手段３は、出力音声４０２を１回も出力していない（ＮＯ）のでステップＳ２３２へ進む。ステップＳ２３２において音声出力制御手段３は音節列ＩＤを１とし、ステップ２３３において音声出力指示３０２を出力する。 In step S231, since the audio output control means 3 has not output the output audio 402 even once (NO), the process proceeds to step S232. In step S232, the audio output control means 3 sets the syllable string ID to 1, and outputs an audio output instruction 302 in step 233.

さらに、音声出力手段４が音声出力指示３０２を受け付け、音節列テーブルＴ４１を参照して音節列ＩＤ＝１に対応する音節列データ「［ｈｏ］（１．０），［ｌａ］（１．２）」（図８のレコードＲ４１１の第２フィールドＦ４１２）を得て出力音声４０２（図１２のＤ２１１、Ｄ２１２の部分）を合成する。ここで、音節列データの係数αは音節［ｈｏ］が１．０、［ｌａ］が１．２と、音節［ｌａ］の方の基本周波数を高くするように設定されているが、出力音声４０２の基本周波数は音節［ｈｏ］（Ｄ２１１）の方が音節［ｌａ］（Ｄ２１２）よりも高くなっている。これは、周波数算出式における乱数の項によって出力音声４０２の基本周波数に揺らぎが与えられたためである。 Furthermore, the voice output unit 4 receives the voice output instruction 302, and refers to the syllable string table T41, and the syllable string data “[ho] (1.0), [la] (1.2” corresponding to the syllable string ID = 1. ”” (The second field F412 of the record R411 in FIG. 8) is obtained and the output sound 402 (D211 and D212 in FIG. 12) is synthesized. Here, the coefficient α of the syllable string data is set so that the fundamental frequency of the syllable [la] is higher, with the syllable [ho] being 1.0 and the [la] being 1.2. The fundamental frequency of 402 is higher for syllable [ho] (D211) than for syllable [la] (D212). This is because the fundamental frequency of the output sound 402 is fluctuated by the random number term in the frequency calculation formula.

以下、同様の処理により、出力音声４０２の２回目の出力が行われる（図１２のＤ２１３、Ｄ２１４の部分）。音声出力制御手段３はステップＳ２３１の３回目の処理において、２回出力済である（ＹＥＳ）ためステップＳ２３９へ進み、オンフック信号３０４を出力することにより図１２の時刻ｔ２０２に無線通信手段５が無線公衆網との接続を終了する。 Thereafter, the second output of the output sound 402 is performed by the same processing (parts D213 and D214 in FIG. 12). Since the audio output control means 3 has already been output twice in the third process of step S231 (YES), the process proceeds to step S239, and the on-hook signal 304 is output, so that the wireless communication means 5 is wireless at time t202 in FIG. Close the connection with the public network.

以上説明したとおり、本発明の実施の形態２における携帯電話端末は、通話相手が無言電話をかけてきた場合、挨拶として発声される定型句であるかのように通話相手に思わせるための音声を定められた回数だけ出力した後、公衆網との接続を自動的に終了する。 As described above, the mobile phone terminal according to the second embodiment of the present invention has a voice for making the other party feel like a fixed phrase that is uttered as a greeting when the other party makes a silent call. Is output a predetermined number of times, and then the connection with the public network is automatically terminated.

本発明の実施の形態２の他の具体的な動作として、架空請求を目的とした悪意呼を着信した場合を例として説明する。 As another specific operation of the second embodiment of the present invention, a case where a malicious call for the purpose of fictitious billing is received will be described as an example.

図１３は、本実施の携帯２における携帯電話端末２００における入力音声２０１と出力音声４０２の内容を示す図である。図１３において横軸は時刻を表している。図１３（ａ）、図１３（ｂ）、図１３（ｃ）は、連続的な動作を便宜上３つに分けて示したものである。図１３で、テキストＤ３０１〜Ｄ３０５は入力音声２０１、すなわち通話相手の音声の内容を、テキストＤ３１１〜Ｄ３１５は出力音声４０２の内容を表している。なお、図１３において入力音声２０１および出力音声４０２の基本周波数の情報については記載を省略してある。以下、図１３を中心として本実施の形態２における携帯電話端末２００の動作を説明する。なお、着信時に利用者が時刻ｔ３０１にオフフックボタン９３を操作してから最初の出力音声４０２（Ｄ３１１）が出力されるまでの処理は、本発明の実施の形態２における前述の動作例の処理と同様であるので説明を省略する。 FIG. 13 is a diagram showing the contents of the input voice 201 and the output voice 402 in the mobile phone terminal 200 in the mobile phone 2 of the present embodiment. In FIG. 13, the horizontal axis represents time. FIG. 13A, FIG. 13B, and FIG. 13C show the continuous operation divided into three for convenience. In FIG. 13, texts D301 to D305 represent the input voice 201, that is, the contents of the voice of the other party, and texts D311 to D315 represent the contents of the output voice 402. In FIG. 13, information on the fundamental frequencies of the input sound 201 and the output sound 402 is omitted. Hereinafter, the operation of the mobile phone terminal 200 according to the second embodiment will be described with reference to FIG. The process from when the user operates the off-hook button 93 at time t301 to the time when the first output sound 402 (D311) is output is the same as the process in the above-described operation example according to the second embodiment of the present invention. Since it is the same, description is abbreviate | omitted.

音声Ｄ３０１が入力されると、音声分析手段２が入力音声２０１を分析して音韻情報２０２を抽出する。抽出された音韻情報２０２には有音を示す信号「ＯＮ」が含まれている。 When the voice D301 is input, the voice analysis unit 2 analyzes the input voice 201 and extracts phonological information 202. The extracted phoneme information 202 includes a signal “ON” indicating sound.

次に、音声出力制御手段３が図９〜１１のフローチャートに示す手順で処理を実行する。ステップＳ２１１において音声出力制御手段３は、音韻情報２０２を取得する。ステップＳ２１２において音声出力制御手段３は、有音区間（「ＯＮ」）である（ＹＥＳ）ので図１０のステップＳ２２１へ進む。 Next, the audio output control means 3 executes processing in the procedure shown in the flowcharts of FIGS. In step S 211, the audio output control unit 3 acquires phonological information 202. In step S212, since the sound output control means 3 is a sound section (“ON”) (YES), the process proceeds to step S221 in FIG.

ステップＳ２２１において音声出力制御手段３は、擬似乱数ｄを発生させるが、ここではｄ＝０．２であるとすると、ｄ＜０．９９８であるのでステップＳ２２２へ進み、音節列ＩＤ＝０とし、これをステップＳ２２３で音声出力指示３０２として出力する。音声出力手段４は音声出力指示３０２を受け付けるが、音節列ＩＤ＝０であるので出力音声４０２を出力しない。 In step S221, the audio output control means 3 generates a pseudo random number d. Here, if d = 0.2, since d <0.998, the process proceeds to step S222, and the syllable string ID = 0. This is output as an audio output instruction 302 in step S223. The voice output unit 4 receives the voice output instruction 302 but does not output the output voice 402 because the syllable string ID = 0.

入力音声２０１が入力されている間は、上述の処理が繰り返されるので、出力音声４０２は出力されない。 While the input sound 201 is being input, the above process is repeated, so that the output sound 402 is not output.

音声Ｄ３０１が終了すると、音声分析手段２は無音を示す信号「ＯＦＦ」を含んだ音韻情報２０２を出力する。音声出力制御手段３は音韻情報２０２を取得し（ステップＳ２１１）、ステップＳ２１２の判断（ＮＯ）によりステップＳ２１３へ進み、音声出力中ではないのでさらにステップＳ２１４へと進む。ステップＳ２１４において音声出力制御手段３は、最初の無音区間ではない（ＮＯ）のでステップＳ２１５へ進む。 When the voice D301 ends, the voice analysis means 2 outputs phonological information 202 including a signal “OFF” indicating silence. The voice output control means 3 acquires the phoneme information 202 (step S211), and proceeds to step S213 based on the determination (NO) in step S212, and further proceeds to step S214 because the voice is not being output. In step S214, the audio output control means 3 is not the first silent section (NO), so the process proceeds to step S215.

ステップＳ２１５において音声出力制御手段３が発生した擬似乱数ｄの値がここでは０．３であるとすると、０．２≦ｄ＜０．９であるのでステップＳ２１７へ進む。ステップＳ２１７において音声出力制御手段３は音節列ＩＤをランダムに選択するが、ここではＩＤ＝３であるとすると、ステップＳ２１８において音声出力制御手段３は音節列ＩＤ＝３を含む音声出力指示３０２を出力する。 If the value of the pseudo random number d generated by the audio output control means 3 in step S215 is 0.3 here, since 0.2 ≦ d <0.9, the process proceeds to step S217. In step S217, the voice output control means 3 randomly selects a syllable string ID. Here, if ID = 3, the voice output control means 3 outputs a voice output instruction 302 including the syllable string ID = 3 in step S218. Output.

音声出力手段４は音声出力指示３０２を受け付け、音節列テーブルＴ４１を参照して音節列データ［［ｋｉ］（１．０），［ｒｕ］（０．９），［ｍｉ］（０．９），［ｊｉ］（１．２），［ｈｉ］（１．１），［ｇｏ］（１．０），［ｃｈｅ］（１．３），［ｓｉ］（１．５）」（図８のレコードＲ４１３の第２フィールドＦ４１２）を得て出力音声４０２として音声Ｄ３１２（図１３）を出力する。 The voice output means 4 accepts the voice output instruction 302 and refers to the syllable string table T41 to obtain syllable string data [[ki] (1.0), [ru] (0.9), [mi] (0.9). , [Ji] (1.2), [hi] (1.1), [go] (1.0), [che] (1.3), [si] (1.5) ”(in FIG. The second field F412) of the record R413 is obtained, and the sound D312 (FIG. 13) is output as the output sound 402.

音声Ｄ３１３「ｃｈｅｇｉ？」は、ステップＳ２１５において音声出力制御手段３が発生した擬似乱数ｄの値が０．２よりも小さくステップＳ２１６において音節列ＩＤ＝２となった場合、またはステップＳ２１７において音節列ＩＤとして２が選択された場合の出力である。 The voice D313 “chegi?” Is used when the value of the pseudo random number d generated by the voice output control means 3 in step S215 is smaller than 0.2 and becomes syllable string ID = 2 in step S216, or in step S217. This is an output when 2 is selected as the ID.

音声Ｄ３１４が出力されている最中に時刻ｔ３０２に通話相手が話し始めると、音声出力制御手段３はステップＳ２１２の判断（ＹＥＳ）により図１０の処理へ進む。ここではステップＳ２２１において音声出力制御手段３が発生した擬似乱数ｄが０．７であったとすると、ｄ＜０．９９８であるので音声出力制御手段３はステップＳ２２２、Ｓ２２３の処理を実行し、音声出力手段４は図１３の時刻ｔ３０３に出力音声４０２の出力を中断する。 If the other party starts speaking at time t302 while the voice D314 is being output, the voice output control means 3 proceeds to the process of FIG. 10 by the determination (YES) in step S212. Here, if the pseudo-random number d generated by the audio output control means 3 in step S221 is 0.7, since d <0.998, the audio output control means 3 executes the processing of steps S222 and S223, The output means 4 interrupts the output of the output sound 402 at time t303 in FIG.

音声Ｄ３０５が入力されている最中にステップＳ２２１で音声出力制御手段３が発生する擬似乱数ｄの値がここで０．９９８４となったとすると、ｄ≧０．９９８であるので、音声出力制御手段３はステップＳ２２９でオンフック信号３０４を出力することにより、無線通信手段５が時刻ｔ３０４に無線公衆網との接続を終了する。 If the value of the pseudo random number d generated by the audio output control means 3 in step S221 while the audio D305 is being input is 0.9984 here, since d ≧ 0.998, the audio output control means 3 outputs an on-hook signal 304 in step S229, whereby the wireless communication means 5 terminates the connection with the wireless public network at time t304.

以上説明したとおり、本発明の実施の形態２における携帯電話端末は、通話相手にとって意味を成さない音声を通話相手の音声の合間に出力することによって利用者が会話をしているように見せかけた後、自動的に通話を切断する。 As described above, the mobile phone terminal according to Embodiment 2 of the present invention makes it appear that the user is having a conversation by outputting a voice that does not make sense to the other party in the middle of the other party's voice. After that, the call is automatically disconnected.

以上の説明から明らかなように、本発明の実施の形態２における携帯電話端末は、音節列をランダムに通話相手の端末へ出力するため、通話相手にとって意味を成さない音声を出力することができるので、通話相手に利用者の国籍を知られることなく、利用者との意思疎通が不可能であると通話相手に思わせることができる。 As is clear from the above description, since the mobile phone terminal according to Embodiment 2 of the present invention outputs a syllable string to the call partner's terminal at random, it may output a voice that does not make sense to the call partner. Therefore, it is possible to make the call partner think that communication with the user is impossible without knowing the nationality of the user.

また、本発明の実施の形態２における携帯電話端末は、通話相手の音声が有音かどうかに基づいて音声の出力を変化させることができるため、通話相手の発話状況に応じた音声出力ができるので、出力音声が定型メッセージや合成音声であると疑う余地を相手に与えず、利用者との意思疎通が不可能であると確実に相手に思わせることができる。 In addition, since the mobile phone terminal according to Embodiment 2 of the present invention can change the output of the voice based on whether or not the voice of the other party is voiced, it can output the voice according to the utterance situation of the other party. Therefore, it is possible to make the other party think that it is impossible to communicate with the user without giving the other party the room to suspect that the output voice is a standard message or synthesized voice.

また、本発明の実施の形態２における携帯電話端末は、定型的な音声をときどき出力するため、出力音声の自然さが増すので、出力音声が定型メッセージや合成音声であると疑う余地を通話相手に与えず、利用者との意思疎通が不可能であると確実に通話相手に思わせることができる。 In addition, since the mobile phone terminal according to the second embodiment of the present invention outputs a standard voice from time to time, the naturalness of the output voice increases, so that there is no doubt that the output voice is a standard message or synthesized voice. It is possible to surely make the other party think that communication with the user is impossible.

また、本発明の実施の形態２における携帯電話端末は、音声出力の最中に通話相手が話し始めると音声出力を中止するため、通話相手の音声を聴きながら利用者が実際に発話しているのだと通話相手に思わせることができるので、利用者との意思疎通が不可能であると確実に通話相手に思わせることができる。 In addition, since the mobile phone terminal according to Embodiment 2 of the present invention stops the voice output when the other party starts speaking during the voice output, the user actually speaks while listening to the voice of the other party. Since it is possible to make the other party think that it is, it is possible to certainly make the other party think that communication with the user is impossible.

また、本発明の実施の形態２における携帯電話端末は、通話中に自動的に通信を終了するため、利用者が通話相手の通話の意味がわからず通話を継続しても無駄だと判断してオンフック操作を行ったかのように通話相手に思わせることができるので、利用者との意思疎通が不可能であると確実に通話相手に思わせることができる。 In addition, since the mobile phone terminal according to Embodiment 2 of the present invention automatically terminates communication during a call, it is determined that it is useless even if the user does not understand the meaning of the other party's call and continues the call. Therefore, the other party can be made to think that the other party cannot communicate with the user.

なお、本発明の実施の形態２において、音節列データは予め定めたものから選択するようにしたが、通話相手にとって無意味な音声を出力できるものであればどのような手段を用いてもよく、たとえば音声出力のたびにランダムな音節列や音素列を都度生成するようにしてもよい。 In the second embodiment of the present invention, the syllable string data is selected from predetermined ones, but any means may be used as long as it can output a meaningless voice for the other party. For example, a random syllable string or phoneme string may be generated each time a voice is output.

また、本発明の実施の形態２において、出力音声を出力するか通信を切断するかを判断する条件（ステップＳ２１５、Ｓ２２１の分岐判断におけるｄの値の範囲）は一定であるとしたが、これに限るものではなく、通信を切断する確率を着信を重ねるごとに増加させるなど、条件を変化させるようにしてもよい。 In the second embodiment of the present invention, the condition for determining whether to output audio or disconnect communication (the range of the value of d in the branch determination in steps S215 and S221) is assumed to be constant. However, the condition may be changed, for example, the probability of disconnecting communication is increased every time there are incoming calls.

また、本発明の実施の形態１および実施の形態２において、端末が本発明の音声出力装置を内蔵するとしたが、たとえば交換機、中継機、サーバ等、端末と接続する外部装置が本発明の音声出力装置を内蔵し、利用者または通話相手の音声を処理して音声を出力するように構成してもよい。 In the first and second embodiments of the present invention, the terminal incorporates the voice output device of the present invention. For example, an external device connected to the terminal, such as an exchange, a repeater, or a server, is the voice of the present invention. An output device may be incorporated so that the voice of the user or the other party is processed and the voice is output.

また、本発明の実施の形態１および実施の形態２において、携帯電話端末の例を説明したが、これに限るものではなく、固定電話、ＩＰ電話、ボイスチャット、インターホン等でも同様の効果が得られる。 In the first and second embodiments of the present invention, an example of a mobile phone terminal has been described. However, the present invention is not limited to this, and the same effect can be obtained with a landline phone, an IP phone, a voice chat, an interphone, and the like. It is done.

また、本発明の音声出力装置は、利用者の入力音声に対してランダム性を持つ音声を出力する装置に利用することができ、たとえば電子ペット、ペットロボット、玩具、ゲーム機やゲームソフト等を構成するための音声出力装置としても利用できる。
本発明を詳細にまた特定の実施態様を参照して説明したが、本発明の精神と範囲を逸脱することなく様々な変更や修正を加えることができることは当業者にとって明らかである。
本出願は、２００５年８月２日出願の日本特許出願Ｎｏ．２００５−２２３６５２に基づくものであり、その内容はここに参照として取り込まれる。The voice output device of the present invention can be used for a device that outputs a voice having randomness with respect to a user's input voice. For example, an electronic pet, a pet robot, a toy, a game machine, a game software, etc. It can also be used as an audio output device for configuring.
Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.
This application is filed in Japanese Patent Application No. 1 filed on Aug. 2, 2005. 2005-233652, the contents of which are incorporated herein by reference.

本発明の音声出力装置および音声出力方法は、悪意呼の発呼者に利用者の国籍を知られることがなく、利用者との意思疎通が不可能であると発呼者に思わせることができ、発呼者による再度の悪意呼を抑止することができるという効果を有し、不特定の者との通話が可能な音声通信装置に有用である。 The voice output device and the voice output method of the present invention can make the caller think that the caller of the malicious call does not know the nationality of the user and cannot communicate with the user. This is effective for a voice communication apparatus capable of suppressing another malicious call by a caller and capable of making a call with an unspecified person.

従来より、いたずら電話等の悪意呼を撃退するための装置が提案されてきた。たとえば、利用者が発する音声のピッチ周期を変換して相手に聞かせる電話装置が提案されている。この発明によれば、女性の利用者が発声しても相手には男性の声に聞こえるため、相手に利用者が男性であると思わせ、いたずら電話をやめさせることができる（例えば特許文献１参照）。 Conventionally, devices for repelling malicious calls such as mischievous telephones have been proposed. For example, there has been proposed a telephone device that converts the pitch period of a voice uttered by a user and listens to the other party. According to the present invention, even if a female user utters, the other party can hear a male voice, so that the other party can think that the user is a male and stop the prank call (for example, Patent Document 1). reference).

また、定型メッセージを相手に聞かせる従来の方法の場合、メッセージが利用者本人の発話によるものではないことを相手が容易に判別できるため、相手は利用者本人が電話に出るまであきらめず、悪意呼を何度も繰り返してしまうという問題があった。 In addition, in the case of the conventional method in which a standard message is sent to the other party, the other party can easily determine that the message is not due to the user's utterance. There was a problem that the call was repeated many times.

（実施の形態１）
図１は、本発明の実施の形態１における音声出力装置１の概略構成図である。図１において、音声出力装置１は音声分析手段２、音声出力制御手段３、音声出力手段４を備える。 (Embodiment 1)
FIG. 1 is a schematic configuration diagram of an audio output device 1 according to Embodiment 1 of the present invention. In FIG. 1, the audio output device 1 includes an audio analysis unit 2, an audio output control unit 3, and an audio output unit 4.

Ｆ０’＝Ｆ０＊ｒ＊（乱数（０．４）＋０．８）
ここでｒは、出力音声４０２の基本周波数を入力音声２０１の基本周波数に対してどの程度高くするかを指示するための予め定められた係数である。係数ｒは、ｒ＜１とすることにより出力音声４０２の基本周波数を入力音声２０１の基本周波数よりも全体的に低くすることがでる。たとえばｒ＝０．５とすれば、入力音声２０１の女声が男声のように変換されて出力音声４０２として出力されることになるので、通話相手に利用者の性別が知られることを防ぐことができる。係数ｒは利用者の操作によって変更することを可能としてもよい。乱数（０．４）は、０．４未満の乱数を表す。基本周波数に乱数等で揺らぎを与えることにより、入力音声４０２のイントネーションの変化を攪乱することができ、出力音声４０２のイントネーションから入力音声２０１が何語であるかを通話相手に悟られてしまうことを防いでいる。 F0 ′ = F0 * r * (random number (0.4) +0.8)
Here, r is a predetermined coefficient for instructing how much the fundamental frequency of the output speech 402 is to be higher than the fundamental frequency of the input speech 201. The coefficient r is set to r <1, so that the fundamental frequency of the output sound 402 can be made lower than the fundamental frequency of the input sound 201 as a whole. For example, if r = 0.5, the female voice of the input voice 201 is converted like a male voice and output as the output voice 402, thus preventing the calling party from knowing the gender of the user. it can. The coefficient r may be changed by a user operation. The random number (0.4) represents a random number less than 0.4. By giving fluctuations to the fundamental frequency with a random number or the like, the change in intonation of the input voice 402 can be disturbed, and the other party can recognize the language of the input voice 201 from the intonation of the output voice 402. Is preventing.

（実施の形態２）
次に本発明の第２の実施の形態について説明する。本実施の形態２では、無線通信手段５の出力である受話信号を入力音声として用いている点に特徴がある。 (Embodiment 2)
Next, a second embodiment of the present invention will be described. The second embodiment is characterized in that the received signal that is the output of the wireless communication means 5 is used as the input voice.

Ｆ０’＝Ｆ０ｂａｓｅ＊α＊（乱数（０．４）＋０．８）
ここでＦ０ｂａｓｅは、出力音声４０２の基本周波数の初期値であり、Ｆ０ｂａｓｅの値を調整しておくことにより出力音声４０２を男声や女声とすることができる。たとえばＦ０ｂａｓｅ＝１２０Ｈｚとすれば、出力音声４０２を男声にすることができる。Ｆ０ｂａｓｅの値は利用者の操作によって変更することを可能としてもよい。乱数（０．４）は、０．４未満の乱数を表す。基本周波数に乱数を乗ずることで出力音声４０２のイントネーション変化にバリエーションを与えることができ、出力音声４０２が合成によるものであることを通話相手に悟られてしまうことを防いでいる。また音声出力手段４は、音節の境界において前後の音節の基本周波数を補間する処理を行うことにより、出力音声４０２が音声として不自然にならないようにしている。以下、本発明の実施の形態２の具体的な動作を説明する。すなわち、実施の形態２では通話相手の音声を入力音声として分析した結果に基づいて合成した音声を出力音声として無線公衆網へ出力し、通話相手に聞かせている。以下、通話相手が無言電話をかけてきた場合を説明する。 F0 ′ = F0base * α * (random number (0.4) +0.8)
Here, F0base is an initial value of the fundamental frequency of the output sound 402. By adjusting the value of F0base, the output sound 402 can be made into a male voice or a female voice. For example, if F0base = 120 Hz, the output voice 402 can be a male voice. The value of F0base may be changed by a user operation. The random number (0.4) represents a random number less than 0.4. By multiplying the fundamental frequency by a random number, a variation can be given to the intonation change of the output voice 402, thereby preventing the other party from realizing that the output voice 402 is due to synthesis. The voice output means 4 performs a process of interpolating the fundamental frequency of the preceding and following syllables at the syllable boundary so that the output voice 402 does not become unnatural as a voice. The specific operation of Embodiment 2 of the present invention will be described below. That is, in the second embodiment, the synthesized voice based on the result of analyzing the voice of the other party as an input voice is output to the wireless public network as the output voice, and the other party is told. Hereinafter, a case where the other party makes a silent call will be described.

ステップＳ２２１において音声出力制御手段３は、擬似乱数ｄを発生させるが、ここで
はｄ＝０．２であるとすると、ｄ＜０．９９８であるのでステップＳ２２２へ進み、音節列ＩＤ＝０とし、これをステップＳ２２３で音声出力指示３０２として出力する。音声出力手段４は音声出力指示３０２を受け付けるが、音節列ＩＤ＝０であるので出力音声４０２を出力しない。 In step S221, the audio output control means 3 generates a pseudo random number d. Here, if d = 0.2, since d <0.998, the process proceeds to step S222, and the syllable string ID = 0. This is output as an audio output instruction 302 in step S223. The voice output unit 4 receives the voice output instruction 302 but does not output the output voice 402 because the syllable string ID = 0.

音声出力手段４は音声出力指示３０２を受け付け、音節列テーブルＴ４１を参照して音節列データ［［ｋｉ］（１．０），［ｒｕ］（０．９），［ｍｉ］（０．９），［ｊｉ］（１．２），［ｈｉ］（１．１），［ｇｏ］（１．０），［ｃｈｅ］（１．３），［ｓｉ］（１．５）］（図８のレコードＲ４１３の第２フィールドＦ４１２）を得て出力音声４０２として音声Ｄ３１２（図１３）を出力する。 The voice output means 4 accepts the voice output instruction 302 and refers to the syllable string table T41 to obtain syllable string data [[ki] (1.0), [ru] (0.9), [mi] (0.9). , [Ji] (1.2), [hi] (1.1), [go] (1.0), [che] (1.3), [si] (1.5)] (in FIG. The second field F412) of the record R413 is obtained, and the sound D312 (FIG. 13) is output as the output sound 402.

また、本発明の音声出力装置は、利用者の入力音声に対してランダム性を持つ音声を出力する装置に利用することができ、たとえば電子ペット、ペットロボット、玩具、ゲーム機やゲームソフト等を構成するための音声出力装置としても利用できる。
本発明を詳細にまた特定の実施態様を参照して説明したが、本発明の精神と範囲を逸脱することなく様々な変更や修正を加えることができることは当業者にとって明らかである。
本出願は、2005年8月2日出願の日本特許出願No.2005-223652に基づくものであり、その内容はここに参照として取り込まれる。 The voice output device of the present invention can be used for a device that outputs a voice having randomness with respect to a user's input voice. For example, an electronic pet, a pet robot, a toy, a game machine, a game software, etc. It can also be used as an audio output device for configuring.
Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.
This application is based on Japanese Patent Application No. 2005-223652 filed on August 2, 2005, the contents of which are incorporated herein by reference.

Explanation of symbols

１音声出力装置
２音声分析手段
３音声出力制御手段
４音声出力手段
５無線通信手段
７信号切替手段
４１定型音節保持手段
６１マイクロホン
６２スピーカー
９１信号切替ボタン
９２オンフックボタン
９３オフフックボタン
１００，２００携帯電話端末
２０１入力音声
２０２音韻情報
３０２音声出力指示
３０４オンフック信号
４０２出力音声
５０１送話信号
５０２受話信号
６１２マイクロホン出力信号 DESCRIPTION OF SYMBOLS 1 Audio | voice output apparatus 2 Audio | voice analysis means 3 Audio | voice output control means 4 Audio | voice output means 5 Wireless communication means 7 Signal switching means 41 Fixed syllable holding means 61 Microphone 62 Speaker 91 Signal switching button 92 On-hook button 93 Off-hook button 100,200 201 Input voice 202 Phoneme information 302 Voice output instruction 304 On-hook signal 402 Output voice 501 Transmission signal 502 Reception signal 612 Microphone output signal

Claims

Speech analysis means for extracting phonological information from input speech;
Voice output control means for instructing voice output based on the phonological information;
Audio output means for outputting audio based on the instructions;
An audio output device comprising:
A speech output device configured to output speech of random phonemes based on phoneme information of the input speech.

The phonological information includes information for identifying a phoneme or syllable included in the input speech,
The speech output apparatus according to claim 1, wherein the speech output control means determines a phoneme or syllable constituting the output speech by replacing the phoneme or syllable according to a predetermined rule.

The phonological information includes information indicating whether or not the input speech is speech,
The audio output device according to claim 1, wherein the audio output control unit instructs audio output based on information indicating whether the sound is present.

The phonological information includes information representing a fundamental frequency of the input speech,
4. The audio output device according to claim 1, wherein the audio output control unit determines a fundamental frequency of the output audio based on the fundamental frequency. 5.

A communication means for executing communication processing and outputting the output sound to a communication destination;
An audio communication device comprising the audio output device according to any one of claims 1 to 4.

A first step of extracting phonological information from the input speech;
A second step of instructing random phonological speech output based on the phonological information;
A third step of outputting the output sound based on the instruction;
An audio output method comprising: