JP2017216602A

JP2017216602A - Telephone device

Info

Publication number: JP2017216602A
Application number: JP2016109560A
Authority: JP
Inventors: 徳田　肇道; Tadamichi Tokuda; 肇道徳田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2017-12-07
Anticipated expiration: 2036-05-31
Also published as: JP6675079B2

Abstract

PROBLEM TO BE SOLVED: To efficiently perform speech speed conversion that is easy to hear for a user, while suppressing erroneous detection of a voice interval included in a speed reception signal.SOLUTION: A microphone 11 collects speech transmission voices of a user. A noise level estimation part 21z estimates a level of a noise signal around the user. A voice internal detection part 21 detects an interval of a voice included in a transmission signal from the other telephone set. While the speech transmission voices of the user are collected by the microphone 11, a speech reception gain control part 15 attenuates a speech reception signal including a line echo component of the speech transmission voice based on a telephone switching network. Based on an attenuation amount of the speech reception signal and the estimated level of the noise signal, an interval detection correction part 17 corrects a threshold value to be used for the voice interval detection in the voice interval detection part 21. Based on the threshold that is corrected by the interval detection correction part 17, a voice conversion part 22 converts a speech speed of the voice included in the transmission signal from the other telephone set and outputs the voice from a speaker 12.SELECTED DRAWING: Figure 2

Description

本発明は、通話相手の電話装置から送信された通話の音声信号を受信し、音声信号の音声区間の話速を変換して出力する電話装置に関する。 The present invention relates to a telephone device that receives a voice signal of a call transmitted from a telephone device of a call partner, converts a speech speed of a voice section of the voice signal, and outputs the converted voice speed.

話速変換装置は、例えば通話相手の電話装置から送信された通話の音声信号（つまり、受話側の音声信号）を時間方向に一定の割合で伸張させることにより、通話相手がゆっくりと話して聴者が聞き易くなる音声に変換する機能を有する。 The speech speed conversion device, for example, expands a voice signal of a call transmitted from the telephone device of the other party (that is, the voice signal on the receiving side) at a constant rate in the time direction, so that the other party speaks slowly and listens to the listener. Has a function to convert the sound into a voice that makes it easier to hear.

この話速変換装置は電話装置に限定されて使用されるものではない。話速変換装置を用いた先行技術として、例えばインターホンに搭載された話速変換装置が知られている（特許文献１参照）。特許文献１の話速変換装置は、入力信号に音声が含まれている音声区間と音声が含まれていない非音声区間とを判別し、音声区間と判別されているときに伸張処理を行い、非音声区間と判別されているときに圧縮処理を行う。また、この話速変換装置は、非音声区間と判別している場合であっても、入力信号に含まれる騒音レベルが所定のしきい値以上であるときには圧縮処理を行わない。これにより、話速変換装置は、入力音声が誤って圧縮されることによって出力音声が途切れるのを防ぐことができる。 This speech speed conversion device is not limited to a telephone device. As a prior art using a speech speed conversion device, for example, a speech speed conversion device mounted on an interphone is known (see Patent Document 1). The speech speed conversion apparatus of Patent Document 1 discriminates between a speech section in which speech is included in an input signal and a non-speech section in which speech is not included, and performs expansion processing when it is determined as a speech section, Compression processing is performed when it is determined as a non-voice section. In addition, even when the speech speed converting apparatus determines that it is a non-speech segment, it does not perform compression processing when the noise level included in the input signal is equal to or higher than a predetermined threshold value. Thereby, the speech speed converting apparatus can prevent the output voice from being interrupted due to the input voice being compressed by mistake.

特許第５３４６２３０号公報Japanese Patent No. 5346230

しかしながら、特許文献１に記載の話速変換装置の構成をリアルタイムの通話に使用される電話装置に適用すると、次のような問題が考えられた。リアルタイムの通話では、ユーザ自身の電話装置と通話相手の電話装置との間で、会話が時分割に行われることが一般的である。つまり、ユーザ自身が会話している時には通話相手はその会話内容を聞いており、通話相手が会話している時にはユーザ自身はその会話内容を聞いている。このようなリアルタイムの通話時における電話装置間の通話音声の送受信では、一般的に電話交換網内に設けられる２線４線変換回路に基づく回線エコーが生じる。このため、例えばユーザ自身が通話した声（つまり、送話音声）の回線エコー成分が電話装置の受話側回路に進入し、その回線エコー成分が受話側回路内に設けられるスピーカから聞こえることがあり得、回線エコー成分が雑音となって通話時の支障になることが考えられる。 However, when the configuration of the speech speed conversion device described in Patent Document 1 is applied to a telephone device used for real-time phone calls, the following problems have been considered. In a real-time call, conversation is generally performed in a time-sharing manner between the user's own telephone device and the telephone device of the other party. That is, when the user is talking, the other party is listening to the conversation, and when the other party is talking, the user is listening to the conversation. In transmission / reception of call voice between telephone devices during such a real-time call, a line echo based on a two-wire / four-wire conversion circuit generally provided in a telephone exchange network occurs. For this reason, for example, the line echo component of the voice of the user himself (that is, the transmitted voice) may enter the receiver circuit of the telephone device, and the line echo component may be heard from a speaker provided in the receiver circuit. As a result, it is conceivable that the line echo component becomes noise and hinders a call.

また、回線エコー成分による雑音の発生を抑制するために、電話装置に対し、例えばボイススイッチが搭載されると、電話装置のユーザが送話する時に受話信号を低減させることが可能である。ところが、ボイススイッチのオン／オフによって受話音声に含まれる雑音信号のレベルが変動してしまう。この結果、電話装置内に設けられる話速変換装置は、雑音信号のレベルの変動を音声と誤検出することがある。例えば、ボイススイッチはユーザの送話が終了すると受話信号の減衰を止めるので、話速変換装置は、ボイススイッチのオフによって回線エコー成分による雑音のレベルが増加したことで、その信号を受話の音声信号であると判断して、音声区間を正常に検出できなくなる（図７（Ｂ）参照）。図７（Ｂ）は、ボイススイッチのオン前後の受話音声の変化を示すタイミングチャートである。 Further, in order to suppress the generation of noise due to the line echo component, for example, if a voice switch is mounted on the telephone device, it is possible to reduce the received signal when the user of the telephone device transmits. However, the level of the noise signal included in the received voice varies depending on whether the voice switch is turned on or off. As a result, the speech speed conversion device provided in the telephone device may erroneously detect a fluctuation in the level of the noise signal as voice. For example, since the voice switch stops the attenuation of the received signal when the user's transmission is completed, the speech speed converting device uses the voice switch to turn off the voice switch, and the level of noise due to the line echo component increases. Since it is determined that the signal is a signal, the speech section cannot be normally detected (see FIG. 7B). FIG. 7B is a timing chart showing changes in the received voice before and after the voice switch is turned on.

従って、話速変換装置は、音声区間を誤検出してしまうと、受話信号に含まれる音声区間の話速変換を適正に行うことができず、非音声区間における雑音も話速変換することになる。この結果、話速変換に不要な時間が存在する上に、受話が聞き取り難くなり、話速変換の効果が低減する。また、音声が途切れてしまうという影響もあった。 Therefore, if the speech rate conversion device erroneously detects the speech interval, the speech rate conversion cannot be performed properly for the speech interval included in the received signal, and noise in the non-speech interval is also converted to the speech rate. Become. As a result, there is a time unnecessary for the speech speed conversion, and it becomes difficult to hear the received speech, and the effect of the speech speed conversion is reduced. There was also an effect that the sound was interrupted.

本発明は、上述した従来の状況に鑑みてなされたものであり、受話信号に含まれる音声区間の誤検出を抑制し、ユーザにとって聞き取り易い話速変換を効率的に行うことができる電話装置を提供することを目的とする。 The present invention has been made in view of the above-described conventional situation, and provides a telephone device that can suppress erroneous detection of a voice section included in a received signal and can efficiently perform speech speed conversion that is easy for a user to hear. The purpose is to provide.

本発明は、電話交換網を介して他電話装置との間で通話する電話装置であって、ユーザの送話音声を収音するマイクと、前記ユーザの周囲の雑音信号のレベルを推定する雑音レベル推定部と、前記他電話装置からの送信信号に含まれる音声の区間を検出する区間検出部と、前記ユーザの送話音声が前記マイクにより収音されている間、前記電話交換網に基づく前記送話音声の回線エコー成分を含む受話信号を減衰する受話ゲイン制御部と、前記受話ゲイン制御部による前記受話信号の減衰量と前記雑音レベル推定部により推定された前記雑音信号のレベルとを基に、前記区間検出部における前記音声の区間検出に用いる閾値を補正する閾値補正部と、前記閾値補正部により補正された後の閾値を基に、前記他電話装置からの送信信号に含まれる音声を話速変換してスピーカから出力する話速変換部と、を備える、電話装置である。 The present invention relates to a telephone device for making a call with another telephone device via a telephone exchange network, a microphone for collecting a user's transmitted voice, and a noise for estimating a level of a noise signal around the user. Based on the telephone switching network while the level estimation unit, the section detection unit for detecting the voice section included in the transmission signal from the other telephone device, and the user's transmitted voice is collected by the microphone A reception gain control unit for attenuating a reception signal including a line echo component of the transmission voice; and an attenuation amount of the reception signal by the reception gain control unit and a level of the noise signal estimated by the noise level estimation unit. Based on the threshold correction unit that corrects the threshold value used for the voice segment detection in the segment detection unit and the threshold value corrected by the threshold correction unit, it is included in the transmission signal from the other telephone device. Comprising a speech rate conversion section outputting from the speaker to the speech speed converting speech, and a telephone device.

本発明によれば、受話信号に含まれる音声区間の誤検出を抑制し、ユーザにとって聞き取り易い話速変換を効率的に行うことができる。 ADVANTAGE OF THE INVENTION According to this invention, the false detection of the audio | voice area contained in a received signal can be suppressed, and speech speed conversion which is easy to hear for a user can be performed efficiently.

第１の実施形態における電話機間の接続を示す図The figure which shows the connection between the telephone sets in 1st Embodiment ２線４線変換回路に接続される電話機の話速変換に係わる部分の内部構成の一例を詳細に示す図The figure which shows in detail an example of the internal structure of the part in connection with the speech speed conversion of the telephone connected to a 2-wire 4-wire conversion circuit 話速変換部の内部構成の一例を詳細に示すブロック図Block diagram showing in detail an example of the internal configuration of the speech rate conversion unit （Ａ），（Ｂ）話速変換前後の音声信号を示すタイミングチャート(A), (B) Timing chart showing audio signals before and after speech speed conversion （Ａ）音声区間の検出結果を示すタイミングチャート、（Ｂ）信号波形の時間平均値を示すタイミングチャート、（Ｃ）音声波形のタイミングチャート(A) Timing chart showing detection result of voice section, (B) Timing chart showing time average value of signal waveform, (C) Timing chart of voice waveform 第１の実施形態の通話動作手順の一例を詳細に説明するフローチャートThe flowchart explaining in detail an example of a call operation procedure of the first embodiment （Ａ），（Ｂ）ボイススイッチのオン前後の受話音声の変化を示すタイミングチャート(A), (B) Timing chart showing changes in received voice before and after the voice switch is turned on （Ａ），（Ｂ）母音及び子音を含む音声に対する話速変換の仕方を説明する図(A), (B) The figure explaining the method of the speech speed conversion with respect to the sound containing a vowel and a consonant （Ａ），（Ｂ）ＡＭＤＦ値の時間変化を示すタイミングチャート(A), (B) Timing chart showing time change of AMDF value 第２の実施形態の電話機における話速変換に係わる部分の内部構成の一例を詳細に示す図The figure which shows an example of the internal structure of the part in connection with speech speed conversion in the telephone of 2nd Embodiment in detail 電話機の各部における状態の変化の一例を示すタイミングチャートTiming chart showing an example of state change in each part of the telephone 第２の実施形態の通話・再生動作手順の一例を説明するフローチャートA flowchart for explaining an example of a call / playback operation procedure according to the second embodiment

以下、本発明に係る電話装置の各実施形態について、図面を参照して説明する。 Hereinafter, embodiments of a telephone device according to the present invention will be described with reference to the drawings.

（第１の実施形態に至る経緯・課題）
話速変換装置は、受話側の音声信号を時間方向に伸張させることで、元の音声を話者がゆっくりと話した場合と似たような音声に変換する機能を有する。特許文献１に記載の話速変換装置の構成をリアルタイムの通話に使用される電話装置に適用すると、次のような問題が考えられる。リアルタイムの通話では、ユーザ自身の電話装置と通話相手の電話装置との間で、会話が時分割に行われることが一般的である。つまり、ユーザ自身が会話している時には通話相手はその会話内容を聞いており、通話相手が会話している時にはユーザ自身はその会話内容を聞いている。このようなリアルタイムの通話時における電話装置間の通話音声の送受信では、一般的に電話機内に設けられる２線４線変換回路に基づく回線エコーが生じる。このため、例えばユーザ自身が通話した声（つまり、送話音声）の回線エコー成分が電話装置の受話側回路に進入し、その回線エコー成分が受話側回路内に設けられるスピーカから聞こえることがあり得、回線エコー成分が雑音となって通話時の支障になることが考えられる。 (Background and issues leading to the first embodiment)
The speech speed conversion device has a function of converting the original voice into a voice similar to that when the speaker slowly speaks by expanding the voice signal on the receiver side in the time direction. When the configuration of the speech speed conversion device described in Patent Document 1 is applied to a telephone device used for real-time phone calls, the following problems can be considered. In a real-time call, conversation is generally performed in a time-sharing manner between the user's own telephone device and the telephone device of the other party. That is, when the user is talking, the other party is listening to the conversation, and when the other party is talking, the user is listening to the conversation. In the transmission / reception of call voice between telephone devices during such a real-time call, a line echo based on a two-wire / four-wire conversion circuit provided in a telephone is generally generated. For this reason, for example, the line echo component of the voice of the user himself (that is, the transmitted voice) may enter the receiver circuit of the telephone device, and the line echo component may be heard from a speaker provided in the receiver circuit. As a result, it is conceivable that the line echo component becomes noise and hinders a call.

また、回線エコー成分による雑音の発生を抑制するために、電話装置に対し、例えばボイススイッチが搭載されると、電話装置のユーザが送話する時に音声や雑音の信号を含む受話信号を低減させることが可能である。ところが、ボイススイッチのオン／オフによって受話音声に含まれる雑音信号のレベルが変動してしまう。この結果、電話装置内に設けられる話速変換装置は、雑音信号のレベルの変動を音声と誤検出することがある。例えば、ボイススイッチはユーザの送話が終了すると受話信号の減衰を止めるので、話速変換装置は、ボイススイッチのオフによって回線エコー成分による雑音のレベルが増加したことで、その信号を受話の音声信号であると判断して、音声区間を正常に検出できなくなる（図７（Ｂ）参照）。 In addition, in order to suppress the generation of noise due to the line echo component, for example, when a voice switch is mounted on the telephone device, the received signal including voice and noise signals is reduced when the user of the telephone device transmits. It is possible. However, the level of the noise signal included in the received voice varies depending on whether the voice switch is turned on or off. As a result, the speech speed conversion device provided in the telephone device may erroneously detect a fluctuation in the level of the noise signal as voice. For example, since the voice switch stops the attenuation of the received signal when the user's transmission is completed, the speech speed converting device uses the voice switch to turn off the voice switch, and the level of noise due to the line echo component increases. Since it is determined that the signal is a signal, the speech section cannot be normally detected (see FIG. 7B).

そこで、第１の実施形態では、受話信号に含まれる音声区間の誤検出を抑制し、ユーザにとって聞き取り易い話速変換を効率的に行うことができる電話装置の例について説明する。 Therefore, in the first embodiment, an example of a telephone device that can suppress erroneous detection of a voice section included in a received signal and can efficiently perform speech speed conversion that is easy for a user to hear will be described.

以下、適宜図面を参照しながら、本発明に係る振り込め電話装置を具体的に開示した各実施形態を詳細に説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の範囲に記載の主題を限定することは意図されていない。 Hereinafter, embodiments that specifically disclose a transfer telephone device according to the present invention will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter described in the claims.

（第１の実施形態）
図１は、第１の実施形態における電話機１０Ａ，１０Ｂ間の接続を示す図である。本発明に係る電話装置の一例としての電話機１０Ａと電話機１０Ｂは、公衆アナログ回線５０を含む電話交換網７を介して、互いに通話可能に接続される。 (First embodiment)
FIG. 1 is a diagram illustrating a connection between the telephones 10A and 10B according to the first embodiment. A telephone 10A and a telephone 10B as an example of a telephone device according to the present invention are connected to each other via a telephone exchange network 7 including a public analog line 50 so as to be able to make a call.

電話機１０Ａは、交換機としての役割を持つ２線４線変換回路３０Ａを有する。２線４線変換回路３０Ａは、電話機１０Ａ内のマイク及びスピーカ（後述参照）に対してそれぞれプラス、マイナスで２線毎の計４線の信号を、アースと信号線の２線の信号に変換する。２線４線変換回路３０Ａは、公衆アナログ回線５０に接続される。 The telephone 10A has a two-wire four-wire conversion circuit 30A that serves as an exchange. The two-wire four-wire conversion circuit 30A converts a total of four signals for every two wires into a two-wire signal of a ground and a signal wire with respect to the microphone and the speaker (described later) in the telephone 10A. To do. The 2-wire 4-wire conversion circuit 30A is connected to the public analog line 50.

同様に、電話機１０Ｂは、交換機としての役割を持つ２線４線変換回路３０Ｂを有する。２線４線変換回路３０Ｂは、電話機１０Ｂ内のマイク及びスピーカ（後述参照）に対してそれぞれプラス、マイナスで２線毎の計４線の信号を、アースと信号線の２線の信号に変換する。２線４線変換回路３０Ｂは、公衆アナログ回線５０に接続される。２線４線変換回路３０Ａ，３０Ｂは、公衆アナログ回線５０に２線で接続される。 Similarly, the telephone set 10B includes a two-wire / four-wire conversion circuit 30B that serves as an exchange. The two-wire / four-wire conversion circuit 30B converts a total of four signals for every two wires, plus and minus, with respect to the microphone and the speaker (see later) in the telephone 10B, into two signals of the ground and the signal wire. To do. The 2-wire 4-wire conversion circuit 30B is connected to the public analog line 50. The two-wire / four-wire conversion circuits 30A and 30B are connected to the public analog line 50 by two wires.

本実施形態の電話機１０Ａ，１０Ｂは、一般に普及している公衆アナログ回線５０に接続される固定電話機である。 The telephones 10A and 10B of the present embodiment are fixed telephones that are connected to a public analog line 50 that is generally popular.

なお、電話機１０Ａ，１０Ｂを特に区別する必要が無い場合、単に電話機１０と称することもある。また、２線４線変換回路３０Ａ，３０Ｂを特に区別する必要が無い場合、単に２線４線変換回路３０と称することもある。 Note that the telephones 10A and 10B may be simply referred to as the telephone 10 when it is not necessary to distinguish between them. In addition, when there is no need to distinguish between the two-wire four-wire conversion circuits 30A and 30B, they may be simply referred to as the two-wire four-wire conversion circuit 30.

また、上述した交換機としては、例えば会社のオフィス等に設置され、内線電話と外線電話とを交換する構内交換機（ＰＢＸ：Private Branch eXchange）であってもよい。この場合、電話機には、構内交換機に接続されるビジネスホンが用いられる。本実施形態では、公衆アナログ回線を使って通話を行う場合を想定しているが、デジタル回線を使って通話を行う場合も、本発明は同様に適用可能である。 In addition, the exchange described above may be a private branch exchange (PBX) that is installed in a company office or the like and exchanges an extension telephone and an external telephone, for example. In this case, a business phone connected to the private branch exchange is used as the telephone. In the present embodiment, it is assumed that a telephone call is made using a public analog line, but the present invention can be similarly applied to a telephone call using a digital line.

図２は、２線４線変換回路３０に接続される電話機１０の話速変換に係わる部分の内部構成の一例を詳細に示す図である。図２に示す電話機１０は、マイク１１、スピーカ１２、受話ゲイン制御部１５、区間検出補正部１７、スロートーク用ボタン１６、及び話速変換部２０を含む構成である。 FIG. 2 is a diagram showing in detail an example of the internal configuration of the portion related to the speech speed conversion of the telephone set 10 connected to the 2-wire 4-wire conversion circuit 30. The telephone 10 shown in FIG. 2 includes a microphone 11, a speaker 12, a reception gain control unit 15, a section detection correction unit 17, a slow talk button 16, and a speech speed conversion unit 20.

マイク１１は、電話機１０を使用するユーザの音声（つまり、送話側の音声）を収音して入力する。 The microphone 11 collects and inputs the voice of the user who uses the telephone 10 (that is, the voice on the transmission side).

スピーカ１２は、通話相手の電話機１０からの送信信号に含まれる音声（つまり、受話側の音声）を音声出力する。 The speaker 12 outputs the voice (that is, the voice on the receiving side) included in the transmission signal from the telephone 10 of the other party.

受話ゲイン制御部１５は、例えばボイススイッチ、センタークリッパ及びＡＧＣ（Auto Gain Control：自動利得制御器）のうちいずれかを用いて、ユーザの音声の送話時における電話機１０の受話信号のレベルの減衰を制御する。受話信号には、２線４線変換回路３０に基づく回線エコー成分の信号、更に、音声や雑音の信号が含まれている。 The reception gain control unit 15 uses, for example, any one of a voice switch, a center clipper, and an AGC (Auto Gain Control) to attenuate the level of the reception signal of the telephone 10 when the user's voice is transmitted. To control. The received signal includes a signal of a line echo component based on the two-wire / four-wire conversion circuit 30, and further a voice or noise signal.

例えばボイススイッチは、電話機１０における送話の有無を判定するととともに、送受話のスイッチングを行い、送話があると判定した時には受話信号のレベルを減衰する。 For example, the voice switch determines whether or not there is a transmission in the telephone 10 and performs switching of transmission and reception. When it is determined that there is a transmission, the level of the reception signal is attenuated.

センタークリッパは、電話機１０における送話時に受話信号のレベルをほぼ値０にする。 The center clipper sets the level of the received signal to a value of about 0 when the telephone 10 transmits.

ＡＧＣは、電話機１０における送話側の音声が大きいと、その音量に応じて受話信号のレベルを下げる。 When the voice on the transmitting side of the telephone 10 is high, the AGC lowers the level of the received signal according to the volume.

受話ゲイン制御部１５は、受話側の減衰量を随時、区間検出補正部１７に出力する。 The reception gain control unit 15 outputs the reception side attenuation to the section detection correction unit 17 as needed.

閾値補正部の一例としての区間検出補正部１７は、受話ゲイン制御部１５から得られた受話信号の減衰量に関する情報を基に、話速変換部２０による音声の区間検出に用いる閾値を補正する。音声の区間検出に用いる閾値の補正には、例えば３通りの方法がある。閾値は、例えば受話信号の長時間平均ＡｖＬ（図５（Ｂ）参照）である雑音レベルを用いて設定される。この受話信号の長時間平均ＡｖＬは、雑音レベル推定部２１ｚにより導出されて得られる。 The section detection correction unit 17 as an example of the threshold correction unit corrects a threshold used for speech section detection by the speech speed conversion unit 20 based on information on the attenuation amount of the reception signal obtained from the reception gain control unit 15. . There are, for example, three methods for correcting the threshold value used for detecting the voice section. The threshold is set using a noise level that is, for example, a long-term average AvL (see FIG. 5B) of the received signal. The long-term average AvL of the received signal is obtained by being derived by the noise level estimation unit 21z.

第１の補正方法では、区間検出補正部１７は、雑音レベル推定部２１ｚにより推定された雑音レベルの長時間平均ＡｖＬに、受話ゲイン制御部１５から得られた受話の減衰量（ゲイン）を掛け合わせることで、閾値を設定する。即ち、第１の補正方法によると、受話信号の減衰分に応じて閾値が下がる。 In the first correction method, the section detection correction unit 17 multiplies the long-term average AvL of the noise level estimated by the noise level estimation unit 21z by the attenuation amount (gain) of the reception obtained from the reception gain control unit 15. The threshold is set by combining them. That is, according to the first correction method, the threshold value is lowered according to the attenuation amount of the received signal.

第２の補正方法では、区間検出補正部１７は、受話ゲイン制御部１５により受話信号が減衰されている間、その減衰分に応じて受話信号の増幅を行う旨の制御信号を音声区間検出部２１に指示する。音声区間検出部２１の信号増幅部２１ｘは、区間検出補正部１７からの制御信号に応じて、入力された受話信号を増幅し、その増幅後の受話信号から雑音レベルを推定する。即ち、第２の補正方法によると、増幅された受話信号に含まれる、推定された雑音レベルにより閾値が設定される。 In the second correction method, the section detection correction unit 17 transmits a control signal indicating that the reception signal is amplified according to the attenuation while the reception signal is attenuated by the reception gain control unit 15. 21 is instructed. The signal amplification unit 21x of the speech section detection unit 21 amplifies the received reception signal in accordance with the control signal from the section detection correction unit 17, and estimates the noise level from the amplified reception signal. That is, according to the second correction method, the threshold is set according to the estimated noise level included in the amplified received signal.

第３の補正方法では、区間検出補正部１７は、受話ゲイン制御部１５により受話信号が減衰されている間、音声区間検出部２１の雑音レベル推定部２１ｚにより雑音レベルの推定を行わない。即ち、第３の補正方法によると、閾値は、既定の雑音レベル（つまり、図５（Ｂ）に示す受話信号の長時間平均ＡｖＬ）に設定される。但し、既定という意味は、固定という意味ではなく、受話環境によっては、雑音レベルが変動することもあり得るため、都度導出される雑音レベルという意味である。 In the third correction method, the section detection correction unit 17 does not estimate the noise level by the noise level estimation unit 21z of the voice section detection unit 21 while the reception signal is attenuated by the reception gain control unit 15. That is, according to the third correction method, the threshold value is set to a predetermined noise level (that is, the long-term average AvL of the received signal shown in FIG. 5B). However, the term “predetermined” does not mean “fixed”, but may mean that the noise level is derived every time because the noise level may vary depending on the reception environment.

第１、第２の各補正方法は、受話ゲイン制御部１５が例えばセンタークリッパかＡＧＣを用いて構成される場合に行われる。 The first and second correction methods are performed when the reception gain control unit 15 is configured using, for example, a center clipper or AGC.

第３の補正方法は、受話ゲイン制御部１５が例えばボイススイッチを用いて構成される場合に行われる。 The third correction method is performed when the reception gain control unit 15 is configured using, for example, a voice switch.

区間検出補正部１７は、第１の補正方法で補正する場合、雑音レベルの長時間平均ＡｖＬに受話の減衰量（ゲイン）を掛け合わせた、変更後の閾値を使用する旨の制御信号を話速変換部２０に出力する。 In the case of correcting with the first correction method, the section detection correction unit 17 speaks a control signal indicating that the changed threshold value obtained by multiplying the long-term average AvL of the noise level by the attenuation amount (gain) of the reception is used. Output to the speed converter 20.

区間検出補正部１７は、第２の補正方法で補正する場合、受話ゲイン制御部１５による減衰前の受話信号に含まれる雑音レベルを閾値として使用する旨の制御信号を話速変換部２０に出力する。この制御信号には、上述したように、受話ゲイン制御部１５による受話信号の減衰量に応じて、減衰後の信号レベルが減衰前の雑音レベルになるように増幅されるための指示が含まれる。 When correction is performed by the second correction method, the section detection correction unit 17 outputs a control signal indicating that the noise level included in the reception signal before attenuation by the reception gain control unit 15 is used as a threshold value to the speech speed conversion unit 20. To do. As described above, this control signal includes an instruction for amplification so that the signal level after attenuation becomes the noise level before attenuation according to the attenuation amount of the reception signal by the reception gain control unit 15. .

区間検出補正部１７は、第３の補正方法の場合、既定の閾値（つまり、図５（Ｂ）に示す受話信号の長時間平均ＡｖＬ）の使用を指示するための制御信号を話速変換部２０に出力する。 In the case of the third correction method, the section detection correction unit 17 transmits a control signal for instructing the use of a predetermined threshold (that is, the long-term average AvL of the reception signal shown in FIG. 5B) to the speech speed conversion unit. 20 is output.

話速変換部２０は、受話側の音声信号を時間方向に伸張させることで、元の音声を話者がゆっくりと話した場合と似た音声に変換する機能を有する。話速変換を行う場合、一定の変換率で動作させ続けると、実時間に対して際限なく遅延が生じる。この場合、リアルタイムの通話を行う電話機では、会話が成立しなくなるおそれがある。リアルタイムの話速変換処理では、受話の音声区間を検出し、音声区間の話速を遅くする際、非音声区間を短縮して遅延を回復させる。これにより、実時間との遅延が少なく、話速変換による再生（スロー再生）の効果が得られる。 The speech speed conversion unit 20 has a function of converting the original voice into a voice similar to that when the speaker slowly speaks by expanding the voice signal on the receiving side in the time direction. When speech speed conversion is performed, if the operation is continued at a constant conversion rate, a delay occurs indefinitely with respect to real time. In this case, there is a possibility that the conversation cannot be established on a telephone that performs a real-time call. In the real-time speech speed conversion process, when a speech section of an incoming call is detected and the speech speed of the speech section is reduced, the non-speech section is shortened to recover the delay. Thereby, there is little delay with real time and the effect of reproduction | regeneration (slow reproduction | regeneration) by speech speed conversion is acquired.

従って、話速変換部２０は、音声区間を伸ばし、非音声区間を短縮することで、遅延を防ぐことが可能である。例えば、「もしもしこんにちは」の音声を再生する場合、「もしもし」と「こんにちは」の音声を長く伸ばし、「もしもし」と「こんにちは」の間である、非音声区間を縮める。 Therefore, the speech speed conversion unit 20 can prevent delay by extending the voice section and shortening the non-voice section. For example, if you want to play the voice of "Hello Hello", extended to be long the voice of the "Hello", "Hello", it is between "Hello" and "Hello", reducing the non-speech section.

スロートーク用ボタン１６は、話速変換部２０のオン／オフを切り替える、ユーザによって押下自在なスイッチである。また、スロートーク用ボタン１６は、話速変換部２０のオン時に点灯し、オフ時に消灯するインジケータ１６ｚを有しており、押下される度に点灯と消灯を繰り返す。インジケータ１６ｚは、電話機１０の状況を示すために設けられている。 The slow talk button 16 is a switch that can be pressed by the user to switch on / off of the speech speed conversion unit 20. The slow talk button 16 has an indicator 16z that is turned on when the speech speed conversion unit 20 is turned on and turned off when turned off, and is repeatedly turned on and off each time it is pressed. The indicator 16z is provided to indicate the status of the telephone 10.

図３は、話速変換部２０の内部構成の一例を詳細に説明するブロック図である。話速変換部２０は、音声区間検出部２１、音声変換部２２、非音声変換部２３、及び信号メモリ２４を含む構成である。 FIG. 3 is a block diagram illustrating an example of the internal configuration of the speech speed conversion unit 20 in detail. The speech speed conversion unit 20 includes a speech section detection unit 21, a speech conversion unit 22, a non-speech conversion unit 23, and a signal memory 24.

区間検出部の一例としての音声区間検出部２１は、雑音レベル推定部２１ｚ及び信号増幅部２１ｘを有し、受話側の入力信号に含まれる音声の区間を検出する。 The speech section detection unit 21 as an example of the section detection unit includes a noise level estimation unit 21z and a signal amplification unit 21x, and detects a speech section included in the input signal on the reception side.

雑音レベル推定部２１ｚは、受話側の入力信号に含まれる非音声区間において、ユーザの周囲の雑音信号のレベル（つまり、雑音レベル）を推定する。 The noise level estimation unit 21z estimates the level of the noise signal around the user (that is, the noise level) in the non-voice section included in the input signal on the receiver side.

信号増幅部２１ｘは、前述した第２の補正方法で用いられ、区間検出補正部１７からの制御信号に従い、受話信号の減衰分に応じて、入力信号に含まれる雑音信号を増幅する。 The signal amplification unit 21x is used in the second correction method described above, and amplifies the noise signal included in the input signal according to the attenuation amount of the received signal in accordance with the control signal from the section detection correction unit 17.

音声変換部２２は、音声区間を伸ばして音声信号を遅延させる。音声変換部２２は、母音・子音判定部２２ｚ及び遅延付加部２２ｙを有する。 The voice conversion unit 22 extends the voice section and delays the voice signal. The voice conversion unit 22 includes a vowel / consonance determination unit 22z and a delay addition unit 22y.

母音・子音判定部２２ｚは、音声信号に含まれる母音と子音を判定する。 The vowel / consonant determination unit 22z determines a vowel and a consonant included in the audio signal.

遅延付加部２２ｙは、母音の音声信号（図９（Ａ）参照）を遅延させ、子音の音声信号（図９（Ｂ）参照）を遅延させない。 The delay adding unit 22y delays the vowel sound signal (see FIG. 9A) and does not delay the consonant sound signal (see FIG. 9B).

非音声変換部２３は、非音声区間を短縮（圧縮）する。 The non-speech conversion unit 23 shortens (compresses) the non-speech section.

信号メモリ２４は、入力された音声信号（入力信号）を一時的に記憶し、また、音声変換部２２及び非音声変換部２３から出力される音声信号（出力信号）を一時的に記憶する音声バッファである。信号メモリ２４は、話速変換される音声データの量で既定される、小容量のメモリである。 The signal memory 24 temporarily stores the input audio signal (input signal), and also stores the audio signal (output signal) output from the audio conversion unit 22 and the non-audio conversion unit 23 temporarily. It is a buffer. The signal memory 24 is a small-capacity memory that is defined by the amount of voice data to be converted into speech speed.

図４（Ａ），（Ｂ）は、話速変換前後の音声信号を示すタイミングチャートである。図４（Ａ）は、入力された受話側の音声信号（入力信号）を示す。図４（Ｂ）は、話速変換後の音声信号（出力信号）を示す。入力信号は、話速変換前の信号であり、話速は１００％である。一方、出力信号は、話速変換後の信号であり、話速は６５％である。 4A and 4B are timing charts showing audio signals before and after speech speed conversion. FIG. 4A shows an input voice signal (input signal) on the receiving side. FIG. 4B shows an audio signal (output signal) after speech speed conversion. The input signal is a signal before the speech speed conversion, and the speech speed is 100%. On the other hand, the output signal is a signal after the speech speed conversion, and the speech speed is 65%.

話速変換の結果、受話側の音声は、話者が早口で話しても、ゆっくりと聞き易く話した場合と似たような音声となる。従って、高齢者や聴覚身障者等が受話側の音声を聴き取り易くなる。 As a result of the speech speed conversion, the voice on the receiving side is similar to that when the speaker speaks quickly and is easy to hear. Therefore, it becomes easy for elderly people, hearing-impaired persons, and the like to hear the voice on the receiving side.

図５（Ａ），（Ｂ），（Ｃ）は、音声区間の検出方法を説明する図である。図５（Ａ）は、音声区間の検出結果を示すタイミングチャートである。図５（Ｂ）は、信号波形の時間平均値を示すタイミングチャートである。図５（Ｃ）は、音声波形のタイミングチャートである。 5A, 5B, and 5C are diagrams for explaining a method of detecting a speech section. FIG. 5A is a timing chart showing the detection result of the voice section. FIG. 5B is a timing chart showing the time average value of the signal waveform. FIG. 5C is a timing chart of the audio waveform.

音声区間検出部２１は、図５（Ｃ）に示す受話側の音声を入力すると、この音声信号の長時間平均ＡｖＬと短時間平均ＡｖＳとを演算して導出する。長時間平均ＡｖＬでは、話者が話している時間と黙っている時間とを含むような、十分に長い時間、例えば３分、５分等が設定される。一方、短時間平均ＡｖＳでは、話者が話している音声の大きさ（音量）を捉えられるような、短い時間、例えば３秒、５秒等が設定される。 When the voice on the receiving side shown in FIG. 5C is input, the voice section detection unit 21 calculates and derives the long-time average AvL and the short-time average AvS of this voice signal. In the long-term average AvL, a sufficiently long time, for example, 3 minutes, 5 minutes, or the like is set to include the time when the speaker is speaking and the time when the speaker is silent. On the other hand, in the short-time average AvS, a short time, for example, 3 seconds, 5 seconds, or the like is set so as to capture the volume (volume) of the voice spoken by the speaker.

音声区間検出部２１は、図５（Ｂ）に示すように、音声信号の長時間平均ＡｖＬを雑音レベルとみなし、音声区間を判定するための閾値に設定する。また、音声区間検出部２１は、音声信号の短時間平均ＡｖＳを音声レベルとみなす。短時間平均ＡｖＳの値は、長時間平均ＡｖＬの値と比べ、話者が話している時に大きな値となって変動する。 As shown in FIG. 5B, the speech section detection unit 21 regards the long-time average AvL of the speech signal as a noise level and sets it as a threshold for determining the speech section. Further, the voice section detection unit 21 regards the short time average AvS of the voice signal as the voice level. The value of the short-term average AvS is a large value and fluctuates when the speaker is speaking, compared with the value of the long-term average AvL.

音声区間検出部２１は、図５（Ａ）に示すように、音声レベルが雑音レベルより大きい区間を音声区間として検出し、音声レベルが雑音レベル以下である区間を非音声区間として検出する。図５（Ａ）に示すタイミングチャートでは、音声区間を値１で表し、非音声区間を値０で表す。 As shown in FIG. 5A, the voice section detection unit 21 detects a section where the voice level is higher than the noise level as a voice section, and detects a section where the voice level is equal to or lower than the noise level as a non-voice section. In the timing chart shown in FIG. 5A, a voice interval is represented by a value 1, and a non-voice segment is represented by a value 0.

上述した構成を有する電話機１０の動作について説明する。 The operation of the telephone 10 having the above-described configuration will be described.

図６は、第１の実施形態の通話動作手順の一例を詳細に説明するフローチャートである。電話機１０が、公衆交換電話網（公衆アナログ回線）を介して、通話相手の電話機と接続されると、通話が開始される。 FIG. 6 is a flowchart illustrating in detail an example of a call operation procedure according to the first embodiment. When the telephone 10 is connected to the telephone of the other party via the public switched telephone network (public analog line), the telephone call is started.

通話中、つまり、リアルタイムで音声が送受信される時、ユーザによってスロートーク用ボタン１６が押下されると、話速変換部２０は、話速変換（スロートーク）を開始する（Ｓ１）。スロートークの開始は、スロートーク用ボタン１６のインジケータ１６ｚが点灯してユーザに知らせる。なお、電話機１０は、予め常にスロートークを行うように設定しておくことも可能である。この場合、スロートーク用ボタンを省くことができる。さらに、この場合、通常トークに戻したい時に押下自在なスロートーク解除用ボタンが設けられてもよい。 When the user presses the slow talk button 16 during a call, that is, when voice is transmitted and received in real time, the speech speed conversion unit 20 starts speech speed conversion (slow talk) (S1). The start of the slow talk is notified to the user by the indicator 16z of the slow talk button 16 being lit. The telephone 10 can be set to always perform slow talk in advance. In this case, the slow talk button can be omitted. Further, in this case, a slow talk release button that can be pressed when returning to normal talk may be provided.

受話ゲイン制御部１５は、受話を減衰中であるか否かを判別する（Ｓ２）。受話を減衰中である場合、区間検出補正部１７は、受話ゲイン制御部１５から得られた受話の減衰量（ゲイン）を入力し、受話の減衰量を基に音声区間検出の補正を開始する（Ｓ３）。 The reception gain control unit 15 determines whether the reception is being attenuated (S2). When the reception is being attenuated, the section detection correction unit 17 inputs the attenuation amount (gain) of the reception obtained from the reception gain control unit 15 and starts correcting the detection of the voice section based on the attenuation amount of the reception. (S3).

区間検出補正部１７は、前述した３通りの補正方法のいずれか又は組み合わせで音声区間検出の補正を行う（Ｓ４）。 The section detection correction unit 17 corrects the voice section detection by any one or combination of the three correction methods described above (S4).

第１の補正方法では、区間検出補正部１７は、音声区間を検出するための閾値を受話の減衰量に見合った分だけ下げるように、雑音レベル推定部２１ｚに制御信号を出力する（Ｓ４Ａ）。 In the first correction method, the section detection correction unit 17 outputs a control signal to the noise level estimation unit 21z so as to lower the threshold for detecting the voice section by an amount corresponding to the attenuation of the received call (S4A). .

第２の補正方法では、区間検出補正部１７は、受話信号を減衰量に見合った分だけ信号増幅部２１ｘで増幅し、雑音レベル推定部２１ｚに対し、増幅後の信号で雑音レベルを推定するように制御信号を出力する（Ｓ４Ｂ）。 In the second correction method, the section detection correction unit 17 amplifies the received signal by the amount corresponding to the attenuation amount by the signal amplification unit 21x, and estimates the noise level from the amplified signal to the noise level estimation unit 21z. In this manner, a control signal is output (S4B).

第３の補正方法では、区間検出補正部１７は、雑音レベル推定部２１ｚに対し、受話の減衰中、雑音レベルの推定を行わないように制御信号を出力する（Ｓ４Ｃ）。この場合、雑音レベル推定部２１ｚは、雑音レベルの推定を行わず、受信信号と比較される閾値には、既定の閾値が用いられる。 In the third correction method, the section detection correction unit 17 outputs a control signal to the noise level estimation unit 21z so as not to estimate the noise level during the attenuation of the received call (S4C). In this case, the noise level estimation unit 21z does not estimate the noise level, and a predetermined threshold is used as a threshold compared with the received signal.

前述したように、第１及び第２の補正方法は、受話ゲイン制御部１５がセンタークリッパあるいはＡＧＣで構成される場合に有効である。第３の補正方法は、受話ゲイン制御部１５がボイススイッチで構成される場合に有効である。 As described above, the first and second correction methods are effective when the reception gain control unit 15 is configured by a center clipper or AGC. The third correction method is effective when the reception gain control unit 15 is configured by a voice switch.

音声区間検出部２１は、受話信号を閾値と比較して、音声区間を検出する（Ｓ５）。この音声区間は、前述したように、図５（Ｂ）に示す音声の長時間平均ＡｖＬが短時間平均ＡｖＳより大きいことによって検出される。音声区間検出部２１は、受話側が音声区間であるか否かを判別する（Ｓ６）。 The voice section detection unit 21 detects the voice section by comparing the received signal with a threshold value (S5). As described above, this voice section is detected when the long-term average AvL of the voice shown in FIG. 5B is larger than the short-term average AvS. The voice section detection unit 21 determines whether or not the receiving side is a voice section (S6).

音声区間である場合、音声変換部２２は、音声区間を伸ばして再生（スロー再生）する（Ｓ７）。一方、非音声区間である場合、非音声変換部２３は、非音声区間を圧縮する（Ｓ８）。ステップＳ７、Ｓ８の処理は、前述したように公知の技術である。 In the case of a voice section, the voice conversion unit 22 performs playback (slow playback) by extending the voice section (S7). On the other hand, if it is a non-speech segment, the non-speech conversion unit 23 compresses the non-speech segment (S8). The processing in steps S7 and S8 is a known technique as described above.

ステップＳ７、Ｓ８の処理後、話速変換部２０は、話速変換（スロートーク）終了であるか否かを判別する（Ｓ９）。スロートークの終了は、スロートーク用ボタン１６が再度押下されることによって行われる。なお、スロートーク用解除ボタンが設けられている場合にこのスロートーク解除用ボタンが押下される、あるいはオンフックにより通話が終了することで、スロートークが終了してもよい。 After the processes in steps S7 and S8, the speech speed conversion unit 20 determines whether or not the speech speed conversion (slow talk) has ended (S9). The slow talk is ended when the slow talk button 16 is pressed again. When the slow talk release button is provided, the slow talk may be ended by pressing the slow talk release button or by terminating the call by on-hook.

スロートークが終了でない場合、話速変換部２０はステップＳ２に戻り、同様の処理を繰り返す。一方、スロートークが終了する場合、話速変換部２０は本動作を終了する。スロートークの終了によって、スロートーク用ボタン１６は消灯する。 If the slow talk is not finished, the speech speed conversion unit 20 returns to step S2 and repeats the same processing. On the other hand, when the slow talk ends, the speech speed conversion unit 20 ends the operation. When the slow talk is finished, the slow talk button 16 is turned off.

受話ゲイン制御部１５としてボイススイッチを用いた場合の受話音声について説明する。図７（Ａ），（Ｂ）は、ボイススイッチのオン前後の受話音声の変化を示すタイミングチャートである。図７（Ａ）は、ボイススイッチのオフ時の受話音声の信号波形を示す。受話の音声区間後における送話では、回線エコーによる雑音が生じている。 The received voice when a voice switch is used as the reception gain control unit 15 will be described. FIGS. 7A and 7B are timing charts showing changes in the received voice before and after the voice switch is turned on. FIG. 7A shows the signal waveform of the received voice when the voice switch is off. In the transmission after the voice section of reception, noise due to line echo is generated.

図７（Ｂ）は、ボイススイッチのオン後の受話音声の信号波形を示す。前述したように、ボイススイッチは、送話の有無を判定し、送受話のスイッチングを行い、送話がある時に受話信号を減衰させる。ボイススイッチがオンであると、受話の後、送話がある期間（図７の期間ｔｂ）では、ボイススイッチがオンとなり、受話側の雑音信号を低減するが、送話の途中あるいは送話が終わってから受話が始まるまでの期間、つまり送話も受話も無い期間（図７の期間ｔａ，ｔｃ）では、ボイススイッチがオフとなる。このボイススイッチのオフによって雑音信号が大きく変動し、音声区間検出部２１は、この雑音信号が急に大きくなった時（立ち上がった時）に雑音を受話側の音声と誤検出してしまう。この結果、音声区間検出部２１は、実際の受話の音声区間よりも長い期間、音声区間として検出してしまう。 FIG. 7B shows the signal waveform of the received voice after the voice switch is turned on. As described above, the voice switch determines whether or not there is a transmission, performs transmission / reception switching, and attenuates the reception signal when there is a transmission. When the voice switch is on, the voice switch is turned on and the noise signal on the receiving side is reduced during a period when there is a transmission after reception (period tb in FIG. 7). The voice switch is turned off in the period from the end to the start of the reception, that is, the period in which neither transmission nor reception is performed (periods ta and tc in FIG. 7). When the voice switch is turned off, the noise signal fluctuates greatly, and the voice section detection unit 21 erroneously detects the noise as a voice on the receiving side when the noise signal suddenly increases (rises). As a result, the voice segment detection unit 21 detects a voice segment for a period longer than the actual voice segment of the incoming call.

本実施形態における区間検出補正部１７は、前述した第１、第２又は第３の補正方法で、期間ｔａ，ｔｃが音声区間に含まれないように、音声区間検出の補正を行い、正確な音声区間（図７の期間ｔｄ）を得る。 The section detection correction unit 17 in the present embodiment corrects the voice section detection so that the periods ta and tc are not included in the voice section by the above-described first, second, or third correction method, and is accurate. A voice section (period td in FIG. 7) is obtained.

次に、母音と子音を含む音声に対し、母音と子音を区別して話速変換を行う場合について説明する。図８（Ａ），（Ｂ）は、母音及び子音を含む音声に対する話速変換の仕方を説明する図である。通常、多くの音声は、母音と子音とで構成される。例えば、「あさ」の音声をゆっくりと話す場合を一例として示す。「あさ」の音声を母音と子音で区別し易いように、アルファベットで示すと「ＡＳＡ」となり、「Ａ」が母音であり、「Ｓ」が子音であり、「Ａ」が母音である。 Next, a case where speech speed conversion is performed by distinguishing vowels and consonants from speech including vowels and consonants will be described. FIGS. 8A and 8B are diagrams for explaining a method of speech speed conversion for speech including vowels and consonants. Usually, many voices are composed of vowels and consonants. For example, the case where the voice of “Asa” is spoken slowly is shown as an example. In order to make it easy to distinguish the voice of “ASA” from vowels and consonants, it is expressed as “ASA”, “A” is a vowel, “S” is a consonant, and “A” is a vowel.

図８（Ａ）に示すように、普通に話す場合と比べてゆっくりと話す場合には、子音の「Ｓ」はあまり伸ばさずに、母音の「Ａ」が長く伸びる傾向にある。このような話声が自然な肉声である。一方、話速変換部２０は、図８（Ｂ）に示すように、子音も母音も区別することなく、一律に音声を伸張すると、肉声とは異なる違和感が生じる。 As shown in FIG. 8A, when speaking slowly as compared to normal speaking, the consonant “S” does not increase so much and the vowel “A” tends to extend longer. Such a voice is a natural voice. On the other hand, as shown in FIG. 8B, when the speech speed conversion unit 20 uniformly expands the voice without distinguishing between consonants and vowels, a sense of incongruity different from that of the real voice occurs.

本実施形態では、話速変換部２０は、ＡＭＤＦ(Average Magnitude Deference function)値を用いて、音声区間の母音らしさ、子音らしさの度合いを算出し、子音区間よりも母音区間が遅い話速になるように、話速を変換する。ＡＭＤＦ値は、音声の基本周期（繰り返し周期）を求めるために話速変換部２０によって算出される。音声の基本周期の検出では、その波形と時間をずらした波形との相関の度合い（自己相関値）を求めて、相関が最も強くなる間隔（ピッチ）が求められる。自己相関の計算には、信号同士の積算や減算を用いる方法があるが、本実施形態では、比較的減算量が少ない、減算（差分）を用いてＡＭＤＦ値を求める。 In the present embodiment, the speech speed conversion unit 20 calculates the degree of vowel-likeness and consonant-likeness of a speech section using an AMDF (Average Magnitude Deference function) value, and the speech speed of the vowel section is slower than that of the consonant section. To convert the speaking speed. The AMDF value is calculated by the speech speed conversion unit 20 in order to obtain the basic period (repetition period) of speech. In detecting the basic period of speech, the degree of correlation (autocorrelation value) between the waveform and the waveform shifted in time is obtained, and the interval (pitch) at which the correlation is strongest is obtained. For the calculation of autocorrelation, there is a method of using integration or subtraction of signals. In this embodiment, the AMDF value is obtained using subtraction (difference) with a relatively small subtraction amount.

図９（Ａ），（Ｂ）は、ＡＭＤＦ値の時間変化を示すタイミングチャートである。縦軸はＡＭＤＦ値を表し、横軸は時間差（Δｔ）を表す。図９（Ａ）は、母音区間におけるＡＭＤＦ値の時間変化を示す。母音区間では、図９（Ａ）に示すように、音声信号の周期性が強く、子音区間では、図９（Ｂ）に示すように、音声信号の周期性が弱い。信号の周期性が強い程、ＡＤＭＦ値の最小値が小さくなってピークとして現れるが、ＡＤＭＦ値の最大値は変化しない。 FIGS. 9A and 9B are timing charts showing temporal changes in AMDF values. The vertical axis represents the AMDF value, and the horizontal axis represents the time difference (Δt). FIG. 9A shows the time change of the AMDF value in the vowel section. In the vowel section, the periodicity of the speech signal is strong as shown in FIG. 9A, and in the consonant section, the periodicity of the speech signal is weak as shown in FIG. 9B. The stronger the periodicity of the signal, the smaller the minimum value of the ADMF value appears as a peak, but the maximum value of the ADMF value does not change.

従って、数式（１）に示すように、母音らしさは、変数Ｘによって表される。話速変換部２０は、変数Ｘが小さくて値０に近い程、その区間は母音らしいと判断する。 Therefore, the vowel-likeness is represented by the variable X as shown in the equation (1). The speech speed conversion unit 20 determines that the interval is more likely to be a vowel as the variable X is smaller and closer to the value 0.

Ｘ＝ＡＭＤＦ値の最小値ｍＢ／ＡＭＤＦ値の最大値ｍＡ …… （１） X = AMDF value minimum value mB / AMDF value maximum value mA (1)

話速変換部２０は、変数Ｘが小さく区間ほど、ゆっくりとした音声で肉声に近い自然な音声になるように、話速変換を行う。ここでは、話速変換部２０は、変数Ｘの値が小さくなるほど、遅延量が大きくなるように連続して変化させたが、閾値Ｔｈ１を設定し、変数Ｘと閾値Ｔｈ１を比較することで、母音と子音とを区別してもよい。即ち、話速変換部２０は、変数Ｘ＜Ｔｈ１である場合、母音であると判定し、遅延量を大きくし、変数Ｘ ≧ Ｔｈ１である場合、子音であると判定し、遅延量を値０もしくは小さくしてもよい。 The speech speed conversion unit 20 performs the speech speed conversion so that the variable X becomes smaller and the section becomes a natural voice close to the real voice with a slower voice. Here, the speech speed conversion unit 20 is continuously changed so that the delay amount increases as the value of the variable X decreases. However, by setting the threshold value Th1 and comparing the variable X and the threshold value Th1, A vowel and a consonant may be distinguished. That is, when the variable X <Th1, the speech speed conversion unit 20 determines that it is a vowel, increases the delay amount, and when the variable X ≧ Th1, determines that it is a consonant and sets the delay amount to the value 0. Or you may make it small.

また、変数Ｘとして、ＡＭＤＦ値の最小値ｍＢとＡＭＤＦ値の最大値ｍＡとの比率を用いたが、ＡＭＤＦ値の最小値ｍＢとＡＭＤＦ値の最大値ｍＡとの差分の絶対値を用いてもよい。この場合、差分の絶対値で表される変数Ｘを閾値Ｔｈ２と比較することで、母音と子音とを区別してもよい。即ち、話速変換部２０は、変数Ｘ＞Ｔｈ２である場合、母音であると判定し、遅延量を大きくし、変数Ｘ ≦ Ｔｈ２である場合、子音であると判定し、遅延量を値０もしくは小さくしてもよい。 Further, although the ratio of the minimum value mB of the AMDF value and the maximum value mA of the AMDF value is used as the variable X, the absolute value of the difference between the minimum value mB of the AMDF value and the maximum value mA of the AMDF value may be used. Good. In this case, the vowel and the consonant may be distinguished by comparing the variable X represented by the absolute value of the difference with the threshold Th2. That is, when the variable X> Th2, the speech speed conversion unit 20 determines that it is a vowel, increases the delay amount, and when the variable X ≦ Th2, determines that it is a consonant and sets the delay amount to the value 0. Or you may make it small.

このように、肉声に近い音声になるように、母音及び子音を含む音声に対する話速変換を行うことができる。また、子音に対して話速変換を行わないことで、少ない演算量で話速変換を実現できる。 In this way, speech speed conversion can be performed on speech including vowels and consonants so that speech is close to real voice. Also, by not performing speech speed conversion on consonants, speech speed conversion can be realized with a small amount of computation.

なお、音声は、母音と子音の組み合わせに限らず、子音だけの音声もあり、例えば「ん」が挙げられる。また、本実施形態では、日本語で音声を話す場合を示したが、英語やドイツ語等の外国語で音声を話す場合においても、本発明は同様に適用可能である。 Note that the voice is not limited to a combination of vowels and consonants, but may be a voice of only consonants, for example, “n”. In the present embodiment, the case where the voice is spoken in Japanese is shown. However, the present invention can be similarly applied to the case where the voice is spoken in a foreign language such as English or German.

以上により、第１の実施形態における電話機１０Ａ（電話装置）は、電話交換網７を介して電話機１０Ｂ（他電話装置）との間で通話する。マイク１１は、ユーザの送話音声を収音する。雑音レベル推定部２１ｚは、ユーザの周囲の雑音信号のレベルを推定する。音声区間検出部２１は、電話機１０Ｂからの送信信号に含まれる音声の区間を検出する。受話ゲイン制御部１５は、ユーザの送話音声がマイク１１により収音されている間、電話交換網７に基づく送話信号の回線エコー成分を含む受話信号を減衰する。区間検出補正部１７（閾値補正部）は、受話ゲイン制御部１５による受話信号の減衰量と雑音レベル推定部２１ｚにより推定された雑音信号のレベルとを基に、音声区間検出部２１における音声の区間検出に用いる閾値を補正する。音声変換部２２（話速変換部）は、区間検出補正部１７により補正された後の閾値を基に、電話機１０Ｂからの送信信号に含まれる音声を話速変換してスピーカ１２から出力させる。 As described above, the telephone 10A (telephone apparatus) in the first embodiment makes a call with the telephone 10B (other telephone apparatus) via the telephone exchange network 7. The microphone 11 collects the user's transmitted voice. The noise level estimation unit 21z estimates the level of a noise signal around the user. The voice section detector 21 detects a voice section included in the transmission signal from the telephone 10B. The reception gain control unit 15 attenuates the reception signal including the line echo component of the transmission signal based on the telephone switching network 7 while the user's transmission voice is collected by the microphone 11. The section detection correction unit 17 (threshold correction unit) is based on the attenuation amount of the received signal by the reception gain control unit 15 and the level of the noise signal estimated by the noise level estimation unit 21z. The threshold used for section detection is corrected. The voice conversion unit 22 (speech speed conversion unit) converts the voice included in the transmission signal from the telephone set 10B based on the threshold value corrected by the section detection correction unit 17 and outputs the voice from the speaker 12.

これにより、音声区間の誤検出を防止し、ユーザにとって聞き取り易い話速変換を行うことができる。また、話速変換の結果、受話側の音声は、話者が早口で話しても、ゆっくりと聞き易く話した場合と似たような音声となる。従って、高齢者や聴覚身障者等が受話側の音声を聴き取り易くなる。 As a result, it is possible to prevent erroneous detection of the voice section and perform speech speed conversion that is easy for the user to hear. As a result of the speech speed conversion, the voice on the receiver side is similar to that when the speaker speaks slowly and easily, even when speaking quickly. Therefore, it becomes easy for elderly people, hearing-impaired persons, and the like to hear the voice on the receiving side.

また、受話ゲイン制御部１５は、センタークリッパである。区間検出補正部１７は、受話ゲイン制御部１５による受話信号の減衰分だけ閾値を下げるように、雑音レベル推定部２１ｚに信号を出力する。音声区間検出部２１は、受話信号が補正された閾値を超える期間を、音声の区間として検出する。 The reception gain control unit 15 is a center clipper. The section detection correction unit 17 outputs a signal to the noise level estimation unit 21z so as to lower the threshold by the amount of attenuation of the reception signal by the reception gain control unit 15. The voice section detection unit 21 detects a period in which the received signal exceeds the corrected threshold as a voice section.

これにより、センタークリッパを用いた場合に、減衰した受信信号のレベルに閾値を合わせることができる。従って、音声の区間の検出が正確になる。 Thereby, when the center clipper is used, the threshold value can be matched to the level of the attenuated received signal. Therefore, the detection of the voice section becomes accurate.

また、受話ゲイン制御部１５は、自動利得制御器（ＡＧＣ）である。区間検出補正部１７は、受話ゲイン制御部１５による受話信号の減衰分だけ閾値を下げるように、雑音レベル推定部２１ｚに信号を出力する。音声区間検出部２１は、受話信号が補正された閾値を超える期間を、音声の区間として検出する。 The reception gain control unit 15 is an automatic gain controller (AGC). The section detection correction unit 17 outputs a signal to the noise level estimation unit 21z so as to lower the threshold by the amount of attenuation of the reception signal by the reception gain control unit 15. The voice section detection unit 21 detects a period in which the received signal exceeds the corrected threshold as a voice section.

これにより、自動利得制御器を用いた場合に、減衰した受信信号のレベルに閾値を合わせることができる。従って、音声の区間の検出が正確になる。 Thereby, when the automatic gain controller is used, the threshold value can be matched with the level of the attenuated received signal. Therefore, the detection of the voice section becomes accurate.

また、受話ゲイン制御部１５は、ボイススイッチである。受話ゲイン制御部１５により受話信号が減衰されている間、信号増幅部２１ｘがその減衰分だけ受話信号を増幅し、雑音レベル推定部２１ｚは、増幅後の雑音信号のレベルを推定する。 The reception gain control unit 15 is a voice switch. While the reception signal is attenuated by the reception gain control unit 15, the signal amplification unit 21x amplifies the reception signal by the attenuation, and the noise level estimation unit 21z estimates the level of the amplified noise signal.

これにより、ボイススイッチを用いた場合に、減衰した受信信号のレベルを閾値に合わせることができる。従って、音声の区間の検出が正確になる。 Thereby, when the voice switch is used, the level of the attenuated received signal can be adjusted to the threshold value. Therefore, the detection of the voice section becomes accurate.

また、受話ゲイン制御部１５は、ボイススイッチである。雑音レベル推定部２１ｚは、受話ゲイン制御部１５による受話信号の減衰中、雑音信号のレベルの推定を停止する。音声区間検出部２１は、受話信号が既定の閾値を超える期間を、音声の区間として検出する。これにより、受話信号を散発的に減衰させる場合でも、音声の区間の検出が行える。 The reception gain control unit 15 is a voice switch. The noise level estimation unit 21z stops the estimation of the level of the noise signal while the reception gain control unit 15 attenuates the reception signal. The voice section detector 21 detects a period in which the received signal exceeds a predetermined threshold as a voice section. Thereby, even when the received signal is sporadically attenuated, it is possible to detect a voice section.

また、電話機１０Ａは、話速変換の開始を指示するスロートーク用ボタン１６を有する。話速変換部２０は、スロートーク用ボタン１６によって話速変換の開始が指示されると、電話機１０Ｂからの送信信号に含まれる音声を話速変換する。 The telephone 10A also has a slow talk button 16 that instructs the start of speech speed conversion. When the start of speech speed conversion is instructed by the slow talk button 16, the speech speed conversion unit 20 converts the speech contained in the transmission signal from the telephone 10B.

これにより、ユーザは、任意のタイミング、例えば通話相手の話声が聴き取り難いと判断した時等に話速変換を開始することができる。 Thereby, the user can start the speech speed conversion at an arbitrary timing, for example, when it is determined that it is difficult to hear the voice of the other party.

また、スロートーク用ボタン１６は、インジケータ１６ｚを有する。スロートーク用ボタン１６によって話速変換の開始が指示されると、インジケータ１６ｚが点灯する。これにより、ユーザは、話速変換が行われていることを容易に知ることができる。 The slow talk button 16 has an indicator 16z. When the start of speech speed conversion is instructed by the slow talk button 16, the indicator 16z lights up. Thereby, the user can easily know that the speech speed conversion is being performed.

（第２の実施形態に至る経緯・課題）
話速変換は、入力した音声信号（入力信号）を一旦、信号メモリに蓄積し、過去の信号を入力信号よりもゆっくりとした速度で読み出すことで行われる。 (Background and issues leading to the second embodiment)
Speech speed conversion is performed by temporarily storing an input voice signal (input signal) in a signal memory and reading a past signal at a speed slower than that of the input signal.

留守録機能付きの電話機に話速変換装置を搭載した場合、留守録として蓄積された留守番電話メッセージをゆっくりとした速度で再生する場合、留守番電話メッセージが長いと、話速変換部の信号メモリ（音声バッファ）の空き容量が無くなってしまう。つまり、話速変換が長時間継続すると、入出力間の遅延が増大して、信号メモリの空き容量が無くなってしまう。 When a voice rate conversion device is installed in a telephone with an answering machine function, when the answering machine message stored as an answering machine is played back at a slow speed, if the answering machine message is long, the signal memory ( Free space in the audio buffer is lost. That is, if the speech speed conversion continues for a long time, the delay between input and output increases, and the signal memory has no free space.

この結果、信号メモリの空き容量が増えるまで話速変換が行えなくなり、話速変換の効果が低減する。このような場合、例えば、ユーザは、途中から話速変換されずに、通常の再生速度で受話を聞くようになり、聞きづらくなる上、受話の音声速度の変化に違和感を覚えてしまう。特に、留守録を音声で聞く場合、用件によっては留守録の音声が長時間である場合もあり、信号メモリの空き容量が減少することが想定され、上記の状況が発生し易い。一方、信号メモリの容量を増やした場合には、コストが上昇する。 As a result, speech speed conversion cannot be performed until the free space in the signal memory increases, and the effect of speech speed conversion is reduced. In such a case, for example, the user does not convert the speech speed from the middle, but listens to the received speech at the normal playback speed, becomes difficult to hear, and feels uncomfortable with the change in the voice speed of the received speech. In particular, when listening to an answering machine by voice, depending on the requirement, the voice of the answering machine may be long, and it is assumed that the free space of the signal memory is reduced, and the above situation is likely to occur. On the other hand, when the capacity of the signal memory is increased, the cost increases.

そこで、第２の実施形態では、信号メモリが小容量であっても、長時間の留守録の音声を話速変換できる電話装置の例を説明する。 Therefore, in the second embodiment, an example of a telephone device capable of converting the voice speed of a long answering machine even if the signal memory has a small capacity will be described.

（第２の実施形態）
第２の実施形態の電話装置は第１の実施形態とほぼ同一の構成を有する。第１の実施形態と同一の構成要素については同一の符号を用いることで、その説明を省略する。 (Second Embodiment)
The telephone device of the second embodiment has almost the same configuration as that of the first embodiment. About the same component as 1st Embodiment, the description is abbreviate | omitted by using the same code | symbol.

図１０は、第２の実施形態の電話機１０Ｃにおける話速変換に係わる部分の内部構成の一例を詳細に示す図である。電話機１０Ｃは、ユーザの操作によって留守番電話メッセージを再生可能なＴＡＭ（telephone answering machine ：電話応答機）機能付きの電話装置であり、話速変換部２０Ａ、スピーカ１２、信号メモリ１２４、メモリ監視部１２５、デコーダ１２６、留守録音声蓄積部１２７、及びエンコーダ１２８を有する。 FIG. 10 is a diagram showing in detail an example of an internal configuration of a portion related to speech speed conversion in the telephone 10C of the second embodiment. The telephone 10C is a telephone device with a TAM (telephone answering machine) function that can reproduce an answering machine message by a user's operation. The telephone speed conversion unit 20A, the speaker 12, the signal memory 124, and the memory monitoring unit 125 are provided. , A decoder 126, an answering machine voice storage unit 127, and an encoder 128.

信号メモリ１２４は、話速変換部２０Ａに入力された音声信号（入力信号）を一時的に記憶し、また、話速変換部２０Ａから出力される音声信号（出力信号）を一時的に記憶する音声バッファである。信号メモリ１２４は、話速変換される音声データの量で既定される、小容量のメモリである。なお、信号メモリ１２４は、話速変換部２０Ａの内部に設けられてもよいし、本実施形態のように外部に設けられてもよい。 The signal memory 124 temporarily stores an audio signal (input signal) input to the speech speed conversion unit 20A, and temporarily stores an audio signal (output signal) output from the speech speed conversion unit 20A. It is an audio buffer. The signal memory 124 is a small-capacity memory that is defined by the amount of audio data that is converted into speech speed. The signal memory 124 may be provided inside the speech speed conversion unit 20A or may be provided outside as in the present embodiment.

話速変換部２０Ａは、信号メモリ１２４を除き、前記第１の実施形態と同様の構成を有する。スピーカ１２は、再生される留守録音声を出力する。 The speech speed conversion unit 20A has the same configuration as that of the first embodiment except for the signal memory 124. The speaker 12 outputs a recorded voice message that is played back.

留守録音声蓄積部１２７は、留守番電話メッセージを蓄積するものであり、ハードディスクやメモリカード等、比較的大容量の記憶領域を有する。エンコーダ１２８は、留守録音声蓄積部１２７に蓄積される留守番電話メッセージを所定の音声圧縮方式で圧縮する。音声圧縮方式として、ＭＰ３（MPEG Audio Layer3），ＡＡＣ（Advanced Audio Coding），ＷＡＶ（Windows Media Audio）等が挙げられる。デコーダ１２６は、留守録音声蓄積部１２７に蓄積された留守番電話メッセージを読み出し、エンコーダ１２８に対応する音声伸張方式で留守番電話メッセージを伸張する。 An answering machine voice storage unit 127 stores an answering machine message, and has a relatively large storage area such as a hard disk or a memory card. The encoder 128 compresses the answering machine message stored in the answering machine voice storage unit 127 by a predetermined voice compression method. Examples of the audio compression method include MP3 (MPEG Audio Layer 3), AAC (Advanced Audio Coding), WAV (Windows Media Audio), and the like. The decoder 126 reads out the answering machine message stored in the answering machine voice storage unit 127, and decompresses the answering machine message using the voice decompression method corresponding to the encoder 128.

メモリ監視部１２５は、信号メモリ１２４の空き容量を監視し、信号メモリ１２４の空き容量が少なくなると、つまり信号メモリ１２４の消費率（使用率）が上限の閾値Ｓｈ１を超える場合、デコーダ１２６に対し留守番電話メッセージの読み出しを停止させる指示を行い、信号メモリ１２４の空き容量が増えてくると、つまり信号メモリ１２４の消費率が下限の閾値Ｓｈ２を下回ると、デコーダ１２６に対し音声メッセージの読み出しを再開させる指示を行う。ここで、上限の閾値Ｓｈ１は、信号メモリ１２４の空き容量が既定下限値に近い値に設定される。また、下限の閾値Ｓｈ２は、信号メモリ１２４の空き容量が既定上限値に近い値に設定される。 The memory monitoring unit 125 monitors the free space of the signal memory 124. When the free space of the signal memory 124 decreases, that is, when the consumption rate (use rate) of the signal memory 124 exceeds the upper limit threshold Sh1, When the instruction to stop reading the answering machine message is issued and the free space of the signal memory 124 increases, that is, when the consumption rate of the signal memory 124 falls below the lower limit threshold Sh2, the voice message reading to the decoder 126 is resumed. To give instructions. Here, the upper threshold value Sh1 is set to a value where the free capacity of the signal memory 124 is close to the predetermined lower limit value. The lower limit threshold Sh2 is set to a value where the free space of the signal memory 124 is close to the predetermined upper limit.

図１１は、電話機１０Ｃの各部における状態の変化を示すタイミングチャートである。このタイミングチャートでは、デコーダ１２６による読み出し、スロー音声出力、及び信号メモリ１２４の消費率（使用率）の時間変化が示される。 FIG. 11 is a timing chart showing a change in state in each part of the telephone 10C. In this timing chart, changes with time of reading by the decoder 126, slow sound output, and the consumption rate (usage rate) of the signal memory 124 are shown.

タイミングｔ０において、デコーダ１２６が留守番電話メッセージを読み出し、話速変換部２０Ａが話速変換を行ってスロー音声が出力されると、信号メモリ１２４の消費率が上昇する。タイミングｔ１において、信号メモリ１２４の消費率が上限の閾値Ｓｈ１（例えば最大容量の２０％程度に設定された既定上限値に近い値）を超えると、メモリ監視部１２５は、デコーダ１２６に対し、留守録音声蓄積部１２７に蓄積されている留守番電話メッセージの読み出しを停止させる。留守番電話メッセージの読み出しを停止している間も、話速変換部２０Ａは、話速変換を行い、スロー音声を出力する。 At timing t0, when the decoder 126 reads the answering machine message and the speech speed conversion unit 20A performs the speech speed conversion to output the slow voice, the consumption rate of the signal memory 124 increases. When the consumption rate of the signal memory 124 exceeds the upper limit threshold Sh1 (for example, a value close to a predetermined upper limit set to about 20% of the maximum capacity) at the timing t1, the memory monitoring unit 125 makes an absence to the decoder 126. Reading of the answering machine message stored in the recorded voice storage unit 127 is stopped. While reading of the answering machine message is stopped, the speech speed conversion unit 20A performs the speech speed conversion and outputs a slow voice.

その後、信号メモリ１２４の消費率が徐々に低下し、タイミングｔ２において、下限の閾値Ｓｈ２（例えば最大容量の８０％程度に設定された既定下限値に近い値）を下回ると、メモリ監視部１２５は、デコーダ１２６に対し、留守録音声蓄積部１２７に蓄積されている留守番電話メッセージの読み出しを再開させる。 Thereafter, when the consumption rate of the signal memory 124 gradually decreases and falls below a lower limit threshold Sh2 (for example, a value close to a predetermined lower limit set to about 80% of the maximum capacity) at timing t2, the memory monitoring unit 125 Then, the decoder 126 is made to resume reading out the answering machine message stored in the recorded voice recording unit 127.

これにより、スロー音声の出力が途切れることなく、話速変換部２０Ａは、小容量の信号メモリ１２４であっても、留守録音声を長時間に亘って話速変換することができる。また、上限の閾値Ｓｈ１と下限の閾値Ｓｈ２とを設定し、上限の閾値Ｓｈ１に対して下限の閾値Ｓｈ２を広く設定することで、信号メモリ１２４の空き容量が十分に回復してから話速変換を再開させることができ、話速変換の途中で空き容量が著しく少なくなってしまうことを防止できる。従って、話速変換の動作が安定する。また、デコーダ１２６の停止・再開が頻繁に繰り返されることによる処理の負荷を軽減できる。 As a result, the speech speed conversion unit 20A can convert the answering voice over a long period of time even if the signal memory 124 has a small capacity without interrupting the output of the slow voice. Further, by setting an upper limit threshold Sh1 and a lower limit threshold Sh2, and setting the lower limit threshold Sh2 wider than the upper limit threshold Sh1, the speech speed conversion is performed after the free space of the signal memory 124 is sufficiently recovered. Can be resumed, and it can be prevented that the free space is remarkably reduced in the middle of the speech speed conversion. Therefore, the operation of the speech speed conversion is stabilized. In addition, it is possible to reduce the processing load due to frequent stop / restart of the decoder 126.

図１２は、第２の実施形態の通話・再生動作手順の一例を説明するフローチャートである。電話機１０Ｃは、通話中か否かを判別する（Ｓ２１）。通話中である場合、電話機１０Ｃは、受話音声のリアルタイム再生モードを実行し、前記第１の実施形態と同様、話速変換を行う（Ｓ２２）。ステップＳ２２における話速変換においては、前記第１の実施形態で詳述したので、その説明を省略する。 FIG. 12 is a flowchart for explaining an example of a call / playback operation procedure according to the second embodiment. The telephone 10C determines whether a call is in progress (S21). When the telephone call is in progress, the telephone set 10C executes the received voice real-time playback mode, and performs the speech speed conversion in the same manner as in the first embodiment (S22). Since the speech speed conversion in step S22 has been described in detail in the first embodiment, a description thereof will be omitted.

一方、ステップＳ２１で通話中でない場合、電話機１０Ｃは、留守番電話メッセージを再生するか否かを判別する（Ｓ２３）。留守番電話メッセージを再生しない場合、電話機１０ＣはステップＳ２１の処理に戻る。 On the other hand, if the telephone call is not being made in step S21, the telephone 10C determines whether or not to reproduce the answering machine message (S23). If the answering machine message is not reproduced, the telephone set 10C returns to the process of step S21.

また、留守番電話メッセージを再生する場合、電話機１０Ｃは、留守録音声の非リアルタイム再生モード、つまり、留守番電話メッセージのスロー再生モードに移行する（Ｓ２４）。なお、第２の実施形態における非リアルタイム再生（留守番電話メッセージの再生）では、非音声区間もスロー再生を行うが、前記第１の実施形態と同様、非音声区間においてはスロー再生を行わないようにしてもよい。非音声区間をスロー再生を行わないようにすることで、信号メモリの消費率を速やかに下げることができる。また、ユーザにとって聞き取り易い話速変換を行うことが可能である。 When the answering machine message is reproduced, the telephone set 10C shifts to the non-real time reproduction mode of the answering voice, that is, the slow reproduction mode of the answering machine message (S24). In non-real-time playback (playback of an answering machine message) in the second embodiment, slow playback is also performed in the non-speech section. However, as in the first embodiment, slow playback is not performed in the non-speech section. It may be. By not performing slow playback in the non-voice section, the signal memory consumption rate can be quickly reduced. It is also possible to perform speech speed conversion that is easy for the user to hear.

デコーダ１２６は、留守録音声蓄積部１２７から留守番電話メッセージを読み出し、留守番電話メッセージを伸張し、伸張後の留守番電話メッセージを信号メモリ１２４に記憶する（Ｓ２５）。 The decoder 126 reads the answering machine message from the answering voice storage unit 127, decompresses the answering machine message, and stores the decompressed answering machine message in the signal memory 124 (S25).

話速変換部２０Ａは、信号メモリ１２４内の留守番電話メッセージを話速変換して再生（スロー再生）する（Ｓ２６）。メモリ監視部１２５は、信号メモリ１２４の消費率が上限の閾値Ｓｈ１を超えて空き容量が少なくなったか否かを判別する（Ｓ２７）。上限の閾値Ｓｈ１を超えていない場合、デコーダ１２６は、留守番電話メッセージの読み出しを継続する（Ｓ３０）。 The speech speed conversion unit 20A converts the answering machine message in the signal memory 124 to reproduce the speech speed (slow playback) (S26). The memory monitoring unit 125 determines whether the consumption rate of the signal memory 124 has exceeded the upper limit threshold Sh1 and the free space has decreased (S27). If the upper limit threshold Sh1 is not exceeded, the decoder 126 continues to read the answering machine message (S30).

一方、信号メモリ１２４の消費率が上限の閾値Ｓｈ１を超えている場合、メモリ監視部１２５は、デコーダ１２６に対し、留守番電話メッセージの読み出しを停止させる（Ｓ２８）。そして、メモリ監視部１２５は、信号メモリ１２４の消費率が下限の閾値Ｓｈ２を下回って空き容量が増えたか否かを判別する（Ｓ２９）。 On the other hand, when the consumption rate of the signal memory 124 exceeds the upper limit threshold Sh1, the memory monitoring unit 125 causes the decoder 126 to stop reading the answering machine message (S28). Then, the memory monitoring unit 125 determines whether or not the consumption rate of the signal memory 124 falls below the lower limit threshold Sh2 and the free space has increased (S29).

下限の閾値Ｓｈ２を下回っていない場合、メモリ監視部１２５は、ステップＳ２８の処理に戻る。一方、信号メモリ１２４の消費率が下限の閾値Ｓｈ２を下回った場合、ステップＳ３０において、デコーダ１２６は、留守番電話メッセージの読み出しを再開する。 If it is not below the lower limit threshold Sh2, the memory monitoring unit 125 returns to the process of step S28. On the other hand, if the consumption rate of the signal memory 124 falls below the lower limit threshold Sh2, the decoder 126 resumes reading out the answering machine message in step S30.

そして、電話機１０Ｃは、留守録の再生を終了するか否かを判別する（Ｓ３１）。留守録の再生を終了しない場合、ステップＳ２６に戻り、話速変換部２０Ａは、話速変換を行い再生する。一方、ボタン操作、留守録音声蓄積部１２７の全ての留守番電話メッセージが再生済みとなった場合等により、再生を終了する場合、電話機１０Ｃは本動作を終了する。 Then, the telephone set 10C determines whether or not to finish playing the recorded message (S31). When the reproduction of the recorded message is not finished, the process returns to step S26, and the speech speed conversion unit 20A performs the speech speed conversion and reproduces it. On the other hand, when the reproduction is terminated due to a button operation or when all the answering machine messages in the answering voice storage unit 127 have been reproduced, the telephone 10C terminates this operation.

以上により、第２の実施形態における電話機１０Ｃ（電話装置）は、電話交換網７を介して電話機１０Ｂ（他電話装置）との間で通話する。エンコーダ１２８（圧縮部）は、電話機１０Ｂから送信された通話音声を圧縮する。留守録音声蓄積部１２７（留守番電話メッセージ保存部）は、エンコーダ１２８により圧縮された通話音声を留守番電話メッセージとして保存する。デコーダ１２６（伸張部）は、留守録音声蓄積部１２７に保存された通話音声を伸張する。信号メモリ１２４（音声バッファ）は、デコーダ１２６により伸張された通話音声を一時的に保持する。話速変換部２０は、信号メモリ１２４から読み出した通話音声を話速変換してスピーカ１２から出力させる。メモリ監視部１２５（監視部）は、信号メモリ１２４の空き容量が既定下限値に近付いたと判断すると、デコーダ１２６における通話音声の伸張を一時的に停止させる。 As described above, the telephone 10 </ b> C (telephone apparatus) in the second embodiment makes a call with the telephone 10 </ b> B (other telephone apparatus) via the telephone exchange network 7. The encoder 128 (compression unit) compresses the call voice transmitted from the telephone 10B. An answering machine voice storage unit 127 (an answering machine message storage unit) stores the call voice compressed by the encoder 128 as an answering machine message. The decoder 126 (decompression unit) decompresses the call voice stored in the recorded voice recording unit 127. The signal memory 124 (voice buffer) temporarily holds the call voice expanded by the decoder 126. The speech speed conversion unit 20 converts the speech speed read from the signal memory 124 and outputs it from the speaker 12. When the memory monitoring unit 125 (monitoring unit) determines that the free capacity of the signal memory 124 has approached the predetermined lower limit, the memory monitoring unit 125 (monitoring unit) temporarily stops the expansion of the call voice in the decoder 126.

これにより、信号メモリ１２４が小容量であっても、長時間の留守録の音声に対し、話速変換を行うことができる。 Thereby, even if the signal memory 124 has a small capacity, it is possible to perform the speech speed conversion for the voice of the long absence recording.

また、メモリ監視部１２５が信号メモリ１２４の空き容量が既定上限値に近付いたと判断すると、デコーダ１２６における通話音声の伸張を再開させる。このように、音声バッファの空き容量が増えると、通話音声の伸張を再開させることで、話速変換を切れ間無く行うことができる。 When the memory monitoring unit 125 determines that the free space of the signal memory 124 has approached the predetermined upper limit value, the voice expansion in the decoder 126 is resumed. As described above, when the free space of the voice buffer increases, the voice speed conversion can be performed without interruption by resuming the expansion of the call voice.

また、既定下限値は、信号メモリ１２４の最大容量の２０％程度である。最大容量の２０％程度で通話音声の伸張を一時的に停止させることで、音声バッファにマージンを持たせることができ、話速変換の動作が安定する。 The predetermined lower limit value is about 20% of the maximum capacity of the signal memory 124. By temporarily stopping the expansion of the call voice at about 20% of the maximum capacity, a margin can be provided in the voice buffer, and the operation of the speech speed conversion is stabilized.

また、電話機１０Ｃは、話速変換の開始を指示するスロートーク用ボタン１６を有する。話速変換部２０Ａは、スロートーク用ボタン１６によって話速変換の開始が指示されると、電話機１０Ｂからの送信信号に含まれる音声を話速変換する。 The telephone 10C has a slow talk button 16 for instructing start of speech speed conversion. When the start of speech speed conversion is instructed by the slow talk button 16, the speech speed conversion unit 20A converts the speech contained in the transmission signal from the telephone 10B.

なお、第２の実施形態においても、前記第１の実施形態と同様、母音と子音を含む音声に対して、母音と子音を区別して話速変換を行うことが可能である。 In the second embodiment as well, as in the first embodiment, speech speed conversion can be performed on speech including vowels and consonants by distinguishing vowels and consonants.

以上、図面を参照しながら各種の実施形態について説明したが、本発明はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 While various embodiments have been described above with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

例えば、上記実施形態では、電話装置が固定電話機である場合を示したが、無線で通話を行う携帯電話機であってもよい。 For example, in the above-described embodiment, the case where the telephone device is a fixed telephone is shown, but a mobile telephone that performs a telephone call wirelessly may be used.

本発明は、受話信号に含まれる音声区間の誤検出を抑制し、ユーザにとって聞き取り易い話速変換を効率的に行うことができる電話装置として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a telephone device that can suppress erroneous detection of a voice section included in a received signal and can efficiently perform speech speed conversion that is easy for a user to hear.

５Ａ，５Ｂ交換機
１０，１０Ａ，１０Ｂ，１０Ｃ電話機
１１マイク
１２スピーカ
１５受話ゲイン制御部
１６スロートーク用ボタン
１７区間検出補正部
２０，２０Ａ話速変換部
２１音声区間検出部
２１ｚ雑音レベル推定部
２２音声変換部
２１ｘ信号増幅部
２２ｙ遅延付加部
２２ｚ母音・子音判定部
２３非音声変換部
２４，１２４信号メモリ
３０，３０Ａ，３０Ｂ２線４線変換回路
５０公衆アナログ回線
１２５メモリ監視部
１２６デコーダ
１２７留守録音声蓄積部
１２８エンコーダ
ＡｖＬ長時間平均
ＡｖＳ短時間平均
Ｓｈ１，Ｓｈ２閾値 5A, 5B exchange 10, 10A, 10B, 10C Telephone 11 Microphone 12 Speaker 15 Receive gain control unit 16 Slow talk button 17 Section detection correction unit 20, 20A Speech speed conversion unit 21 Voice section detection unit 21z Noise level estimation unit 22 Voice Conversion unit 21x Signal amplification unit 22y Delay addition unit 22z Vowel / consonation determination unit 23 Non-voice conversion unit 24, 124 Signal memory 30, 30A, 30B 2-wire 4-wire conversion circuit 50 Public analog line 125 Memory monitoring unit 126 Decoder 127 Voice mail recording Voice accumulation unit 128 Encoder AvL Long-time average AvS Short-time average Sh1, Sh2 Threshold

Claims

A telephone device for making a call with another telephone device via a telephone exchange network,
A microphone that picks up the user's transmitted voice;
A noise level estimator for estimating a level of a noise signal around the user;
A section detector for detecting a section of voice included in the transmission signal from the other telephone device;
A reception gain control unit for attenuating a reception signal including a line echo component of the transmission voice based on the telephone switching network while the user's transmission voice is picked up by the microphone;
Threshold correction for correcting a threshold value used for detecting the speech section in the section detection section based on the attenuation amount of the received signal by the reception gain control section and the level of the noise signal estimated by the noise level estimation section. And
A speech rate conversion unit that converts the speech rate included in the transmission signal from the other telephone device based on the threshold value corrected by the threshold value correction unit and outputs it from a speaker;
Telephone device.

The telephone device according to claim 1,
The reception gain control unit is a center clipper,
The threshold correction unit lowers the threshold according to the attenuation of the received signal by the center clipper,
The section detection unit detects a period in which the received signal exceeds a threshold corrected by the threshold correction unit as the voice section.
Telephone device.

The telephone device according to claim 1,
The reception gain control unit is an automatic gain controller,
The threshold correction unit lowers the threshold according to the attenuation of the received signal by the automatic gain controller,
The section detection unit detects a period in which the received signal exceeds a threshold corrected by the threshold correction unit as the voice section.
Telephone device.

The telephone device according to claim 1,
The reception gain control unit is a voice switch,
The noise level estimation unit estimates the level of the noise signal by amplifying the reception signal according to the attenuation while the reception signal is attenuated by the voice switch.
Telephone device.

The telephone device according to claim 1,
The reception gain control unit is a voice switch,
The noise level estimation unit stops the estimation of the level of the noise signal while the received signal is attenuated by the voice switch,
The section detection unit detects a period in which the received signal exceeds the predetermined threshold as the voice section.
Telephone device.

The telephone device according to any one of claims 1 to 5,
A button for instructing the start of speech rate conversion,
The speech rate conversion unit, when instructed to start speech rate conversion by the button, converts speech rate included in a transmission signal from the other telephone device,
Telephone device.

The telephone device according to claim 6,
An indicator showing the status of the telephone device;
When the start of speech speed conversion is instructed by the button, the indicator lights.
Telephone device.