JP2008292621A

JP2008292621A - Speech speed conversion device, speaking device and speech speed conversion method

Info

Publication number: JP2008292621A
Application number: JP2007136248A
Authority: JP
Inventors: Tadamichi Tokuda; 肇道徳田
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2007-05-23
Filing date: 2007-05-23
Publication date: 2008-12-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech speed conversion device, a speaking device and a speech speed conversion method, capable of automatically converting speech speed of only speech of a speaker specified by a student, with a conversion rate by which a user feels most desirable. <P>SOLUTION: The speech speed conversion device is provided with: a speech feature extraction section 101 for extracting a speech feature of the individual speaker who attends speaking; a speech feature storage section 102 for storing the speech feature for the speaker specified by the student; a speaker determination section 103 for determining whether or not an uttering person is one of the speaker specified by the student by comparing the speech feature of a current speaker with the speech feature stored by the speech feature storage section 102; and a speech speed conversion section 104 for converting the speech speed of received speech of the speaker, when it is determined that the uttering person is the speaker specified by the student by the speaker determination section 103. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、会議電話など、複数の拠点を接続して音声通話を行う通話装置に利用可能な話速変換装置および話速変換方法に関するものである。また、話速変換装置を有する通話装置にも関するものである。 The present invention relates to a speech speed conversion device and a speech speed conversion method that can be used for a telephone device that connects a plurality of bases and performs a voice call such as a conference phone. The present invention also relates to a call device having a speech speed conversion device.

音声の音程を変えずにそのスピードを遅くまたは速く変換する話速変換の技術は従来から知られており、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）レコーダなどの音声再生装置を始め、電話機やテレビ、ラジオなどに広く利用されている。話速変換処理の内容は、たとえば（特許文献１）に示されている。 The technology of speech speed conversion that changes the speed of the voice without changing the pitch of the voice is conventionally known, and is widely used for voice reproduction apparatuses such as IC (Integrated Circuit) recorders, telephones, televisions, radios, and the like. It's being used. The content of the speech speed conversion process is shown in, for example, (Patent Document 1).

音声をゆっくりとした話速に変換する場合の一般的な処理構成について説明する。図１８は、従来の話速変換装置の構成を模式的に示すブロック図である。図１８において、２０１は音声信号のピッチ（基本周波数）を検出する音声ピッチ検出部、２０２は音声ピッチの周期単位で音声波形を切り出す挿入波形生成部、２０３は切り出されたピッチ波形を入力音声信号に定期的に挿入することにより時間軸方向に波形を伸張させる波形接続部である。基本周波数とその倍音で構成される音声の有声信号の波形は、ピッチ単位で周期的に同じ波形が繰り返される傾向があるので、この構成によって音程を変えずにゆっくりとした話速へ変換することが可能である。 A general processing configuration for converting speech into a slow speech speed will be described. FIG. 18 is a block diagram schematically showing a configuration of a conventional speech speed conversion device. In FIG. 18, 201 is an audio pitch detector for detecting the pitch (fundamental frequency) of an audio signal, 202 is an insertion waveform generator for extracting an audio waveform in units of the audio pitch period, and 203 is an input audio signal for the extracted pitch waveform. It is a waveform connection part that expands the waveform in the direction of the time axis by inserting it periodically. The voiced signal waveform composed of the fundamental frequency and its overtones tends to repeat the same waveform periodically in pitch units, so this configuration should be converted to a slower speech speed without changing the pitch. Is possible.

話速変換処理を通話装置のように実時間の（遅延の許されない）音声送受信を行う装置に適用する場合を考える。たとえば、仮に受話音声の話速を一律に遅く変換して再生し続けると、受話音声が実際に発生された時間と話速変換後の再生時間との遅延が時間の経過と共に増大し、会話に著しい不具合を生じてしまう。そこで、このような装置に適用する場合には、従来から特別な工夫がなされている。その概念を、図６を参照しながら説明する。実時間の話速変換処理では、有音区間（受話信号に音声信号が存在する区間）は話速を遅くする変換を行い、無音区間（受話信号に音声信号が存在しない区間）は圧縮することによって、実際の発話時間からの遅延を少なくしている。 Consider a case where speech rate conversion processing is applied to a device that performs real-time (no delay allowed) voice transmission / reception, such as a communication device. For example, if the speech speed of the received voice is converted to a slower rate and played continuously, the delay between the time when the received voice is actually generated and the playback time after the conversion of the speech speed increases over time. It will cause a significant malfunction. Therefore, when applied to such a device, special devices have been conventionally made. The concept will be described with reference to FIG. In real-time speech speed conversion processing, conversion is performed to slow down the speech interval (interval where the speech signal is present in the received signal), and compression is performed during the silent interval (interval where the speech signal is not present in the received signal). Therefore, the delay from the actual speech time is reduced.

また、信号を接続する際は重複する区間を少しずつ設けてスムージングすることによって、歪みの発生を防止する。かかる話速変換処理を実行する装置を電話会議装置に搭載すると、早口で聞き取りにくい話者の音声をゆっくりとした聞き取りやすい話速に変換することが可能となる。また、国際会議などで外国語の会話を行う場合に、外国語の了解度が向上し、会議の円滑な進行に寄与する効果を期待できる。
特開平５−２５７４９０号公報 Further, when connecting signals, smoothing is performed by providing overlapping sections little by little to prevent the occurrence of distortion. When a device that performs such a speech rate conversion process is installed in a telephone conference device, it is possible to convert the voice of a speaker that is difficult to hear quickly into a speech rate that is slow and easy to hear. In addition, when a foreign language conversation is conducted at an international conference or the like, the intelligibility of the foreign language is improved, and the effect of contributing to the smooth progress of the conference can be expected.
JP-A-5-257490

しかしながら、上記従来の技術によれば、複数の話者が参加する電話会議などの用途で使用される通話装置に話速変換処理を適用すると、全ての話者の話速が遅く変換されてしまう。一般に、話速には個人差があって早口の話者は話速を遅く変換すると会話の了解度が向上するが、元々ゆっくりとした話速で、話速を変換する必要の無い話者の音声も一緒に変換されてしまうので、会話の了解度が損なわれてしまう場合があった。また、複数話者のうちで特定の話者が外国語を話す場合など、その話者の音声を選択的にゆっくりとした話速に変換したいという要望があるが、従来の構成では不可能であった。以上のように、従来の話速変換処理を複数の話者が参加する電話会議などに適用すると、必ずしもよい効果が得られないという問題があった。 However, according to the above-described conventional technology, when the speech speed conversion process is applied to a telephone device used for a teleconference in which a plurality of speakers participate, the speaking speed of all the speakers is converted slowly. . Generally speaking, there is an individual difference in speaking speed, and a fast-speaking speaker improves the intelligibility of the conversation by converting the speaking speed to a slower speed. However, the speaking speed of the speaker who originally has a slow speaking speed and does not need to convert the speaking speed. Since the voice is also converted together, the intelligibility of the conversation may be impaired. In addition, there is a desire to selectively convert a speaker's voice to a slower speaking speed, such as when a specific speaker among multiple speakers speaks a foreign language, but this is not possible with the conventional configuration. there were. As described above, when the conventional speech speed conversion process is applied to a telephone conference in which a plurality of speakers participate, there is a problem that a good effect cannot always be obtained.

このように、複数話者が参加できる通話装置に搭載される話速変換装置および話速変換方法では、話速が早い話者や外国語を話す話者など、特定話者の音声に対して最適な話速の変換率で話速変換を行うことが要求されている。 As described above, in the speech rate conversion device and the speech rate conversion method installed in the communication device in which multiple speakers can participate, the speech of a specific speaker, such as a speaker with a high speech rate or a speaker speaking a foreign language, can be obtained. It is required to perform speech rate conversion at an optimal speech rate conversion rate.

本発明は、上記に鑑みてなされたものであって、受聴者が指定した話者の音声だけを、使用者が最も好ましく感じる変換率で自動的に話速を変換することができる話速変換装置、通話装置および話速変換方法を提供することを目的とする。 The present invention has been made in view of the above, and is capable of automatically converting the speech speed of only the voice of the speaker specified by the listener at a conversion rate that the user feels most desirable. An object is to provide a device, a communication device, and a speech speed conversion method.

上記課題を解決するために本発明は、通話に参加する個々の話者の音声特徴を抽出する音声特徴抽出手段と、受聴者によって指定された話者について抽出された音声特徴を記憶する音声特徴記憶手段と、現在の発話者の音声特徴を前記音声特徴記憶手段に記憶された前記音声特徴と比較することによって前記発話者が、前記受聴者によって指定された話者の一人であるか否かを判定する話者判定手段と、前記話者判定手段で前記受聴者によって指定された話者であると判定された場合に、前記発話者の受話音声の話速を変換する話速変換手段と、を備えたものである。 In order to solve the above-mentioned problems, the present invention provides a voice feature extracting means for extracting voice features of individual speakers participating in a call, and a voice feature for storing voice features extracted for a speaker designated by a listener. Whether the speaker is one of the speakers designated by the listener by comparing the voice features of the current speaker with the voice features stored in the voice feature storage means And a speech speed conversion means for converting the speech speed of the speech received by the speaker when the speaker determination means determines that the speaker is designated by the listener. , With.

また、本発明は、通話に参加する個々の話者の音声特徴を抽出し、受聴者によって指定された話者について抽出された音声特徴を記憶し、現在の発話者の音声特徴を、記憶された音声特徴と比較することによって発話者が、受聴者によって指定された話者の一人であるか否かを判定する。この判定の結果、受聴者によって指定された話者であると判定された場合に、発話者の受話音声の話速を変換する話速変換処理を行うようにしたものである。 The present invention also extracts the voice features of individual speakers participating in the call, stores the voice features extracted for the speaker specified by the listener, and stores the voice features of the current speaker. It is determined whether or not the speaker is one of the speakers specified by the listener by comparing with the voice feature. As a result of this determination, when it is determined that the speaker is designated by the listener, speech speed conversion processing for converting the speech speed of the received voice of the speaker is performed.

本発明によれば、受聴者が指定した話者の音声だけを、使用者が最も好ましく感じる変換率で自動的に話速を変換するようにしたので、複数の話者が参加する通話においても、受聴者による話者の発話内容の了解度を改善することのできる話速変換装置が得られる。また、受聴者が指定していない話者の音声については、話速が変換されることがないので、全体的に受聴者にとって、話者の発話内容の了解度を改善することのできる話速変換装置が得られる。 According to the present invention, only the voice of the speaker designated by the listener is automatically converted at the conversion rate that the user feels most comfortable. Therefore, even in a call in which a plurality of speakers participate, Thus, it is possible to obtain a speech speed conversion device that can improve the intelligibility of the utterance content of the speaker by the listener. In addition, since the speech speed is not converted for the voice of a speaker not specified by the listener, the speech speed can improve the understanding level of the speaker's speech overall for the listener. A conversion device is obtained.

また、本発明によれば、受聴者が指定した話者の音声だけを、使用者が最も好ましく感じる変換率で自動的に話速を変換するようにしたので、複数の話者が参加する通話においても、受聴者による話者の発話内容の了解度を改善することのできる話速変換方法が得られる。また、受聴者が指定していない話者の音声については、話速が変換されることがないので、全体的に受聴者にとって、話者の発話内容の了解度を改善することのできる話速変換方法が得られる。 Also, according to the present invention, only the voice of the speaker specified by the listener is automatically converted at the conversion rate that the user feels most comfortable, so the conversation in which a plurality of speakers participate. The speech speed conversion method that can improve the intelligibility of the content of the speaker's utterance by the listener is also obtained. In addition, since the speech speed is not converted for the voice of a speaker not specified by the listener, the speech speed can improve the understanding level of the speaker's speech overall for the listener. A conversion method is obtained.

第１の発明の話速変換装置は、通話に参加する個々の話者の音声特徴を抽出する音声特徴抽出手段と、受聴者によって指定された話者について抽出された音声特徴を記憶する音声特徴記憶手段と、現在の発話者の音声特徴を音声特徴記憶手段に記憶された音声特徴と比較することによって発話者が、受聴者によって指定された話者の一人であるか否かを判定する話者判定手段と、話者判定手段で受聴者によって指定された話者であると判定された場合に、発話者の受話音声の話速を変換する話速変換手段と、を備えたものであり、通話する話者が複数存在する場合でも、受聴者が指定した話者の音声のみが良好な話速に変換されるという作用を有する。 According to a first aspect of the present invention, there is provided a speech speed conversion device for extracting speech features of individual speakers participating in a call, and a speech feature for storing speech features extracted for a speaker designated by a listener. Talk to determine whether the speaker is one of the speakers specified by the listener by comparing the voice features of the storage means and the current speaker with the voice features stored in the voice feature storage means And a speech speed conversion means for converting the speech speed of the received voice of the speaker when the speaker determination means determines that the speaker is the speaker designated by the listener. Even when there are a plurality of talkers, only the voice of the talker designated by the listener is converted to a good talk speed.

第２の発明の話速変換装置は、第１の発明において、受聴者によって指定された話者のそれぞれに対して設定された最適な話速の変換率を記憶する指定話者変換条件記憶手段と、話者判定手段によって判定された話者に対応する話速の変換率を、指定話者変換条件記憶手段から選択する指定話者変換条件選択手段と、をさらに備え、話速変換手段は、指定話者変換条件選択手段によって選択された話速の変換率を用いて受聴者によって指定された話者の話速を変換するものであり、受聴者が指定した通話する相手の話者ごとに話速の変換率を設定できるという作用を有する。 According to a second aspect of the present invention, there is provided a speech rate conversion device according to the first aspect, wherein the designated speaker conversion condition storage means stores the conversion rate of the optimal speech rate set for each speaker specified by the listener. And a designated speaker conversion condition selecting means for selecting the conversion rate of the speech speed corresponding to the speaker determined by the speaker determining means from the designated speaker conversion condition storage means, and the speech speed converting means is , Converting the speaking speed of the speaker specified by the listener using the conversion rate of the speaking speed selected by the designated speaker conversion condition selecting means, and for each speaker of the other party specified by the listener It has the effect that the conversion rate of speech speed can be set.

第３の発明の話速変換装置は、第２の発明において、指定話者変換条件記憶手段は、受聴者によって指定された話者のそれぞれに対して設定された再生音量の増幅率をさらに記憶し、指定話者変換条件選択手段は、話者判定手段によって判定された話者に対応する話速の変換率と再生音量の増幅率とを、指定話者変換条件記憶手段から選択し、話速変換手段は、指定話者変換条件選択手段によって選択された話速の変換率と再生音量の増幅率とを用いて受聴者によって指定された話者の話速と再生音量とを変換するものであり、受聴者が指定した通話する相手の話者ごとに、話速の変換率とともに再生音量の増幅率を設定できるという作用を有する。 According to a third aspect of the present invention, in the second aspect, the designated speaker conversion condition storage means further stores the amplification factor of the reproduction volume set for each of the speakers designated by the listener. The designated speaker conversion condition selecting means selects the conversion rate of the speech speed and the amplification factor of the reproduction volume corresponding to the speaker determined by the speaker determining means from the specified speaker conversion condition storage means, and The speed conversion means converts the speech speed and reproduction volume of the speaker designated by the listener using the conversion rate of the speech speed selected by the designated speaker conversion condition selection means and the amplification factor of the reproduction volume. Thus, it has an effect that the amplification factor of the reproduction volume can be set together with the conversion rate of the speech speed for each of the other party's speakers designated by the listener.

また、本発明の第４の発明の通話装置は、通信回線を介して他の通話装置と接続し、通信を行う通信手段と、話者の発する声を集音する集音手段と、他の通話装置からの音声を再生出力する音声出力手段と、第１〜第３の発明のいずれか１つに記載の話速変換装置と、を備えたものであり、通話する話者が複数存在する場合でも、受聴者が指定した話者の音声のみが、良好な話速および／または再生音量に変換されるという作用を有する。 The communication device of the fourth invention of the present invention is connected to another communication device via a communication line and communicates, a communication device for collecting communication, a sound collection device for collecting the voice uttered by the speaker, There is provided a voice output means for reproducing and outputting a voice from a call device and a speech speed conversion device according to any one of the first to third inventions, and there are a plurality of talkers. Even in such a case, only the voice of the speaker designated by the listener is converted into a good speech speed and / or reproduction volume.

さらに、本発明の第５の発明の話速変換方法は、通話に参加する個々の話者の音声特徴を抽出する音声抽出処理を行い、受聴者によって指定された話者について抽出された音声特徴を記憶する音声特徴記憶処理を行い、現在の発話者の音声特徴を記憶された音声特徴と比較することによって発話者が、受聴者によって指定された話者の一人であるか否かを判定する話者判定処理を行い、判定の結果、受聴者によって指定された話者であると判定された場合に、発話者の受話音声の話速を変換する話速変換処理を行うものであり、通話する話者が複数存在する場合でも、受聴者が指定した話者の音声のみが良好な話速に変換されるという作用を有する。 Furthermore, the speech speed conversion method according to the fifth aspect of the present invention performs speech extraction processing for extracting the speech features of individual speakers participating in the call, and the speech features extracted for the speakers designated by the listener. To determine whether or not the speaker is one of the speakers specified by the listener by comparing the voice features of the current speaker with the stored voice features. Performs speaker determination processing, and performs speech speed conversion processing to convert the speech speed of the received speech of the speaker when it is determined that the speaker is designated by the listener as a result of the determination. Even when there are a plurality of speakers, only the voice of the speaker designated by the listener is converted to a good speech speed.

第６の発明の話速変換方法は、受聴者によって指定された話者のそれぞれに対して設定された最適な話速の変換率を記憶する指定話者変換条件記憶処理を行い、話速判定処理の前に話者判定処理によって判定された話者に対応する話速の変換率を、指定話者変換条件記憶処理で記憶した内容から選択する指定話者変換条件選択処理を行い、話速変換処理では、指定話者変換条件選択処理で選択された話速の変換率を用いて受聴者によって指定された話者の話速を変換するものであり、受聴者が指定した通話する相手の話者ごとに話速の変換率を設定できるという作用を有する。 According to a sixth aspect of the present invention, there is provided a speech speed conversion method for performing a designated speaker conversion condition storing process for storing a conversion rate of an optimal speech speed set for each of speakers specified by a listener, and determining a speech speed. The designated speaker conversion condition selection process is performed to select the conversion rate of the speech speed corresponding to the speaker determined by the speaker determination process before the process from the contents stored in the designated speaker conversion condition storage process. In the conversion process, the speaker's speech speed specified by the listener is converted using the conversion rate of the speech speed selected in the designated speaker conversion condition selection process. It has the effect that the conversion rate of the speech speed can be set for each speaker.

第７の発明の話速変換方法は、指定話者変換条件記憶処理では、受聴者によって指定された話者のそれぞれに対して設定された再生音量の増幅率をさらに記憶し、指定話者変換条件選択処理では、話者判定処理によって判定された話者に対応する話速の変換率と再生音量の増幅率とを、指定話者変換条件記憶処理で記憶した内容から選択し、話速変換処理では、指定話者変換条件選択処理で選択された話速の変換率と再生音量の増幅率とを用いて受聴者によって指定された話者の話速と再生音量とを変換するものであり、受聴者が指定した通話する相手の話者ごとに、話速の変換率とともに再生音量の増幅率を設定できるという作用を有する。 According to a seventh aspect of the present invention, in the designated speaker conversion condition storing process, the reproduction volume gain set for each of the speakers designated by the listener is further stored, and the designated speaker conversion is stored. In the condition selection process, the conversion rate of the speech speed corresponding to the speaker determined by the speaker determination process and the amplification factor of the reproduction volume are selected from the contents stored in the designated speaker conversion condition storage process, and the speech speed conversion is performed. In the process, the speaker speed specified by the listener and the playback volume are converted using the conversion rate of the speech speed selected in the specified speaker conversion condition selection process and the amplification factor of the playback volume. In addition, it has an effect that the amplification factor of the reproduction volume can be set together with the conversion rate of the speech speed for each speaker of the other party specified by the listener.

以下、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below.

（実施の形態１）
この実施の形態１では、複数の話者が参加する電話会議などにおいて、発話者が誰であるかを判定し、判定結果を話者別に設定された話速変換率と照合して最適な話速変換率を決定し、話速変換部はその結果にしたがって、個々の発話者の話速を受聴者にとって好ましい話速に自動的で変換する話速変換装置、通話装置および話速変換方法について説明する。 (Embodiment 1)
In the first embodiment, in a conference call in which a plurality of speakers participate, it is determined who is the speaker, and the determination result is compared with the speech rate conversion rate set for each speaker. A speech speed conversion device, a speech device, and a speech speed conversion method for determining a speed conversion rate, and a speech speed conversion unit automatically converting a speech speed of an individual speaker into a speech speed preferable for a listener according to the result explain.

図１は、本発明の実施の形態１における話速変換装置を備える通話装置の一例を示す斜視図であり、図２は、図１の通話装置の上面図である。これらの図１〜図２において、６０１は通話装置、６０２ａ〜６０２ｄは使用者の音声を集音するマイクロホン、６０３は受話音声を再生するスピーカ、６０４は相手側回線と接続する通信ケーブル、６０５は話速変換を適用する話者を使用者（特許請求の範囲における受聴者に対応する）が指定するための登録ボタン、６０６は話速変換を開始／終了するためのスロー再生ボタン、６０７は発信／着信の操作を行う操作ボタン、６０８は通話の状態などを表示する表示部である。ここで、スピーカは、特許請求の範囲における音声出力手段に対応し、マイクロホン６０２ａ〜６０２ｄは、同じく集音手段に対応している。 FIG. 1 is a perspective view showing an example of a communication device including the speech speed conversion device according to Embodiment 1 of the present invention, and FIG. 2 is a top view of the communication device of FIG. 1 to 2, reference numeral 601 denotes a communication device, reference numerals 602a to 602d denote microphones that collect the user's voice, reference numeral 603 denotes a speaker that reproduces the received voice, reference numeral 604 denotes a communication cable connected to the partner line, and reference numeral 605 denotes a communication cable. A registration button for the user (corresponding to the listener in the claims) to specify a speaker to which the speech speed conversion is applied, 606 is a slow playback button for starting / ending the speech speed conversion, and 607 is an outgoing call An operation button for performing an incoming call operation, and a display unit 608 displays a call state. Here, the speaker corresponds to the sound output means in the claims, and the microphones 602a to 602d also correspond to the sound collecting means.

図３は、図１〜図２の通話装置で本発明の実施の形態１による話速変換装置が関係する部品の構成を模式的に示すブロック図である。この図３において、７０１は各種演算と周辺装置の制御を行うデジタルシグナルプロセッサ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ、以下、ＤＳＰという）、７０２は通信路インターフェース、７０３は操作インターフェース、７０４はメモリ、７０５はデジタル／アナログ（以下、Ｄ／Ａという）コンバータ、７０６はアナログ／デジタル（以下、Ａ／Ｄという）コンバータ、７０７はスピーカ、７０８はマイクロホン、７０１１は受話信号入力部、７０１２は音声信号出力部、７０１３は通信路、７０１４は受話信号路である。 FIG. 3 is a block diagram schematically showing the configuration of components related to the speech speed conversion apparatus according to the first embodiment of the present invention in the communication apparatus of FIGS. In FIG. 3, reference numeral 701 denotes a digital signal processor (Digital Signal Processor, hereinafter referred to as DSP) for performing various operations and control of peripheral devices, 702 a communication path interface, 703 an operation interface, 704 a memory, and 705 a digital / analog. (Hereinafter referred to as D / A) converter, 706 is an analog / digital (hereinafter referred to as A / D) converter, 707 is a speaker, 708 is a microphone, 7011 is a received signal input unit, 7012 is an audio signal output unit, and 7013 is communication. A path 7014 is a reception signal path.

この図３における通話装置での動作の概要について説明する。通話装置６０１の通信路インターフェース７０２は通信路７０１３の一端に接続されており、その通信路７０１３の他端に接続された図示しない別の通話装置と通話信号（音声信号）の送受信を行う。上記他端に接続された別の通話装置より送信された音声信号は、通信路７０１３から通信路インターフェース７０２および受話信号路７０１４を介して、受話信号入力部７０１１よりＤＳＰ７０１へ入力される。ＤＳＰ７０１では、受信した音声信号について、本実施の形態１による話速変換などの所定の処理が施される。 An outline of the operation of the call device in FIG. 3 will be described. The communication path interface 702 of the call device 601 is connected to one end of the communication path 7013, and transmits and receives a call signal (voice signal) with another call device (not shown) connected to the other end of the communication path 7013. A voice signal transmitted from another call device connected to the other end is input from the reception signal input unit 7011 to the DSP 701 from the communication path 7013 via the communication path interface 702 and the reception signal path 7014. In the DSP 701, predetermined processing such as speech speed conversion according to the first embodiment is performed on the received audio signal.

ここで、たとえば通信路７０１３で送受信される信号がＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）パケットのようなデジタル化された信号である場合、通信路インターフェース７０２内で処理される信号はたとえばＰＣＭ（Ｐｕｌｓｅ−ＣｏｄｅＭｏｄｕｌａｔｉｏｎ）のようなデジタル信号を用いて全て行われ、受話信号路７０１４は、通常、シリアルバスやパラレルバスとなる。また、通信路７０１３がたとえばＰＳＴＮ（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋｓ）のようなアナログ信号路である場合は、通信路インターフェース７０２かＤＳＰ７０１の内部にＡ／Ｄコンバータ７０６が必要となる。通信路インターフェース７０２にＡ／Ｄコンバータ７０６が含まれる場合、それ以降の処理は先ほどと同じくデジタル信号を用いて全て行われ、受話信号路７０１４は通常、シリアルバスやパラレルバスとなる。これに対して、ＤＳＰ７０１の内部にＡ／Ｄコンバータ７０６が内蔵される場合、受話信号路７０１４はアナログ信号線となる。 Here, for example, when the signal transmitted / received in the communication path 7013 is a digitized signal such as an IP (Internet Protocol) packet, the signal processed in the communication path interface 702 is, for example, PCM (Pulse-Code Modulation). The reception signal path 7014 is usually a serial bus or a parallel bus. When the communication path 7013 is an analog signal path such as PSTN (Public Switched Telephone Networks), an A / D converter 706 is required in the communication path interface 702 or the DSP 701. When the A / D converter 706 is included in the communication path interface 702, all subsequent processing is performed using digital signals as before, and the reception signal path 7014 is usually a serial bus or a parallel bus. On the other hand, when the A / D converter 706 is built in the DSP 701, the reception signal path 7014 is an analog signal line.

ＤＳＰ７０１で話速変換された受話音声は、Ｄ／Ａコンバータ７０５によりアナログ信号に変換され、スピーカ７０７から再生される。同様に使用者の送話音声はマイクロホン７０８で集音されてＡ／Ｄコンバータ７０６によってデジタル信号に変換されてＤＳＰ７０１に渡され、送話信号（音声信号）として通信路インターフェース７０２に送出される。 The received voice whose speech speed has been converted by the DSP 701 is converted into an analog signal by the D / A converter 705 and reproduced from the speaker 707. Similarly, the user's transmitted voice is collected by the microphone 708, converted into a digital signal by the A / D converter 706, passed to the DSP 701, and transmitted to the communication path interface 702 as a transmitted signal (voice signal).

図４は、本発明の実施の形態１における話速変換装置の機能構成を模式的に示すブロック図である。この図４において、１０１は通話に参加する個々の話者の音声特徴を抽出する音声特徴抽出部、１０２は抽出された音声特徴を記憶する音声特徴記憶部、１０３は現在の発話者の音声を音声特徴記憶部１０２に記憶された音声特徴と比較することによってその発話者が、受聴者によって指定された話者の一人であるか否かを判定する話者判定部、１０４は話者判定部１０３で受聴者によって指定された話者であると判定された場合に、発話者の受話音声の話速を所定の速さに変換する話速変換部である。 FIG. 4 is a block diagram schematically showing a functional configuration of the speech rate conversion apparatus according to Embodiment 1 of the present invention. In FIG. 4, 101 is a voice feature extraction unit that extracts the voice features of individual speakers participating in the call, 102 is a voice feature storage unit that stores the extracted voice features, and 103 is the voice of the current speaker. A speaker determination unit that determines whether the speaker is one of the speakers specified by the listener by comparing with the voice feature stored in the voice feature storage unit 102, and 104 is a speaker determination unit When it is determined that the speaker is designated by the listener in 103, the speech speed converting unit converts the speech speed of the received voice of the speaker into a predetermined speed.

本実施の形態１において、音声特徴記憶部１０２はメモリ７０４に対応し、音声特徴抽出部１０１、話者判定部１０３、および話速変換部１０４は、それぞれ図３のメモリ７０４に格納され、ＤＳＰ７０１上で動作するソフトウェアプログラムに対応している。つまり、話者判定部１０３と話速変換部１０４は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声を、メモリ７０４に記憶されているプログラムにしたがって演算して、Ｄ／Ａコンバータ７０５に出力することによって実現される。また、音声特徴抽出部１０１は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声から、メモリ７０４に記憶されているプログラムにしたがって演算して音声特徴量を取得することによって実現され、音声特徴記憶部１０２は、音声特徴抽出部１０１によって取得されたその音声特徴量がメモリ７０４に記憶されることによって実現される。 In the first embodiment, the speech feature storage unit 102 corresponds to the memory 704, and the speech feature extraction unit 101, the speaker determination unit 103, and the speech speed conversion unit 104 are stored in the memory 704 in FIG. It corresponds to the software program that runs above. That is, the speaker determination unit 103 and the speech speed conversion unit 104 calculate the received voice received by the DSP 701 from the communication path interface 702 according to the program stored in the memory 704 and output the result to the D / A converter 705. Is realized. The voice feature extraction unit 101 is realized by calculating a voice feature amount from a received voice received by the DSP 701 from the communication path interface 702 according to a program stored in the memory 704, and a voice feature storage unit. 102 is realized by storing the voice feature amount acquired by the voice feature extraction unit 101 in the memory 704.

このように構成された実施の形態１の通話装置における受話音声の再生処理について説明する。図５は、本発明の実施の形態１における話速変換方法の手順の一例を示すフローチャートである。ここでは、前提として、通話装置は複数の話者が存在する他地点と接続して通話中（たとえば、電話会議中）であるものとする。 A description will be given of a reception voice reproduction process in the communication device of the first embodiment configured as described above. FIG. 5 is a flowchart showing an example of the procedure of the speech speed conversion method according to Embodiment 1 of the present invention. Here, as a premise, it is assumed that the call device is connected to another point where a plurality of speakers exist and is talking (for example, during a conference call).

まず、通話装置６０１の使用者は、自分が話速変換して聞きたい相手側の話者の音声を指定する必要がある場合（ステップ１でＹｅｓの場合）には、そのような話者の発話中に、登録ボタン６０５を押下して通話装置６０１に音声特徴の登録を指示する。これにより、音声特徴抽出部１０１はその時点での受話音声の特徴量を算出し、音声特徴記憶部１０２に記憶する（ステップ２）。 First, when it is necessary for the user of the communication device 601 to specify the voice of the other party's speaker that he / she wants to hear by converting the speech speed (Yes in Step 1), such a speaker's While speaking, the registration button 605 is pressed to instruct the call device 601 to register voice features. Thereby, the voice feature extraction unit 101 calculates the feature amount of the received voice at that time and stores it in the voice feature storage unit 102 (step 2).

このステップ２で抽出する音声特徴量は、一般に話者の認証に用いられる音声スペクトルやピッチ周波数、音声ホルマントの遷移情報などとする。たとえばスペクトルは音声信号のＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）演算により、ピッチ周波数はケプストラム法や相関関数法により、音声ホルマントの時間遷移はスペクトルの概形を計算することによって抽出される。そして、話者判定処理のマッチング方法に適した形式に変換される。たとえば単語発声のスペクトル距離を計算するＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍｍｉｎｇ）マッチング方式ではスペクトル情報の時系列データとして変換され、ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）などの確率モデルによるマッチング方式では状態遷移の確率情報として変換される。 The speech feature amount extracted in step 2 is assumed to be speech spectrum, pitch frequency, speech formant transition information, etc., which are generally used for speaker authentication. For example, the spectrum is extracted by FFT (Fast Fourier Transform) calculation of the audio signal, the pitch frequency is extracted by the cepstrum method or the correlation function method, and the time transition of the audio formant is extracted by calculating the outline of the spectrum. Then, it is converted into a format suitable for the matching method of the speaker determination process. For example, the DP (Dynamic Programming) matching method for calculating the spectral distance of word utterance is converted as time-series data of spectrum information, and the matching method based on a probability model such as HMM (Hidden Markov Model) is converted as state transition probability information. .

その後、またはステップ１で発声中の話者を話速変換して聞きたい相手として登録しない場合（ステップ１でＮｏの場合）で、使用者は、話速変換したい話者の登録を他に行う場合（ステップ３でＮｏの場合）には、ステップ１へと戻り、上述した処理が、話速変換したい話者をすべて登録するまで、繰り返される。 After that, or in the case where the speaker who is speaking in step 1 is not registered as the other party who wants to hear by converting the speech speed (No in step 1), the user performs another registration of the speaker whose speech speed is to be converted. In the case (No in step 3), the process returns to step 1 and the above-described processing is repeated until all the speakers whose speech speed is to be converted are registered.

使用者による話速変換したい話者の登録が終了し（ステップ３でＹｅｓの場合）、通常の会話中に、使用者によってスロー再生ボタン６０６が押下されたか否かが通話装置６０１で判定される（ステップ４）。スロー再生ボタン６０６が押下された場合（ステップ４でＹｅｓの場合）には、通話装置６０１の音声特徴抽出部１０１は、受話音声の音声特徴量を算出し続ける（ステップ５）。 The registration of the speaker who wants to change the speech speed by the user is completed (Yes in Step 3), and it is determined by the communication device 601 whether or not the slow playback button 606 has been pressed by the user during normal conversation. (Step 4). When the slow playback button 606 is pressed (Yes in Step 4), the voice feature extraction unit 101 of the call device 601 continues to calculate the voice feature amount of the received voice (Step 5).

その後、話者判定部１０３は、ステップ５で算出した受話音声の音声特徴量について、音声特徴記憶部１０２に記憶されている音声特徴量との距離を計算し、その結果からステップ２で登録した話者の内の１人であるか否かの判定を行う（ステップ６）。つまり、音声特徴量との距離が閾値よりも小さければ（ステップ６でＹｅｓの場合）、現在受信中の受話音声を有する話者は、使用者が指定した話者であると判断し、話速変換部１０４にその旨のフラグを渡す。また、話者判定部１０３は、音声特徴量との距離が閾値以上であれば、現在受信中の受話音声を有する話者使用者が指定した話者ではないと判断し、話速変換部１０４にその旨のフラグを渡す。 After that, the speaker determination unit 103 calculates the distance between the voice feature amount of the received voice calculated in step 5 and the voice feature amount stored in the voice feature storage unit 102, and the result is registered in step 2. It is determined whether or not the speaker is one of the speakers (step 6). That is, if the distance from the speech feature amount is smaller than the threshold (Yes in Step 6), the speaker having the received speech currently being received is determined to be the speaker designated by the user, and the speech speed A flag to that effect is passed to the conversion unit 104. If the distance from the voice feature amount is equal to or greater than the threshold, the speaker determination unit 103 determines that the speaker is not designated by the speaker user who has the received voice currently being received, and the speech speed conversion unit 104 Pass a flag to that effect.

話速変換部１０４は、話者判定部１０３から受け取ったフラグに基づいて、受話音声に対する処理を行う。すなわち、登録した話者の音声特徴に近い場合（ステップ６でＹｅｓの場合）には、話速変換部１０４は、受話音声をゆっくりと聞きやすい話速に変換して再生する（ステップ７）。一方、登録した話者の音声特徴に近くない場合（ステップ６でＮｏの場合）またはステップ４でスロー再生ボタン６０６が押下されていない場合（ステップ４でＮｏの場合）には、話速変換部１０４は、受話音声に対して話速の変換を行わず、所定の処理を行う。そして、ステップ１〜ステップ７の処理が、通話が終了するまで繰り返し実行される。 The speech speed conversion unit 104 performs processing on the received voice based on the flag received from the speaker determination unit 103. That is, when the voice characteristics of the registered speaker are close (Yes in Step 6), the speech speed conversion unit 104 converts the received voice into a speech speed that is easy to hear and reproduces it (Step 7). On the other hand, when it is not close to the voice feature of the registered speaker (No in Step 6) or when the slow playback button 606 is not pressed in Step 4 (No in Step 4), the speech speed conversion unit. 104 performs a predetermined process without converting the speech speed of the received voice. And the process of step 1-step 7 is repeatedly performed until a telephone call is complete | finished.

なお、以上の図５に示される処理において、話速変換したい話者を登録し、話速変換を行うか否かを設定するステップ１〜ステップ４の処理は使用者による操作処理が必要であり、ステップ５〜ステップ７の通話中の選択的な話速変換処理は通話装置６０１が設定内容に基づいて自動的に実行し続ける。 In the process shown in FIG. 5 above, the process from step 1 to step 4 for registering a speaker to be speech speed converted and setting whether to perform the speech speed conversion requires an operation process by the user. The selective speech speed conversion processing during a call in steps 5 to 7 is continuously performed automatically by the call device 601 based on the set contents.

ここで、本実施の形態１におけるリアルタイム話速変換の概念について説明する。図６は、話速変換処理の動作概念を示す図である。この図６では、スロー再生ボタン６０６が押下されない「通常速度」モード（話速変換なし＝１００％）と、スロー再生ボタン６０６が押下された「ゆっくり」モードと、の２種類のモードが選択可能な場合を示している。 Here, the concept of real-time speech speed conversion in the first embodiment will be described. FIG. 6 is a diagram showing an operation concept of the speech speed conversion process. In FIG. 6, two modes can be selected: a “normal speed” mode in which the slow playback button 606 is not pressed (no speech speed conversion = 100%) and a “slow” mode in which the slow playback button 606 is pressed. Shows the case.

この図６に示されるように、「通常速度」モードに対して、「ゆっくり」モードが通常モード以上の話速変換率となるよう登録されている場合、有音区間は話者別に設定された変換率で音声信号を時間方向に伸張する。しかし、受話音声の話速を一律に遅く変換して再生し続けると、受話音声が実際に発生した時間に対する話速変換後の再生時間の遅延が、時間の経過と共に増大し、会話に著しい不具合を生じてしまう。そこで、無音区間を実際の発話タイミングに合わせて圧縮することによって、話速の変換率が異なる場合でも会話に支障ある大きな遅延を生じること無く話速変換を行うことができる。 As shown in FIG. 6, when the “slow” mode is registered so as to have a speech rate conversion rate higher than the normal mode with respect to the “normal speed” mode, the sound period is set for each speaker. The audio signal is expanded in the time direction at the conversion rate. However, if the speech speed of the received voice is uniformly reduced and played continuously, the delay of the playback time after the speech speed conversion with respect to the time when the received voice was actually generated increases with time, which causes a significant problem with the conversation. Will occur. Therefore, by compressing the silent section in accordance with the actual speech timing, the speech speed can be converted without causing a large delay that hinders the conversation even when the conversion rate of the speech speed is different.

つぎに、話速変換処理の具体的な実現方法について、図７〜図９を用いて説明する。図７は、本発明の実施の形態１における通話装置の使用時の構成の一例を示す図であり、図８は、本発明の実施の形態１における通話装置の使用時の構成の他の例を示す図であり、図９は、図７〜図８での話速変換処理の実際の動作例を示す図である。これらの例では、２台の通話装置が通信回線を介して接続され、音声通話による通話を行う場合を例示している。 Next, a specific method for realizing the speech speed conversion process will be described with reference to FIGS. FIG. 7 is a diagram showing an example of a configuration when using the communication device according to Embodiment 1 of the present invention, and FIG. 8 is another example of a configuration when using the communication device according to Embodiment 1 of the present invention. FIG. 9 is a diagram illustrating an actual operation example of the speech speed conversion processing in FIGS. 7 to 8. In these examples, a case where two call devices are connected via a communication line and a voice call is performed is illustrated.

図７において、６０１ａは話者Ａが使用する通話装置、６０１ｂは話者Ｂが使用する通話装置、１２０１ａ，１２０１ｂはゲートウェイ、１２０２はインターネットである。この図７に示される構成例では、２台の通話装置６０１ａ，６０１ｂはそれぞれ、ゲートウェイ１２０１ａ，１２０１ｂを介してインターネット１２０２に接続されている。また、この図７の場合では、通話装置６０１ａと通話装置６０１ｂとの間で送受信される音声の信号は、デジタル信号がパケット化されたデータである。 In FIG. 7, 601a is a communication device used by speaker A, 601b is a communication device used by speaker B, 1201a and 1201b are gateways, and 1202 is the Internet. In the configuration example shown in FIG. 7, the two communication devices 601a and 601b are connected to the Internet 1202 via gateways 1201a and 1201b, respectively. In the case of FIG. 7, the audio signal transmitted / received between the call device 601a and the call device 601b is data obtained by packetizing a digital signal.

もちろん、実施の形態１における使用時の構成はこの限りではなく、たとえばゲートウェイ１２０１ａ，１２０１ｂには他の端末装置やハブ、ルータなどの通信機器が接続されていてもよい。また、ゲートウェイ１２０１ａと通話装置６０１ａとの間、またはゲートウェイ１２０１ｂと通話装置６０１ｂとの間にも他の端末装置やハブ、ルータなどの通信機器が接続されていてもよい。 Of course, the configuration in use in the first embodiment is not limited to this, and other terminal devices, hubs, routers, and other communication devices may be connected to the gateways 1201a and 1201b, for example. In addition, communication devices such as other terminal devices, hubs, and routers may be connected between the gateway 1201a and the call device 601a or between the gateway 1201b and the call device 601b.

図８において、１３０１ａ，１３０１ｂはそれぞれ通話装置６０１ａ，６０１ｂに接続される接続線、１３０２ａ，１３０２ｂはモデム、１３０３ａ，１３０３ｂは公衆回線網、１３０４ａ，１３０４ｂはそれぞれ通話装置６０１ａ，６０１ｂが所属するインターネットサービスプロバイダ（図中、ＩＳＰと表記）である。 In FIG. 8, 1301a and 1301b are connection lines connected to the communication devices 601a and 601b, respectively. 1302a and 1302b are modems, 1303a and 1303b are public line networks, 1304a and 1304b are Internet services to which the communication devices 601a and 601b belong. A provider (indicated as ISP in the figure).

この図８に示される構成例のように、通話装置６０１ａ，６０１ｂがそれぞれ、モデム１３０２ａ，１３０２ｂ、公衆回線網１３０３ａ，１３０３ｂ、インターネットサービスプロバイダ１３０４ａ，１３０４ｂなどを介してインターネット１２０２に接続されているものであってもよい。この場合、モデム１３０２ａと通話装置６０１ａとの間の接続線１３０１ａ上、およびモデム１３０２ｂと通話装置６０１ｂとの間の接続線１３０１ｂ上では、アナログ音声信号で送受信が行われ、モデム１３０２ａ，１３０２ｂにおいて音声信号のデジタル化および変復調が行われるようにしてもよい。また、接続線１３０１ａ，１３０１ｂがＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）ケーブルであって、既に通話装置６０１ａ，６０１ｂにおいてデジタル化された音声データのパケットが接続線１３０１ａ，１３０１ｂ上を伝送する場合には、モデム１３０２ａ，１３０２ｂにおいては変復調のみが行われるようにしてもよい。 As shown in the configuration example shown in FIG. 8, communication apparatuses 601a and 601b are connected to the Internet 1202 via modems 1302a and 1302b, public line networks 1303a and 1303b, Internet service providers 1304a and 1304b, respectively. It may be. In this case, analog voice signals are transmitted and received on the connection line 1301a between the modem 1302a and the communication device 601a and on the connection line 1301b between the modem 1302b and the communication device 601b. The signal may be digitized and modulated / demodulated. Further, when the connection lines 1301a and 1301b are LAN (Local Area Network) cables and the voice data packets already digitized in the communication apparatuses 601a and 601b are transmitted on the connection lines 1301a and 1301b, the modem 1302a. , 1302b, only modulation / demodulation may be performed.

なお、図７や図８の構成は一例であり、図７に示す構成と図８に示す構成とが混在している状態であってもよいし、図８のモデム１３０２ａ，１３０２ｂとが１つの同じ公衆回線のみを介して接続される構成を有していてもよい。 7 and 8 are examples, and the configuration shown in FIG. 7 and the configuration shown in FIG. 8 may be mixed, or the modems 1302a and 1302b in FIG. You may have the structure connected only through the same public line.

また、本実施の形態１における通話装置６０１ａ，６０１ｂは、その通話装置６０１ａ，６０１ｂに内蔵されたマイクロホン（図２における通話装置６０１のマイクロホン６０２ａ〜６０２ｄに相当）に入力される当該通話装置６０１ａ，６０１ｂの使用者の音声については、その通話装置６０１ａ，６０１ｂに内蔵されたスピーカ（図２における通話装置６０１のスピーカ６０３に相当）には出力しないようにしている。これは、内蔵マイクロホンに入力された当該通話装置６０１ａ，６０１ｂの使用者の音声を当該通話装置６０１ａ，６０１ｂのスピーカから出力するようにした場合、ハウリングを起こしやすいためである。しかしながら、もしハウリングが発生しないような装置を構成することができるのであれば、内蔵マイクロホンに入力された当該通話装置６０１ａ，６０１ｂの使用者の音声を当該通話装置６０１ａ，６０１ｂのスピーカから出力するようにしてもよい。 Also, the communication devices 601a and 601b in the first embodiment are connected to the microphones (corresponding to the microphones 602a to 602d of the communication device 601 in FIG. 2) incorporated in the communication devices 601a and 601b. The voice of the user 601b is not output to the speakers (corresponding to the speaker 603 of the call device 601 in FIG. 2) built in the call devices 601a and 601b. This is because howling is likely to occur when the voice of the user of the communication device 601a, 601b input to the built-in microphone is output from the speaker of the communication device 601a, 601b. However, if a device that does not generate howling can be configured, the voice of the user of the communication device 601a, 601b input to the built-in microphone is output from the speaker of the communication device 601a, 601b. It may be.

以上のような構成における実際の話速変換処理の動作例について図９を参照しながら説明する。ここで、話者Ａは図７の通話装置６０１ａを使用し、話者Ｂは通話装置６０１ｂを使用する。また、図９の動作を開始する前に、話者Ａは上述したように登録ボタン（図２における通話装置６０１の登録ボタン６０５に相当）を押下し、図５のフローチャートにおけるステップ１〜ステップ３にしたがって、通話装置６０１ａに話者Ｂの音声特徴の登録を行い、さらにスロー再生ボタン（図２における通話装置６０１のスロー再生ボタン６０６に相当）を押下し、通話装置６０１ａにおいて話者Ｂの音声特徴量の算出が行われているものとする。 An operation example of actual speech speed conversion processing in the above configuration will be described with reference to FIG. Here, the speaker A uses the communication device 601a of FIG. 7, and the speaker B uses the communication device 601b. Before starting the operation of FIG. 9, the speaker A presses the registration button (corresponding to the registration button 605 of the call device 601 in FIG. 2) as described above, and steps 1 to 3 in the flowchart of FIG. Then, the voice feature of the speaker B is registered in the call device 601a, and the slow playback button (corresponding to the slow play button 606 of the call device 601 in FIG. 2) is further pressed, and the voice of the speaker B is heard in the call device 601a. It is assumed that the feature amount has been calculated.

図９において１４１１と１４１３に示すように、通話装置６０１ａにおいて話者Ａから話者Ｂへ話しかけている音声の速さと、通話装置６０１ｂから話者Ｂに届く話者Ａの音声の速さは同じである。これは、話者Ｂ側の通話装置６０１ｂにおいて話者Ａの音声特徴の登録が行われていないか、または登録されていたとしても話者Ｂ側の通話装置６０１ｂのスロー再生ボタン６０６の押下が行われておらず、通話装置６０１ｂにおいて音声特徴量の算出が行われていないためである。 As indicated by reference numerals 1411 and 1413 in FIG. 9, the speed of the voice speaking from the speaker A to the speaker B in the telephone conversation device 601a is the same as the speed of the voice of the speaker A reaching the speaker B from the telephone conversation device 601b. It is. This is because the voice feature of the speaker A is not registered in the talking device 601b on the speaker B side, or the slow playback button 606 of the talking device 601b on the speaker B side is pressed even if it is registered. This is because the voice feature amount is not calculated in the communication device 601b.

前者の場合、話者Ｂ側の通話装置６０１ｂに搭載された図５のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないので、ステップ１（Ｎｏ）→ステップ３（Ｙｅｓ）→ステップ４へと移行する。そこで、スロー再生ボタン６０６の押下が行われていればステップ４（Ｙｅｓ）→ステップ５→ステップ６の順に移行する。しかし、ステップ６において図４の話者判定部１０３が音声特徴抽出部１０１からの信号により「Ｎｏ」と判定する。すなわち、音声特徴が音声特徴記憶部１０２に登録されていないので、話者判定部１０３は、登録した話者の音声特徴に近いとは判定しない。そのため、判定結果を受けた図４の話速変換部１０４は通話装置６０１ａから送られてくる音声信号の話速変換を行わない。その後、通話装置６０１ｂに搭載された図５のフローチャートを実行するプログラムの処理状態は、ステップ６（Ｎｏ）からステップ１へと移行する。 In the former case, since the processing state of the program shown in the flowchart of FIG. 5 installed in the talking device 601b on the speaker B side has not registered voice characteristics at the time of determination, step 1 (No) → step 3 ( Yes) → Transfer to Step 4. Therefore, if the slow playback button 606 has been pressed, the process proceeds in the order of step 4 (Yes) → step 5 → step 6. However, in step 6, the speaker determination unit 103 in FIG. 4 determines “No” based on the signal from the speech feature extraction unit 101. That is, since the voice feature is not registered in the voice feature storage unit 102, the speaker determination unit 103 does not determine that the voice feature is close to the registered speaker's voice feature. Therefore, the speech speed conversion unit 104 in FIG. 4 that has received the determination result does not perform the speech speed conversion of the voice signal transmitted from the communication device 601a. Thereafter, the processing state of the program that executes the flowchart of FIG. 5 mounted on the communication device 601b shifts from Step 6 (No) to Step 1.

また後者の場合、話者Ｂ側の通話装置６０１ｂに搭載された図５のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録が行われているので、まずはステップ１（Ｙｅｓ）→ステップ２→ステップ３（Ｙｅｓ）→ステップ４へと移行する。しかしながら、スロー再生ボタン６０６の押下が行われていないので、ステップ４で「Ｎｏ」と判定されてステップ１へと移行する。以上に示す状態遷移を、通話装置６０１ｂに搭載された図５のフローチャートに示すプログラムは繰り返し行っている。 In the latter case, since the processing state of the program shown in the flowchart of FIG. 5 installed in the communication device 601b on the speaker B side is registered at the time of determination, first of all, step 1 (Yes) → The process proceeds from step 2 to step 3 (Yes) to step 4. However, since the slow playback button 606 has not been pressed, “No” is determined in step 4, and the process proceeds to step 1. The state transition shown above is repeatedly performed by the program shown in the flowchart of FIG. 5 installed in the communication device 601b.

これに対し、通話装置６０１ａから話者Ａに聞こえている音声１４１２ａ，１４１４ａは、通話装置６０１ｂにおいて話者Ｂが話者Ａへ話しかけている音声１４１２ｂ，１４１４ｂよりもゆっくりとした速さとなる。これは、話者Ａ側の通話装置６０１ａにおいて話者Ｂの音声特徴の登録が行われ、話者Ａ側の通話装置６０１ａのスロー再生ボタン６０６の押下が行われ、通話装置６０１ａにおいて音声特徴量の算出が行われているためである。 On the other hand, the voices 1412a and 1414a heard by the speaker A from the call device 601a are slower than the voices 1412b and 1414b that the speaker B is talking to the speaker A in the call device 601b. This is because the voice feature of the speaker B is registered in the talking device 601a on the speaker A side, the slow playback button 606 of the talking device 601a on the speaker A side is pressed, and the voice feature amount is obtained in the talking device 601a. This is because the calculation is performed.

この処理状態を図５に示すフローチャートを用いて説明すると以下のようになる。話者Ｂから話者Ａへ話しかけていないとき、すなわち区間１４０１，１４０３で示される状態にあるとき、通話装置６０１ａに搭載された図５のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないのでステップ１（Ｎｏ）→ステップ４へと移行する。また、スロー再生ボタン６０６の押下が行われているのでステップ４（Ｙｅｓ）→ステップ５→ステップ６の順に移行する。ステップ６において図４の話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果「Ｎｏ」と判定するので、その判定結果を受けた話速変換部１０４は通話装置６０１ｂから送られてくる音声信号の話速変換を行わない。その後、通話装置６０１ａに搭載された図５のフローチャートに示すプログラムの処理状態は、ステップ６（Ｎｏ）からステップ１へと移行する。以上に示す状態遷移を、通話装置６０１ａに搭載された図５のフローチャートに示すプログラムは繰り返し行っている。 This processing state will be described with reference to the flowchart shown in FIG. When the speaker B is not speaking to the speaker A, that is, in the state shown in the sections 1401 and 1403, the processing state of the program shown in the flowchart of FIG. Is not registered, the process proceeds from step 1 (No) to step 4. Further, since the slow playback button 606 has been pressed, the process proceeds in the order of Step 4 (Yes) → Step 5 → Step 6. In step 6, the speaker determination unit 103 in FIG. 4 determines “No” as a result of comparing the signal from the speech feature extraction unit 101 and the data in the speech feature storage unit 102, so the speech speed conversion unit that receives the determination result 104 does not convert the speech speed of the audio signal sent from the communication device 601b. Thereafter, the processing state of the program shown in the flowchart of FIG. 5 installed in the communication device 601a shifts from step 6 (No) to step 1. The state transition shown above is repeatedly performed by the program shown in the flowchart of FIG. 5 installed in the communication device 601a.

ついで、話者Ｂから話者Ａへ話しかけるとき（区間１４０２，１４０４で示される状態にあるとき）、すなわち図９における音声１４１２ｂまたは１４１４ｂが通話装置６０１ｂから通話装置６０１ａへ伝達されているとき、通話装置６０１ａに搭載された図５のフローチャートに示すプログラムの処理状態は、先程と同様にステップ１（Ｎｏ）→ステップ３（Ｙｅｓ）→ステップ４（Ｙｅｓ）→ステップ５→ステップ６の順に移行する。そして、ステップ６で図４の話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果、「話者Ｂである」、すなわちステップ６で「Ｙｅｓ」と判定し、ステップ７へと状態が遷移する。その判定結果を受けた図４の話速変換部１０４は通話装置６０１ｂから送られてくる話者Ｂの音声信号１４１２ｂまたは１４１４ｂの話速変換を行い、それぞれ１４１２ａまたは１４１４ａとする。その後、通話装置６０１ａに搭載された図５のフローチャートに示すプログラムの処理状態は、ステップ６からステップ１へと移行する。 Next, when talking from the speaker B to the speaker A (when in the state shown by the sections 1402 and 1404), that is, when the voice 1412b or 1414b in FIG. 9 is transmitted from the communication device 601b to the communication device 601a, The processing state of the program shown in the flowchart of FIG. 5 installed in the apparatus 601a shifts in the order of Step 1 (No) → Step 3 (Yes) → Step 4 (Yes) → Step 5 → Step 6 in the same manner as before. Then, as a result of comparing the signal from the speech feature extraction unit 101 and the data in the speech feature storage unit 102 by the speaker determination unit 103 in FIG. 4 in step 6, “speaker B”, that is, “Yes” in step 6. And the state transitions to step 7. Upon receiving the determination result, the speech speed conversion unit 104 in FIG. 4 converts the speech speed of the voice signal 1412b or 1414b of the speaker B sent from the communication device 601b to be 1412a or 1414a, respectively. Thereafter, the processing state of the program shown in the flowchart of FIG. 5 installed in the communication device 601a shifts from step 6 to step 1.

通話装置６０１ａにおいて、通話装置６０１ｂから送られてくる話者Ｂの音声信号の話速変換速度をどれくらいにするかについては、通話装置６０１ａの使用者がメニュー画面などでの設定において予め決められるようにしてもよいし、通話装置６０１ａの設計段階で装置の製造者が予め所定の話速変換速度に決めてもよい。 In the communication device 601a, the user of the communication device 601a can determine in advance the setting on the menu screen or the like as to how much the speech speed conversion speed of the voice signal of the speaker B sent from the communication device 601b is to be set. Alternatively, the device manufacturer may determine a predetermined speech speed conversion speed in advance at the design stage of the communication device 601a.

なお、本実施の形態１においては、話者Ａのみが話者Ｂの音声特徴の登録を行った後スロー再生ボタン６０６を押下している場合について説明したが、以上のような音声特徴の登録とスロー再生ボタン６０６の押下は話者Ａに限らず話者Ｂも実施してよい。また、上述した例では、通話装置６０１が２台接続された場合を例に挙げたが、通話装置６０１を３台以上用いてもよいし、ある通話装置６０１において複数の話者が話す場合においても、他の通話装置６０１がその複数のうちの任意の話者の音声特徴を登録し、話速変換を行うことも可能である。 In the first embodiment, the case where only the speaker A registers the voice feature of the speaker B and then presses the slow playback button 606 has been described. The slow playback button 606 may be pressed not only by the speaker A but also by the speaker B. Further, in the above-described example, the case where two communication devices 601 are connected is described as an example, but three or more communication devices 601 may be used, or when a plurality of speakers speak in a certain communication device 601. However, it is also possible for another call device 601 to register the voice characteristics of any of the plurality of speakers and to convert the speech speed.

以上のように本実施の形態１では、受話音声の音声信号から音声特徴を抽出する音声特徴抽出部１０１と、音声特徴抽出部１０１で抽出した音声特徴が登録された話者のものか否かを判定する話者判定部１０３と、登録された話者のものである場合にその音声信号の話速を所定の話速に変換する話速変換部１０４と、を設けることにより、複数の話者が参加する通話において、早口の話者や外国語を話す話者など、受聴者が指定する話者の音声のみが適当な速さに変換される。その結果、受聴者によるそれら話者の発話内容の了解度が改善されるという効果を有する。一方で、受聴者が指定していない話者の音声は話速変換が適用されないため、変換が逆効果となって、つまり遅い話者の音声がさらに遅くなるなど、会話の了解度が低下することを回避することができるという効果も有する。 As described above, in the first embodiment, the speech feature extraction unit 101 that extracts speech features from the speech signal of the received speech, and whether or not the speech features extracted by the speech feature extraction unit 101 belong to the registered speaker. Provided with a speaker determination unit 103 for determining the speech rate, and a speech speed conversion unit 104 that converts the speech speed of the voice signal to a predetermined speech speed when it belongs to a registered speaker. In a call in which a participant participates, only the voice of a speaker designated by the listener, such as a fast-talker or a speaker who speaks a foreign language, is converted to an appropriate speed. As a result, there is an effect that the intelligibility of the utterance contents of those speakers by the listener is improved. On the other hand, since the speech speed conversion is not applied to the voice of the speaker that is not specified by the listener, the conversion becomes counterproductive, that is, the voice of the slow speaker is further delayed, resulting in a decrease in the intelligibility of the conversation. This also has the effect of avoiding this.

（実施の形態２）
図１０は、本発明の実施の形態２における話速変換装置が関係する部分の構成を模式的に示すブロック図である。この図１０において、実施の形態１と同様に、１０１は通話に参加する個々の話者の音声特徴を抽出する音声特徴抽出部、１０２は抽出された音声特徴を記憶する音声特徴記憶部、１０３は現在の発話者の音声を音声特徴記憶部１０２に記憶された音声特徴と比較することによってその発話者が、受聴者によって指定された話者の一人であるか否かを判定する話者判定部、１０４は話者判定部１０３の判定結果を受けて受話音声の話速を変換する話速変換部である。 (Embodiment 2)
FIG. 10 is a block diagram schematically showing a configuration of a portion related to the speech speed conversion apparatus according to the second embodiment of the present invention. In FIG. 10, as in the first embodiment, 101 is a voice feature extraction unit that extracts voice features of individual speakers participating in a call, 102 is a voice feature storage unit that stores the extracted voice features, 103 Determines whether the speaker is one of the speakers designated by the listener by comparing the voice of the current speaker with the voice features stored in the voice feature storage unit 102. Reference numeral 104 denotes a speech speed conversion unit that receives the determination result of the speaker determination unit 103 and converts the speech speed of the received voice.

また、１０５は個々の話者に対して設定された最適な話速の変換率を記憶する話者・話速対応記憶部、１０６は話者判定部１０３によって判定された話者に対応する話速の変換率を、話者・話速対応記憶部１０５から選択する話速選択部である。ここで、音声特徴記憶部１０２に記憶された音声特徴と、話者・話速対応記憶部１０５に記憶された話速の変換率とは、たとえば音声特徴の対象となる話者に対して付される話者識別情報などによって関連付けされている。また、話速変換部１０４は、話速選択部１０６によって選択された話速変換率に基づいて対応する話者の話速を変換する。なお、特許請求の範囲における指定話者変換条件記憶手段は話者・話速対応記憶部１０５に相当し、同じく指定話者変換条件選択手段は、話速選択部１０６に相当する。なお、本実施の形態２による話速変換装置を備える通話装置の構成については、実施の形態１の図１〜図３に示したものと同様であるので、その説明を省略する。 Reference numeral 105 denotes a speaker / speech speed correspondence storage unit that stores the conversion rate of the optimum speech speed set for each speaker. Reference numeral 106 denotes a story corresponding to the speaker determined by the speaker determination unit 103. This is a speech speed selection unit that selects a speed conversion rate from the speaker / speech speed correspondence storage unit 105. Here, the speech feature stored in the speech feature storage unit 102 and the conversion rate of the speech speed stored in the speaker / speech speed correspondence storage unit 105 are attached to, for example, a speaker as a target of the speech feature. Are associated by speaker identification information. The speech speed conversion unit 104 converts the speech speed of the corresponding speaker based on the speech speed conversion rate selected by the speech speed selection unit 106. The designated speaker conversion condition storage means in the claims corresponds to the speaker / speech speed correspondence storage unit 105, and similarly, the designated speaker conversion condition selection means corresponds to the speech speed selection unit 106. Note that the configuration of the communication device including the speech speed conversion device according to the second embodiment is the same as that shown in FIGS. 1 to 3 of the first embodiment, and thus the description thereof is omitted.

本実施の形態２において、音声特徴記憶部１０２と話者・話速対応記憶部１０５は、図３におけるメモリ７０４に対応する。また、話者判定部１０３、話速変換部１０４、話速選択部１０６は、それぞれ図３のＤＳＰ７０１上で動作するソフトウェアプログラムとして、メモリ７０４に格納されている。つまり、話者判定部１０３と話速変換部１０４と話速選択部１０６は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声をメモリ７０４に記憶されているプログラムにしたがって演算してＤ／Ａコンバータ７０５に出力することにより実現される。また、音声特徴抽出部１０１は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声から、メモリ７０４に記録されているプログラムにしたがって演算して音声特徴量を取得することによって実現され、音声特徴記憶部１０２は、音声特徴抽出部１０１によって取得されたその音声特徴量がメモリ７０４に記憶されることによって実現される。さらに、話者・話速対応記憶部１０５は、話速変換率を登録した音声特徴に対応付けてメモリ７０４に記憶することで実現される。 In the second embodiment, the voice feature storage unit 102 and the speaker / speech speed correspondence storage unit 105 correspond to the memory 704 in FIG. The speaker determination unit 103, the speech speed conversion unit 104, and the speech speed selection unit 106 are stored in the memory 704 as software programs that operate on the DSP 701 in FIG. That is, the speaker determination unit 103, the speech rate conversion unit 104, and the speech rate selection unit 106 calculate the received speech received by the DSP 701 from the communication path interface 702 according to a program stored in the memory 704 and perform a D / A converter. This is realized by outputting to 705. The voice feature extraction unit 101 is realized by calculating the voice feature amount from the received voice received by the DSP 701 from the communication path interface 702 according to the program recorded in the memory 704, and the voice feature storage unit. 102 is realized by storing the voice feature amount acquired by the voice feature extraction unit 101 in the memory 704. Further, the speaker / speech speed correspondence storage unit 105 is realized by storing the speech speed conversion rate in the memory 704 in association with the registered voice feature.

このように構成された本発明の実施の形態２の話速変換装置を有する通話装置における動作を説明する。図１１は、本発明の実施の形態２における話速変換方法の手順の一例を示すフローチャートである。ここでは、図５と同様に、前提として、通話装置は複数の話者が存在する他地点と接続して通話中であるものとする。 The operation of the communication apparatus having the speech speed conversion apparatus according to Embodiment 2 of the present invention configured as described above will be described. FIG. 11 is a flowchart showing an example of the procedure of the speech speed conversion method according to Embodiment 2 of the present invention. Here, as in FIG. 5, it is assumed that the call device is connected to another point where a plurality of speakers exist and is talking.

まず、通話装置の使用者は、自分が話速変換して聞きたい相手側の話者の音声を指定する必要がある場合（ステップ２１でＹｅｓの場合）には、そのような話者の発話中に、使用者は登録ボタン６０５を押下して通話装置６０１に音声特徴の登録を指示する。 First, when it is necessary for the user of the call device to specify the voice of the other party's speaker that he / she wants to hear by converting his / her speech speed (Yes in step 21), the speech of such speaker Meanwhile, the user presses the registration button 605 to instruct the call device 601 to register voice features.

続いて、使用者は適当な話速の変換率をたとえば操作ボタン６０７などの入力部を介して指定すると（ステップ２２）、話速変換部１０４は指定された変換率で話速を変換する（ステップ２３）。使用者は変換された音声を聞き、話速が適当であるかを判断し（ステップ２４）、適当でない場合（ステップ２４でＮｏの場合）には、再びステップ２２へと戻り、話速が適当となるまで、話速の変換率を変えて上述した処理を繰り返し実行する。 Subsequently, when the user designates an appropriate speech rate conversion rate via an input unit such as the operation button 607 (step 22), the speech rate conversion unit 104 converts the speech rate at the designated conversion rate (step 22). Step 23). The user listens to the converted voice and determines whether the speaking speed is appropriate (step 24). If the speaking speed is not appropriate (No in step 24), the user returns to step 22 and the speaking speed is appropriate. Until the above is reached, the above-described processing is repeatedly executed while changing the conversion rate of the speech speed.

その後、または話速が適当である場合（ステップ２４でＹｅｓの場合）には、音声特徴抽出部１０１は、現在の音声特徴量を算出し、音声特徴記憶部１０２に記憶するとともに、設定された話速変換率をその音声特徴量に対応付けして話者・話速対応記憶部１０５に記憶する（ステップ２５）。この音声特徴量の実現形態は実施の形態１で説明したものと同様である。 After that, or when the speech speed is appropriate (Yes in step 24), the speech feature extraction unit 101 calculates the current speech feature amount, stores it in the speech feature storage unit 102, and is set The speech rate conversion rate is associated with the voice feature amount and stored in the speaker / speech rate correspondence storage unit 105 (step 25). The form of realizing the voice feature amount is the same as that described in the first embodiment.

その後、またはステップ２１で発声中の話者を話速変換して聞きたい相手として登録しない場合（ステップ２１でＮｏの場合）で、使用者は、話速変換したい話者の登録を他に行う場合（ステップ２６でＮｏの場合）には、ステップ２１へと戻り、上述した処理が、話速変換したい話者をすべて登録するまで、繰り返される。 After that, or in the case where the speaker who is speaking at step 21 is not registered as the other party who wants to hear by converting the speech speed (in the case of No at step 21), the user performs another registration of the speaker whose speech speed is to be converted. In the case (No in Step 26), the process returns to Step 21 and the above-described processing is repeated until all the speakers whose speech speed is to be converted are registered.

使用者によって、話速変換したい話者の登録が終了し（ステップ２６でＹｅｓの場合）、通常の会話中に、使用者によってスロー再生ボタン６０６が押下されたか否かが通話装置６０１で判定される（ステップ２７）。スロー再生ボタン６０６が押下された場合（ステップ２７でＹｅｓの場合）には、通話装置６０１の音声特徴抽出部１０１は、受話音声の音声特徴量を算出し続ける（ステップ２８）。 Registration of the speaker whose speech speed is to be converted is completed by the user (Yes in step 26), and it is determined by the communication device 601 whether or not the slow playback button 606 has been pressed by the user during normal conversation. (Step 27). When the slow playback button 606 is pressed (Yes in Step 27), the voice feature extraction unit 101 of the call device 601 continues to calculate the voice feature amount of the received voice (Step 28).

その後、話者判定部１０３は、ステップ２８で算出した受話音声の音声特徴量について、音声特徴記憶部１０２に記憶されている全ての話者の音声特徴量との距離を計算し、その結果からステップ２５で登録した話者の内の１人であるか否かの判定を行う（ステップ２９）。つまり、音声特徴量との距離が閾値よりも小さければ（ステップ２９でＹｅｓの場合）、話者判定部１０３は、話者がステップ２５で登録した話者に一致すると判定し、話速選択部１０６は、一致した話者に関連付けて登録してある話者・話速対応記憶部１０５中の話速変換率をロードする（ステップ３０）。そして、話速変換部１０４は、話速選択部１０６によってロードされた話速変換率で受話音声を変換して再生する（ステップ３１）。 Thereafter, the speaker determination unit 103 calculates the distances between the speech feature amounts of the received speech calculated in step 28 and the speech feature amounts of all the speakers stored in the speech feature storage unit 102, and from the results. It is determined whether or not the speaker is one of the speakers registered in step 25 (step 29). That is, if the distance from the voice feature amount is smaller than the threshold (Yes in step 29), the speaker determining unit 103 determines that the speaker matches the speaker registered in step 25, and the speech speed selecting unit 106 loads the speech speed conversion rate in the speaker / speech speed correspondence storage unit 105 registered in association with the matched speaker (step 30). Then, the speech speed conversion unit 104 converts the received voice at the speech speed conversion rate loaded by the speech speed selection unit 106 and reproduces it (step 31).

一方、音声特徴量との距離が閾値以上である場合（ステップ２９でＮｏの場合）、またはスロー再生ボタン６０６が押下されていない場合（ステップ２７でＮｏの場合）には、話者判定部１０３は、使用者が指定した話者ではないと判断し、話速選択部１０６は、通常の速度で受話音声を再生することを話速変換部１０４に指示する。そして、話速変換部１０４は、受話音声に対して話速の変換を行わず、所定の処理を行う。そして、ステップ２１〜ステップ３１の処理が、通話が終了するまで繰り返し実行される。このような処理を行うことにより、通話する相手側のどの話者が発声しても使用者が話者別に設定した変換率で常に良好な話速変換を適用できる効果作用を有している。 On the other hand, when the distance from the voice feature amount is equal to or greater than the threshold (No in Step 29), or when the slow play button 606 is not pressed (No in Step 27), the speaker determination unit 103. Is determined not to be a speaker designated by the user, and the speech speed selection unit 106 instructs the speech speed conversion unit 104 to reproduce the received voice at a normal speed. Then, the speech speed conversion unit 104 performs a predetermined process without converting the speech speed on the received voice. And the process of step 21-step 31 is repeatedly performed until a telephone call is complete | finished. By performing such processing, it is possible to always apply good speech speed conversion with a conversion rate set by the user for each speaker regardless of which speaker on the other side of the call speaks.

図１２は、本発明の実施の形態２における複数の話者の話速設定によるリアルタイム話速変換の概念図である。この図１２では、ある話者の話速について、話者１は１１０％、話者２は１２０％、話者３は１３０％、・・・、という具合に受聴者（話者１、話者２、話者３、・・・）で異なる話速が登録される場合を示している。この図に示されるように、有音区間は受聴者別に設定された変換率で音声信号を時間方向に伸張し、無音区間を実際の発話タイミングに合わせて圧縮することによって、話速の変換率が異なる場合でも会話に支障のある大きな遅延を生じること無く話速変換を行うことができる。 FIG. 12 is a conceptual diagram of real-time speech speed conversion by speaking speed settings of a plurality of speakers in Embodiment 2 of the present invention. In FIG. 12, with respect to the speaking speed of a certain speaker, the listener (speaker 1, speaker) is 110% for speaker 1, 120% for speaker 2, 130% for speaker 3, and so on. 2, speakers 3,...) Are registered with different speaking speeds. As shown in this figure, the voiced speech rate is expanded in the time direction with the conversion rate set for each listener, and the silent interval is compressed according to the actual speech timing, thereby converting the speech rate conversion rate. Even if they are different, speech speed conversion can be performed without causing a large delay that hinders conversation.

つぎに、本実施の形態２における話速変換処理の具体的な実現方法について、図１３〜図１４を用いて説明する。図１３は、本発明の実施の形態２における通話装置の使用時の構成の一例を示す図であり、図１４は、図１３での話速変換処理の実際の動作例を示す図である。図１３において、６０１ｃ〜６０１ｆは、それぞれ話者Ｃ〜Ｆが使用する通話装置、１２０１ｃ〜１２０１ｆは、それぞれ通話装置６０１ｃ〜６０１ｆとインターネット１２０２とを接続するゲートウェイである。なお、実施の形態１の図７と図８で用いたものと同一の構成要素には同一の符号を付してその説明を省略している。 Next, a specific method for realizing the speech speed conversion process according to the second embodiment will be described with reference to FIGS. FIG. 13 is a diagram showing an example of a configuration when using the communication device according to the second embodiment of the present invention, and FIG. 14 is a diagram showing an actual operation example of the speech speed conversion processing in FIG. In FIG. 13, 601 c to 601 f are call devices used by the speakers C to F, respectively, and 1201 c to 1201 f are gateways that connect the call devices 601 c to 601 f and the Internet 1202, respectively. In addition, the same code | symbol is attached | subjected to the component same as what was used in FIG. 7 and FIG. 8 of Embodiment 1, and the description is abbreviate | omitted.

この図１３の例では、４台の通話装置６０１ｃ〜６０１ｆの間で通話する場合を想定している。また、図１３の構成の場合には、通話装置６０１ｃ〜６０１ｆの間で送受信される音声の信号は、デジタル信号がパケット化されたデータである。 In the example of FIG. 13, it is assumed that a call is made between the four call devices 601c to 601f. In the case of the configuration of FIG. 13, the audio signal transmitted / received between the communication devices 601c to 601f is data in which a digital signal is packetized.

もちろん、実施の形態２における使用時の構成はこの限りではなく、実施の形態１に示したように、任意の方法で通話装置６０１ｃ〜６０１ｆ間を接続することが可能である。その方法は、既に実施の形態１で説明したので、ここでは省略する。また、本実施の形態２における通話装置６０１ｃ〜６０１ｆのマイクロホン７０８とスピーカ７０７においても、ハウリングが発生しないような構造とされるのは、実施の形態１に示したのと同様である。 Of course, the configuration at the time of use in the second embodiment is not limited to this, and as shown in the first embodiment, the communication devices 601c to 601f can be connected by an arbitrary method. Since this method has already been described in Embodiment 1, it is omitted here. In addition, the microphone 708 and the speaker 707 of the communication devices 601c to 601f according to the second embodiment are also structured so as not to generate howling, as in the first embodiment.

以上のような構成における実際の話速変換処理の動作例について図１４を参照しながら説明する。ここで、話者Ｃは図１３の通話装置６０１ｃを使用し、話者Ｄは通話装置６０１ｄを使用し、話者Ｅは通話装置６０１ｅを使用し、話者Ｆは通話装置６０１ｆを使用する。また、図１４の動作を開始する前に、話者Ｆのみが上述したように登録ボタン（図２における通話装置６０１の登録ボタン６０５に相当）を押下し、図１１のフローチャートにおけるステップ２１〜ステップ２６にしたがって、通話装置６０１ｆに対し話者Ｃ〜Ｅの話速変換率と音声特徴の登録を行い、さらにスロー再生ボタン（図２における通話装置６０１のスロー再生ボタン６０６に相当）を押下し、通話装置６０１ｆにおいて音声特徴量の算出と、話者の特定と、話速の選択が行われているものとする。 An example of actual speech speed conversion processing in the above configuration will be described with reference to FIG. Here, the speaker C uses the communication device 601c of FIG. 13, the speaker D uses the communication device 601d, the speaker E uses the communication device 601e, and the speaker F uses the communication device 601f. Further, before starting the operation of FIG. 14, only the speaker F presses the registration button (corresponding to the registration button 605 of the call device 601 in FIG. 2) as described above, and steps 21 to step in the flowchart of FIG. 26, the speech rate conversion rates and voice characteristics of the speakers C to E are registered to the communication device 601f, and a slow playback button (corresponding to the slow playback button 606 of the communication device 601 in FIG. 2) is pressed. It is assumed that the speech device 601f has performed calculation of voice features, identification of a speaker, and selection of speech speed.

図１４では省略しているが、通話装置６０１ｆにおいて話者Ｆから話者Ｃ〜Ｅへ話しかけている音声の速さと、通話装置６０１ｃ〜６０１ｅからそれぞれ話者Ｃ〜Ｅに聞こえている話者Ｆの音声の速さはそれぞれ同じである。これは実施の形態１の図９における１４１１と１４１３と同様に、話者Ｃ〜Ｅ側の通話装置６０１ｃ〜６０１ｅにおいて話者Ｆの話速変換率と音声特徴の登録が行われていないか、または登録されていても話者Ｃ〜Ｅ側の通話装置６０１ｃ〜６０１ｅのスロー再生ボタン６０６の押下が行われておらず、通話装置６０１ｃ〜６０１ｅにおいて音声特徴量の算出と話者の特定と話速の選択が行われていないためである。 Although omitted in FIG. 14, the speed of the voice spoken from the speaker F to the speakers C to E in the call device 601 f and the speaker F heard by the speakers C to E from the call devices 601 c to 601 e, respectively. The voice speeds are the same. As in 1411 and 1413 in FIG. 9 of the first embodiment, whether or not the speech rate conversion rate and the voice feature of the speaker F are registered in the communication devices 601c to 601e on the speaker C to E side. Or, even if registered, the slow playback button 606 of the talking devices 601c to 601e on the speakers C to E side has not been pressed, and the speech features are calculated and the speaker is identified and talked in the talking devices 601c to 601e. This is because the speed is not selected.

前者の場合、通話装置６０１ｃ〜６０１ｅに搭載された図１１のフローチャートに示す各プログラムの処理状態は、判定時点では音声特徴の登録を行っていないのでステップ２１（Ｙｅｓ）→ステップ２６（Ｙｅｓ）→ステップ２７へと移行する。そこで、スロー再生ボタン６０６の押下が行われていれば、ステップ２７（Ｙｅｓ）→ステップ２８→ステップ２９の順に移行する。ステップ２９において、通話装置６０１ｃ〜６０１ｅの各話者判定部１０３が各音声特徴抽出部１０１からの信号により「Ｎｏ」と判定するので、各話速選択部１０６を介して、その判定結果を受けた通話装置６０１ｃ〜６０１ｅの各話速変換部１０４は、通話装置６０１ｆから送られてくる音声信号の話速変換を行わない。その後、通話装置６０１ｃ〜６０１ｅに搭載された図１１のフローチャートを実行する各プログラムの処理状態は、それぞれステップ２９（Ｎｏ）からステップ２１へと移行する。 In the former case, the processing state of each program shown in the flowchart of FIG. 11 installed in the communication devices 601c to 601e has not registered voice characteristics at the time of determination, so step 21 (Yes) → step 26 (Yes) → Control goes to step 27. Therefore, if the slow playback button 606 has been pressed, the process proceeds in the order of step 27 (Yes) → step 28 → step 29. In step 29, each speaker determination unit 103 of the communication devices 601 c to 601 e determines “No” based on the signal from each voice feature extraction unit 101, so that the determination result is received via each speech speed selection unit 106. The speech speed conversion units 104 of the communication devices 601c to 601e do not perform the speech speed conversion of the audio signal transmitted from the communication device 601f. Thereafter, the processing state of each program for executing the flowchart of FIG. 11 installed in the communication devices 601c to 601e shifts from step 29 (No) to step 21 respectively.

また後者の場合、通話装置６０１ｃ〜６０１ｅに搭載された図１１のフローチャートに示す各プログラムの処理状態は、判定時点では音声特徴の登録が行われているので、まずはステップ２１（Ｎｏ）→ステップ２６（Ｙｅｓ）→ステップ２７へと移行する。しかしながら、スロー再生ボタン６０６の押下が行われていないので、ステップ２７において「Ｎｏ」と判定されてステップ２１へと移行する。以上に示す状態遷移を、通話装置６０１ｃ〜６０１ｅに搭載された図１１のフローチャートに示す各プログラムは、それぞれ繰り返し行っている。なお、通話装置６０１ｃ〜６０１ｅにおいて、それぞれに搭載された図１１のフローチャートに示す各プログラムの各ステップを互いに同期させて動作させる必要は無い。 In the latter case, since the voice feature is registered at the time of determination, the processing state of each program shown in the flowchart of FIG. 11 installed in the communication devices 601c to 601e is first step 21 (No) → step 26. (Yes) → Transition to Step 27. However, since the slow playback button 606 has not been pressed, “No” is determined in step 27 and the process proceeds to step 21. Each of the programs shown in the flowchart of FIG. 11 installed in the communication devices 601c to 601e is repeatedly performed. In the communication devices 601c to 601e, it is not necessary to operate the steps of each program shown in the flowchart of FIG.

これに対し、通話装置６０１ｆから話者Ｆに聞こえている話者Ｃ〜Ｅのそれぞれの音声１６２１〜１６２３は、通話装置６０１ｃ〜６０１ｅにおいてそれぞれの話者Ｃ〜Ｅから話者Ｆへ話しかけている各音声１６１１〜１６１３よりもゆっくりとした速さとなる。この各話者Ｃ〜Ｅの音声速度は、話者（受聴者）Ｆの登録段階（ステップ２１〜ステップ２６）においてそれぞれ設定されたものとなる。これは、話者Ｆ側の通話装置６０１ｆにおいて話者Ｃ〜Ｅの音声速度および音声特徴の登録が行われ、話者Ｆ側の通話装置６０１ｆのスロー再生ボタン６０６の押下が行われて、通話装置６０１ｆにおいて音声特徴量の算出と、話者の特定と、話速の選択が行われているためである。 On the other hand, the voices 1621 to 1623 of the speakers C to E that are heard by the speaker F from the communication device 601f are talking to the speaker F from the speakers C to E in the communication devices 601c to 601e. The speed is slower than that of each of the voices 1611 to 1613. The voice speeds of the speakers C to E are set at the registration stage (steps 21 to 26) of the speaker (listener) F, respectively. This is because the voice speed and voice characteristics of the speakers C to E are registered in the call device 601f on the speaker F side, and the slow playback button 606 of the call device 601f on the speaker F side is pressed, This is because in the apparatus 601f, the calculation of the voice feature amount, the identification of the speaker, and the selection of the speech speed are performed.

この処理状態を図１１に示すフローチャートを用いて説明すると以下のようになる。話者Ｃ〜Ｅのいずれもが話者Ｆへ話しかけていないとき、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないので、ステップ２１（Ｙｅｓ）→ステップ２６（Ｙｅｓ）→ステップ２７へと移行する。また、スロー再生ボタン６０６の押下が行われているので、ステップ２７（Ｙｅｓ）→ステップ２９の順に移行する。ステップ２９で図４の話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果「Ｎｏ」と判定するので、その判定結果を受けた話速変換部１０４は通話装置６０１ｃ〜６０１ｅから送られてくる音声信号の話速変換を行わない。 This processing state will be described with reference to the flowchart shown in FIG. When none of the speakers C to E is speaking to the speaker F, the processing state of the program shown in the flowchart of FIG. 11 installed in the communication device 601f is not registered as a voice feature at the time of determination. The process proceeds from step 21 (Yes) → step 26 (Yes) → step 27. Further, since the slow playback button 606 has been pressed, the process proceeds from step 27 (Yes) to step 29. In step 29, the speaker determination unit 103 in FIG. 4 determines “No” as a result of comparing the signal from the speech feature extraction unit 101, the signal from the speech feature extraction unit 101, and the data in the speech feature storage unit 102. The speech speed conversion unit 104 that has received the determination result does not perform the speech speed conversion of the voice signals transmitted from the communication devices 601c to 601e.

その後、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムの処理状態は、ステップ２９（Ｎｏ）からステップ２１へと移行する。以上に示す状態遷移を、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムは繰り返し行っている。 Thereafter, the processing state of the program shown in the flowchart of FIG. 11 installed in the communication device 601f shifts from step 29 (No) to step 21. The state transition shown above is repeatedly performed by the program shown in the flowchart of FIG. 11 installed in the communication device 601f.

これに対して、たとえば話者Ｃが話者Ｆへ話しかけているとき（図１４の区間１６０１で示される状態にあるとき）、すなわち図１４における音声１６１１が通話装置６０１ｃから通話装置６０１ｆへ伝達されているとき、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないので、ステップ２１（Ｙｅｓ）→ステップ２６（Ｙｅｓ）→ステップ２７へと移行する。 On the other hand, for example, when speaker C is talking to speaker F (when in a state shown by section 1601 in FIG. 14), that is, voice 1611 in FIG. 14 is transmitted from communication device 601c to communication device 601f. 11, the processing state of the program shown in the flowchart of FIG. 11 installed in the communication device 601 f does not register voice characteristics at the time of determination, so step 21 (Yes) → step 26 (Yes) → step 27. Migrate to

また、スロー再生ボタン６０６の押下が行われているので、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムの処理状態は、ステップ２７において「Ｙｅｓ」と判定し、ステップ２８→ステップ２９の順に移行する。ステップ２９において通話装置６０１ｆの話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果、「話者Ｃである」、すなわちステップ２９で「Ｙｅｓ」と判定し、ステップ３０→ステップ３１の順に状態が遷移する。つまり、ステップ２９で「Ｙｅｓ」の判定結果を受けた通話装置６０１ｆの話速変換部１０４は、ステップ３０において、話者・話速対応記憶部１０５から話速選択部１０６を介して話者Ｃに対応する話速変換率（１１０％）をロードし、ステップ３１において通話装置６０１ｃから送られてくる話者Ｃの音声信号１６１１の話速変換を行って１６２１とする。その後、通話装置６０１ｆに搭載された図１１のフローチャートに示すプログラムの処理状態は、ステップ３１からステップ２１へと移行する。以上に述べた処理は、話者Ｄ，Ｅの音声１６１２，１６１３についても同様に行われ、話者Ｆにはそれぞれ１２０％に変換された再生音声１６２２、１３０％に変換された再生音声１６２３のように聞こえるようになる。 Further, since the slow playback button 606 has been pressed, the processing state of the program shown in the flowchart of FIG. 11 installed in the call device 601f is determined as “Yes” in step 27, and step 28 → step 29 Move in order. As a result of comparing the signal from the voice feature extraction unit 101 and the data in the voice feature storage unit 102 by the speaker determination unit 103 of the telephone conversation device 601f in step 29, “speaker C”, that is, “Yes” in step 29 is obtained. The state transitions in the order of step 30 → step 31. That is, the speech speed conversion unit 104 of the communication device 601f that has received the determination result of “Yes” in step 29, in step 30, the speaker C from the speaker / speech speed correspondence storage unit 105 via the speech speed selection unit 106. The speech rate conversion rate (110%) corresponding to is loaded, and the speech rate of the voice signal 1611 of the speaker C sent from the communication device 601c is converted in step 31 to 1621. Thereafter, the processing state of the program shown in the flowchart of FIG. 11 installed in the communication device 601f shifts from step 31 to step 21. The processing described above is performed in the same manner for the voices 1612 and 1613 of the speakers D and E, and for the speaker F, the playback voice 1622 converted to 120% and the playback voice 1623 converted to 130%, respectively. Sounds like this.

なお、本実施の形態２においては、話者Ｆのみが他の話者Ｃ〜Ｅの話速変換率と音声特徴の登録を行った後に、スロー再生ボタン６０６を押下している場合について説明したが、以上のような話速変換率と音声特徴の登録とスロー再生ボタン６０６の押下は話者Ｆに限らず他の話者Ｃ〜Ｅが実施してもよいし、複数の話者が同時に実施してもよい。また、通話装置６０１を２〜３台用いても、５台以上用いてもよいし、ある通話装置６０１において複数の話者が話す場合においても、他の通話装置６０１がその複数の話者の音声特徴と話速変換率を登録し、それぞれの話者に適した話速変換を行うことも可能である。 In the second embodiment, the case where only the speaker F presses the slow playback button 606 after registering the speech rate conversion rate and voice characteristics of the other speakers C to E has been described. However, not only the speaker F but also other speakers C to E may perform the above-described speech rate conversion rate, voice feature registration, and slow playback button 606 pressing, and a plurality of speakers may be simultaneously active. You may implement. Further, two or three communication devices 601 may be used, or five or more communication devices 601 may be used. When a plurality of speakers speak in one communication device 601, the other communication devices 601 are connected to the plurality of speakers. It is also possible to register speech characteristics and speech rate conversion rate and perform speech rate conversion suitable for each speaker.

以上のように本実施の形態２では、音声特徴を記憶した話者に対応して設定した話速変換率を話者・話速対応記憶部１０５に記憶し、話速選択部１０６が話者判定部１０３によって判定された話者に対応する話速変換率を抽出し、これに基づいて話速変換部１０４が受話音声を変換するようにしたので、複数の話者が参加する通話において、受話者の感覚に応じた速さで他の話者の発話内容を聞くことができ、その了解度が改善される。一方で、受聴者が指定していない話者の音声は話速変換が適用されないため、変換が逆効果となって会話の了解度が低下することを回避することもできる。 As described above, in the second embodiment, the speech rate conversion rate set corresponding to the speaker storing the voice feature is stored in the speaker / speech rate correspondence storage unit 105, and the speech rate selection unit 106 Since the speech rate conversion rate corresponding to the speaker determined by the determination unit 103 is extracted, and the speech rate conversion unit 104 converts the received voice based on this, in a call involving a plurality of speakers, It is possible to listen to the content of another speaker's speech at a speed according to the listener's sense, and the intelligibility is improved. On the other hand, since speech speed conversion is not applied to the voice of a speaker not designated by the listener, it is possible to avoid the conversion from having an adverse effect and lowering the intelligibility of the conversation.

（実施の形態３）
図１５は、本発明の実施の形態３における話速変換装置が関係する部分の構成を模式的に示すブロック図である。この図１５において、実施の形態１と同様に、１０１は通話に参加する個々の話者の音声特徴を抽出する音声特徴抽出部、１０２は抽出された音声特徴を記憶する音声特徴記憶部、１０３は現在の発話者の音声を音声特徴記憶部１０２に記憶された音声特徴と比較することによってその発話者が、受聴者によって指定された話者の一人であるか否かを判定する話者判定部、１０４は話者判定部１０３の判定結果を受けて受話音声の話速を変換する話速変換部である。 (Embodiment 3)
FIG. 15 is a block diagram schematically showing a configuration of a portion related to the speech rate conversion apparatus according to Embodiment 3 of the present invention. In FIG. 15, as in the first embodiment, 101 is a voice feature extraction unit that extracts voice features of individual speakers participating in a call, 102 is a voice feature storage unit that stores the extracted voice features, and 103. Determines whether the speaker is one of the speakers designated by the listener by comparing the voice of the current speaker with the voice features stored in the voice feature storage unit 102. Reference numeral 104 denotes a speech speed conversion unit that receives the determination result of the speaker determination unit 103 and converts the speech speed of the received voice.

また、１０７は個々の話者に対して設定された最適な話速の変換率と再生音量の増幅率とを記憶する話者・話速・音量対応記憶部、１０８は話者判定部１０３によって判定された話者に対応する話速の変換率と再生音量の増幅率を、話者・話速・音量対応記憶部１０７から選択する話速・音量選択部である。ここで、音声特徴記憶部１０２に記憶された音声特徴と、話者・話速・音量対応記憶部１０７に記憶された話速の変換率と再生音量の増幅率とは、たとえば音声特徴の対象となる話者に対して付される話者識別情報などによって関連付けされている。また、話速変換部１０４は、話速・音量選択部１０８によって選択された話速変換率と再生音量の増幅率に基づいて対応する話者の話速と再生音量を変換する。なお、特許請求の範囲における指定話者変換条件記憶手段は話者・話速・音量対応記憶部１０７に相当し、同じく指定話者変換条件選択手段は、話速・音量選択部１０８に相当する。 Reference numeral 107 denotes a speaker / speech speed / volume correspondence storage section for storing an optimum speaking speed conversion rate and reproduction volume amplification ratio set for each speaker, and 108 denotes a speaker determination section 103. This is a speech speed / volume selector that selects the conversion rate of the speech speed and the amplification factor of the playback volume corresponding to the determined speaker from the speaker / speech speed / volume correspondence storage unit 107. Here, the speech feature stored in the speech feature storage unit 102, the conversion rate of the speech speed and the amplification factor of the playback volume stored in the speaker / speech speed / volume correspondence storage unit 107 are, for example, the target of the speech feature Are associated by speaker identification information attached to the speaker. The speech speed conversion unit 104 converts the speech speed and playback volume of the corresponding speaker based on the speech speed conversion rate selected by the speech speed / volume selection unit 108 and the playback volume amplification rate. The designated speaker conversion condition storage means in the claims corresponds to the speaker / speech speed / volume correspondence storage section 107, and the designated speaker conversion condition selection means similarly corresponds to the speech speed / volume selection section 108. .

このような構成によって、話速変換部１０４が、話速・音量選択部１０８から受け取った変換率で話速を変換すると同時に音量も変換することによって、話者の発話距離や声の大きさによらず、個々の話者の音声が受聴者の好みの音量と話速に自動的に変換される。なお、本実施の形態３による話速変換装置を備える通話装置の構成については、実施の形態１の図１〜図３に示したものと同様であるので、その説明を省略する。 With such a configuration, the speech speed conversion unit 104 converts the speech speed at the conversion rate received from the speech speed / volume selection unit 108 and at the same time converts the volume so that the speech distance and the loudness of the speaker can be adjusted. Regardless, the voice of each speaker is automatically converted to the listener's preferred volume and speed. Note that the configuration of the communication device including the speech speed conversion device according to the third embodiment is the same as that shown in FIGS. 1 to 3 of the first embodiment, and thus the description thereof is omitted.

本実施の形態３において、音声特徴記憶部１０２と話者・話速・音量対応記憶部１０７は、図３におけるメモリ７０４に対応する。また、話者判定部１０３、話速変換部１０４、話速・音量選択部１０８は、それぞれ図３のＤＳＰ７０１上で動作するソフトウェアプログラムとして、メモリ７０４に格納されている。つまり、話者判定部１０３と話速変換部１０４と話速・音量選択部１０８は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声をメモリ７０４に記憶されているプログラムにしたがって演算してＤ／Ａコンバータ７０５に出力することにより実現される。また、音声特徴抽出部１０１は、ＤＳＰ７０１が通信路インターフェース７０２より受信した受話音声から、メモリ７０４に記録されているプログラムにしたがって演算して音声特徴量を取得することによって実現され、音声特徴記憶部１０２は、音声特徴抽出部１０１によって取得されたその音声特徴量がメモリ７０４に記憶されることによって実現される。さらに、話者・話速・音量対応記憶部１０７は、話速変換部１０４によって変換される話速変換率と再生音量の増幅率を、登録した音声特徴に対応付けてメモリ７０４に記憶することで実現される。 In the third embodiment, the voice feature storage unit 102 and the speaker / speech speed / volume correspondence storage unit 107 correspond to the memory 704 in FIG. Further, the speaker determination unit 103, the speech speed conversion unit 104, and the speech speed / volume selection unit 108 are stored in the memory 704 as software programs that operate on the DSP 701 in FIG. That is, the speaker determination unit 103, the speech speed conversion unit 104, and the speech speed / volume selection unit 108 calculate the received voice received by the DSP 701 from the communication path interface 702 according to the program stored in the memory 704 and perform D / This is realized by outputting to the A converter 705. The voice feature extraction unit 101 is realized by calculating the voice feature amount from the received voice received by the DSP 701 from the communication path interface 702 according to the program recorded in the memory 704, and the voice feature storage unit. 102 is realized by storing the voice feature amount acquired by the voice feature extraction unit 101 in the memory 704. Further, the speaker / speech speed / volume correspondence storage unit 107 stores the speech rate conversion rate converted by the speech rate conversion unit 104 and the reproduction volume amplification rate in the memory 704 in association with the registered voice feature. It is realized with.

このように構成された本発明の実施の形態３の話速変換装置を有する通話装置における動作を説明する。図１６は、本発明の実施の形態３における話速変換方法の手順の一例を示すフローチャートである。ここでも、実施の形態１の図５や実施の形態２の図１１と同様に、前提として、通話装置は複数の話者が存在する他地点と接続して通話中であるものとする。 The operation of the communication apparatus having the speech rate conversion apparatus according to Embodiment 3 of the present invention configured as described above will be described. FIG. 16 is a flowchart showing an example of the procedure of the speech speed conversion method according to Embodiment 3 of the present invention. Here, similarly to FIG. 5 of the first embodiment and FIG. 11 of the second embodiment, it is assumed that the call device is connected to another point where a plurality of speakers exist and is in a call.

実施の形態２の図１１のステップ２１〜ステップ２４と同様に、通話装置６０１の使用者は、自分が話速変換して聞きたい相手側の話者の音声の話速変換率を指定する（ステップ４１〜ステップ４４）。その後、使用者は適当な音量を指定すると（ステップ４５）、話速変換部１０４は指定された音量に増幅する。使用者は、増幅された音声を聞き、音量が適当であるかを判断し（ステップ４６）、適当でない場合（ステップ４６でＮｏの場合）には、再びステップ４５へと戻り、音量が適当となるまで、受話音量を変えて上述した処理を繰り返し実行する。 Similar to steps 21 to 24 in FIG. 11 of the second embodiment, the user of the communication device 601 designates the speech rate conversion rate of the voice of the other party's speaker that he / she wants to hear by converting the speech rate ( Step 41 to Step 44). Thereafter, when the user designates an appropriate volume (step 45), the speech speed conversion unit 104 amplifies the designated volume. The user listens to the amplified voice and determines whether the sound volume is appropriate (step 46). If the sound volume is not appropriate (No in step 46), the user returns to step 45 again to determine that the sound volume is appropriate. Until it is, the above-described processing is repeatedly executed while changing the reception volume.

音量が適当である場合（ステップ４６でＹｅｓの場合）には、音声特徴抽出部１０１は、現在の音声特徴量を算出し、音声特徴記憶部１０２に記憶するとともに、設定された話速変換率と音量をその音声特徴量に対応付けして話者・話速・音量対応記憶部１０７に記憶する（ステップ４７）。この音声特徴量の実現形態は実施の形態１で説明したものと同様である。 If the sound volume is appropriate (Yes in step 46), the speech feature extraction unit 101 calculates the current speech feature amount, stores it in the speech feature storage unit 102, and sets the set speech rate conversion rate. Are stored in the speaker / speech speed / volume correspondence storage unit 107 in association with the voice feature amount (step 47). The form of realizing the voice feature amount is the same as that described in the first embodiment.

その後、またはステップ４１で発声中の話者を話速変換および／または音量変換して聞きたい相手として登録しない場合（ステップ４１でＮｏの場合）で、使用者は、話速変換および／または音量変換したい話者の登録を他に行う場合（ステップ４８でＮｏの場合）には、ステップ４１へと戻り、上述した処理が、話速変換および／または音量変換したい話者をすべて登録するまで、繰り返される。 After that, or when the speaker who is speaking in step 41 is not registered as the other party who wants to hear by converting the speech speed and / or volume (if No in step 41), the user can change the speech speed and / or volume. If the speaker to be converted is registered elsewhere (No in step 48), the process returns to step 41 until the above-described processing registers all the speakers for which the speech speed conversion and / or volume conversion is to be performed. Repeated.

使用者によって、話速変換および／または音量変換したい話者の登録が終了し（ステップ４８でＹｅｓの場合）、通常の会話中に、使用者によってスロー再生ボタン６０６が押下されたか否かが判定される（ステップ４９）。スロー再生ボタン６０６が押下された場合（ステップ４９でＹｅｓの場合）には、通話装置６０１の音声特徴抽出部１０１は、受話音声の音声特徴量を算出し続ける（ステップ５０）。 Determination of whether or not the slow playback button 606 has been pressed by the user during normal conversation after registration of the speaker who wants to convert the speech speed and / or volume is completed by the user (Yes in step 48). (Step 49). When the slow playback button 606 is pressed (Yes in Step 49), the voice feature extraction unit 101 of the call device 601 continues to calculate the voice feature amount of the received voice (Step 50).

その後、話者判定部１０３は、音声特徴記憶部１０２に記憶されている全ての話者の音声特徴量との距離を計算し、その結果からステップ４７で登録した話者の内の１人であるか否かの判定を行う（ステップ５１）。つまり、音声特徴量との距離が閾値よりも小さければ（ステップ５１でＹｅｓの場合）、話速・音量選択部１０８は、一致した話者に関連付けしてある話者・話速・音量対応記憶部１０７中の話速変換率と再生音量の増幅率をロードする（ステップ５２）。そして、話速変換部１０４は、話速・音量選択部１０８によってロードされた話速変換率と再生音量の増幅率で受話音声を変換して再生する（ステップ５３）。 Thereafter, the speaker determination unit 103 calculates the distances from the voice feature amounts of all the speakers stored in the voice feature storage unit 102, and from the results, one of the speakers registered in step 47 is calculated. It is determined whether or not there is (step 51). That is, if the distance from the voice feature amount is smaller than the threshold value (Yes in step 51), the speech speed / volume selection unit 108 stores the speaker / speech speed / volume correspondence memory associated with the matched speaker. The speech rate conversion rate and the reproduction volume gain in the unit 107 are loaded (step 52). Then, the speech speed conversion unit 104 converts the received voice by the speech speed conversion rate loaded by the speech speed / volume selection unit 108 and the amplification factor of the playback volume and reproduces it (step 53).

一方、音声特徴量との距離が閾値以上であれば（ステップ５１でＮｏの場合）またはスロー再生ボタン６０６が押下されていない場合（ステップ４９でＮｏの場合）には、話者判定部１０３は、使用者が指定した話者ではないと判断し、話速・音量選択部１０８は、通常の速度と音量で受話音声を再生することを話速変換部１０４に指示する。そして、話速変換部１０４は、受話音声に対して話速や音量の変換を行わず、所定の処理を行う。そして、ステップ４１〜ステップ５３の処理が、通話が終了するまで繰り返し実行される。このような処理を行うことにより、通話する相手側のどの話者が発声しても使用者が話者別に設定した変換率と再生音量の変換率で、常に良好な話速変換と再生処理を行うことができる。 On the other hand, if the distance from the voice feature amount is equal to or greater than the threshold (No in Step 51) or if the slow playback button 606 is not pressed (No in Step 49), the speaker determination unit 103 The speech speed / volume selection unit 108 instructs the speech speed conversion unit 104 to reproduce the received voice at a normal speed and volume. Then, the speech speed conversion unit 104 performs a predetermined process without converting the speech speed or the volume of the received voice. And the process of step 41-step 53 is repeatedly performed until a telephone call is complete | finished. By performing such processing, regardless of the other party's speaker who speaks, the conversion rate and playback volume conversion rate set by the user for each speaker can always be used to achieve good speech speed conversion and playback processing. It can be carried out.

つぎに、本実施の形態３における話速変換処理の具体的な実現方法について、図１３と図１７説明する。図１７は、図１３での話速変換処理の実際の動作例を示す図である。ここでは、実施の形態２における図１３と同じ構成を有する場合を想定している。すなわち、４台の通話装置６０１ｃ〜６０１ｆの間で通話する場合である。通話装置６０１ｃ〜６０１ｆはそれぞれ、ゲートウェイ１２０１ｃ〜１２０１ｆを介してインターネット１２０２に接続されている。また、通話装置６０１ｃ〜６０１ｆの間で送受信される音声の信号は、デジタル信号がパケット化されたデータであるとする。なお、実施の形態３における使用時の構成はこの限りではなく、実施の形態１，２で説明したものと同様のバリエーションが存在する。 Next, a specific method of realizing the speech speed conversion process in the third embodiment will be described with reference to FIGS. FIG. 17 is a diagram showing an actual operation example of the speech speed conversion process in FIG. Here, the case where it has the same structure as FIG. 13 in Embodiment 2 is assumed. That is, it is a case where a call is made between the four call devices 601c to 601f. The communication devices 601c to 601f are connected to the Internet 1202 via gateways 1201c to 1201f, respectively. Further, it is assumed that the audio signal transmitted and received between the communication devices 601c to 601f is data obtained by packetizing a digital signal. The configuration in use in the third embodiment is not limited to this, and there are variations similar to those described in the first and second embodiments.

以上のような構成における実際の話速変換処理の動作例について図１７を参照しながら説明する。ここで、実施の形態２の図１４と同様に、話者Ｃは図１３の通話装置６０１ｃを使用し、話者Ｄは通話装置６０１ｄを使用し、話者Ｅは通話装置６０１ｅを使用し、話者Ｆは通話装置６０１ｆを使用する。また、図１７の動作を開始する前に、話者Ｆのみが前述のように登録ボタン（図２における通話装置６０１の登録ボタン６０５に相当）を押下し、図１６のフローチャートにおけるステップ４１〜ステップ４８にしたがって、通話装置６０１ｆに対し話者Ｃ〜Ｅの話速変換率と再生音量の増幅率および音声特徴の登録を行い、さらにスロー再生ボタン（図２における通話装置６０１のスロー再生ボタン６０６に相当）を押下し、通話装置６０１ｆにおいて音声特徴量の算出および話者の特定と話速および受話音量の選択が行われているものとする。 An example of actual speech speed conversion processing in the above configuration will be described with reference to FIG. Here, as in FIG. 14 of the second embodiment, the speaker C uses the communication device 601c of FIG. 13, the speaker D uses the communication device 601d, the speaker E uses the communication device 601e, The speaker F uses the communication device 601f. Also, before starting the operation of FIG. 17, only the speaker F presses the registration button (corresponding to the registration button 605 of the call device 601 in FIG. 2) as described above, and steps 41 to step in the flowchart of FIG. 48, the speech rate conversion rate, the reproduction volume amplification factor, and the voice characteristics of the speakers C to E are registered to the call device 601f, and a slow play button (to the slow play button 606 of the call device 601 in FIG. 2) is registered. It is assumed that the call feature 601f is calculated, the speaker is specified, the speaker is specified, and the speech speed and reception volume are selected.

図１７では省略しているが、通話装置６０１ｆにおいて話者Ｆから話者Ｃ〜Ｅへ話しかけている音声の速さおよび音量と、通話装置６０１ｃ〜６０１ｅからそれぞれ話者Ｃ〜Ｅに聞こえている話者Ｆの音声の速さおよび音量とはそれぞれ同じである。これは実施の形態１の図９における１４１１と１４１３と同様に、話者Ｃ〜Ｅ側の通話装置６０１ｃ〜６０１ｅにおいて話者Ｆの話速変換率、音量および音声特徴の登録が行われていないか、または登録されていても話者Ｃ〜Ｅ側の通話装置６０１ｃ〜６０１ｅのスロー再生ボタン６０６の押下が行われておらず、通話装置６０１ｃ〜６０１ｅにおいて音声特徴量の算出、話者の特定、音量および話速の選択が行われていないためである。 Although omitted in FIG. 17, the speed and volume of the voice talking from the speaker F to the speakers C to E in the call device 601 f and the speakers C to E are heard from the call devices 601 c to 601 e, respectively. The speed and volume of the voice of the speaker F are the same. Similarly to 1411 and 1413 in FIG. 9 of the first embodiment, the speech rate conversion rate, volume, and voice characteristics of speaker F are not registered in the communication devices 601c to 601e on the speakers C to E side. Or, even if registered, the slow playback button 606 of the talking devices 601c to 601e on the speakers C to E side is not pressed, and the speech features are calculated and the speaker is specified in the talking devices 601c to 601e. This is because the volume and speaking speed are not selected.

前者の場合、通話装置６０１ｃ〜６０１ｅに搭載された図１６のフローチャートに示す各プログラムの処理状態は、判定時点では音声特徴の登録を行っていないのでステップ４１（Ｙｅｓ）→ステップ４８（Ｙｅｓ）→ステップ４９へと移行する。そこで、スロー再生ボタン６０６の押下が行われていれば、ステップ４９（Ｙｅｓ）→ステップ５０→ステップ５１の順に移行する。ステップ５１において、通話装置６０１ｃ〜６０１ｅの各話者判定部１０３が各音声特徴抽出部１０１からの信号により「Ｎｏ」と判定するので、各話速・音量選択部１０８を介して、その判定結果を受けた通話装置６０１ｃ〜６０１ｅの各話速変換部１０４は、通話装置６０１ｆから送られてくる音声信号の話速変換を行わない。その後、通話装置６０１ｃ〜６０１ｅに搭載された図１６のフローチャートに示す各プログラムの処理状態は、それぞれステップ５１（Ｎｏ）からステップ４１へと移行する。 In the former case, the processing state of each program shown in the flowchart of FIG. 16 installed in the communication devices 601c to 601e has not registered voice characteristics at the time of determination, so step 41 (Yes) → step 48 (Yes) → Control goes to step 49. Therefore, if the slow playback button 606 has been pressed, the process proceeds in the order of step 49 (Yes) → step 50 → step 51. In step 51, each speaker determination unit 103 of the call devices 601 c to 601 e determines “No” based on the signal from each voice feature extraction unit 101, and therefore the determination result via each speech speed / volume selection unit 108. The speech rate conversion units 104 of the call devices 601c to 601e that have received the call do not perform the speech rate conversion of the voice signal transmitted from the call device 601f. Thereafter, the processing state of each program shown in the flowchart of FIG. 16 installed in the communication devices 601c to 601e shifts from step 51 (No) to step 41, respectively.

また後者の場合、通話装置６０１ｃ〜６０１ｅに搭載された図１６のフローチャートに示す各プログラムの処理状態は、判定時点では音声特徴の登録が行われているので、まずはステップ４１（Ｎｏ）→ステップ４８（Ｙｅｓ）→ステップ４９へと移行する。しかしながら、スロー再生ボタン６０６の押下が行われていないので、ステップ４９において「Ｎｏ」と判定されてステップ４１へと移行する。以上に示す状態遷移を、通話装置６０１ｃ〜６０１ｅに搭載された図１１のフローチャートに示す各プログラムは、それぞれ繰り返し行っている。なお、通話装置６０１ｃ〜６０１ｅにおいて、それぞれに搭載された図１６のフローチャートに示す各プログラムの各ステップを互いに同期させて動作させる必要は無い。 In the latter case, since the voice feature is registered at the time of determination, the processing state of each program shown in the flowchart of FIG. 16 installed in the communication devices 601c to 601e is first step 41 (No) → step 48. (Yes) → Transition to step 49. However, since the slow playback button 606 has not been pressed, “No” is determined in step 49 and the process proceeds to step 41. Each of the programs shown in the flowchart of FIG. 11 installed in the communication devices 601c to 601e is repeatedly performed. In the communication devices 601c to 601e, it is not necessary to operate the steps of the programs shown in the flowchart of FIG.

これに対し、通話装置６０１ｆから話者Ｆに聞こえている話者Ｃ〜Ｅのそれぞれの音声１７２１〜１７２３は、通話装置６０１ｃ〜６０１ｅにおいて話者Ｃ〜Ｅから話者Ｆへ話しかけている各音声１７１１〜１７１３よりもゆっくりとした速さとなり、それぞれの音量も最適なものとなる。この各話者Ｃ〜Ｅの音声速度と音量は、話者（受聴者）Ｆの登録段階（ステップ４１〜ステップ４８）においてそれぞれ設定されたものとなる。これは、話者Ｆ側の通話装置６０１ｆにおいて話者Ｃ〜Ｅの音声速度、音量および音声特徴の登録が行われ、話者Ｆ側の通話装置６０１ｆのスロー再生ボタン６０６の押下が行われ、通話装置６０１ｆにおいて音声特徴量の算出、話者の特定、話速および音量の選択が行われているためである。なお、図１７の略矩形の音声信号の横方向（図中の左右方向）のサイズは話速の速さを示し、長いほど話速は遅いことを示している。また、略矩形の音声信号の縦方向（図中の上下方向）のサイズは音量の大きさを示し、長いほど音量は大きいことを示している。 On the other hand, the voices 1721 to 1723 of the speakers C to E that are heard by the speaker F from the call device 601f are the voices that are being spoken to the speaker F from the speakers C to E in the call devices 601c to 601e. The speed is slower than 1711 to 1713, and the respective volumes are also optimum. The voice speed and volume of each speaker C to E are set in the registration stage (steps 41 to 48) of the speaker (listener) F, respectively. This is because the voice speed, volume and voice characteristics of the speakers C to E are registered in the call device 601f on the speaker F side, and the slow playback button 606 of the call device 601f on the speaker F side is pressed. This is because the voice feature amount is calculated, the speaker is specified, and the speech speed and volume are selected in the call device 601f. Note that the size of the substantially rectangular audio signal in FIG. 17 in the horizontal direction (left-right direction in the figure) indicates the speed of speech, and the longer the length, the slower the speech speed. Further, the size of the substantially rectangular audio signal in the vertical direction (vertical direction in the figure) indicates the volume, and the longer the volume, the higher the volume.

この処理状態を図１６に示すフローチャートを用いて説明すると以下のようになる。話者Ｃ〜Ｅのいずれもが話者Ｆへ話しかけていないとき、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないので、ステップ４１（Ｙｅｓ）→ステップ４８（Ｙｅｓ）→ステップ４９へと移行する。また、スロー再生ボタン６０６の押下が行われているので、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムの処理状態は、ステップ４９で「Ｙｅｓ」と判定され、ステップ５０→ステップ５１の順に移行する。ステップ５１において通話装置６０１ｆの話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果「Ｎｏ」と判定するので、その判定結果を受けた話速変換部１０４は通話装置６０１ｃ〜６０１ｅから送られてくる音声信号の話速変換および音量変換を行わない。 This processing state will be described with reference to the flowchart shown in FIG. When none of the speakers C to E is speaking to the speaker F, the processing state of the program shown in the flowchart of FIG. The process proceeds from step 41 (Yes) to step 48 (Yes) to step 49. Since the slow playback button 606 has been pressed, the processing state of the program shown in the flowchart of FIG. Move in order. In step 51, the speaker determination unit 103 of the call device 601f determines “No” as a result of comparing the signal from the voice feature extraction unit 101 and the data of the voice feature storage unit 102. Unit 104 does not perform speech speed conversion or volume conversion on the audio signals sent from communication devices 601c to 601e.

その後、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムの処理状態は、ステップ５１（Ｎｏ）からステップ４１へと移行する。以上に示す状態遷移を、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムは繰り返し行っている。 Thereafter, the processing state of the program shown in the flowchart of FIG. 16 installed in the communication device 601f shifts from step 51 (No) to step 41. The state transition shown above is repeatedly performed by the program shown in the flowchart of FIG. 16 installed in the communication device 601f.

これに対して、たとえば話者Ｃが話者Ｆへ話しかけているとき（図１７の区間１６０１で示される状態にあるとき）、すなわち図１７における音声１７１１が通話装置６０１ｃから通話装置６０１ｆへ伝達されているとき、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムの処理状態は、判定時点では音声特徴の登録を行っていないのでステップ４１（Ｙｅｓ）→ステップ４８（Ｙｅｓ）→ステップ４９（Ｙｅｓ）へと移行する。 On the other hand, for example, when speaker C is speaking to speaker F (when in a state shown by section 1601 in FIG. 17), that is, voice 1711 in FIG. 17 is transmitted from communication device 601c to communication device 601f. 16, the processing state of the program shown in the flowchart of FIG. 16 installed in the communication device 601 f is not registered at the time of determination, so that the voice feature is not registered, so step 41 (Yes) → step 48 (Yes) → step 49 ( Yes).

また、スロー再生ボタン６０６の押下が行われているので、通話装置６０１ｆに搭載された図１６のフローチャートに示すプログラムの処理状態は、ステップ４９において「Ｙｅｓ」と判定し、ステップ５０→ステップ５１の順に移行する。ステップ５１において通話装置６０１ｆの話者判定部１０３が音声特徴抽出部１０１からの信号と音声特徴記憶部１０２のデータを比較した結果、「話者Ｃである」、すなわちステップ５１で「Ｙｅｓ」と判定し、ステップ５２〜ステップ５３の順に状態が遷移する。つまり、ステップ５１で「Ｙｅｓ」の判定結果を受けた通話装置６０１ｆの話速変換部１０４は、ステップ５２において、話者・話速・音量対応記憶部１０７から話速・音量選択部１０８を介して話者Ｃの話速変換率と再生音量の増幅率をロードし、ステップ５３において通話装置６０１ｃから送られてくる話者Ｃの音声信号１７１１の話速と再生音量変換を行って、１７２１とする。以上に述べた処理は、話者Ｄ，Ｅの音声１７１２，１７１３についても同様に行われ、話者Ｆにはそれぞれ再生音声１７２２，１７２３のように聞こえるようになる。 Since the slow playback button 606 has been pressed, the processing state of the program shown in the flowchart of FIG. 16 installed in the call device 601f is determined to be “Yes” in step 49, and from step 50 to step 51. Move in order. As a result of comparison of the signal from the voice feature extraction unit 101 and the data of the voice feature storage unit 102 by the speaker determination unit 103 of the call device 601f in step 51, “speaker C”, that is, “Yes” in step 51 is obtained. The state transitions in the order of step 52 to step 53. In other words, the speech speed conversion unit 104 of the call device 601f that has received the determination result of “Yes” in step 51 passes the talk speed / volume selection unit 108 from the speaker / speech speed / volume correspondence storage unit 107 in step 52. Then, the speech rate conversion rate and the reproduction volume amplification factor of the speaker C are loaded, and the speech speed and playback volume conversion of the speech signal 1711 of the speaker C sent from the call device 601c are performed in step 53, and 1721 To do. The processing described above is performed in the same manner for the voices 1712 and 1713 of the speakers D and E, so that the speaker F can be heard as reproduced voices 1722 and 1723, respectively.

なお、本実施の形態３においては、話者Ｆのみが他の話者Ｃ〜Ｅの話速変換率と音声特徴の登録を行った後に、スロー再生ボタン６０６を押下している場合について説明したが、以上のような話速変換率、再生音量の増幅率、音声特徴の登録およびスロー再生ボタン６０６の押下は話者Ｆに限らず他の話者Ｃ〜Ｅが実施してもよいし、複数の話者が同時に実施しても構わない。また、通話装置６０１を２〜３台用いても、５台以上用いてもよいし、ある通話装置６０１において複数の話者が話す場合においても、他の通話装置６０１がその複数の話者の音声特徴と話速変換率と再生音量の増幅率とを登録し、それぞれの話者に適した話速変換と再生音量の変換を行うことも可能である。 In the third embodiment, the case where only the speaker F presses the slow playback button 606 after registering the speech rate conversion rate and voice characteristics of the other speakers C to E has been described. However, the speech rate conversion rate, the reproduction volume amplification factor, the voice feature registration, and the slow playback button 606 are not limited to the speaker F, but may be performed by other speakers C to E. A plurality of speakers may be performed simultaneously. Further, two or three communication devices 601 may be used, or five or more communication devices 601 may be used. When a plurality of speakers speak in one communication device 601, the other communication devices 601 are connected to the plurality of speakers. It is also possible to register the voice feature, the speech speed conversion rate, and the reproduction volume amplification rate, and perform the speech speed conversion and playback volume conversion suitable for each speaker.

以上のように本実施の形態３では、音声特徴を記憶した話者に対応して設定した話速変換率と音声の増幅率を話者・話速・音量対応記憶部１０７に記憶し、話速・音量選択部１０８が話者判定部１０３によって判定された話者に対応する話速変換率と音声の増幅率を抽出し、これに基づいて話速変換部１０４が受話音声を変換するようにしたので、複数の話者が参加する通話において、受話者の感覚に応じた速さと音量で他の話者の発話内容を聞くことができ、その了解度が改善される。一方で、受聴者が指定していない話者の音声は話速変換が適用されないため、変換が逆効果となって音量のさらなる増大／減少、会話の了解度が低下することを回避することもできる。 As described above, in the third embodiment, the speech rate conversion rate and the speech amplification rate set for the speaker storing the speech features are stored in the speaker / speech rate / volume correspondence storage unit 107, and the speech The speed / volume selection unit 108 extracts the speech rate conversion rate and the speech amplification rate corresponding to the speaker determined by the speaker determination unit 103, and the speech rate conversion unit 104 converts the received speech based on this. Therefore, in a call in which a plurality of speakers participate, it is possible to hear the utterance contents of other speakers at a speed and volume according to the listener's sense, and the intelligibility is improved. On the other hand, since the speech speed conversion is not applied to the voice of the speaker not specified by the listener, it is also possible to avoid further increase / decrease in volume and decrease in the intelligibility of the conversation due to the reverse effect of the conversion. it can.

以上のように、本発明にかかる話速変換装置、通話装置および話速変換方法は、複数の話者を相手に通話する電話会議などの電話会議システムに有用である。 As described above, the speech rate conversion device, the speech device, and the speech rate conversion method according to the present invention are useful for a conference call system such as a conference call in which a plurality of speakers are talked to each other.

本発明の実施の形態１における話速変換装置を備える通話装置の一例を示す斜視図The perspective view which shows an example of the communication apparatus provided with the speech speed conversion apparatus in Embodiment 1 of this invention. 本発明の実施の形態１における通話装置の上面図Top view of communication apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態１における通話装置で本発明の実施の形態１による話速変換装置が関係する部品の構成を模式的に示すブロック図The block diagram which shows typically the structure of the component in which the speech-speed converter by Embodiment 1 of this invention is related with the telephone call apparatus in Embodiment 1 of this invention. 本発明の実施の形態１における話速変換装置の機能構成を模式的に示すブロック図The block diagram which shows typically the function structure of the speech-speed converter in Embodiment 1 of this invention. 本発明の実施の形態１における話速変換方法の手順の一例を示すフローチャートThe flowchart which shows an example of the procedure of the speech speed conversion method in Embodiment 1 of this invention. 本発明の実施の形態１における話速変換処理の動作概念を示す図The figure which shows the operation | movement concept of the speech speed conversion process in Embodiment 1 of this invention. 本発明の実施の形態１における通話装置の使用時の構成の一例を示す図The figure which shows an example of a structure at the time of use of the telephone call apparatus in Embodiment 1 of this invention. 本発明の実施の形態１における通話装置の使用時の構成の他の例を示す図The figure which shows the other example of a structure at the time of use of the telephone call apparatus in Embodiment 1 of this invention. 本発明の実施の形態１における話速変換処理の実際の動作例を示す図The figure which shows the actual operation example of the speech speed conversion process in Embodiment 1 of this invention. 本発明の実施の形態２における話速変換装置が関係する部分の構成を模式的に示すブロック図The block diagram which shows typically the structure of the part with which the speech-speed converter in Embodiment 2 of this invention is related. 本発明の実施の形態２における話速変換方法の手順の一例を示すフローチャートThe flowchart which shows an example of the procedure of the speech speed conversion method in Embodiment 2 of this invention. 本発明の実施の形態２における複数の話者の話速設定によるリアルタイム話速変換の概念図Conceptual diagram of real-time speech speed conversion by speaking speed setting of a plurality of speakers in Embodiment 2 of the present invention 本発明の実施の形態２における通話装置の使用時の構成の一例を示す図The figure which shows an example of the structure at the time of use of the telephone call apparatus in Embodiment 2 of this invention. 本発明の実施の形態２における話速変換処理の実際の動作例を示す図The figure which shows the actual operation example of the speech speed conversion process in Embodiment 2 of this invention. 本発明の実施の形態３における話速変換装置が関係する部分の構成を模式的に示すブロック図The block diagram which shows typically the structure of the part to which the speech-speed converter in Embodiment 3 of this invention is related. 本発明の実施の形態３における話速変換方法の手順の一例を示すフローチャートThe flowchart which shows an example of the procedure of the speech speed conversion method in Embodiment 3 of this invention. 本発明の実施の形態３における話速変換処理の実際の動作例を示す図The figure which shows the actual operation example of the speech speed conversion process in Embodiment 3 of this invention. 従来の話速変換装置の構成を模式的に示すブロック図A block diagram schematically showing the configuration of a conventional speech speed converter

Explanation of symbols

１０１音声特徴抽出部
１０２音声特徴記憶部
１０３話者判定部
１０４話速変換部
１０５話者・話速対応記憶部
１０６話速選択部
１０７話者・話速・音量対応記憶部
１０８話速・音量選択部
６０１，６０１ａ〜６０１ｆ通話装置
６０２ａ〜６０２ｄ，７０８マイクロホン
６０３，７０７スピーカ
６０５登録ボタン
６０６スロー再生ボタン
６０７操作ボタン
７０２通信路インターフェース
７０４メモリ
７０５Ｄ／Ａコンバータ
７０６Ａ／Ｄコンバータ
１２０１ａ〜１２０１ｆゲートウェイ
１２０２インターネット
１３０１ａ，１３０１ｂ接続線
１３０２ａ，１３０２ｂモデム
１３０３ａ，１３０３ｂ公衆回線網
１３０４ａ，１３０４ｂインターネットサービスプロバイダ DESCRIPTION OF SYMBOLS 101 Voice feature extraction part 102 Voice feature memory | storage part 103 Speaker determination part 104 Speak speed conversion part 105 Speaker / speech speed correspondence memory | storage part 106 Speak speed selection part 107 Speaker / speech speed / volume correspondence memory | storage part 108 Selection unit 601, 601 a to 601 f, communication device 602 a to 602 d, 708 microphone 603, 707 speaker 605 registration button 606 slow playback button 607 operation button 702 communication path interface 704 memory 705 D / A converter 706 A / D converter 1201 a to 1201 f gateway 1202 Internet 1301a, 1301b Connection line 1302a, 1302b Modem 1303a, 1303b Public line network 1304a, 1304b Internet service provider

Claims

Voice feature extraction means for extracting voice features of individual speakers participating in the call;
Voice feature storage means for storing voice features extracted for a speaker designated by the listener;
Talk to determine whether the speaker is one of the speakers specified by the listener by comparing the voice features of the current speaker with the voice features stored in the voice feature storage means Person determination means;
A speech speed converting means for converting the speech speed of the received voice of the speaker when the speaker determining means determines that the speaker is designated by the listener;
A speech speed conversion device comprising:

Designated speaker conversion condition storage means for storing a conversion rate of the optimum speech speed set for each of the speakers specified by the listener;
A designated speaker conversion condition selecting means for selecting a conversion rate of the speech speed corresponding to the speaker determined by the speaker determining means from the specified speaker conversion condition storage means;
The speech speed conversion means converts the speech speed of the speaker specified by the listener using the conversion rate of the speech speed selected by the designated speaker conversion condition selection means. The speech speed conversion apparatus according to claim 1.

The designated speaker conversion condition storage means further stores a reproduction volume amplification factor set for each of the speakers designated by the listener,
The designated speaker conversion condition selection means selects, from the designated speaker conversion condition storage means, the conversion rate of the speech speed and the amplification factor of the reproduction volume corresponding to the speaker determined by the speaker determination means,
The speaking speed conversion means uses the speaking speed conversion rate selected by the designated speaker conversion condition selection means and the reproduction volume amplification factor, and the speaking speed and reproduction of the speaker specified by the listener. The speech rate conversion apparatus according to claim 2, wherein the speech rate conversion unit converts the volume.

A communication means for communicating with another communication device via a communication line;
A sound collection means for collecting the voice of the speaker;
Audio output means for reproducing and outputting audio from the other call device;
The speech rate conversion device according to any one of claims 1 to 3,
A call device comprising:

Perform voice extraction processing to extract the voice features of individual speakers participating in the call, perform voice feature storage processing to store the voice features extracted for the speaker specified by the listener, and A speaker determination process is performed to determine whether or not the speaker is one of the speakers designated by the listener by comparing a feature with the stored voice feature. A speech speed conversion method, comprising: performing speech speed conversion processing for converting the speech speed of a received voice of the speaker when it is determined that the speaker is designated by a listener.

Performing designated speaker conversion condition storage processing for storing the conversion rate of the optimum speech speed set for each of the speakers designated by the listener,
Specified speaker conversion condition selection for selecting the conversion rate of the speech speed corresponding to the speaker determined by the speaker determination process before the speech speed determination process from the contents stored in the specified speaker conversion condition storage process Process
The speech speed conversion process converts the speech speed of the speaker specified by the listener using the conversion rate of the speech speed selected in the designated speaker conversion condition selection process. 5. The speech speed conversion method according to 5.

In the designated speaker conversion condition storage process, the reproduction volume amplification factor set for each of the speakers designated by the listener is further stored,
In the designated speaker conversion condition selection process, the content stored in the designated speaker conversion condition storage process, the conversion rate of the speech speed and the amplification factor of the reproduction volume corresponding to the speaker determined by the speaker determination process Select from
In the speech speed conversion process, the speech speed and playback of the speaker specified by the listener using the conversion rate of the speech speed selected in the designated speaker conversion condition selection process and the amplification factor of the playback volume. The speech speed converting method according to claim 6, wherein the sound volume is converted.