JP2003345375A

JP2003345375A - Device and system for reproducing voice

Info

Publication number: JP2003345375A
Application number: JP2002149932A
Authority: JP
Inventors: Masayuki Misaki; 正之三▲崎▼; Takeo Kanamori; 丈郎金森; Junichi Tagawa; 潤一田川; Tomomi Matsuoka; 智美松岡
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2002-05-24
Filing date: 2002-05-24
Publication date: 2003-12-03

Abstract

<P>PROBLEM TO BE SOLVED: To compensate masking caused by peripheral noise while performing compensation adapted to individual degradation in hearing. <P>SOLUTION: A voice hearing condition setting part 5 reads a profile being a user's voice hearing condition, from a profile server 20 through a communication network. A masking extent estimation part 2 compensate masking caused by peripheral noise, of received voice data. A reproducing control part 4 obtains an articulate voice compensated for an influence of masking and degradation in hearing by performing voice emphasis processing considering user's degradation in hearing. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、屋外などのモバイ
ル環境下で、通信やネットワークを通じて得た音声情報
を再生する音声再生装置に係わり、様々な聴力特性を持
ったユーザに対して的確に、騒音環境によって聴き取り
難い音声を端末側で聴き取りやすくするための機能を備
える音声再生装置およびこれを備えた音声再生システム
に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice reproducing device for reproducing voice information obtained through communication or a network in a mobile environment such as outdoors, and accurately relates to a user having various hearing characteristics. The present invention relates to a voice reproduction device having a function for making it easier for a terminal to hear a voice that is difficult to hear due to a noise environment, and a voice reproduction system having the same.

【０００２】[0002]

【従来の技術】現在、音声を記録再生する携帯可能な音
声録再装置や、携帯電話などの音声通信サービス、ある
いは、ネットワークを利用した音声やオーディオの情報
を伝送するサービスなどが広く提供され、屋外に於いて
も様々な形態で音声情報を聴取する機会がある。しかし
ながら、これらの装置、サービスは、一般にユーザの聴
力特性および聴取環境を考慮した音声再生を行うもので
はないので、特に聴力劣化の著しいユーザは例えば補聴
器などを装用して受聴明瞭度を改善して対応している。
補聴器の聴覚補償特性は、一般的にはフィッティングと
呼ばれる特性決定過程を経て装着者に最適なパラメータ
に決定されるが、このパラメータ値はフィッティングを
実施した聴取環境と同等の場合には有効であるが、それ
以外の条件では最適な調整ができていない。2. Description of the Related Art At present, a portable voice recording / reproducing apparatus for recording and reproducing voice, a voice communication service for a mobile phone, a service for transmitting voice and audio information using a network, etc. are widely provided. There is an opportunity to listen to audio information in various forms even outdoors. However, since these devices and services generally do not perform voice reproduction in consideration of the hearing characteristics and listening environment of the user, a user with a remarkable deterioration of hearing may wear a hearing aid to improve the intelligibility. It corresponds.
The hearing compensation characteristic of a hearing aid is generally determined as the optimum parameter for the wearer through a characteristic determination process called fitting, and this parameter value is effective when it is equivalent to the listening environment in which the fitting is performed. However, under other conditions, optimum adjustment has not been achieved.

【０００３】この課題に対する対処を実施した従来の補
聴器としては、特許第２６３８５６３号の履歴保持型補
聴器がある。その履歴保持型補聴器すなわち音声再生装
置を図６に示す。以下、図面を参照しながら、従来技術
について説明を行う。図６において、２１１はマイクロ
ホン、２１２は聴覚補償部、２１３は増幅部、２１４は
イヤフォン、２２１は選択部、３１０は現行記録部、３
２０は履歴記録部、２６１は記憶媒体、２６２はコネク
タ、２６３は記憶媒体検出部である。As a conventional hearing aid that has dealt with this problem, there is a history holding type hearing aid of Japanese Patent No. 2638563. FIG. 6 shows the history holding type hearing aid, that is, the sound reproducing device. Hereinafter, a conventional technique will be described with reference to the drawings. In FIG. 6, reference numeral 211 is a microphone, 212 is a hearing compensation unit, 213 is an amplification unit, 214 is an earphone, 221 is a selection unit, 310 is a current recording unit, 3
Reference numeral 20 is a history recording unit, 261 is a storage medium, 262 is a connector, and 263 is a storage medium detection unit.

【０００４】まず、現行記録部３１０は現在の聴覚特性
に関する補聴特性を決定する現行のパラメータを記録す
る現行パラメータ記録部２３１と、これに対応したフィ
ッティングを記憶している現行フィッティング記録部２
４１とから構成されている。一方、履歴記録部３２０は
過去において補聴特性を決定したパラメータ更新時のパ
ラメータの記録を示す複数のパラメータ履歴記録部２３
２−１〜２３２−ｎ、及びこれらに１対１で対応したフ
ィッティングの記録を示す複数のフィッティング履歴記
録部２４２−１〜２４２−ｎとから構成されている。選
択部２２１は、現行記録部３１０および履歴記録部３２
０に記録されているパラメータの中から１つのパラメー
タを選択する選択部として機能する。この履歴記録部３
２０は着脱可能な記憶媒体２６１上に構築されており、
記憶媒体２６１から選択部２２１へパラメータ履歴及び
フィッティング履歴を伝送するために、記憶媒体２６１
と選択部２２１の間を電気的に接続するコネクタ２６２
が介在する。記憶媒体検出部２６３はコネクタ２６２に
記録媒体が装着されているかどうかを判定して、装着さ
れていない場合には履歴記録部３２０上のデータ選択を
禁止するように選択部２２１へ指示を行う。選択部２２
１は選択されたパラメータ値を聴覚補償部２１２へ伝送
する。聴覚補償部２１２は、選択部２１１から入力され
る音声信号に対して選択部２２１で選択されたパラメー
タを聴覚補償特性として聴覚補償処理を行う。増幅部２
１３は聴覚補償部２１２の聴覚補償処理を施された信号
に対して十分な音量が得られるまで増幅してイヤフォン
２１４へ出力する。イヤフォン２１４は聴覚補償部２１
２および増幅部２１３で処理された音声信号の最終の出
力信号となる。First, the current recording unit 310 records a current parameter for determining a hearing aid characteristic related to the current hearing characteristic, and a current fitting recording unit 2 for storing fittings corresponding thereto.
And 41. On the other hand, the history recording unit 320 includes a plurality of parameter history recording units 23 indicating recording of parameters at the time of updating parameters for which hearing aid characteristics have been determined in the past.
2-1 to 232-n, and a plurality of fitting history recording units 242-1 to 242-n showing recording of fittings corresponding to each other in a one-to-one manner. The selection unit 221 includes a current recording unit 310 and a history recording unit 32.
It functions as a selection unit for selecting one parameter from the parameters recorded in 0. This history recording unit 3
20 is built on a removable storage medium 261.
In order to transmit the parameter history and the fitting history from the storage medium 261 to the selection unit 221, the storage medium 261 is used.
And a connector 262 for electrically connecting the selection unit 221
Intervenes. The storage medium detection unit 263 determines whether or not a recording medium is attached to the connector 262, and if not attached, instructs the selection unit 221 to prohibit data selection on the history recording unit 320. Selector 22
1 transmits the selected parameter value to the hearing compensation unit 212. The hearing compensation unit 212 performs the hearing compensation process on the audio signal input from the selection unit 211 with the parameter selected by the selection unit 221 as the hearing compensation characteristic. Amplifier 2
Reference numeral 13 amplifies the signal subjected to the hearing compensation processing of the hearing compensation unit 212 until a sufficient volume is obtained and outputs the amplified signal to the earphone 214. The earphone 214 is the hearing compensation unit 21.
2 and the final output signal of the audio signal processed by the amplification unit 213.

【０００５】このような構成を持つことによって、過去
にフィッティングした補聴器の聴覚補償パラメータを複
数所有し、そのフィッティングした状況に応じてユーザ
がパラメータを選択することで、聴覚補償特性を変更し
て与えることが可能となり、固定の聴覚特性しか与えら
れない場合に比べて聴覚補償能力が向上できる可能性が
ある。With such a configuration, a plurality of hearing compensation parameters of the hearing aid that have been fitted in the past are owned, and the user selects the parameters according to the fitted situation to change and give the hearing compensation characteristics. It is possible that the hearing compensation ability can be improved as compared with the case where only fixed auditory characteristics are given.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記の
ような構成では、利用する騒音環境においてフィッティ
ング実施後に予め聴覚補償パラメータを保存してある必
要があるが、予めフィッティングを多数の条件下で行う
ことには、多大な労力を必要とし事実上不可能であるこ
とが多い。また、様々な条件の聴取環境においても必ず
しもいつも同じ特性の騒音環境条件である保証はなく、
使用する時間帯や外部的な要因など流動的な要素が多
く、上記構成では端末利用者の音声聴取条件と環境騒音
の双方に対してその場ですぐに最適な聴覚補償パラメー
タを設定できないという本質的な課題を有している。However, in the above configuration, it is necessary to store the hearing compensation parameters in advance after the fitting is performed in the noise environment to be used, but the fitting is performed in advance under many conditions. Often requires a lot of effort and is virtually impossible. Also, even in listening environments under various conditions, there is no guarantee that the environmental conditions will always have the same characteristics.
There are many fluid factors such as time of use and external factors, and in the above configuration, it is not possible to immediately set optimal hearing compensation parameters on the spot for both the listening conditions of the terminal user and environmental noise. Have specific challenges.

【０００７】本発明は上記課題に鑑み、ユーザの音声聴
取条件と環境騒音の推定値をもとに音声信号に対するマ
スキング量を推定して音声強調処理した音声を聴取可能
とする構成を持ち、様々な環境騒音下において即座に最
適な聴覚補償パラメータの設定による聴き取りやすい音
声を得るための音声再生装置およびこれを備えた音声再
生システムを提供するものである。In view of the above-mentioned problems, the present invention has a configuration in which the amount of masking for a voice signal is estimated based on the user's voice listening condition and the estimated value of environmental noise so that the voice enhanced voice can be heard. (EN) Provided are a sound reproducing device and a sound reproducing system including the same for immediately obtaining an easily audible sound by setting optimum hearing compensation parameters under various environmental noises.

【０００８】[0008]

【課題を解決するための手段】この課題を解決するため
に本発明は、通信網を利用してユーザの音声聴取条件を
記録したプロファイル情報を取得し、ユーザの聴力特性
や音質的な好みの特性を利用すると同時に、環境騒音を
推定してこの推定値を元にして再生する音声信号に対す
るマスキング量を推定し、これらの双方を考慮して音声
強調処理を適用することで、様々な条件下で即座に聴き
取りやすい明瞭な音声を再生する汎用的な音声再生装置
および音声再生システムを提供することが可能となる。In order to solve this problem, the present invention uses a communication network to obtain profile information in which the user's voice listening conditions are recorded, and to determine the user's hearing characteristics and sound quality preference. At the same time as using the characteristics, by estimating the environmental noise and estimating the masking amount for the voice signal to be reproduced based on this estimated value, and applying the voice enhancement processing in consideration of both of them, various conditions can be applied. Thus, it is possible to provide a general-purpose audio reproduction device and audio reproduction system that immediately reproduces clear audio that is easy to hear.

【０００９】この目的を達成するために、本発明におけ
る音声再生装置は、通信網経由で取得したユーザのプロ
ファイル情報からユーザの音声聴取条件を求める音声聴
取条件設定手段と、マスキング推定値と音声聴取条件に
基づき音声強調処理手段を制御する再生制御手段と、入
力音声に対して信号処理により所定の強調処理を行う音
声強調処理手段とを備えた構成となっている。In order to achieve this object, a voice reproducing apparatus according to the present invention comprises a voice listening condition setting means for obtaining a voice listening condition of a user from profile information of the user acquired via a communication network, a masking estimated value and a voice listening. The reproduction control means controls the voice enhancement processing means based on the conditions, and the voice enhancement processing means performs predetermined enhancement processing on the input voice by signal processing.

【００１０】この目的を達成するために、本発明におけ
る音声再生システムは、登録されたユーザのプロファイ
ル情報を通信網経由のアクセス要求に応じて送出するプ
ロファイルサーバと、音声情報および音声を含むＡＶ情
報を送信する送出装置と、受信した音声信号に対して環
境騒音の推定値をもとにマスキング量を推定しつつ通信
網経由で取得したユーザのプロファイル情報を利用して
所定の音声強調処理を施した音声を再生する音声再生装
置と、を有した構成となっている。In order to achieve this object, a voice reproduction system according to the present invention includes a profile server for transmitting profile information of a registered user in response to an access request via a communication network, and AV information including voice information and voice. And a sending device that transmits the received voice signal, and performs a predetermined voice enhancement process using the profile information of the user acquired through the communication network while estimating the masking amount based on the estimated value of environmental noise for the received voice signal. And a sound reproducing device for reproducing the reproduced sound.

【００１１】[0011]

【発明の実施の形態】（実施の形態１）以下本発明の実
施の形態１について、図面を参照しながら説明する。BEST MODE FOR CARRYING OUT THE INVENTION (Embodiment 1) Hereinafter, Embodiment 1 of the present invention will be described with reference to the drawings.

【００１２】図１は本発明の実施の形態１における音声
再生システムのブロック図を示すものである。図１にお
いて、１０は音声再生端末、２０はプロファイルサー
バ、３０は音声送出装置、１は騒音推定部、２はマスキ
ング量推定部、３は音声強調部、４は再生制御部、５は
音声聴取条件設定部である。以下、その動作について説
明する。FIG. 1 is a block diagram of an audio reproducing system according to the first embodiment of the present invention. In FIG. 1, 10 is a voice reproduction terminal, 20 is a profile server, 30 is a voice transmission device, 1 is a noise estimation unit, 2 is a masking amount estimation unit, 3 is a voice enhancement unit, 4 is a reproduction control unit, and 5 is voice listening. It is a condition setting unit. The operation will be described below.

【００１３】まず、音声聴取条件設定部５は、ユーザの
音声聴取条件を記録したプロファイルをプロファイルサ
ーバ２０から通信網を通じて読み出して聴力劣化のタイ
プとその度合いを得る。騒音推定部１は、マイクロホン
で収音した受信側の周囲騒音信号から準定常的な騒音推
定値を定期的に求める。次に、マスキング量推定部２
は、騒音推定値によって音声送出装置３０からの受信音
声信号が影響されるマスキング量の推定値を求める。そ
して、再生制御部４は、聴力劣化のタイプと度合いを補
償しながらマスキング量推定部２で時々刻々求められる
マスキング推定値を定期的に更新して音声強調処理部３
の制御を適応的に行う。音声強調処理部３では、音声強
調処理を適用することで受信した音声信号をユーザが明
瞭に聴取できるようにした音声を出力する。このように
音声聴取条件を登録してあれば通信網を介してプロファ
イルサーバから取り出すことで、公共の端末利用時にで
も簡単に個人の聴力劣化に的確に対応した補償が可能と
なる。First, the voice listening condition setting unit 5 reads the profile in which the voice listening condition of the user is recorded from the profile server 20 through the communication network to obtain the type and degree of hearing loss. The noise estimation unit 1 periodically obtains a quasi-steady noise estimation value from the ambient noise signal on the reception side picked up by the microphone. Next, the masking amount estimation unit 2
Calculates an estimated value of the masking amount that the received voice signal from the voice transmitting device 30 is affected by the noise estimated value. Then, the reproduction control unit 4 periodically updates the masking estimation value obtained by the masking amount estimation unit 2 every moment while compensating for the type and degree of hearing deterioration, and the voice enhancement processing unit 3
Adaptively control. The voice enhancement processing unit 3 outputs a voice in which the user can clearly hear the received voice signal by applying the voice enhancement process. As described above, if the voice listening condition is registered, the voice is taken out from the profile server via the communication network, so that even when the public terminal is used, it is possible to easily and accurately compensate for the hearing loss of the individual.

【００１４】本実施の形態では、屋外で使用する音声通
信機や音声情報へアクセスできる携帯情報端末などを想
定した一実施の形態である。ここではユーザとして聴力
劣化した高齢者や障害者などを想定しているが、聴力劣
化の度合いによって音声強調の度合いを調整すること
で、一般的な健聴者でも使用することができる、いわゆ
るユニバーサルデザインと考えてもよい。受信した音声
を聴取するユーザは、自らの聴力劣化の度合いや周囲の
環境騒音の大きさなどによって、明瞭度が十分でなくな
り、コミュニケーションに支障をきたす。この原因は、
ユーザの特定の周波数帯域における感度低下やリクルー
トメント現象という、いわゆる聴力劣化による場合と、
騒音源による受信音声のマスキングによる場合の２つが
考えられる。ここで、単純にユーザがマニュアル操作で
音声再生装置の音量を上げることで解決を図ろうとする
が、それでは、特定の周波数帯域のみが感度不足のまま
であり、あるいは、リクルートメント現象により音の大
きさの感度が非線形に変形する場合には、かえって耳障
りになることが多い。また、騒音源などの時間的な変化
に応じて刻々と音量を自動的に変化させる必要が生じる
など、いずれにせよ不都合な点が多い。This embodiment is an embodiment assuming a voice communication device used outdoors, a portable information terminal capable of accessing voice information, and the like. Here, the user is assumed to be an elderly person or a handicapped person whose hearing is deteriorated, but by adjusting the degree of speech enhancement depending on the degree of hearing deterioration, it can be used by general hearing-impaired people. You may think that. A user who listens to the received voice has insufficient clarity due to the degree of his own hearing deterioration and the amount of environmental noise in the surroundings, which hinders communication. The cause is
There is a case of so-called hearing deterioration, which is called sensitivity deterioration or recruitment phenomenon in a user's specific frequency band,
There are two possible cases of masking the received voice by a noise source. Here, the user simply tries to solve the problem by manually increasing the volume of the audio reproduction device, but then, only a specific frequency band remains insufficient in sensitivity, or the volume of the sound is increased due to the recruitment phenomenon. If the sensitivity of the fish deforms in a non-linear manner, it is rather annoying. In addition, there are many disadvantages in any case, such as the need to automatically change the volume every moment according to the temporal change of the noise source.

【００１５】そこで聴取者の聴力劣化や周囲の環境騒音
の状況に応じて、聞き取り困難な音声を明瞭に聴取する
ための信号処理方法を適応的に制御する。Therefore, a signal processing method for clearly listening to a difficult-to-hear voice is adaptively controlled according to the hearing deterioration of the listener and the surrounding environmental noise.

【００１６】まず、ユーザの音声聴取条件としては、典
型的な高齢者難聴者のケースとして、高音急墜型のタイ
プであればオージオメータで求めた（表１）に示す気導
聴力特性を示すようなデータを使用することも考えられ
る。First, as a user's voice listening condition, in the case of a typical elderly hearing-impaired person, in the case of a high-pitched sound type, the air-conducted hearing characteristics shown in Table 1 are obtained by an audiometer. It is also possible to use such data.

【００１７】[0017]

【表１】このような聴力損失を補うためには、通常、損失分の１
／２程度を補正するハーフゲインルールを適用した補正
量に留めることが多いが（「補聴器活用ガイド」Ｐ．
１１１など大沼直紀著）、さらにユーザの音質的な
好みを反映する主観評価に基づいてフィッティングを行
う必要がある。よって、このユーザの音質的な好みを反
映するための仕組みとして図２のような構成が考えられ
る。図２の構成は、図１の構成に音質調整設定端子を加
えたものである。ユーザの音質的な好みは、すでにユー
ザプロファイルの音声聴取条件に登録されていて、再生
制御部４は、この情報を読み取り音質調整設定端子に基
づき再生制御部の処理パラメータに補正をかけるように
指示を行う。典型的な例として、明瞭な音声よりも自然
性重視で柔らかな音を好むユーザは、強調処理の度合い
をやや軽度に修正し、逆にはきはきした明瞭な音声を好
むユーザは強調処理の度合いを強めに設定することが考
えられる。このように、物理的な聴力補償に加えて、主
観的な好みを反映した音声強調処理を実現することが可
能となる。[Table 1] To compensate for such hearing loss, one
It is often limited to the amount of correction that applies the half-gain rule that corrects about / 2 (see "Hearing aid utilization guide" P.
(111, Naoki Onuma), and it is necessary to perform fitting based on a subjective evaluation that reflects the user's sound quality preference. Therefore, a configuration as shown in FIG. 2 can be considered as a mechanism for reflecting the sound quality preference of the user. The configuration of FIG. 2 is obtained by adding a sound quality adjustment setting terminal to the configuration of FIG. The sound quality preference of the user is already registered in the audio listening condition of the user profile, and the reproduction control unit 4 reads this information and instructs to correct the processing parameter of the reproduction control unit based on the read sound quality adjustment setting terminal. I do. As a typical example, a user who prefers soft sound with emphasis on naturalness rather than clear voice modifies the degree of emphasis processing slightly, and conversely a user who prefers clear and crisp speech changes the degree of emphasis processing. It is conceivable to set it stronger. In this way, in addition to physical hearing compensation, it is possible to realize voice enhancement processing that reflects subjective preferences.

【００１８】騒音の周波数成分の推定には、突発的な騒
音に追随した補償を行うとかえって音が不自然になるの
で、時間的にある程度長い時定数で積分した周波数包絡
を基に騒音推定値を求めることにより、音声強調処理後
の音声が急激に変化することなく自然な再生音を得るこ
とができる。In estimating the frequency component of the noise, the sound becomes unnatural if compensation is made following the sudden noise. Therefore, the estimated noise value is based on the frequency envelope integrated with a time constant that is long to some extent. By obtaining, it is possible to obtain a natural reproduced sound without a sudden change in the sound after the sound enhancement processing.

【００１９】次に、マスキング量の推定方法を示す。ま
ず、音声信号と騒音推定値を各々フレーム単位で周波数
分析して特定の帯域幅毎に騒音のパワーを求めておく。
そして所定の周波数帯域幅毎に音声と騒音のパワーを比
較し、受信した音声が周囲の環境騒音により同時マスキ
ングされるマスキング量を推定し再生制御部４に出力す
る。再生制御部４は各周波数帯域幅毎に音声強調部の強
調パラメータを制御する。音声強調処理部３は、強調処
理パラメータを調整して、再生する音声の強調度合いを
変化させる。Next, a method of estimating the masking amount will be described. First, the voice signal and the noise estimation value are subjected to frequency analysis on a frame-by-frame basis to obtain the noise power for each specific bandwidth.
Then, the powers of the voice and the noise are compared for each predetermined frequency bandwidth, the masking amount at which the received voice is simultaneously masked by the ambient environmental noise is estimated and output to the reproduction control unit 4. The reproduction control unit 4 controls the enhancement parameter of the voice enhancement unit for each frequency bandwidth. The voice emphasis processing unit 3 adjusts the emphasis processing parameter to change the emphasis degree of the reproduced voice.

【００２０】次に、マスキング量推定部２の動作に関し
て説明する。音声信号と騒音推定値の周波数分析を行
い、各々の臨界帯域幅毎の平均エネルギーを求める。そ
して、マスキング量推定部は、対応する臨界帯域幅にお
けるマスキング量を推定する。この値は、例えば、文
献：村瀬、中村、飯田、“周囲騒音によるマスキングを
考慮した音質制御方式”（日本音響学会講演論文集、平
成９年３月、２・３・10）などに示されているように、
信号源と騒音源の双方の値をパラメータとして関数の形
で表される。ここで、同時マスキング効果に関しては、
例えば、Ｂ．Ｃ．Ｊ．ムーア著、大串健吾監訳“聴覚
心理学概論”、の第３章（誠信書房）などに詳しいので
解説を省略する。このようにして求められた臨界帯域毎
のマスキング量推定値は、音声強調処理部の強調処理を
行う度合いを決定するパラメータとして用いられる。Next, the operation of the masking amount estimation unit 2 will be described. Frequency analysis is performed on the voice signal and the estimated noise value, and the average energy for each critical bandwidth is obtained. Then, the masking amount estimation unit estimates the masking amount in the corresponding critical bandwidth. This value is shown, for example, in the literature: Murase, Nakamura, Iida, “Sound Quality Control Method Considering Masking by Ambient Noise” (Proceedings of the Acoustical Society of Japan, March 2, 3/10, 1997). As
It is expressed in the form of a function with the values of both the signal source and the noise source as parameters. Here, regarding the simultaneous masking effect,
For example, B.I. C. J. Since it is detailed in Chapter 3 (Seishin Shobo) of Moore's book “Introduction to Auditory Psychology”, translated by Kengo Ogushi, the explanation is omitted. The masking amount estimated value for each critical band obtained in this way is used as a parameter that determines the degree to which the speech enhancement processing unit performs the enhancement processing.

【００２１】また、再生制御部４は、上記マスキング量
推定値と、ユーザの聴力劣化を考慮して音声強調処理の
処理パラメータを決定する。Further, the reproduction control section 4 determines the processing parameter of the voice enhancement processing in consideration of the masking amount estimated value and the deterioration of the hearing of the user.

【００２２】次に、音声強調処理部３で実施される音声
信号処理に関して説明を行う。受信した音声信号は、聴
取者の周囲の環境騒音によってマスキングを受けて、聴
覚的に聞こえない成分を生じるため、そのマスキングさ
れる周波数帯域を補償するための処理を行う。まず、臨
界帯域幅毎に求められたマスキング量は、その帯域にお
ける一定値の利得調整を行うことで、マスキングの影響
を補償することが可能となる。しかし、周波数分解能を
高めるために分析フレームのポイント数が大きくなる
と、その区間における平均的な利得調整値としては有効
であるが、フレーム内で振幅が定常的でない過渡的な部
分の場合には大振幅部分での音声が過大増幅になり耳障
りになる可能性がある。そこで、補聴器などで使用され
ることが多いダイナミックレンジ圧縮処理を適用する。
図３に音声強調処理部３の構成を示す。まず臨界帯域幅
の周波数帯域に帯域分割し、その帯域毎にダイナミック
レンジ圧縮を施すことでマスキング補償を行うこととす
る。帯域分割部１３１は臨界帯域幅ごとに帯域分割を行
い、次段のダイナミックレンジ圧縮処理部１３２では帯
域毎に与えられるマスキング量をもとに、最小可聴レベ
ル（ＨＴＬ）を定め、不快閾値（ＵＣＬ）との間に音声
信号を収めるダイナミックレンジの圧縮処理を行うもの
である。この時のダイナミックレンジ圧縮処理として図
４に示すような入出力特性を示す。この図ではマスキン
グ補償のために入力信号が４０ｄＢ（ＨＬ）時において
２０ｄＢのゲインアップとなる折れ線型の入出力特性を
与えている。この特性では、入力信号が９０ｄＢ（Ｈ
Ｌ）をＵＣＬと想定し、この値以上に出力信号が増幅さ
れない。また、このような非線形な利得調整を実施する
ことにより、所定の範囲へのダイナミックレンジの圧縮
処理を行うことが可能となり、その結果、帯域毎にマス
キング補償を行うことができる。この入出力特性にユー
ザの聴力を考慮することで、同時に補償は可能となる。Next, the audio signal processing performed by the audio enhancement processing section 3 will be described. The received voice signal is masked by the environmental noise around the listener to generate a component that cannot be heard auditorily, and therefore a process for compensating the masked frequency band is performed. First, the masking amount obtained for each critical bandwidth can be compensated for the effect of masking by adjusting the gain at a constant value in that band. However, when the number of points in the analysis frame is increased to improve the frequency resolution, it is effective as an average gain adjustment value in that section, but it is large in the transient part where the amplitude is not steady in the frame. The sound in the amplitude part may be over-amplified and may be offensive to the ear. Therefore, dynamic range compression processing that is often used in hearing aids and the like is applied.
FIG. 3 shows the configuration of the voice enhancement processing unit 3. First, it is assumed that masking compensation is performed by dividing a band into frequency bands having a critical bandwidth and performing dynamic range compression for each band. The band division unit 131 performs band division for each critical bandwidth, and the dynamic range compression processing unit 132 at the next stage determines the minimum audible level (HTL) based on the masking amount given for each band and sets the uncomfortable threshold (UCL). ) And the dynamic range compression processing for accommodating the audio signal between the two. The input / output characteristics as shown in FIG. 4 are shown as the dynamic range compression processing at this time. In this figure, for the masking compensation, a polygonal line type input / output characteristic which gives a gain increase of 20 dB when the input signal is 40 dB (HL) is given. With this characteristic, the input signal is 90 dB (H
L) is assumed to be UCL, and the output signal is not amplified beyond this value. Further, by performing such a non-linear gain adjustment, it becomes possible to perform a dynamic range compression process to a predetermined range, and as a result, masking compensation can be performed for each band. By considering the hearing ability of the user in this input / output characteristic, compensation can be performed at the same time.

【００２３】また、記録媒体が着脱可能なリムーバブル
メディアであれば、使用するユーザごとに的確な音声聴
取条件に適合した処理が可能となり、公衆電話や公共の
設備に用いるには好都合である。このような汎用的な音
声再生装置の場合でも、ユーザプロファイルを読み込む
ので聴取する音声情報の種類（会話音声、音楽、ニュー
スなど）に応じて自分の好みの音質を加味した音声強調
音声を得ることができる。Further, if the recording medium is a removable medium, it is possible to perform a process suitable for an accurate audio listening condition for each user, which is convenient for use in public telephones and public facilities. Even in the case of such a general-purpose audio reproducing device, since the user profile is read, it is possible to obtain a voice-enhanced voice in which a desired sound quality is added according to the type of voice information (conversation voice, music, news, etc.) to be heard. You can

【００２４】なお、受信音声を明瞭にする部としてはダ
イナミックレンジ圧縮以外にも考えられる。例えば、リ
ミッター動作により上限値を制限する動作を行うグラフ
ィックイコライザなども同等の動作が可能である。It should be noted that other than the dynamic range compression, it is conceivable that the received voice becomes clear. For example, a graphic equalizer that performs an operation of limiting the upper limit value by a limiter operation can perform the same operation.

【００２５】（実施の形態２）以下本発明の実施の形態
２について、図面を参照しながら説明する。図５は本発
明の実施の形態２における音声再生システムのブロック
図を示すものである。図５において、１０は音声再生端
末、２０はプロファイルサーバ、３０は音声送出装置、
１は騒音推定部、２はマスキング量推定部、３は音声強
調処理部、４は再生制御部、５は音声聴取条件設定部、
６は端末聴取条件送出部である。以下、その動作につい
て説明する。(Second Embodiment) A second embodiment of the present invention will be described below with reference to the drawings. FIG. 5 is a block diagram of the audio reproduction system according to the second embodiment of the present invention. In FIG. 5, 10 is a voice reproducing terminal, 20 is a profile server, 30 is a voice transmitting device,
1 is a noise estimation unit, 2 is a masking amount estimation unit, 3 is a voice enhancement processing unit, 4 is a reproduction control unit, 5 is a voice listening condition setting unit,
Reference numeral 6 is a terminal listening condition sending unit. The operation will be described below.

【００２６】まず、音声聴取条件設定部５は、ユーザの
音声聴取条件を記録したプロファイルをプロファイルサ
ーバ２０から通信網を通じて読み出して聴力劣化のタイ
プとその度合いを得る。騒音推定部１は、マイクロホン
で収音した受信側の周囲騒音信号から準定常的な騒音推
定値を定期的に求める。次に、端末聴取条件送出部６
は、騒音推定値や端末側の音響的な再生条件を音声送信
側へ送出する。次に、マスキング量推定部２は、騒音推
定値によって音声送出装置３０からの受信音声信号が影
響されるマスキング量の推定値を求める。そして、再生
制御部４は、聴力劣化のタイプと度合いを補償しなが
ら、音声送信側から送られてきたマスキング推定値に基
づいて音声強調処理部３の制御を適応的に行う。音声強
調処理部３では、音声強調処理を適用することで受信し
た音声信号をユーザが明瞭に聴取できるようにした音声
を出力する。このように音声聴取条件を登録さえしてあ
れば通信網を介してプロファイルサーバから取り出すこ
とで、公共の端末利用時にでも簡単に個人の聴力劣化に
的確に対応した補償が可能となる。First, the voice listening condition setting unit 5 reads the profile in which the voice listening condition of the user is recorded from the profile server 20 through the communication network to obtain the type and degree of hearing deterioration. The noise estimation unit 1 periodically obtains a quasi-steady noise estimation value from the ambient noise signal on the reception side picked up by the microphone. Next, the terminal listening condition sending unit 6
Sends the estimated noise value and the acoustic reproduction condition on the terminal side to the voice transmitting side. Next, the masking amount estimation unit 2 obtains an estimated value of the masking amount in which the received voice signal from the voice transmitting device 30 is affected by the noise estimated value. Then, the reproduction control unit 4 adaptively controls the voice enhancement processing unit 3 based on the masking estimated value sent from the voice transmitting side while compensating for the type and degree of hearing deterioration. The voice enhancement processing unit 3 outputs a voice in which the user can clearly hear the received voice signal by applying the voice enhancement process. In this way, if the voice listening condition is registered, it can be easily taken out from the profile server through the communication network and compensated for the deterioration of the hearing ability of the individual even when using a public terminal.

【００２７】本実施の形態では、実施の形態１が受信側
にすべての構成を有して処理を行うのに対して、音声を
送信する音声送出側に構成の一部を分散させる形の構成
で実現される。従って、最終的に得られる音声出力は、
同じ効果が得られるが、送信側への一部の処理を分散さ
せることにより受信側の再生装置側での演算量は軽減さ
れることになる。一方では、受信側から送信側へ端末聴
取条件を送信する必要があるが、このデータ量は少ない
ため、通信の負担は軽い。従って、例えば、携帯型情報
端末などの電池駆動型機器では、小型軽量化を達成する
目的には演算量を削減できる本構成は有益なものと考え
られる。In the present embodiment, the first embodiment has all the configurations on the receiving side to perform processing, whereas a part of the configuration is distributed to the voice transmitting side for transmitting voice. Will be realized in. Therefore, the final audio output is
Although the same effect can be obtained, by distributing a part of the processing to the transmitting side, the amount of calculation on the reproducing apparatus side on the receiving side can be reduced. On the other hand, although it is necessary to transmit the terminal listening condition from the receiving side to the transmitting side, the communication load is light because the amount of data is small. Therefore, for example, in a battery-driven device such as a portable information terminal, the present configuration that can reduce the amount of calculation is considered to be useful for the purpose of achieving size reduction and weight reduction.

【００２８】なお、受信音声を明瞭にする部としてはダ
イナミックレンジ圧縮以外にも考えられる。例えば、リ
ミッター動作により上限値を制限する動作を行うグラフ
ィックイコライザなども同等の動作が可能である。It should be noted that other than the dynamic range compression, it is conceivable that the received voice becomes clear. For example, a graphic equalizer that performs an operation of limiting the upper limit value by a limiter operation can perform the same operation.

【００２９】[0029]

【発明の効果】以上のように、本発明は、受信した音声
データに対する周囲騒音のマスキング補償を行うと同時
にユーザの聴力劣化を考慮した音声強調処理を行うこと
によりマスキングの影響と聴力劣化の双方を補った明瞭
な音声を得ることが可能となる音声再生装置を実現でき
るものである。As described above, according to the present invention, the masking compensation of ambient noise is performed on the received voice data, and at the same time, the voice emphasizing process considering the deterioration of the hearing ability of the user is performed. Thus, it is possible to realize a sound reproducing device capable of obtaining clear sound supplementing the above.

[Brief description of drawings]

【図１】本発明の実施の形態１における音声再生システ
ムのブロック図FIG. 1 is a block diagram of an audio reproduction system according to a first embodiment of the present invention.

【図２】本発明の実施の形態１における音声再生システ
ムのブロック図FIG. 2 is a block diagram of the audio reproduction system according to the first embodiment of the present invention.

【図３】音声強調処理部の内部ブロック図FIG. 3 is an internal block diagram of a voice enhancement processing unit.

【図４】ダイナミックレンジ圧縮処理の入出力特性図FIG. 4 is an input / output characteristic diagram of dynamic range compression processing.

【図５】本発明の実施の形態２における音声再生システ
ムのブロック図FIG. 5 is a block diagram of an audio reproduction system according to a second embodiment of the present invention.

【図６】従来の音声再生装置のブロック図FIG. 6 is a block diagram of a conventional audio reproduction device.

[Explanation of symbols]

１０音声再生端末２０プロファイルサーバ３０音声送出装置１騒音推定部２マスキング量推定部３音声強調処理部４再生制御部５音声聴取条件設定部６端末聴取条件送出部 10 Audio playback terminal 20 profile server 30 Audio transmitter 1 Noise estimation section 2 Masking amount estimation unit 3 Speech enhancement processor 4 Playback control section 5 Audio listening condition setting section 6 Terminal listening condition sending unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者田川潤一大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者松岡智美大阪府門真市大字門真1006番地松下電器産業株式会社内 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Junichi Tagawa 1006 Kadoma, Kadoma-shi, Osaka Matsushita Electric Sangyo Co., Ltd. (72) Inventor Tomomi Matsuoka 1006 Kadoma, Kadoma-shi, Osaka Matsushita Electric Sangyo Co., Ltd.

Claims

[Claims]

1. A profile server for transmitting profile information of a registered user in response to an access request via a communication network, a transmission device for transmitting audio information and AV information including audio, and a received audio signal. A voice reproduction device that reproduces a voice that has been subjected to a predetermined voice enhancement process by using the profile information of the user acquired via the communication network while estimating the masking amount based on the estimated value of environmental noise Playback system.

2. A profile server for transmitting profile information of a registered user in response to an access request via a communication network, an estimated value of environmental noise obtained from a receiving terminal side, and an AV including voice information and voice to be transmitted. A transmitting device that estimates the masking amount with information and transmits the masking amount at the same time, the received voice information and AV information including the voice, the masking amount estimated value, and the profile information of the user acquired via the communication network An audio reproduction system comprising: an audio reproduction device for reproducing the sound subjected to the emphasis processing.

3. A noise estimation means for estimating environmental noise, a voice enhancement processing means for performing predetermined enhancement processing on input speech by signal processing, and masking for a voice signal received based on an estimated value of environmental noise. Masking amount estimating means for estimating the amount, voice listening condition setting means for obtaining the user's voice listening condition from the profile information of the user acquired via the communication network,
A voice reproduction apparatus comprising: a reproduction control unit that controls a voice enhancement processing unit based on the masking estimated value and the voice listening condition.

4. A noise estimation means for estimating environmental noise, a voice enhancement processing means for performing predetermined enhancement processing by signal processing on an input voice, and a reception condition on a terminal side based on an estimated value of environmental noise. A terminal listening condition transmitting means for transmitting to the transmitting side, a voice listening condition setting means for obtaining the user's voice listening condition from the profile information of the user acquired via the communication network, and a transmitter side for obtaining the terminal listening condition and the transmitted voice. A voice reproduction apparatus comprising: a reproduction control unit that controls a voice enhancement processing unit based on a masking amount estimated value and the voice listening condition.

5. The reproduction control means controls the voice enhancement processing means on the basis of both the masking estimation value and the voice listening condition of the user, as well as the sound quality adjustment value reflecting the sound quality preference of the user. The audio reproducing device according to claim 3 or 4.