JP2011205389A

JP2011205389A - Acoustic reproducing device having sound quality automatic adjusting function and hands-free telephone incorporated with the same

Info

Publication number: JP2011205389A
Application number: JP2010070498A
Authority: JP
Inventors: Takeshi Honma; 健本間; Kenji Nagamatsu; 健司永松; Ryota Kamoshita; 亮太鴨志田; Yusuke Fujita; 雄介藤田; Yasunari Obuchi; 康成大淵; Koichi Fujimoto; 幸一藤本
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2010-03-25
Filing date: 2010-03-25
Publication date: 2011-10-13
Anticipated expiration: 2030-03-25
Also published as: JP5292345B2

Abstract

PROBLEM TO BE SOLVED: To make the sound reproduction of the most suitable sound quality to a receiver to be automatically executed even though an environment of a circumference is varied in each use in an acoustic reproducing device such as a hands-free telephone etc.SOLUTION: Acoustic characteristics of a transmission voice which is easy to be heard by the receiver are previously stored in a transmission voice feature storing unit 212, then a filter is formed by a filter formation unit 250 based on acoustic characteristics of a selected transmission voice and an acoustic transfer function from a speaker selected by a vehicle transfer function storing unit 222 to the receiver when the voice is really reproduced, and the sound quality adjustment of transmission voice is performed so as to make the voice to be heard easily by the receiver. Furthermore, the acoustic characteristics of the transmission voice which is easy for hearing by a user are automatically learned from the transmission voice at a time when the user usually performs conversation by a telephone, and then the information is utilized for adjusting the sound quality of the acoustic reproducing device.

Description

本発明は、ハンズフリー電話装置などの音響再生装置に関し、特に音質自動調整機能を備えた音響再生装置に関する。 The present invention relates to a sound reproduction device such as a hands-free telephone device, and more particularly to a sound reproduction device having an automatic sound quality adjustment function.

自動車内において、手で携帯電話を持たずとも電話ができるハンズフリー電話装置が市販されている。ハンズフリー電話装置の音質は、出荷前に開発メーカが設計して出荷する場合がほとんどである。電話の音質については、ユーザによって聞きやすい音質が異なる。そのため、ユーザが最適と感じる音質調整を行えることが望ましい。これを実現するには、ハンズフリー電話装置の出力音に対して、ユーザ自身がイコライザなどの設定をできるように設計すれば実現できる。 There is a commercially available hands-free telephone device that can make a phone call without holding a mobile phone by hand. The sound quality of hands-free telephone devices is mostly designed and shipped by development manufacturers before shipment. As for the sound quality of the telephone, the sound quality that is easy to hear varies depending on the user. Therefore, it is desirable to be able to adjust the sound quality that the user feels optimal. This can be realized by designing the user so that the user can set the equalizer or the like for the output sound of the hands-free telephone device.

しかし、同一のユーザが同一のハンズフリー電話装置を使用する場合においても、その場面や環境に応じて、車内における音質が変化する。そのため、常に最適な音質を保つためには、ユーザ自身がその都度設定を変更する必要があり、操作の手間がかかる。また、運転中は路面との摩擦音、エンジンの駆動音、エアコンの音などにより、頻繁にその音響環境が変化するため、その都度ユーザ自身が操作を行って調整をすることは困難である。
また、自家用車を持たず、カーシェアリングやカーレンタルなどのサービスを利用して、同一ユーザがさまざまな自動車に乗る機会が今後増えていくと考えられる。その場合において、車種やハンズフリー電話装置が変更するたびに、その音質設定を行うことは手間がかかる。 However, even when the same user uses the same hands-free telephone device, the sound quality in the vehicle changes depending on the scene and environment. Therefore, in order to always maintain the optimum sound quality, it is necessary for the user himself to change the setting each time, which takes time and effort. In addition, during operation, the acoustic environment frequently changes due to frictional noise with the road surface, engine driving sound, air conditioner sound, etc., and it is difficult for the user to perform adjustments by operating each time.
In addition, it is considered that there will be more opportunities for the same user to ride various cars using services such as car sharing and car rental without own cars. In that case, it takes time and effort to set the sound quality each time the vehicle type or hands-free telephone device changes.

特許文献1には、携帯電話において使用する補聴器に関する技術が開示されている。この文献によれば、使用者の携帯電話機器の特定、使用者が会話する騒音環境を判断し、そこから、使用者にとって最適な補聴のためのフィルタを決定する方法が開示されている。 Patent Document 1 discloses a technique related to a hearing aid used in a mobile phone. According to this document, there is disclosed a method for determining a user's mobile phone device, determining a noise environment in which the user has a conversation, and determining a filter for hearing aid optimal for the user.

特開２００２−２２３５００号公報JP 2002-223500 A

特許文献1で開示される技術では、携帯電話のような手持ちの電話が想定されており、ハンズフリー電話のような、スピーカ、マイクとユーザの間に距離がある場合については考慮されていなかった。また、ユーザの好みの音響環境を調べる方法は、開示されていない。 In the technology disclosed in Patent Document 1, a hand-held phone such as a mobile phone is assumed, and a case where there is a distance between a speaker and a microphone and a user, such as a hands-free phone, has not been considered. . Moreover, the method of investigating a user's favorite acoustic environment is not disclosed.

本発明は、ハンズフリー電話装置などの音響再生装置において、ユーザ（受話者）にとって最適な音質による音再生を、利用の度に、周囲の環境などが変化しても、自動的に実施できるようにすることを目的とする。 According to the present invention, in an audio reproduction device such as a hands-free telephone device, sound reproduction with an optimum sound quality for a user (listener) can be automatically performed even if the surrounding environment changes every time it is used. The purpose is to.

本発明では、ハンズフリー電話装置などの音響再生装置において、ユーザ（受話者）が聞きやすい送話者からの送話音声の音響特性(送話音声特徴）を保存しておき、実際に音声を再生する際には、選択した送話音声の音響特性とスピーカから受話者までの音響伝達関数に基づいて、ユーザが聞きやすい音声となるような音質調整を行う。さらに、ユーザが普段電話によって会話する際の送話者からの送話音声より、ユーザに聞きやすい送話音声の音響特性を自動的に学習し、この情報を音響再生装置の音質調整に利用する。 In the present invention, in a sound reproduction device such as a hands-free telephone device, the acoustic characteristics (speech speech characteristics) of the transmitted speech from the talker that is easy for the user (listener) to hear are stored, and the actual speech is stored. At the time of reproduction, sound quality adjustment is performed based on the acoustic characteristics of the selected transmitted voice and the acoustic transfer function from the speaker to the listener so that the user can easily hear the voice. Furthermore, the user automatically learns the acoustic characteristics of the transmitted voice that is easy to hear from the transmitted voice from the talker when the user usually talks on the phone, and uses this information to adjust the sound quality of the sound playback device. .

本発明の音響再生装置は、1ないし複数の送話音声の音響特性を保存する送話音声特徴保存部と、送話者または送話者の属性を特定する送話者特定部と、前記送話音声特徴保存部に保存されている送話音声の音響特性から、前記送話者特定部で特定された送話者の情報に基づき、送話音声の音響特性を選択し取得する送話音声特徴選択部と、送話音声を再生するスピーカから受話者の位置までの1ないし複数の伝達関数を保存する伝達関数保存部と、前記伝達関数保存部に保存されている伝達関数から、少なくとも受話者の位置情報を含む情報に基づき、伝達関数を選択し取得する伝達関数選択部と、前記送話音声特徴選択部が選択した送話音声の音響特性と、前記伝達関数選択部が選択した伝達関数に基づき、受話者の位置における再生音の音響特性が、前記送話音声特徴選択部が選択した送話音声の音響特性に近くなるフィルタを作成するフィルタ作成部と、前記フィルタ作成部が作成したフィルタに基づいたフィルタ処理を再生音に行うフィルタ処理部とを備えることを特徴とするものである。 The acoustic reproduction device of the present invention includes a transmission voice feature storage unit that stores the acoustic characteristics of one or a plurality of transmission voices, a speaker identification unit that identifies a speaker or a speaker attribute, and the transmitter. Spoken voice that selects and acquires the acoustic characteristics of the transmitted voice from the acoustic characteristics of the transmitted voice stored in the spoken voice feature storage unit based on the information of the speaker specified by the speaker specifying unit The feature selection unit, a transfer function storage unit that stores one or more transfer functions from the speaker that reproduces the transmitted voice to the position of the listener, and a transfer function that is stored in the transfer function storage unit A transfer function selection unit that selects and acquires a transfer function based on information including the position information of the person, the acoustic characteristics of the transmitted voice selected by the transmission voice feature selection unit, and the transfer selected by the transfer function selection unit Based on the function, the playback sound at the listener's location A filter creation unit that creates a filter whose echo characteristics are close to the acoustic characteristics of the transmitted voice selected by the transmitted voice feature selection unit, and a filter process based on the filter created by the filter creation unit is performed on the reproduced sound And a filter processing unit.

また、本発明の音響再生装置は、更に学習手段を備え、該学習手段は、送話音声の音質調整を行う送話音声調整部と、前記送話音声調整部で音質調整された送話音声の音響特性を解析する送話音声解析部と、前記送話音声解析部が出力した送話音声の音響特性を保存する音声特徴保存部と、前記送話音声解析部の出力に基づき、送話音声の音響特性を学習し、学習結果を前記音声特徴保存部に保存する音声特徴学習部とを備え、前記学習手段の音声特徴保存部の送話音声の音響特性を、前記送話音声特徴保存部の送話音声の音響特性として用いることを特徴とするものである。 The sound reproducing apparatus of the present invention further includes a learning unit, and the learning unit adjusts the sound quality of the transmitted voice, and the transmitted voice whose sound quality is adjusted by the transmitted voice adjusting unit. Based on the output of the transmitted voice analysis unit, the voice characteristic storage unit that stores the acoustic characteristics of the transmitted voice output by the transmitted voice analysis unit, and the output of the transmitted voice analysis unit A speech feature learning unit that learns the acoustic characteristics of speech and stores the learning results in the speech feature storage unit, and stores the transmitted speech acoustic characteristics of the transmitted speech in the speech feature storage unit of the learning means. It is used as an acoustic characteristic of the transmitted voice of a part.

本発明により、ハンズフリー電話などの音響再生装置を使用するユーザにとって、最適な音質による音再生を、少ない手間で実施することができる。 According to the present invention, it is possible to perform sound reproduction with optimum sound quality for a user who uses a sound reproduction device such as a hands-free telephone with little effort.

本発明の実施例１のハンズフリー電話装置の全体構成を示す図である。It is a figure which shows the whole structure of the hands-free telephone apparatus of Example 1 of this invention. 本発明の実施例１のフィルタ設計部の構成を示す図である。It is a figure which shows the structure of the filter design part of Example 1 of this invention. 本発明の実施例１の送話音声特徴選択部の処理のフローチャートである。It is a flowchart of the process of the transmission voice characteristic selection part of Example 1 of this invention. 本発明の実施例１の車両伝達関数選定部の処理のフローチャートである。It is a flowchart of a process of the vehicle transfer function selection part of Example 1 of this invention. 本発明の実施例１の車内音響環境推定部の処理のフローチャートである。It is a flowchart of a process of the vehicle interior acoustic environment estimation part of Example 1 of this invention. 本発明の実施例１のフィルタ作成部の処理のフローチャートである。It is a flowchart of a process of the filter preparation part of Example 1 of this invention. 本発明の実施例１のデータを説明する図である。It is a figure explaining the data of Example 1 of this invention. 本発明の実施例２の学習手段の全体構成を示す図である。It is a figure which shows the whole structure of the learning means of Example 2 of this invention. 本発明の実施例２の送話音声特徴学習部の構成を示す図である。It is a figure which shows the structure of the transmission voice characteristic learning part of Example 2 of this invention. 本発明の実施例２の判定部の処理のフローチャートである。It is a flowchart of the process of the determination part of Example 2 of this invention. 本発明の実施例２の音声特徴学習部の処理のフローチャートである。It is a flowchart of a process of the audio | voice feature learning part of Example 2 of this invention.

以下、車両におけるハンズフリー電話装置を例として、本発明の実施の形態を添付図面に基づいて説明する。なお、本発明における「受話者」、「送話者」、「送話音声」、「受話音声」の用語を、ハンズフリー電話装置を例に説明すると、「受話者」とはハンズフリー電話装置のユーザをいい、「送話者」とは電話の相手方であって受話者に送話する者をいう。そして、「送話音声」とは送話者が発話してユーザである受話者が受ける音声をいい、「受話音声」とはユーザである受話者が発話して送話者に送る音声をいう。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings, taking a hands-free telephone device in a vehicle as an example. The terms “receiver”, “speaker”, “transmitted voice”, and “received voice” in the present invention will be described by taking a hands-free telephone device as an example. The “speaker” refers to a person who is the other party of the telephone and who transmits to the receiver. "Speech" means the voice that the talker speaks and receives by the user's receiver, and "received voice" means the voice that the user's talker speaks and sends to the talker. .

実施例1として、音質の自動調整機能を備えるハンズフリー電話装置を説明する。 As a first embodiment, a hands-free telephone device having an automatic sound quality adjustment function will be described.

［全体構成］
図1は、本発明における音質の調整機能をもつハンズフリー電話装置の全体構成を示す図である。電話回線網110は、一般の固定電話や携帯電話が接続し互いに電話ができる一般的な回線網である。携帯電話120は、ハンズフリー電話に使用する携帯電話である。 [overall structure]
FIG. 1 is a diagram showing an overall configuration of a hands-free telephone device having a sound quality adjusting function according to the present invention. The telephone line network 110 is a general line network in which general landline phones and mobile phones can be connected to make a call. The mobile phone 120 is a mobile phone used for a hands-free phone.

ハンズフリー電話装置125は、本発明で開示する音質調整機能を持つハンズフリー電話を行うためのシステムである。
制御部130は、ハンズフリー電話装置に一般に備わる制御を行う。具体的には、携帯電話120との通信の処理、送話音声・受話音声の制御、各種モジュールの制御である。また、携帯電話とインターネットとの接続装置をもち、サーバとの情報の授受を行えるようにしてもよい。制御部130において行われる携帯電話120とハンズフリー電話装置125との通信には、現在市販されている装置に備わっているものを使用することができる。すなわち、有線で接続してもよいし、Bluetooth規格により規定される方法で無線で通信しても良い。
スピーカ190は、ユーザに対して送話音を聞かせるために、音を再生する。
マイク180は、ユーザが発話する音声を採取する。 The hands-free telephone device 125 is a system for making a hands-free telephone having a sound quality adjustment function disclosed in the present invention.
The control unit 130 performs control generally provided in the hands-free telephone device. Specifically, it is processing of communication with the mobile phone 120, control of transmitted voice / received voice, and control of various modules. Further, it may have a connection device between a mobile phone and the Internet so that information can be exchanged with a server. For communication between the mobile phone 120 and the hands-free telephone device 125 performed in the control unit 130, a device provided in a commercially available device can be used. That is, it may be connected by wire or may communicate wirelessly by a method defined by the Bluetooth standard.
The speaker 190 reproduces sound in order to let the user hear the transmitted sound.
The microphone 180 collects the voice uttered by the user.

送話音声調整部134は、ハンズフリー電話のユーザ（受話者）に聞かせる音（送話音）の音質調整を行なう。具体的には、周波数帯域ごとの音圧レベルを変更するイコライジング処理などが行われる。
受話音声調整部132は、ハンズフリー電話のユーザが発話した音声（受話音）の音質調整を行う。これには、周波数帯域ごとのイコライジングのほかに、走行騒音を低減するノイズキャンセリング処理などが含まれる。
イコライジングの手法としては、FIRフィルタ、 IIRフィルタといったディジタルフィルタを用いる方法、FFT分析に基づく方法が知られている。また、ノイズキャンセリングの手法としては、スペクトルサブトラクション法やMMSE STSA法が知られている。これらの方法は、公知であるので、説明は省略する。 The transmitted voice adjustment unit 134 adjusts the sound quality of the sound (transmitted sound) to be heard by the user (receiver) of the hands-free phone. Specifically, an equalizing process for changing the sound pressure level for each frequency band is performed.
The received voice adjustment unit 132 adjusts the sound quality of the voice (received sound) uttered by the user of the hands-free phone. This includes noise canceling processing for reducing running noise in addition to equalizing for each frequency band.
As equalizing methods, methods using digital filters such as FIR filters and IIR filters and methods based on FFT analysis are known. As a noise canceling method, a spectral subtraction method and an MMSE STSA method are known. Since these methods are well-known, description is abbreviate | omitted.

エコーキャンセル部140は、マイク180がスピーカ190から再生される音を集音することによって起こるエコーを消去する処理を行う。この方法には、公知の文献の以下の方法を使用することができる。
F.K. Soong、 A.M. Peterson:``Fast least-squares (LS) in the voice echo cancellation application、" Proc. ICASSP、 pp.1398-1403、 1982
フィルタ設計部160は、ユーザに聞かせる音声に対して音質調整を行なうための周波数帯域ごとの増幅率を決定する。この処理については、後ほど説明する。
フィルタ処理部170は、フィルタ設計部160において設計された各周波数帯域の増幅率に基づいて、送話音声調整部134が出力した送話音に対する信号処理を行う。この処理については、後ほど説明する。
車両情報取得部150は、車両における情報を取得する。この説明は後に行う。 The echo cancellation unit 140 performs a process of erasing an echo that occurs when the microphone 180 collects sound reproduced from the speaker 190. For this method, the following methods in known literature can be used.
FK Soong, AM Peterson: `` Fast least-squares (LS) in the voice echo cancellation application, '' Proc. ICASSP, pp.1398-1403, 1982
The filter design unit 160 determines an amplification factor for each frequency band for performing sound quality adjustment on the voice to be heard by the user. This process will be described later.
Based on the amplification factor of each frequency band designed by the filter design unit 160, the filter processing unit 170 performs signal processing on the transmission sound output from the transmission voice adjustment unit 134. This process will be described later.
The vehicle information acquisition unit 150 acquires information on the vehicle. This will be described later.

［フィルタ設計部］
図2は、フィルタ設計部160の詳細を示した図である。 [Filter design department]
FIG. 2 is a diagram showing the details of the filter design unit 160.

送話音声解析部240は、送話音声調整部134が出力した音声に対して、周波数-パワー特性を計算する。
本実施例では、周波数-パワー特性の計算がいくつかの方法で行われる。周波数領域におけるパワーの計算方法は、いくつかの方法が知られており、そういった公知の方法を使用することができる。
本実施例では、FFTを使った方法を用いる。第1に、入力された音声波形から、一定時間長の音声を切り出す。この時間長はフレーム長と呼ばれ、10ms程度の値がよく用いられる。第2に、切り出した音声に対して、Hanning窓またはHamming窓といったサイドローブを抑制する時間窓を掛ける。第3に、時間窓を掛けた波形に対して、FFT演算を行う。これにより、時間領域の波形は、周波数領域における実部と虚部の値に変換される。最後に、FFT演算に得られたそれぞれの値に対して大きさの2乗値を求める。これにより、各時刻における周波数-パワー特性を得ることができる。この処理を、規定のフレーム間隔（ここでは、フレーム長の1/4とする）において、逐次繰り返す。
なお、このFFTによる方法の代替としては、フレームごとの音声波形に対してLPC分析を行うことにより、周波数スペクトル包絡を求める方法を用いることができる。 The transmitted voice analysis unit 240 calculates a frequency-power characteristic for the voice output from the transmitted voice adjustment unit 134.
In this embodiment, the frequency-power characteristics are calculated by several methods. Several methods are known for calculating the power in the frequency domain, and such known methods can be used.
In this embodiment, a method using FFT is used. First, a certain length of speech is cut out from the input speech waveform. This time length is called a frame length, and a value of about 10 ms is often used. Secondly, a time window for suppressing side lobes such as Hanning window or Hamming window is applied to the extracted voice. Third, the FFT operation is performed on the waveform multiplied by the time window. As a result, the waveform in the time domain is converted into values of the real part and the imaginary part in the frequency domain. Finally, a square value of the magnitude is obtained for each value obtained by the FFT calculation. Thereby, the frequency-power characteristic at each time can be obtained. This process is sequentially repeated at a prescribed frame interval (here, ¼ of the frame length).
As an alternative to this FFT method, a method of obtaining a frequency spectrum envelope by performing LPC analysis on a speech waveform for each frame can be used.

また、周波数-パワー特性における周波数軸の間隔に関して説明する。単純にFFTをした場合には、サンプリング周期の逆数の周波数の分解能で周波数パワー特性が得られる。しかし、本実施例では、ヒトの聴覚上最適に聴取される音声への変換を目指すため、聴覚を考慮した周波数帯域ごとにパワーを求めることとする。こういった周波数帯域の定義としては、critical band filterやbark band filterなどが知られている。こういった公知の周波数帯域の定義にのっとり、FFT演算により求めたパワーを、所定の重み付けの後に和を取ることにより、各周波数帯域におけるパワーとして使用する。 Further, the frequency axis interval in the frequency-power characteristics will be described. When FFT is simply performed, frequency power characteristics can be obtained with a resolution of the frequency that is the reciprocal of the sampling period. However, in this embodiment, power is obtained for each frequency band in consideration of hearing in order to convert it into sound that is optimally heard on human hearing. Known definitions of such frequency bands include critical band filter and bark band filter. In accordance with these known frequency band definitions, the power obtained by the FFT operation is used as the power in each frequency band by taking the sum after a predetermined weighting.

［送話音声特徴の取得］
送話者特定部210、送話音声特徴選択部211では、送話者が誰であるかを特定し、さらに、その送話者に対応する音声特徴を取得する。この流れを、図２と図３のフローチャートに従って説明する。
図３のフローチャートは、電話の着信があった際に開始される。また、着信のたびに1度行うことにより終了する処理である。 [Acquisition of transmitted voice characteristics]
The speaker identification unit 210 and the transmission voice feature selection unit 211 specify who the speaker is, and further acquire a voice feature corresponding to the speaker. This flow will be described with reference to the flowcharts of FIGS.
The flowchart of FIG. 3 is started when an incoming call is received. In addition, the process is terminated by performing it once for each incoming call.

ステップ310では、送話者特定部210により、電話をかけている送話者が誰であるかを特定する。この方法としては、電話において一般的に用いられる発信者の電話番号通知機能に基づいて、携帯電話に備わっている電話帳を参照した方法を用いることができる。また、後述する送話音声特徴保存部212には、さまざまな音声の特徴が保存されている。よって、その情報を用いた話者認識を行ってもよい。話者認識の方法としては、音声の長時間スペクトルによる認識方法や、ケプストラムの動的特徴に基づく認識方法が知られており、以下の文献に詳しく記載されている。
古井貞煕：「音声情報処理」、森北出版、1998
これらの方法では、送話音声調整部134が出力した送話音に対して、送話音声解析部240において周波数領域における分析を行って特徴量を算出し、これと送話音声特徴保存部212に保存されている各話者の音声特徴量との距離を計算することにより実装することができる。
また、送話者特定部210は、送話者の属性に関する情報のみを出力しても良い。たとえば、男性であるか女性であるか、年齢が何歳程度であるかといった、話者のいずれかの属性を示す情報を出力することが挙げられる。以降では、こういった送話者の属性のみを特定した場合においても、送話者の特定での実施と同様に行えるため、送話者が特定される場合についてのみ説明する。 In step 310, the speaker identification unit 210 identifies who is making the call. As this method, a method in which a telephone directory provided in a mobile phone is referred to based on a caller's telephone number notification function generally used in a telephone can be used. In addition, various voice features are stored in a transmitted voice feature storage unit 212 described later. Therefore, speaker recognition using the information may be performed. As a method for speaker recognition, a recognition method based on a long-time spectrum of speech and a recognition method based on dynamic features of a cepstrum are known, and are described in detail in the following documents.
Sadaaki Furui: “Speech Information Processing”, Morikita Publishing, 1998
In these methods, the transmitted voice output from the transmitted voice adjusting unit 134 is analyzed in the frequency domain by the transmitted voice analyzing unit 240 to calculate the feature amount, and the transmitted voice feature storing unit 212 It can be implemented by calculating the distance from each speaker's voice feature value stored in.
Further, the speaker identification unit 210 may output only information related to the attributes of the speaker. For example, it is possible to output information indicating any attribute of the speaker, such as whether the person is male or female, and how old the age is. In the following, even when only the attributes of such a speaker are specified, it can be performed in the same manner as in the case of specifying the speaker, so only the case where the speaker is specified will be described.

ステップ320では、ステップ310において送話者（または送話者の属性）が特定できたかによって分岐を行う。特定された場合には、ステップ330へ進む。
ステップ330では、送話音声特徴選択部211において、ステップ320で特定された送話者（または送話者の属性）に従い、送話者の音声特徴を検索する。検索する対象は、送話音声特徴保存部212に保存されているデータである。 In step 320, the process branches depending on whether or not the speaker (or the attribute of the speaker) has been identified in step 310. If specified, go to step 330.
In step 330, the transmitted voice feature selection unit 211 searches for the voice feature of the sender according to the sender (or the attribute of the sender) specified in step 320. The search target is data stored in the transmitted voice feature storage unit 212.

送話音声特徴保存部212の説明を行う。送話音声特徴保存部212には、送話者ごとのユーザにとって聞きやすい送話音声の特徴を保存する。送話者ごとのユーザにとって聞きやすい送話音声の特徴は、別途特定されたものであり、その方法はどのようなものでもよい。本実施形態では、後ほど、ユーザの普段の電話での会話から、送話者ごとのユーザにとって聞きやすい送話音声の特徴を学習する方法を説明する。
送話音声特徴保存部212での保存の形態については、図７の(A)において説明する。 The transmitted voice feature storage unit 212 will be described. The transmitted voice feature storage unit 212 stores transmitted voice features that are easy to hear for the user for each speaker. The feature of the transmitted voice that is easy to hear for the user for each speaker is specified separately, and any method may be used. In the present embodiment, a method for learning the features of transmitted voice that is easy to hear for each user by a user from a usual telephone conversation will be described later.
The form of storage in the transmitted voice feature storage unit 212 will be described with reference to FIG.

送話音声の特徴としては、特定した送話者の、各周波数帯域における、会話音声での下限音圧レベル、上限音圧レベル、平均音圧レベルのうち、1つないし複数を記憶するものとする。ここで書いた下限音圧レベル、上限音圧レベル、平均音圧レベルは、のちほどフィルタ作成部250において使われる。なお、ここで書く音圧レベルとは、音圧に変換される値であればどのような形態でもよく、インテンシティ、パワーでもよい。また、値の保持形態は、対数値でも線形値でも良いが、便宜上、デシベルを単位とする対数値として説明をする。
下限音圧レベルは、送話者の会話音声をユーザが電話で聞く際に起こりうる最小の音圧レベルであり、ユーザにとって聞きやすい音とするために補償しなければならない最下限の音圧レベルとして参照される。
上限音圧レベルは、送話者の会話音声をユーザが電話で聞く際に起こりうる最大の音圧レベルであり、ユーザにとって聞いていて不快にならない音圧レベルの上限値として参照される。
平均音圧レベルは、送話者の会話音声をユーザが電話で聞く際に起こりうる平均での音圧レベルであり、ユーザにとって聞きやすい音声である場合における平均の音圧レベルとして参照される。 As the characteristics of the transmitted voice, one or more of the lower limit sound pressure level, the upper limit sound pressure level, and the average sound pressure level in the conversation voice in each frequency band of the specified sender are stored. To do. The lower limit sound pressure level, upper limit sound pressure level, and average sound pressure level written here are used later in the filter creation unit 250. The sound pressure level written here may be in any form as long as it is a value converted into sound pressure, and may be intensity or power. The value holding form may be a logarithmic value or a linear value, but will be described as a logarithmic value in units of decibels for convenience.
The lower limit sound pressure level is the lowest sound pressure level that can occur when the user listens to the conversational voice of the talker over the phone, and the lowest sound pressure level that must be compensated for in order to make the sound easier for the user to hear Referred to as
The upper limit sound pressure level is the maximum sound pressure level that can occur when the user listens to the conversational voice of the talker by telephone, and is referred to as the upper limit value of the sound pressure level that is not uncomfortable when listening to the user.
The average sound pressure level is an average sound pressure level that can occur when the user listens to the conversational voice of the talker by telephone, and is referred to as the average sound pressure level when the voice is easy to hear for the user.

送話音声特徴は、上限音圧レベル、下限音圧レベル、平均音圧レベルの形式に限らず、別の形式で保持しておくことでも良い。たとえば、各周波数帯域の音圧レベルの変動を任意の確率分布関数（たとえば、正規分布やベータ分布など）にあてはめ、その分布関数のパラメタとして記録しておいても良い。正規分布を仮定した場合には、パラメタとしては、平均値と分散を記憶しておく。そして、さきほど直接に記憶していた上限音圧レベル、下限音圧レベルの代替として、分布関数から、平均値から単位標準偏差だけ離れた値（例．-3σ、 +3σの値）を、それぞれ上限音圧レベル、下限音圧レベルとして使用する。また、平均音圧レベルは、分布関数の平均値または期待値をそのまま利用することができる。 The transmitted voice feature is not limited to the format of the upper limit sound pressure level, the lower limit sound pressure level, and the average sound pressure level, but may be held in another format. For example, the variation of the sound pressure level in each frequency band may be applied to an arbitrary probability distribution function (for example, normal distribution, beta distribution, etc.) and recorded as a parameter of the distribution function. If a normal distribution is assumed, the average value and variance are stored as parameters. Then, as an alternative to the upper limit sound pressure level and the lower limit sound pressure level memorized directly earlier, values separated from the average value by unit standard deviation (eg, values of -3σ, + 3σ) from the distribution function, respectively Used as the upper and lower sound pressure levels. In addition, the average value or expected value of the distribution function can be used as it is for the average sound pressure level.

ステップ330における送話音声特徴選択部211の処理では、送話者特定部210が特定した送話者情報に基づき、送話音声特徴保存部212から適する送話音声特徴を検索する。
また、送話音声特徴保存部212の情報をつかわなくとも、同形式の情報を制御部130を通じて、携帯電話に保存されている送話音声特徴を利用する、サーバからダウンロードした送話音声を利用することでもよい。この場合、ステップ330では、送話音声特徴選択部211は、制御部130を通じて送話者の情報を要求する。 In the processing of the transmitted voice feature selection unit 211 in step 330, a suitable transmitted voice feature is searched from the transmitted voice feature storage unit 212 based on the speaker information specified by the speaker specifying unit 210.
In addition, even if the information of the transmitted voice feature storage unit 212 is not used, the same type of information is used through the control unit 130, the transmitted voice feature stored in the mobile phone is used, and the transmitted voice downloaded from the server is used. You may do it. In this case, in step 330, the transmitted voice feature selection unit 211 requests information on the transmitter through the control unit 130.

ステップ340では、検索対象となった送話音声特徴が存在したか否かを判定する。存在しないと判定するケースとしては、送話音声特徴部212に、電話をかけてきたがユーザに対する送話音声特徴が保存されていない場合が挙げられる。また、後述する学習手段と組み合わせて使用する場合には、送話者の音声特徴が送話音声特徴部212に保存はされているものの、その量が少ない場合が挙げられる。すなわち、送話音声特徴として蓄積されているデータ量が小さい場合には、そこから算出される送話音声特徴の信頼性が低いと判断し、検索対象となった送話音声特徴が存在しないと判断する。存在した場合には、ステップ350へ進む。存在しないと判断された場合には、ステップ355へ進む。 In step 340, it is determined whether or not there is a transmitted voice feature to be searched. As a case where it is determined that the call does not exist, a case where a call is made to the transmission voice characteristic unit 212 but a transmission voice characteristic for the user is not stored can be cited. Further, when used in combination with a learning means to be described later, there is a case where the voice feature of the sender is stored in the sent voice feature unit 212 but the amount is small. In other words, if the amount of data stored as a transmitted voice feature is small, it is determined that the reliability of the transmitted voice feature calculated from the amount is low, and there is no transmitted voice feature that is the search target to decide. If it exists, go to step 350. If it is determined that it does not exist, the process proceeds to step 355.

ステップ350では、送話音声特徴選択部211において、検索対象となった送話音声特徴を取得する。ここでは、ステップ330において送話音声特徴保存部212を検索できたならば、送話音声特徴保存部212より検索された送話音声特徴を取得する。また、制御部130を通じサーバまたは携帯電話へ検索要求したのであれば、サーバまたは携帯電話から送話音声特徴が転送された送話音声特徴を取得する。また、サーバまたは携帯電話から転送されたデータは、送話音声特徴保存部212に随時蓄積し、再度利用する際には送話音声特徴保存部212からデータを得る方法でもよい。
また、ステップ350で取得された送話音声特徴は、ステップ357において、フィルタ作成部250へ送られる。 In step 350, the transmission voice feature selection unit 211 acquires the transmission voice feature that is the search target. Here, if the transmission voice feature storage unit 212 can be searched in step 330, the transmission voice feature searched by the transmission voice feature storage unit 212 is acquired. Further, if a search request is made to the server or mobile phone through the control unit 130, the transmitted voice feature to which the transmitted voice feature is transferred from the server or the mobile phone is acquired. The data transferred from the server or the mobile phone may be stored in the transmission voice feature storage unit 212 as needed, and the data may be obtained from the transmission voice feature storage unit 212 when used again.
Further, the transmitted voice feature acquired in step 350 is sent to the filter creation unit 250 in step 357.

また、ステップ320において送話者が特定できなかった場合、ステップ340において検索対象の送話音声が検索できなかった場合には、ステップ355へ進む。ステップ355では、どのような場合においても選ぶことができるデフォルトの送話音声特徴を取得する。このデフォルトの送話音声特徴も、送話音声特徴保存部212、または、サーバや携帯電話に保存されているものとする。
このデフォルトの送話音声特徴は、どのような人が発話したとしても、ユーザが聞きやすいと感じる周波数と音圧レベルの関係であることが望ましい。そのようなものとして、聴力検査で測定されるオージオグラムに基づく値を使う方法が考えられる。すなわち、聴力検査における各周波数音に対する聴覚閾値を下限音圧レベルに対応させ、不快閾値を上限音圧レベルに対応させる。なお、このデフォルトの特性は、送話者の声質が考慮されていないため、真の聞きやすい音声とは若干異なるが、ユーザの聞きやすい音の周波数特性を考慮した特性を持っておくことによって、最低限の音質に補償するものである。
ステップ355で取得された送話音声特徴は、ステップ357において、フィルタ作成部250へ送られる。
以上により、送話者特定部210、送話音声特徴選択部211の処理は終了する。 If the sender cannot be identified in step 320, or if the transmitted voice to be searched cannot be searched in step 340, the process proceeds to step 355. In step 355, a default transmitted voice feature that can be selected in any case is obtained. It is assumed that this default transmission voice feature is also stored in the transmission voice feature storage unit 212 or a server or a mobile phone.
It is desirable that the default transmitted voice feature is a relationship between the frequency and the sound pressure level that the user feels easy to hear no matter what person speaks. As such, a method using a value based on an audiogram measured by an audiometry is conceivable. That is, the auditory threshold value for each frequency sound in the hearing test is made to correspond to the lower limit sound pressure level, and the unpleasant threshold value is made to correspond to the upper limit sound pressure level. Note that this default characteristic does not take into account the voice quality of the sender, so it is slightly different from the true easy-to-hear sound, but by having a characteristic that considers the frequency characteristic of the user's easy-to-hear sound, This is to compensate for the minimum sound quality.
The transmitted voice feature acquired in step 355 is sent to the filter creation unit 250 in step 357.
Thus, the processing of the transmitter identification unit 210 and the transmitted voice feature selection unit 211 ends.

［車両情報の取得］
車両情報特定部220、車両伝達関数選定部221では、ユーザが車両内においてハンズフリー電話の音声を聞く環境における、スピーカからユーザ(受話者）の頭部までの伝達関数を得る。この処理を、図４のフローチャートによって説明する。
図４のフローチャートは、電話の着信があった際に開始される。また、原則、着信のたびに1度行うことにより終了する処理であるが、通話中に過度な車両情報の変更があった場合には、その都度更新してもよい。 [Obtain vehicle information]
The vehicle information identification unit 220 and the vehicle transfer function selection unit 221 obtain a transfer function from the speaker to the user's (listener's) head in an environment where the user listens to a hands-free telephone voice in the vehicle. This process will be described with reference to the flowchart of FIG.
The flowchart of FIG. 4 is started when an incoming call is received. In principle, the process is terminated by performing it once for each incoming call. However, if there is an excessive change in vehicle information during a call, it may be updated each time.

ステップ460では、車両情報特定部220の処理によって、車両の情報を特定する。この処理の説明のために、車両情報取得部150、車両伝達関数保存部222の説明を行う。
車両情報取得部150では、車内におけるさまざまな情報を感知する。この情報の種類は、特定のものに限定されないが、以下の情報が挙げられる。まず、温度、湿度を感知する。また、同乗者が乗っている位置を特定するため、各座席のシートの圧力の情報や、車内に設置したカメラの情報を取得する。また、ユーザの頭部位置を特定するために、シートのヘッドレストの高さ情報、シートの前後位置情報、角度情報を取得する。 In step 460, vehicle information is specified by the processing of the vehicle information specifying unit 220. In order to describe this process, the vehicle information acquisition unit 150 and the vehicle transfer function storage unit 222 will be described.
The vehicle information acquisition unit 150 senses various information in the vehicle. The type of this information is not limited to a specific type, but includes the following information. First, temperature and humidity are sensed. In addition, in order to specify the position where the passenger is riding, information on the pressure of the seat of each seat and information on the camera installed in the vehicle are acquired. In addition, in order to specify the user's head position, the height information of the headrest of the seat, the front / rear position information of the seat, and the angle information are acquired.

車両伝達関数保存部222には、車両におけるさまざまな位置の、さまざまな条件下における、スピーカからの音響伝達関数を保存する（これを、車両伝達関数と呼ぶ）。車両伝達関数は、公知のインパルス応答測定法を使用して測定することができる。また、車両伝達関数を求める対象となる位置としては、ユーザがハンズフリー電話を使用する際の頭部の位置に関して、想定される複数の位置において測定をしておく。その他の条件としては、温度、湿度、同乗者の位置といった条件が変化した場合において、車両伝達関数を測定しておく。
図７の(B)に、車両伝達関数保存部222に保存されている車両伝達関数の1つの例を示す。車両伝達関数は、周波数と各周波数帯域における音エネルギの伝達率（ゲイン）との関係で保存しておく。車両伝達関数は、同一の条件であっても、さまざまな外部の要因により変動するため、最大、最小、平均のそれぞれのゲインのいずれか1つ以上を保存しておくものとする。あるいは、確率分布関数のパラメタという形で、各周波数帯域のゲインの変動範囲を保存しておくという形態でも良い。 The vehicle transfer function storage unit 222 stores an acoustic transfer function from the speaker at various positions in the vehicle under various conditions (this is called a vehicle transfer function). The vehicle transfer function can be measured using known impulse response measurement methods. In addition, as positions for which the vehicle transfer function is obtained, measurements are made at a plurality of assumed positions with respect to the position of the head when the user uses the hands-free telephone. As other conditions, the vehicle transfer function is measured when conditions such as temperature, humidity, and passenger position change.
FIG. 7B shows an example of the vehicle transfer function stored in the vehicle transfer function storage unit 222. The vehicle transfer function is stored in relation to the frequency and the transmission rate (gain) of sound energy in each frequency band. Since the vehicle transfer function fluctuates due to various external factors even under the same conditions, one or more of the maximum, minimum, and average gains are stored. Alternatively, the gain fluctuation range of each frequency band may be stored in the form of a probability distribution function parameter.

ステップ460における車両情報特定部220の処理では、車両情報取得部150から取得された車両取得情報に従って、車両伝達関数を選定するための条件を特定する。具体的には、温度、湿度、同乗者位置、ユーザの頭部位置などである。ユーザの頭部位置を推定するためには、前述のシートの情報だけではなく、あらかじめ記憶しておいたユーザの身体形状の情報を利用し、シート位置とユーザの身体寸法情報から計算する方法でも良い。ユーザの身体形状を保存する方法としては、運転免許証にある外部記憶装置に格納しておく方法、または、サーバや携帯電話に保存しておき、運転しているユーザが特定されしだい、身体形状の情報をこれらの機器より得る方法などを利用できる。 In the process of the vehicle information specifying unit 220 in step 460, a condition for selecting a vehicle transfer function is specified according to the vehicle acquisition information acquired from the vehicle information acquisition unit 150. Specifically, temperature, humidity, passenger position, user head position, and the like. In order to estimate the user's head position, not only the information on the above-mentioned sheet, but also the method of calculating from the sheet position and the user's body dimension information using information on the user's body shape stored in advance. good. As a method for preserving the user's body shape, it can be stored in an external storage device in the driver's license, or it can be stored in a server or mobile phone and the body shape can be specified as soon as the user is driving. The method of obtaining the information from these devices can be used.

ステップ470では、車両伝達関数選定部221の処理により、ステップ460で特定された条件に基づき、車両伝達関数保存部222から、条件にもっとも適合する車両伝達関数を検索する。
なお、検索対象となる車両伝達関数は、車両伝達関数保存部222に保存されていなくとも、その都度、制御部130を介してサーバや携帯電話に検索要求を出す方法でもよい。または、車両伝達関数保存部222に存在しないときにだけ制御部130を介してサーバや携帯電話に検索要求を出す様式でもよい。サーバに車両伝達関数を蓄積する形態をとれば、自動車の出荷前に、あらかじめすべてのパタンの伝達特性を取らずとも、ユーザの需要や販売台数に応じて、随時サーバに追加しておくことも可能となる。 In step 470, the vehicle transfer function selection unit 221 searches the vehicle transfer function storage unit 222 for a vehicle transfer function that best matches the conditions based on the conditions specified in step 460.
The vehicle transfer function to be searched may be a method of issuing a search request to the server or mobile phone via the control unit 130 each time, even if the vehicle transfer function is not stored in the vehicle transfer function storage unit 222. Alternatively, a search request may be issued to the server or mobile phone via the control unit 130 only when the vehicle transfer function storage unit 222 does not exist. If the vehicle transfer function is stored in the server, it may be added to the server as needed according to the user's demand and the number of units sold, without taking all the pattern transfer characteristics before shipping the car. It becomes possible.

ステップ475では、検索対象となった車両伝達特性が検索できたか否かを判定する。検索できないケースとしては、特定された車両情報に一致する条件での車両伝達関数が存在しなかった場合や、一部の条件が一致しているが、所定の方法により計算される条件間の距離が大きかった場合が挙げられる。もし検索ができた場合には、ステップ480へ進む。検索できなかった場合には、ステップ485へ進む。 In step 475, it is determined whether or not the vehicle transfer characteristic that is the search target has been searched. Cases that cannot be searched include when there is no vehicle transfer function under conditions that match the specified vehicle information, or when some conditions match but the distance between the conditions is calculated by a predetermined method Is a large case. If the search is successful, the process proceeds to step 480. If the search has failed, the process proceeds to step 485.

ステップ480では、車両伝達関数選定部221の処理により、ステップ470で検索された車両伝達関数を取得する。この取得においては、ステップ470で車両伝達関数保存部222を検索対象とした場合には、車両伝達関数保存部222より取得する。また、サーバや携帯電話に検索要求を出した場合には、サーバや携帯電話から転送された車両伝達関数を使用する。また、サーバや携帯電話から転送を行った際には、この情報を車両伝達関数保存部222に蓄積していき、再度同一の条件の要求があった際には、車両伝達関数保存部222から読み込めるようにしてもよい。
さらに、ステップ480で取得された車両伝達関数は、ステップ490の処理により、フィルタ作成部へ送られる。 In step 480, the vehicle transfer function searched in step 470 is acquired by the processing of the vehicle transfer function selection unit 221. In this acquisition, when the vehicle transfer function storage unit 222 is set as a search target in step 470, the acquisition is performed from the vehicle transfer function storage unit 222. When a search request is issued to the server or mobile phone, the vehicle transfer function transferred from the server or mobile phone is used. Further, when transferring from the server or the mobile phone, this information is accumulated in the vehicle transfer function storage unit 222. When the same condition is requested again, the vehicle transfer function storage unit 222 It may be made readable.
Further, the vehicle transfer function acquired in step 480 is sent to the filter creation unit by the processing in step 490.

ステップ485では、ステップ475において車両伝達関数が検索できなかった場合に、デフォルトとなる車両伝達関数を取得する。このデフォルトの車両伝達関数も、車両伝達関数保存部222、または、サーバや携帯電話に保存されているものとする。このデフォルトの車両伝達関数は、ユーザの平均的な頭部位置に対する、スピーカからの音響伝達関数である。よって、1箇所で測定した伝達特性や、単純にボリュームの減衰量だけを記述した伝達関数などを使用することができる。
さらに、ステップ485で取得された車両伝達関数は、ステップ490の処理により、フィルタ作成部へ送られる。
以上により、車両情報特定部220、車両伝達関数選定部221の処理は終了する。 In step 485, if the vehicle transfer function cannot be searched in step 475, a vehicle transfer function that is the default is acquired. It is assumed that this default vehicle transfer function is also stored in the vehicle transfer function storage unit 222 or a server or a mobile phone. This default vehicle transfer function is the acoustic transfer function from the speaker for the user's average head position. Therefore, it is possible to use a transfer function measured at one place or a transfer function that simply describes the volume attenuation.
Further, the vehicle transfer function acquired in step 485 is sent to the filter creation unit by the processing in step 490.
Thus, the processes of the vehicle information specifying unit 220 and the vehicle transfer function selecting unit 221 are finished.

［車内音響環境の取得］
車内音響環境推定部230では、ユーザが運転中に聞いている電話の会話音とは異なる騒音（エンジンノイズ、ロードノイズ、音楽音、同乗者の会話音など）を推定する。この処理を、図５のフローチャートによって説明する。
図５のフローチャートは、電話の着信があった際に開始される。また、電話の通話中に繰り返し行う。これにより、騒音を逐次推定するものである。なお、この繰り返しの時間間隔も、先に説明したとおり、FFT演算に使われるフレーム長の1/4を仮定する。 [Acquisition of interior acoustic environment]
The in-vehicle acoustic environment estimation unit 230 estimates noise (engine noise, road noise, music sound, passenger's conversation sound, etc.) that is different from the telephone conversation sound that the user is listening to while driving. This process will be described with reference to the flowchart of FIG.
The flowchart of FIG. 5 is started when an incoming call is received. It is repeated during a telephone call. Thereby, the noise is sequentially estimated. It is assumed that the repetition time interval is 1/4 of the frame length used in the FFT operation as described above.

ステップ510では、電話の通話が継続しているか否かを判定する。この処理は、携帯電話120、制御部130によって行われる。通話が継続していない場合には、処理を終了する。 In step 510, it is determined whether the telephone call is continued. This processing is performed by the mobile phone 120 and the control unit 130. If the call is not continued, the process is terminated.

ステップ515では、ユーザが何か音声を発しているか否かを判定する。これは、マイク180から取り込んだ音に対して、信号処理を行うことにより判定を行う。この処理を行う理由は、車内の騒音を推定する際において、会話中の音声をもとに騒音を推定すると、その推定精度に悪影響が出るためである。この音声発話の検出としては、公知の音声レベルによる方法、GMM(Gaussian mixture model)による音声と非音声との判別手法を取ることができる。GMMに基づく方法では、通常の電話をしていないときの騒音を常に録音しておき、このときの騒音を学習することでもよい。
この判定により、ユーザが発話していないと判定された場合には、ステップ510へ戻る。ユーザが発話していると判定された場合には、ステップ520へ移る。 In step 515, it is determined whether or not the user is making any sound. This is determined by performing signal processing on the sound captured from the microphone 180. The reason for performing this process is that, when estimating the noise in the vehicle, if the noise is estimated based on the voice during the conversation, the estimation accuracy is adversely affected. As the detection of the voice utterance, a method based on a publicly known voice level or a method for discriminating between voice and non-voice based on GMM (Gaussian mixture model) can be used. In the method based on GMM, it is possible to always record the noise when not making a normal telephone call and learn the noise at this time.
If it is determined by this determination that the user is not speaking, the process returns to step 510. If it is determined that the user is speaking, the process proceeds to step 520.

ステップ520では、マイクからの音声に対して、周波数解析を行い、周波数-パワー特性を算出する。この方法は、さきに説明したFFTによる方法を行うことができる。また、ステップ515における発話の判定においてGMMを用いた方法を使う場合には、すでに周波数-パワー特性や、それに類する特性が計算されているので、その情報を用いても良い。 In step 520, frequency analysis is performed on the sound from the microphone to calculate a frequency-power characteristic. This method can be performed by the FFT method described above. Further, when the method using GMM is used in the speech determination in step 515, the frequency-power characteristics and similar characteristics have already been calculated, so that information may be used.

ステップ525では、ステップ520で求められた周波数-パワー特性に所定の計算を行い、周波数-パワー特性の更新を行う。この所定の計算の目的は、現時点における運転中の騒音の状況を推定することである。そのための計算方法にはいくつかの方法が考えられる。第1に、運転中に過去に採取された周波数-パワー特性と、現時点で採取された周波数-パワー特性の平均値を算出することが考えられる。こうすることにより、過去の騒音状況も加味した信頼できる値を採取することができる。また、急に路面状況が変化した場合に関しては、現時点で採取された周波数-パワー特性の瞬時値を使用することでもよい。 In step 525, a predetermined calculation is performed on the frequency-power characteristic obtained in step 520, and the frequency-power characteristic is updated. The purpose of this predetermined calculation is to estimate the current noise situation during operation. There are several possible calculation methods for this purpose. First, it is conceivable to calculate an average value of the frequency-power characteristics collected in the past during operation and the frequency-power characteristics collected at the present time. By doing so, it is possible to collect a reliable value in consideration of the past noise situation. Further, when the road surface condition suddenly changes, the instantaneous value of the frequency-power characteristic collected at the present time may be used.

ステップ530では、ステップ525で得られたマイク音の周波数特性、車両情報取得部150から出力された1つないし複数の情報に基づいて、騒音データ保存部235に保存されているユーザの頭部位置における騒音の周波数特性を検索する。 In step 530, the head position of the user stored in the noise data storage unit 235 based on the frequency characteristics of the microphone sound obtained in step 525 and one or more information output from the vehicle information acquisition unit 150 Search the frequency characteristics of noise at.

騒音データ保存部235について説明する。これは、さまざまな走行環境下、さまざまな車両条件下における、ユーザの頭部位置における騒音の周波数特性を保存したものである。条件としては、車両伝達関数保存部222と同様に、車内の温度、湿度、同乗者の位置などが使用できる。また、自動車の速度情報、エンジンの回転数情報を用いることができる。
さらに、この騒音データの検索においては、ステップ525でもとめられた騒音の周波数-パワー特性を検索条件に入れることを想定する。ステップ525でもとめられた騒音の周波数-パワー特性も、すでに騒音の特性を表している。しかし、マイクで収集される騒音の周波数特性と、ユーザが実際に聞く周波数特性は異なると考えられる。よって、ユーザが聞く音に近い騒音特性を推定できることが望ましい。よって、マイクから取られる騒音と、ユーザの頭部位置における騒音との関連をあらかじめとっておき、検索することによって、高精度なユーザ頭部位置における騒音特性を取ることができる。 The noise data storage unit 235 will be described. This preserves the frequency characteristics of noise at the user's head position under various driving conditions and various vehicle conditions. As conditions, as in the vehicle transfer function storage unit 222, the temperature, humidity, passenger position, etc. in the vehicle can be used. Also, vehicle speed information and engine speed information can be used.
Further, in the search of the noise data, it is assumed that the frequency-power characteristic of the noise stopped at step 525 is included in the search condition. The frequency-power characteristics of the noise stopped at step 525 already represent the characteristics of the noise. However, it is considered that the frequency characteristics of noise collected by the microphone and the frequency characteristics actually heard by the user are different. Therefore, it is desirable to be able to estimate a noise characteristic close to the sound heard by the user. Therefore, it is possible to obtain a highly accurate noise characteristic at the user's head position by preliminarily searching for a relationship between the noise taken from the microphone and the noise at the user's head position.

ステップ540では、ステップ530の検索の結果、騒音特性を検索できたかを判断する。ここで検索できないケースとして考えられるのは、特定された車両情報やマイクの騒音特性に一致する条件での騒音特性が、騒音データ保存部235に存在しなかった場合が挙げられる。
検索結果が存在した場合には、ステップ550へ進む。存在しなかった場合には、ステップ560へ進む。 In step 540, it is determined whether the noise characteristics have been searched as a result of the search in step 530. A case that cannot be searched here is a case in which the noise data storage unit 235 does not have a noise characteristic under a condition that matches the specified vehicle information and the noise characteristic of the microphone.
If a search result exists, the process proceeds to step 550. If not, the process proceeds to step 560.

ステップ550では、ステップ530で検索された騒音特性を取得する。さらに、ステップ570において、取得した騒音特性をフィルタ作成部へ送る。 In step 550, the noise characteristic retrieved in step 530 is acquired. In step 570, the acquired noise characteristics are sent to the filter creation unit.

ステップ560では、ステップ530で騒音特性が検索できなかった場合における、デフォルトの騒音特性を取得する。デフォルトの騒音特性としては、ステップ525でもとめられたマイク180で採取された騒音の周波数-パワー特性をそのまま用いてもよい。または、定数を掛けることにより、騒音の大きさを補正したのち、使用することでもよい。そして、ステップ570において、得られた騒音特性をフィルタ作成部へ送る。 In step 560, a default noise characteristic is acquired when the noise characteristic cannot be retrieved in step 530. As the default noise characteristic, the frequency-power characteristic of the noise collected by the microphone 180 stopped at step 525 may be used as it is. Alternatively, it may be used after correcting the noise level by multiplying by a constant. In step 570, the obtained noise characteristics are sent to the filter creation unit.

以上の車内音響環境推定部230の動作を、通話中繰り返すものとする。よって、フィルタ作成部に出力される騒音特性も、その都度更新される。 The operation of the in-vehicle acoustic environment estimation unit 230 is repeated during a call. Therefore, the noise characteristics output to the filter creation unit are also updated each time.

［フィルタの作成］
フィルタ作成部250は、以上の各部が算出した情報に基づき、ユーザにとって送話音を聞きやすい音に加工するフィルタを作成する。フィルタ作成部250における処理の流れを、図６のフローチャートに従って説明する。
なお、フィルタ作成部250の処理は、通話の継続中は繰り返し行われるものである。なお、この繰り返しの時間間隔も、先に説明したとおり、FFT演算に使われるフレーム長の1/4を仮定する。 Create filter
Based on the information calculated by each of the above units, the filter creation unit 250 creates a filter that processes the transmitted sound into a sound that is easy for the user to hear. The flow of processing in the filter creation unit 250 will be described with reference to the flowchart of FIG.
Note that the processing of the filter creation unit 250 is repeatedly performed while a call is continued. It is assumed that the repetition time interval is 1/4 of the frame length used in the FFT operation as described above.

ステップ610においては、電話による通話が継続中であるか否かを判断する。通話が終了した場合には、処理を終了する。通話が継続している場合には、ステップ620へ移る。 In step 610, it is determined whether a telephone call is ongoing. When the call is finished, the process is finished. If the call continues, the process moves to step 620.

ステップ620では、送話者が発話しているか否かを判断する。これは、送話者が会話しているときにのみ音質調整のフィルタを処理するためにこの処理が行われる。この発話か否かの判定を行う方法としては、先ほど説明した、公知の音声レベルによる方法や、GMMに基づく方法を取ることができる。
もし、発話でないと判定された場合には、ステップ610へ戻る。発話であると判定された場合には、ステップ630へ移る。 In step 620, it is determined whether the sender is speaking. This is done to process the sound quality adjustment filter only when the talker is speaking. As a method for determining whether or not this utterance is used, the method based on the known voice level described above or a method based on GMM can be used.
If it is determined that it is not an utterance, the process returns to step 610. If it is determined that it is an utterance, the process proceeds to step 630.

ステップ630では、送話音声特徴選択部211が出力した送話音声特徴を読み込む。これは、フィルタ作成部250において、ユーザが聞く送話音の周波数特性として目標値として使用される。 In step 630, the transmission voice feature output by the transmission voice feature selection unit 211 is read. This is used as a target value by the filter creation unit 250 as the frequency characteristic of the transmitted sound heard by the user.

ステップ635では、送話音声特徴である周波数特性に対して、車両伝達関数選定部221が出力した車両伝達関数の逆関数との積を計算する。
車両伝達特性の逆関数の例を図７の(C)に示す。この逆関数は、図７の(B)に示す車両伝達関数のゲインに対して、ゲインの逆数を計算することによって計算される。なお、ゲインをdB単位で表す場合には、元の対数ゲインに対する負の値として計算される。また、最大ゲイン、最小ゲイン、平均ゲインといった複数の値が定義されている場合には、それらの値のすべてに対して、逆関数をもとめる。また、車両伝達関数が確率分布として記録されている場合には、その分布が保存される形で、ゲインを逆数を取った分布のパラメタを求める。この逆関数は、スピーカから音を再生する場合において、ユーザの頭部において車室内の音響特性の影響をキャンセルした、周波数特性が平坦な音声を再生するための周波数特性として使用される。 In step 635, the product of the frequency characteristic that is the transmitted voice characteristic and the inverse function of the vehicle transfer function output by the vehicle transfer function selection unit 221 is calculated.
An example of the inverse function of the vehicle transfer characteristic is shown in FIG. This inverse function is calculated by calculating the reciprocal of the gain with respect to the gain of the vehicle transfer function shown in FIG. When the gain is expressed in dB, it is calculated as a negative value with respect to the original logarithmic gain. When a plurality of values such as maximum gain, minimum gain, and average gain are defined, inverse functions are obtained for all of these values. When the vehicle transfer function is recorded as a probability distribution, a distribution parameter obtained by taking the reciprocal of the gain is obtained in such a manner that the distribution is stored. This inverse function is used as a frequency characteristic for reproducing sound with a flat frequency characteristic in which the influence of the acoustic characteristic in the passenger compartment is canceled in the user's head when reproducing the sound from the speaker.

図７の(D)には、送話音声特徴である周波数特性の例図７ (A)と、車両伝達関数の逆関数図７（C)とを掛けた周波数特性を示す。なお、周波数特性の積とは、デシベル単位の軸上では、これらの値の和として計算できる。図７（D)では、送話音声特徴図７（A)の下限音圧レベルに対して、図７（C)の最大ゲインに対応する特性、最小ゲインに対する特性のそれぞれを掛けた特性を求めている（それぞれ図７（D)では、a、 bで示す）。また、送話音声特徴図７（A)の上限音圧レベルに対して、図７（C)の最大ゲインに対応する特性、最小ゲインに対する特性のそれぞれを掛けた特性も求めている（それぞれ図７（D)では、c、 dで示す）。 FIG. 7D shows a frequency characteristic obtained by multiplying an example of the frequency characteristic which is a transmitted voice characteristic, FIG. 7A, and an inverse function diagram 7C of the vehicle transfer function. The product of frequency characteristics can be calculated as the sum of these values on the axis in decibels. In FIG. 7D, a characteristic obtained by multiplying the lower limit sound pressure level in the transmitted voice characteristic FIG. 7A by the characteristic corresponding to the maximum gain in FIG. 7C and the characteristic for the minimum gain is obtained. (Indicated by a and b in FIG. 7D, respectively). In addition, a characteristic obtained by multiplying the upper limit sound pressure level in the transmitted voice characteristic FIG. 7A by the characteristic corresponding to the maximum gain in FIG. 7 (D), indicated by c and d).

ステップ640では、車内音響環境推定部230から、ユーザの頭部位置における騒音の周波数特性を取得する。 In step 640, the frequency characteristic of noise at the user's head position is acquired from the in-vehicle acoustic environment estimation unit 230.

ステップ645では、ステップ640で取得された騒音の周波数特性に対して、車両伝達関数の逆関数を掛ける。この処理の例を図７の(E)に示す。この図の例では、ステップ640で取得された騒音の周波数特性に対して、車両伝達関数の逆関数図７ (C)のうち、平均ゲインに対応する逆関数との積を計算した場合を示している。図7(E)において、ｅは、車内音響環境推定部が出力した騒音特性に対して車両伝達特性の逆関数との積を取った騒音特性を表し、ｆは、車内音響環境推定部が出力した騒音特性を表す。
この逆関数との積の計算によって、ユーザが聞く騒音特性に相当する騒音をスピーカで再生するには、どのような周波数特性としたらよいかが分かる。このデータは、ユーザが実際に聞く送話音において、騒音の影響がどの程度あるかを評価するために使われる。 In step 645, the frequency characteristic of the noise acquired in step 640 is multiplied by the inverse function of the vehicle transfer function. An example of this processing is shown in FIG. In the example of this figure, the product of the inverse function of the vehicle transfer function and the inverse function corresponding to the average gain in the inverse function of the vehicle transfer function with respect to the frequency characteristic of the noise obtained in step 640 is shown. ing. In FIG. 7 (E), e represents a noise characteristic obtained by multiplying the noise characteristic output from the vehicle interior acoustic environment estimation unit by the inverse function of the vehicle transfer characteristic, and f represents the output from the vehicle interior acoustic environment estimation unit. Represents the noise characteristics.
By calculating the product with the inverse function, it is possible to determine what frequency characteristic is to be used to reproduce the noise corresponding to the noise characteristic heard by the user on the speaker. This data is used to evaluate the influence of noise in the transmitted sound that the user actually hears.

ステップ650では、ステップ635で算出した音声特徴とステップ645で算出した騒音特性から、目標とする周波数帯域ごとの音圧レベルの範囲を定める。
この処理の具体例を図７の(D)で示す。図７(D)では、ステップ635で算出した音声特徴(a、b、c、d)、ステップ645で算出した騒音特性(e)が記載されている。
まず、ユーザが聞きやすい音圧レベルは、送話音声特徴図７ (A)の周波数特性が、ユーザの頭部位置における音において再現されることである。これを達成するためには、スピーカからは、図７ (D)の下限音圧レベルから上限音圧レベルまでの音圧の範囲において再生されれば良い。なぜならば、図７ (D)の下限音圧レベル、上限音圧レベルは、送話音声特徴に対して車室内の周波数伝達特性の逆関数が掛けられているため、この周波数特性においてスピーカから再生されれば、車室の音響伝達特性を経てユーザの頭部において音声が再生されるため、送話音声特徴図７ (A)の周波数特性において再生されるためである。
しかしながら、車両伝達関数にはさまざまな理由により変動が発生する。そのため、たとえば図７ (D)のdの曲線に沿って音を再生した場合では、もし車両伝達関数のゲインが変動範囲内のうち大きなゲインを持っていた場合には、ユーザが聞いた音声は、普段聞いている音圧よりも大きくなる。このため、ユーザにとって不快となる恐れがある。一方、図７ (D)のaの曲線に沿って音を再生した場合では、車両伝達関数のゲインが変動範囲内のうち小さなゲインであった場合には、ユーザが聞く音は過度に小さくなる恐れがある。
こういった副作用を抑えるため、ここでは、車両伝達関数の予測される変動範囲内において、ユーザにとって聞きやすい送話音声特徴に収まる音圧の範囲を考える。すなわち、図７ (D)における、bからcまでの範囲に収めることにする。
なお、車両伝達関数や、送話音声特徴の周波数-パワー特性が確率分布関数の形式で定義されている場合には、図７ (D)の分布も確率分布関数によって計算することができる。この計算には、確率変数がデシベル単位で定義される場合には、2つの確率分布の和を取ることに相当するため、図７ (D)の各周波数帯域における音圧の変動範囲を表す確率分布関数のパラメタを得ることは公知の計算方法によって可能である。このようにして得た確率分布より、ユーザが聞きづらいほど音圧が小さくなる場合、不快となるほど音圧が大きくなる場合の確率を所定の閾値以下に収まる音圧の変動範囲が計算できる。 In step 650, the range of the sound pressure level for each target frequency band is determined from the voice characteristics calculated in step 635 and the noise characteristics calculated in step 645.
A specific example of this processing is shown in FIG. In FIG. 7D, the voice feature (a, b, c, d) calculated in step 635 and the noise characteristic (e) calculated in step 645 are described.
First, the sound pressure level that is easy for the user to hear is that the frequency characteristic of the transmitted voice characteristic FIG. 7A is reproduced in the sound at the head position of the user. In order to achieve this, the speaker may be reproduced in the sound pressure range from the lower limit sound pressure level to the upper limit sound pressure level in FIG. This is because the lower limit sound pressure level and upper limit sound pressure level in FIG. 7 (D) are reproduced from the speaker in this frequency characteristic because the transmitted voice feature is multiplied by the inverse function of the frequency transfer characteristic in the passenger compartment. This is because the sound is reproduced at the user's head through the acoustic transfer characteristic of the passenger compartment, and thus reproduced at the frequency characteristic of the transmitted voice characteristic FIG. 7 (A).
However, the vehicle transfer function varies for various reasons. Therefore, for example, when sound is reproduced along the curve d in FIG. 7D, if the gain of the vehicle transfer function has a large gain within the fluctuation range, the voice heard by the user is , It becomes larger than the sound pressure that I usually hear. This may be uncomfortable for the user. On the other hand, when the sound is reproduced along the curve a in FIG. 7D, if the gain of the vehicle transfer function is a small gain within the fluctuation range, the sound heard by the user becomes excessively small. There is a fear.
In order to suppress such side effects, here, the range of sound pressures that fall within the transmitted voice features that are easy to hear for the user within the predicted fluctuation range of the vehicle transfer function is considered. That is, it is within the range from b to c in FIG.
When the vehicle transfer function and the frequency-power characteristic of the transmitted voice feature are defined in the form of a probability distribution function, the distribution in FIG. 7D can also be calculated by the probability distribution function. In this calculation, when a random variable is defined in decibels, it corresponds to taking the sum of two probability distributions. Therefore, the probability representing the fluctuation range of the sound pressure in each frequency band in FIG. The parameter of the distribution function can be obtained by a known calculation method. From the probability distribution obtained in this way, when the sound pressure decreases as it is difficult for the user to hear, the fluctuation range of the sound pressure within which the probability that the sound pressure increases as the user becomes uncomfortable falls below a predetermined threshold can be calculated.

つぎに、騒音による影響を考慮する。図７ (D)のeに示した騒音特性を見ると、高周波数帯域（たとえば、周波数f2）ではスピーカの会話音の再生音圧範囲より低い音圧であるが、低周波数帯域（たとえば、周波数f1）では、一部、騒音の音圧が、会話音の再生の音圧範囲に入っている帯域がある。よって、この帯域では、bからcの範囲の音圧で再生音を再生した場合、ユーザが聞く音では、会話音よりも騒音のほうが大きくなることがある。この場合、ユーザは会話音の聞き取りがしづらくなる。聴覚心理学では、この現象をマスキング現象と呼ぶ。マスキング現象は、ある音のラウドネス（主観的に感じる音の大きさ）が、同帯域のノイズが存在する場合には、ノイズが存在しない場合よりも小さく感じられる現象である。 Next, consider the effects of noise. Looking at the noise characteristics indicated by e in FIG. 7D, the sound pressure is lower in the high frequency band (for example, frequency f2) than the reproduction sound pressure range of the speaker's conversational sound, but the low frequency band (for example, frequency In f1), there is a band where the sound pressure of noise is partly within the sound pressure range of conversational sound reproduction. Therefore, in this band, when the reproduced sound is reproduced with a sound pressure in the range of b to c, the noise heard by the user may be louder than the conversation sound. In this case, it becomes difficult for the user to hear the conversation sound. In auditory psychology, this phenomenon is called a masking phenomenon. The masking phenomenon is a phenomenon in which the loudness of a certain sound (subjective feeling loudness) is felt smaller when noise in the same band is present than when no noise is present.

ノイズによるマスキング現象におけるラウドネスの低下の大きさは、以下の文献に記載されている。
J.P.A. Lochner、 J.F. Burger: ``Form of the loudness function in the presence of masking noise、" Journal of the Acoustical Society of America、 vol.33、 no.12、 pp.1705-1707、 1961
この文献によれば、ノイズが存在する環境下での目的音のラウドネスψは、以下の式で表される。
ψ= k ( Iⁿ - I₀ ⁿ)
ただし、k，nは定数である。また、Iは目的音の音インテンシティ、I₀はノイズの音インテンシティである。
この式による会話音のラウドネスを計算すると、会話音声の音圧レベルが騒音の音圧レベルよりも十分に大きければ、ラウドネスの減少は無視できるほど少ない。一方、会話音声の音圧レベルが騒音の音圧レベルと近い値である場合には、会話音のラウドネスの減少が顕著となる。よって、会話音声の音圧レベルが騒音の音圧レベルと近い場合において、会話音声をより大きく増幅する補償を行うことで、会話音の聞き取りを行うことができる。よって、この基準に基づいて再生音圧の範囲を決定させる。具体的には、図７(D)のeよりラウドネス減少が起こらない程度に大きい音圧を下限とし、cを上限とする範囲を採用する。
なお、騒音特性については、これまで平均値のみを用いる場合を説明したが、送話音声特徴の周波数-パワー特性と同様、騒音の最大値、最小値や、確率分布関数による変動範囲を利用できる場合がある。こういった場合に関しては、上記で述べた送話音声特性のさまざまな形態における対処方法と同様に、マスキングの影響を計算することができる。 The magnitude of the decrease in loudness in the masking phenomenon due to noise is described in the following document.
JPA Lochner, JF Burger: `` Form of the loudness function in the presence of masking noise, '' Journal of the Acoustical Society of America, vol.33, no.12, pp.1705-1707, 1961
According to this document, the loudness ψ of the target sound in an environment where noise exists is expressed by the following equation.
ψ = k (I ⁿ -I ₀ ⁿ )
However, k and n are constants. I is the sound intensity of the target sound, and I ₀ is the sound intensity of the noise.
When calculating the loudness of the conversational sound according to this equation, if the sound pressure level of the conversational sound is sufficiently larger than the sound pressure level of the noise, the decrease in the loudness is negligibly small. On the other hand, when the sound pressure level of the conversation voice is a value close to the sound pressure level of the noise, the loudness of the conversation sound is significantly reduced. Therefore, when the sound pressure level of the conversation voice is close to the sound pressure level of the noise, the conversation sound can be heard by performing compensation that amplifies the conversation voice more greatly. Therefore, the range of the reproduction sound pressure is determined based on this criterion. Specifically, a range is adopted in which the sound pressure that is so large that the loudness does not decrease from e in FIG.
As for the noise characteristics, the case where only the average value has been used has been described so far, but the maximum and minimum noise values and the fluctuation range based on the probability distribution function can be used as with the frequency-power characteristics of the transmitted voice characteristics. There is a case. In such a case, the masking influence can be calculated in the same manner as the above-described coping methods in various forms of the transmitted voice characteristics.

ステップ655では、送話音声解析部240において周波数解析された送話音の周波数-パワー特性を取得する。この周波数-パワー特性は、以下の2通りの使われ方をする。
(1) 会話音の瞬時の特性
(2) 現在の通話が開始してからの現在までの送話者の会話音の統計的な特性
(1)については、先に説明したフレーム単位のFFT演算の結果から得た周波数-パワー特性を直接利用する。(2)は、現在会話されている音声における、各周波数帯域での上限音圧レベル、下限音圧レベルを判定するために得る統計量である。この方法としては、周波数-パワー特性の各時刻の値から、最大となった音圧レベルを上限値とし、最小となった音圧レベルを下限値として使用する方法でもよい。また、過去の会話音の音圧レベルの瞬時値を蓄積し、確率分布関数におけるパラメタを演算し、このパラメタから、音圧の下限、上限を求める方法でもよい。 In step 655, the frequency-power characteristic of the transmitted sound subjected to frequency analysis by the transmitted speech analysis unit 240 is acquired. This frequency-power characteristic is used in the following two ways.
(1) Instantaneous characteristics of conversational sound
(2) Statistical characteristics of the talker's conversational sound from the start of the current call to the present
For (1), the frequency-power characteristics obtained from the result of the FFT calculation in units of frames described above are directly used. (2) is a statistic obtained in order to determine the upper limit sound pressure level and the lower limit sound pressure level in each frequency band in the currently spoken voice. This method may be a method in which the maximum sound pressure level is used as the upper limit value and the minimum sound pressure level is used as the lower limit value from the value at each time of the frequency-power characteristics. Alternatively, a method may be used in which instantaneous values of sound pressure levels of past conversational sounds are accumulated, a parameter in the probability distribution function is calculated, and a lower limit and an upper limit of the sound pressure are obtained from this parameter.

これらの音圧の上限値、下限値は、ステップ660におけるフィルタの増幅率を決定する際に、入力となる音声のダイナミックレンジがどの程度であるかを把握するために使用する。
なお、送話者が通話している際の音声は、過去の会話において蓄積されている送話音声特徴保存部212に保存されている情報を利用することも可能である。もし、現在通話している音声が、送話音声特徴保存部212を再現するものであれば、送話音声解析部240による解析は不要であり、送話音声特徴保存部212に保存されている情報をそのまま用いることができる。しかし、現在会話している送話者の音声特性と過去に蓄積されている送話者の音声特性とは、かならずしも一致しない。その理由としては、送話者の体調の相違、送話者が現在いる環境の相違、会話している相手が運転中であることによる発話様態の変化などが挙げられる。また、送話音声特徴部212に保存されている音声特徴は、後ほど説明する学習装置において、受話者が普段の固定電話や携帯電話の会話から学習することもありうる。よって、ユーザが通常使用している機器に特有のイコライジングや、ユーザがその機器において普段設定している音質設定の影響を含んだ音声となっている。しかしながら、ハンズフリー通話システムにおいては、ユーザが普段使用している携帯電話を使っているとしても、携帯電話で設定されているイコライジングがハンズフリー電話システムにおいて利用不可能な場合がある。また、ハンズフリー電話システムでは、固有のイコライジングや音質調整を行なっていることが一般的である。また、ユーザが普段利用しない携帯電話を利用し、ハンズフリー電話での会話を行うこともある。こういった、ハンズフリー電話で再生される音が、送話音声特徴部212に入っている特徴とは異なる場合も多い。よって、送話音声解析部240を使い、現在会話している音声を逐一解析するものとする。
送話音声解析部240が解析した音声の周波数特性の例を図7の(F)に示す。 These upper limit value and lower limit value of the sound pressure are used for grasping the dynamic range of the input voice when determining the amplification factor of the filter in step 660.
Note that the information stored in the transmitted voice feature storage unit 212 stored in the past conversation can be used as the voice when the talker is talking. If the voice currently being spoken reproduces the transmitted voice feature storage unit 212, the analysis by the transmitted voice analysis unit 240 is unnecessary, and is stored in the transmitted voice feature storage unit 212. Information can be used as it is. However, the voice characteristics of the currently speaking talker and the voice characteristics of the sender accumulated in the past do not always match. Reasons for this include differences in the physical condition of the speaker, differences in the environment in which the speaker is currently present, and changes in the utterance mode due to the conversation partner being driving. In addition, the voice feature stored in the transmitted voice feature unit 212 may be learned from the conversation of a normal fixed phone or mobile phone by the receiver in a learning device described later. Therefore, the sound includes the equalization unique to the device normally used by the user and the influence of the sound quality setting that the user normally sets on the device. However, in the hands-free call system, even if the user uses a mobile phone that is normally used, the equalization set in the mobile phone may not be used in the hands-free phone system. In hands-free telephone systems, it is common to perform unique equalization and sound quality adjustment. In addition, the user may use a mobile phone that is not normally used and have a hands-free phone conversation. In many cases, the sound reproduced by such a hands-free phone is different from the feature included in the transmitted voice feature 212. Therefore, the transmitted voice analysis unit 240 is used to analyze the voice that is currently being spoken.
An example of the frequency characteristic of the voice analyzed by the transmitted voice analysis unit 240 is shown in FIG.

ステップ660では、送話音声の音圧が、目標音圧に収まるようになる、各周波数帯域における増幅率を計算する。
この処理の第1に、現時点の会話における送話音声の音圧（入力音圧と呼ぶ）の変動範囲と、650で算出した目標とする再生音声の音圧（出力音圧と呼ぶ）の変動範囲の間の変換式を計算する。
図7の(G),(H)には、それぞれ、周波数f2、 f1における、入力音圧と出力音圧の関係を示す。 In step 660, the amplification factor in each frequency band is calculated so that the sound pressure of the transmitted voice falls within the target sound pressure.
The first part of this process is the fluctuation range of the sound pressure (referred to as input sound pressure) of the transmitted voice in the current conversation and the fluctuation of the target sound pressure (referred to as output sound pressure) calculated in 650. Calculate the conversion formula between the ranges.
(G) and (H) in FIG. 7 show the relationship between the input sound pressure and the output sound pressure at frequencies f2 and f1, respectively.

まず、周波数f2における関係について説明する。周波数f2における、現在会話中の音声における下限音圧i2min、上限音圧i2maxを、図7 (F)から得る。つぎに、目標とする下限音圧o2min、上限音圧o2maxを、図7 (D)より得る。つぎに、これらの範囲内で、入力音圧が出力音圧に変換される変換式を、dB単位の音圧の1次関数によって求める（これを変換関数と呼ぶ）。変換関数は、図7 (G)に太線によって描かれている。聴覚心理学では、ラウドネスの大きさは、音圧インテンシティまたは音圧エネルギのべき関数で近似できることが知られている。よって、デシベル軸上における一次関数を変換関数として出力音圧に変換することにより、ラウドネスの大小関係を保ったまま、増幅した音を出力することができる。次に、変換関数から、送話音声解析部から得られた入力音圧の瞬時値に対して、出力となる音圧の瞬時値を得る。最後に、出力音圧の瞬時値を入力音圧の瞬時値によって割った値を求め、この値を、周波数帯域f2における増幅率として算出する。なお、入力音圧と出力音圧の両方をデシベル単位で表した場合には、入力音圧レベルから出力音圧レベルを減算した値として算出できる。 First, the relationship at the frequency f2 will be described. The lower limit sound pressure i2min and the upper limit sound pressure i2max in the voice currently being talked at the frequency f2 are obtained from FIG. 7 (F). Next, the target lower limit sound pressure o2min and upper limit sound pressure o2max are obtained from FIG. 7 (D). Next, a conversion formula for converting the input sound pressure into the output sound pressure within these ranges is obtained by a linear function of sound pressure in dB (this is called a conversion function). The conversion function is drawn by a thick line in FIG. In auditory psychology, it is known that the magnitude of loudness can be approximated by a power function of sound pressure intensity or sound pressure energy. Therefore, by converting a linear function on the decibel axis into an output sound pressure as a conversion function, it is possible to output an amplified sound while maintaining a loudness magnitude relationship. Next, an instantaneous value of the sound pressure that is output is obtained from the conversion function with respect to the instantaneous value of the input sound pressure obtained from the transmitted voice analysis unit. Finally, a value obtained by dividing the instantaneous value of the output sound pressure by the instantaneous value of the input sound pressure is obtained, and this value is calculated as an amplification factor in the frequency band f2. When both the input sound pressure and the output sound pressure are expressed in decibels, it can be calculated as a value obtained by subtracting the output sound pressure level from the input sound pressure level.

次に、周波数f1における関係について説明する。周波数f1における、現在会話中の音声における下限音圧i1min、上限音圧i1maxを、図7 (F)から得る。つぎに、目標とする下限音圧o1min、上限音圧o1maxを、図7 (D)より得る。つぎに、これらの範囲内で、入力音圧が出力音圧に変換される変換式を求める。周波数f1では、騒音の影響によりマスキングが発生する。よって、変換式は、一次関数ではなく、図7 (H)に示すような、曲線状の変換関数とする。この変換関数は、入力音圧が小さい場合においてもある程度大きな出力音圧となるようにしており、マスキングの影響を避けるものである。この変換式に基づき、入力音圧の瞬時値に対する出力音圧の瞬時値を求める。最後に、この入力音圧と出力音圧より、周波数帯域f1における増幅率を算出する。
以上の処理を、処理の単位となる各周波数帯域において行い、各周波数帯域における増幅率を算出する。
なお、ここでは、入力音圧、出力音圧の上限値、下限値のあいだの対応を使い、増幅率を計算する方法を説明した。しかし、別の方法を用いることも可能である。たとえば、入力音圧と出力音圧の平均値と上限値の対応関係から、変換関数を求め、増幅率を計算してもよい。 Next, the relationship at the frequency f1 will be described. The lower limit sound pressure i1min and the upper limit sound pressure i1max in the voice currently being spoken at the frequency f1 are obtained from FIG. 7 (F). Next, the target lower limit sound pressure o1min and upper limit sound pressure o1max are obtained from FIG. 7 (D). Next, a conversion formula for converting the input sound pressure into the output sound pressure within these ranges is obtained. At the frequency f1, masking occurs due to the influence of noise. Therefore, the conversion formula is not a linear function but a curved conversion function as shown in FIG. This conversion function makes the output sound pressure somewhat large even when the input sound pressure is small, and avoids the influence of masking. Based on this conversion formula, the instantaneous value of the output sound pressure with respect to the instantaneous value of the input sound pressure is obtained. Finally, the amplification factor in the frequency band f1 is calculated from the input sound pressure and the output sound pressure.
The above processing is performed in each frequency band as a unit of processing, and the amplification factor in each frequency band is calculated.
Here, the method for calculating the amplification factor using the correspondence between the upper limit value and the lower limit value of the input sound pressure and the output sound pressure has been described. However, other methods can be used. For example, the conversion function may be obtained from the correspondence between the average value of the input sound pressure and the output sound pressure and the upper limit value, and the amplification factor may be calculated.

ステップ665では、ステップ660で求められた各周波数帯域の増幅率に対して、平滑化処理を行い、増幅率を再計算する。この平滑化の処理の目的は、(1) 帯域間の増幅率の差を抑えること、(2) 時刻間の増幅率の差を抑えること、の2つの目的がある。
(1)の帯域間の増幅率の差を抑える目的で行う処理としては、ステップ660で求められた各周波数帯域の増幅率からスプライン関数を求め、周波数の変化に対して連続的に変化する増幅率を採用する方法がある。冒頭でも述べたとおり、周波数-パワー特性を計算する際の周波数帯域は、critical bandなどのいくつかの周波数帯域に分割して、それぞれの帯域ごとに求める方法を仮定している。よって、この帯域ごとに求められた増幅率をそのまま適用すると、帯域が変わるとこで急激に増幅率が変化してしまう。そこで、各帯域の増幅率からスプライン関数を求め、そのスプライン関数により、各周波数帯域の中心周波数以外における増幅率を計算する。これにより、周波数による増幅率の変化が平滑化される。
(2)の時刻間の増幅率を抑える目的としては、ステップ660で得られた増幅率の瞬時値を用いずに、過去数フレームにおける増幅率の平均値を使用する方法がある。これらの平滑化の方法は、以下の文献に記載されている方法を使用することができる。
F. Asano、 Y. Suzuki、 T. Sone、 S. Kakehata、 M. Satake、 K. Ohyama、 T. Kobayashi 、 T. Takasaka: ``A digital hearing aid that compensates loudness for sensorineural impaired listeners、" Proc. of ICASSP91、 pp.3625-3628、 1991
ステップ670では、ステップ665で計算された各周波数帯域の増幅率をフィルタ処理部170へ送る。
以上のフィルタ作成部250の処理を、通話中繰り返す。 In step 665, smoothing processing is performed on the gain of each frequency band obtained in step 660, and the gain is recalculated. The purposes of the smoothing process are two purposes: (1) suppressing the difference in amplification factor between bands, and (2) suppressing the difference in amplification factor between times.
The processing performed for the purpose of suppressing the difference in amplification factor between bands in (1) is to obtain a spline function from the amplification factor of each frequency band obtained in step 660, and to perform amplification that continuously changes with respect to frequency change. There are ways to adopt rates. As described at the beginning, it is assumed that the frequency band for calculating the frequency-power characteristics is divided into several frequency bands such as a critical band and obtained for each band. Therefore, if the amplification factor obtained for each band is applied as it is, the amplification factor changes abruptly when the band changes. Therefore, a spline function is obtained from the gain of each band, and the gain other than the center frequency of each frequency band is calculated from the spline function. Thereby, the change of the amplification factor according to the frequency is smoothed.
The purpose of suppressing the amplification factor between times in (2) is a method of using the average value of the amplification factors in the past several frames without using the instantaneous value of the amplification factor obtained in step 660. The method described in the following literature can be used for these smoothing methods.
F. Asano, Y. Suzuki, T. Sone, S. Kakehata, M. Satake, K. Ohyama, T. Kobayashi, T. Takasaka: `` A digital hearing aid that compensates loudness for sensorineural impaired listeners, '' Proc. Of ICASSP91, pp.3625-3628, 1991
In step 670, the amplification factor of each frequency band calculated in step 665 is sent to the filter processing unit 170.
The above processing of the filter creation unit 250 is repeated during a call.

［フィルタ処理部］
次に、フィルタ処理部170の処理について説明を行う。フィルタ処理部170は、フィルタ作成部250が出力した各周波数帯域の増幅率に基づき、送話音声の再生音の調整を行う。 [Filter processing section]
Next, the processing of the filter processing unit 170 will be described. The filter processing unit 170 adjusts the reproduction sound of the transmitted voice based on the amplification factor of each frequency band output from the filter creation unit 250.

本発明では、この処理において、公知のFFT演算による方法を前提として説明する。
第1に、送話音をフレーム長だけ切り出す。フレーム長と呼ばれ、10ms程度の値がよく用いられる。切り出した音声に対して、窓かけを行う。なお、音声の切り出しは、1/4フレーム長の時間間隔において行われることを想定する。
第2に、分析対象音声に対してFFTを行う。これにより、時間領域の波形は、周波数領域における実部と虚部の値に変換される。
第3に、FFTで出力された各周波数領域の値に対して、フィルタ設計部160が出力した各帯域ごとの増幅率を掛ける。
第4に、第3の処理によって得られた周波数領域にける各値に対して、逆フーリエ変換を行い、時間領域に戻す。
第5に、第4の処理によって得られた波形を出力する。第1の処理における送話音の切り出しは、1/4フレーム長の時間間隔において行われるため、異なるフレームでも同一時刻となるサンプル値が存在するため、そのような値は加算をして出力する。この方法は、重複加算法（overlap add法）として知られている。
以上の処理により、送話音に対してフィルタ設計部160が出力した各周波数帯域における増幅率に基づいて振幅を増減した音声を、スピーカ190より出力する。 In the present invention, this processing will be described on the premise of a known FFT calculation method.
First, the transmitted sound is cut out by the frame length. It is called the frame length, and a value of about 10 ms is often used. Windowing is performed on the extracted audio. It is assumed that the audio is cut out at a time interval of 1/4 frame length.
Second, perform FFT on the analysis target speech. As a result, the waveform in the time domain is converted into values of the real part and the imaginary part in the frequency domain.
Third, the value of each frequency region output by the FFT is multiplied by the amplification factor for each band output by the filter design unit 160.
Fourth, inverse Fourier transform is performed on each value in the frequency domain obtained by the third process, and the result is returned to the time domain.
Fifth, the waveform obtained by the fourth process is output. Since the cut out of the transmitted sound in the first processing is performed at a time interval of 1/4 frame length, since there are sample values at the same time even in different frames, such values are added and output. . This method is known as an overlap addition method.
Through the above processing, the speaker 190 outputs the sound whose amplitude is increased or decreased based on the amplification factor in each frequency band output from the filter design unit 160 for the transmitted sound.

［まとめ］
スピーカ190から再生された音声は、以上の処理より、ユーザの頭部位置で聞いた場合においては、ユーザが普段聞いている、送話者に対応した周波数特性に変換されて聞こえる。また、騒音がある場合においては、その騒音による聞き取りへの悪影響が抑えられるような音質調整も行なわれている。よって、ユーザにとって聞きやすい音において、ハンズフリー電話の音声を再生することができる。 [Summary]
From the above processing, the sound reproduced from the speaker 190 is heard by being converted into frequency characteristics corresponding to the speaker that the user normally listens to when listening at the user's head position. In addition, when there is noise, sound quality adjustment is performed so as to suppress an adverse effect on listening due to the noise. Therefore, it is possible to reproduce the hands-free telephone voice that is easy to hear for the user.

［運転席以外における利用］
なお、本実施例では、ユーザは運転席において電話を行う仮定において説明した。しかし、ユーザが後部座席や助手席など、他の座席において使用する場合もある。一般に、ハンズフリー電話は、運転者が電話をするための装置として開発されている。しかし、考えられる状況として、車両内にいる複数の人が、電話の向こうにいる人と交互に会話をすることも考えられる。
このような場合おいても、現在会話している人に対する音質調整を行うことができる。この方法を実施するには、車内音響環境推定部230、車両伝達関数選定部221の処理を行う前に、会話をしている人が車内においてどこにいるかをあらかじめ知っておくことが必要である。このためには、スイッチなどの操作により明示的に会話者の位置を機器に指定する方法、車内に設置したカメラによる画像情報から会話者の位置を特定する方法、車内に設置した1ないし複数のマイクロフォンから所定の信号処理により会話者を特定する方法などを用いることができる。
また、会話をする人が異なれば、好む送話音声特徴が異なる。よって、送話音声特徴保存部には、複数の送話音声特徴を聴取者ごとに異なる領域に保存するものとする。実際に音質の補正を行う際には、会話者（すなわち聴取者）が誰であるかを検知したのち、会話者に対応した送話音質特性を読み込んで音質補正を行う。会話者の特定には、スイッチなどの操作により明示的に会話者が誰であるかを機器に指示する方法、車内に設置したカメラによる画像情報から会話者の顔画像を認識する方法、車内に設置したマイクの音から、現在会話中であるものが誰であるかを話者認識によって特定する方法、などが実施できる。 [Use outside the driver's seat]
In the present embodiment, the description has been given on the assumption that the user makes a phone call at the driver's seat. However, the user may use it in other seats such as a rear seat and a passenger seat. In general, a hands-free telephone is developed as a device for a driver to make a telephone call. However, as a possible situation, it is also conceivable that a plurality of people in the vehicle have conversations alternately with the people behind the phone.
Even in such a case, the sound quality can be adjusted for the person who is currently talking. In order to implement this method, it is necessary to know in advance where the person who is having a conversation is in the vehicle before the processing by the in-vehicle acoustic environment estimation unit 230 and the vehicle transfer function selection unit 221. For this purpose, a method of explicitly designating the position of the talker on the device by operating a switch or the like, a method of identifying the position of the talker from the image information from a camera installed in the car, one or more installed in the car For example, a method of identifying a talker from a microphone by predetermined signal processing can be used.
Moreover, if the person who has a conversation differs, the transmitted audio | voice characteristic to like differs. Therefore, the transmission voice feature storage unit stores a plurality of transmission voice features in different regions for each listener. When actually correcting the sound quality, it is detected who the conversation person (that is, the listener) is, and then the transmitted sound quality characteristic corresponding to the conversation person is read to perform the sound quality correction. To identify a conversation person, a method of explicitly instructing the device who is the conversation person by operating a switch or the like, a method of recognizing a conversation person's face image from image information obtained by a camera installed in the car, From the sound of the installed microphone, it is possible to implement a method of identifying who is currently talking by speaker recognition.

つぎに、実施例２として、送話音声特徴保存部212に保存される送話音声特徴の構築方法に関して、送話音声を学習する構成について説明する。 Next, as a second embodiment, a configuration for learning transmitted speech will be described with respect to a method for constructing transmitted speech features stored in the transmitted speech feature storing unit 212. FIG.

［全体構成］
図８に、送話音声を学習する際のシステム構成を示す。送話音声の学習は、ユーザが一般的に使用する携帯電話、固定電話などにおいて、ユーザが普段会話している際の送話者の音声の特徴（送話音声の音響特性）を学習する。また、ハンズフリー電話装置における通話においても学習を行うことができる。
電話回線網810は、一般の固定電話や携帯電話が接続し互いに電話ができる一般的な回線網である。
携帯電話820は、電話回線網810と学習手段830との通信を行う。この携帯電話は、学習手段がハンズフリー電話装置に備わっている場合にのみ必要となる部品である。固定電話、携帯電話において学習を行う際には、電話回線網810と学習手段830にある制御部840が直接通信を行う構成で実施できるため、携帯電話820は不要となる。ハンズフリー電話装置において学習を行う場合では、携帯電話820と制御部840との通信は、有線または無線（Bluetooth規格）により実施される。 [overall structure]
FIG. 8 shows a system configuration when learning transmitted speech. The learning of the transmitted voice is performed by learning the characteristics of the voice of the talker (acoustic characteristics of the transmitted voice) when the user is usually talking on a mobile phone, a fixed phone, or the like generally used by the user. Also, learning can be performed in a call using a hands-free telephone device.
The telephone line network 810 is a general line network in which general landline phones and mobile phones can be connected to make a call with each other.
The mobile phone 820 communicates with the telephone line network 810 and the learning means 830. This mobile phone is a component that is necessary only when the learning means is provided in the hands-free telephone device. When learning is performed on a fixed telephone or a mobile phone, the mobile phone 820 is not necessary because the telephone line network 810 and the control unit 840 in the learning means 830 can communicate directly. When learning is performed in the hands-free telephone device, communication between the mobile phone 820 and the control unit 840 is performed by wire or wireless (Bluetooth standard).

学習手段830は、送話音声を学習するための各部品を収めた装置である。この学習手段は、携帯電話、固定電話、ハンズフリー電話装置のいずれかに備わっているものとする。また、実施例では、このいずれに備わっている場合でも実施できる方法として説明を行う。 The learning means 830 is a device that contains components for learning the transmitted voice. This learning means is provided in any of a mobile phone, a fixed phone, and a hands-free phone device. Also, in the embodiment, a description will be given as a method that can be implemented in any of these cases.

制御部840は、学習手段830の制御を行う。また、学習手段830が備わっている機器の種類（固定電話、携帯電話、ハンズフリー電話装置）に従い、その機器の制御を行う。よって、電話機能に必要となる、送話音声・受話音声の制御、各種モジュールの制御を行う。本実施例では、送話音声の学習において必要となる動作だけを説明する。 The control unit 840 controls the learning unit 830. In addition, the device is controlled according to the type of device equipped with the learning means 830 (fixed phone, mobile phone, hands-free phone device). Therefore, it controls transmission voice / reception voice and various modules necessary for the telephone function. In the present embodiment, only operations necessary for learning transmitted speech will be described.

操作入力部860は、ユーザの操作を入力する。この操作としては、学習手段830が固定電話、携帯電話に備わっている場合においては、電話のボタン押しの動作を受け付けることが考えられる。また、学習手段830がハンズフリー電話装置に備わっている場合には、ハンズフリー電話装置に備わっているボタン、ハンズフリー電話装置が備わっているカーナビゲーション、カーオーディオ、自動車コクピットのボタン、ダイヤル、リモコンなどの操作を受け付けることが挙げられる。 The operation input unit 860 inputs a user operation. As this operation, when the learning means 830 is provided in a fixed telephone or a mobile phone, it is conceivable to accept an operation of pressing a telephone button. In addition, when the learning means 830 is provided in the hands-free telephone device, the buttons provided in the hands-free telephone device, the car navigation provided with the hands-free telephone device, the car audio, the car cockpit button, the dial, the remote control Accepting operations such as.

車両情報取得部870は、ユーザが車両内にいる場合に、その車両に関する情報を取得する。 The vehicle information acquisition unit 870 acquires information about the vehicle when the user is in the vehicle.

送話音声調整部842は、ユーザに聞かせる音（送話音）の音質調整を行なう。具体的には、周波数帯域ごとの音圧レベルを変更するイコライジング処理などが行われる。
また、固定電話、携帯電話、ハンズフリー電話において、ユーザが特段の音質調整を行う場合においても、この送話音声調整部842によって行うことができる。たとえば、ユーザが高周波数帯域が小さい音を好む場合、送話音声調整部842において高周波数帯域の増幅率を下げたイコライジングを設定することができる。
また、ユーザにとって聞きやすいイコライザ設定が、送話者（送話音を発話している人）ごとに異なる場合には、送話者ごとに異なるイコライジング設定を送話音声調整部842にて施す。具体的には、別途設定した送話者ごとのイコライジング設定を送話音声調整部842の記憶に保持しておく。実際の通話においては、送話者の電話番号などの情報から送話者を特定し、特定された送話者に結びついたイコライジング設定を送話音声調整部842で使用する。
この送話音声調整部842における音質調整は、ユーザの聴力に損失がある場合にも、各周波数帯域の音の増幅を行うことにより、補聴装置として使用することができる。 The transmitted voice adjusting unit 842 adjusts the sound quality of the sound (transmitted sound) to be heard by the user. Specifically, an equalizing process for changing the sound pressure level for each frequency band is performed.
In addition, in the case of a fixed phone, a mobile phone, and a hands-free phone, even when the user makes a special sound quality adjustment, this transmission voice adjustment unit 842 can perform the adjustment. For example, when the user likes a sound with a small high frequency band, equalization with a lower amplification factor of the high frequency band can be set in the transmission voice adjustment unit 842.
In addition, when the equalizer setting that is easy for the user to hear is different for each speaker (the person who is speaking the transmission sound), the transmission voice adjusting unit 842 performs a different equalization setting for each speaker. Specifically, the separately set equalizing setting for each transmitter is held in the transmission voice adjusting unit 842. In an actual call, the sender is identified from information such as the telephone number of the sender, and the equalizing setting associated with the identified sender is used in the transmitted voice adjustment unit 842.
The sound quality adjustment in the transmitted voice adjustment unit 842 can be used as a hearing aid device by amplifying sound in each frequency band even when the user's hearing loss is lost.

送話音声調整部842の役割をまとめると、元々の送話音を、ユーザが電話を通常使用する場面におけるユーザにとって聞きやすい送話音となるように音質調整を行う。送話音声調整部842が行う送話音の音質調整は、上記で説明したように、ユーザ個人の聞きやすさ（すなわち聴覚特性）に特化した音質調整や、送話者の送話音特性に特化した音質調整が含まれる。すなわち、送話音声調整部842が行う音質調整には、ユーザにカスタマイズされた詳細な音質調整が含まれる。 To summarize the role of the transmitted voice adjustment unit 842, sound quality adjustment is performed so that the original transmitted sound becomes a transmitted sound that is easy to hear for the user in a scene where the user normally uses the telephone. As described above, the sound quality adjustment of the transmitted sound performed by the transmitted voice adjustment unit 842 is performed by adjusting the sound quality specialized to the user's individual hearing ability (that is, the auditory characteristic) or the transmitted sound characteristic of the transmitter. Includes sound quality adjustments specifically for. That is, the sound quality adjustment performed by the transmitted voice adjustment unit 842 includes detailed sound quality adjustment customized by the user.

送話音声調整部842の出力音声は、送話音声特徴学習部850に送られる。送話音声特徴学習部850の動作は後ほど説明するが、簡潔に説明すると、送話音声調整部842が出力した音声を元に、送話者ごとの音声特徴を学習する。よって、送話音声特徴学習部850には、
送話音声調整部842によって調整されたユーザにとって聞きやすい音声の特徴が送話者ごとに保存されていく。
この学習した結果である音声特徴は、ハンズフリー電話装置125の送話音声特徴保存部212に保存され、すでに説明したハンズフリー電話装置125の音質の自動調整において、ユーザの耳介位置における音響特性の目標値として使用される。
よって、ハンズフリー電話装置125は、ユーザの耳介位置における送話音が、ユーザ情報や送話者情報によって特定された聞きやすい音質となるように、音質調整を行うことができる。 The output voice of the transmission voice adjustment unit 842 is sent to the transmission voice feature learning unit 850. The operation of the transmitted voice feature learning unit 850 will be described later. In brief, the voice feature for each transmitter is learned based on the voice output from the transmitted voice adjustment unit 842. Therefore, the transmitted voice feature learning unit 850 has
Voice features that are easy to hear for the user adjusted by the transmission voice adjustment unit 842 are stored for each speaker.
The learned voice feature is stored in the transmitted voice feature storage unit 212 of the hands-free telephone device 125. In the automatic adjustment of the sound quality of the hands-free telephone device 125 described above, the acoustic characteristics at the user's pinna position are described. Is used as the target value.
Therefore, the hands-free telephone device 125 can perform sound quality adjustment so that the transmitted sound at the user's pinna position has an easy-to-listen sound quality specified by the user information or the speaker information.

受話音声調整部844は、ユーザが発話した音声（受話音）の音質調整を行う。これには、周波数帯域ごとのイコライジングがある。また、ハンズフリー電話装置である場合には、走行騒音を低減するノイズキャンセリング処理などが含まれる。 The received voice adjustment unit 844 adjusts the sound quality of the voice (received sound) uttered by the user. This includes equalization for each frequency band. In the case of a hands-free telephone device, a noise canceling process for reducing driving noise is included.

マイク882、スピーカ880は、それぞれ、ユーザが使用している電話器（固定電話、携帯電話、ハンズフリー電話装置）において、会話に使われるマイクとスピーカである。 The microphone 882 and the speaker 880 are respectively a microphone and a speaker used for conversation in a telephone device (fixed phone, mobile phone, hands-free telephone device) used by the user.

エコーキャンセル部884は、マイク882がスピーカ880から再生される音を集音することによって起こるエコーを消去する処理を行う。この方法には、前述の公知の文献の方法を使用することができる。 The echo cancel unit 884 performs a process of erasing an echo that occurs when the microphone 882 collects sound reproduced from the speaker 880. For this method, the above-mentioned known literature methods can be used.

［送話音声特徴学習部の構成］
送話音声特徴学習部850は、ユーザの電話に通話をしてきた送話者の音声の特徴を学習する。図９に、送話音声特徴学習部850の構成をより詳細に説明した図を示す。 [Configuration of transmitted voice feature learning unit]
The transmitted voice feature learning unit 850 learns the features of the voice of the sender who has made a call to the user's phone. FIG. 9 is a diagram illustrating the configuration of the transmitted voice feature learning unit 850 in more detail.

送話音声解析部910は、送話音声調整部842より送られてきた送話音に対して周波数分析を行う。ここでは、先に説明しているフレーム長の波形に対するFFT解析により、周波数-パワー特性を求めるものとする。また、この処理を、規定のフレーム間隔（ここでは、フレーム長の1/4とする）において、逐次繰り返す。 The transmission voice analysis unit 910 performs frequency analysis on the transmission sound transmitted from the transmission voice adjustment unit 842. Here, it is assumed that the frequency-power characteristic is obtained by the FFT analysis for the waveform having the frame length described above. Further, this process is sequentially repeated at a prescribed frame interval (here, 1/4 of the frame length).

受話音声解析部930は、エコーキャンセル部884より送られた受話音に対して、周波数解析を行う。この方法は送話音声解析部910と同様である。 The received voice analysis unit 930 performs frequency analysis on the received sound sent from the echo cancellation unit 884. This method is the same as that of the transmitted voice analysis unit 910.

判定部920は、送話音声解析部910において周波数解析した送話音に対して、学習すべきか否かを、各時刻において判定する。この処理の流れを、図１０のフローチャートに従って説明する。 The determination unit 920 determines at each time whether or not the transmission sound subjected to frequency analysis by the transmission voice analysis unit 910 should be learned. The flow of this process will be described with reference to the flowchart of FIG.

［判定部の処理の流れ］
ステップ1010では、通話が継続しているかを判断し、通話が終了していれば処理を終了する。
ステップ1020では、学習のモードによる分岐を行う。ここでは、制御部840を介して得られた操作入力部860の入力に従って分岐を行う。第1に、送話音声の学習を行わないとユーザが明示的に指定している場合には、ステップ1070へ進み、学習停止を音声特徴学習部940に対して出力する。また、ユーザが強制的に学習するように指示している場合には、ステップ1090へ進み、学習を実行するように音声特徴学習部940に対して出力する。また、自動学習モードの場合には、送話音、受話音の情報に基づいて学習を行うか否かを判断する。そのため、ステップ1030へ移る。 [Processing flow of judgment unit]
In step 1010, it is determined whether the call is continued. If the call is ended, the process is ended.
In step 1020, branching according to the learning mode is performed. Here, branching is performed according to the input of the operation input unit 860 obtained via the control unit 840. First, when the user explicitly specifies that learning of the transmitted voice is not performed, the process proceeds to step 1070, and a learning stop is output to the voice feature learning unit 940. If the user instructs to forcibly learn, the process proceeds to step 1090 and is output to the speech feature learning unit 940 so as to execute learning. Further, in the case of the automatic learning mode, it is determined whether or not learning is performed based on information on the transmission sound and the reception sound. Therefore, the process proceeds to step 1030.

ステップ1030では、送話音声解析部より出力された送話音の周波数-パワー特性を取得する。
ステップ1040では、ステップ1030で取得した送話音の周波数-パワー特性に基づき、送話音の騒音レベルを判断する。騒音レベルとは、送話音に含まれる電話での会話音とは異なる音の大きさのことである。この騒音レベルが大きい環境では、送話者はうるさい場所から電話を掛けていることが推察される。そのような環境から得られた送話音は、ユーザにとっても聞きづらい音であると考えられる。よって、このような音は学習対象から除外するために判定を行う。この判定の方法としてはさまざまな方法によって実装が可能であるが、たとえば、送話音の周波数-パワー特性を長時間で平均し、その全体のパワーの大きさが一定閾値を越える場合に騒音大と判定する方法や、GMMモデルを用いた音声非音声判別方法を用いることができる。
判定の結果、騒音レベルが閾値以上であった場合には、ステップ1070へ進み、学習停止を音声特徴学習部940に対して指示する。騒音レベルが閾値未満であった場合は、ステップ1050へ進む。 In step 1030, the frequency-power characteristic of the transmission sound output from the transmission voice analysis unit is acquired.
In step 1040, the noise level of the transmitted sound is determined based on the frequency-power characteristic of the transmitted sound acquired in step 1030. The noise level is a loudness level different from the telephone conversation sound included in the transmitted sound. In an environment where the noise level is high, it can be inferred that the speaker is calling from a noisy place. The transmitted sound obtained from such an environment is considered to be difficult for the user to hear. Therefore, such a sound is determined to be excluded from the learning target. This method can be implemented by various methods. For example, if the frequency-power characteristics of the transmitted sound are averaged over a long period of time, and the overall power level exceeds a certain threshold, the noise level is high. And a speech non-speech discrimination method using a GMM model.
As a result of the determination, if the noise level is equal to or higher than the threshold value, the process proceeds to step 1070 to instruct the speech feature learning unit 940 to stop learning. If the noise level is less than the threshold, the process proceeds to step 1050.

ステップ1050では、受話音声解析部より出力された受話音の周波数-パワー特性を取得する。
ステップ1060では、ステップ1050で取得した受話音の周波数-パワー特性に基づき、受話音の騒音レベルを判断する。これは、ユーザが騒音が大きな環境において会話している場合には、ユーザにとって会話がしづらい状況であると判定されるため、学習対象からはずすことを目的としている。
この判定方法は、ステップ1040における送話音声の騒音判定と同様の方法が使用できる。判定の結果、騒音レベルが閾値以上であった場合には、ステップ1070へ進み、学習停止を音声特徴学習部940に対して出力する。騒音レベルが閾値未満であった場合は、ステップ1080へ進む。
ステップ1080では、学習を実行するように音声特徴学習部940に対して指示する。 In step 1050, the frequency-power characteristic of the received sound output from the received sound analysis unit is acquired.
In step 1060, the noise level of the received sound is determined based on the frequency-power characteristic of the received sound acquired in step 1050. This is intended to be excluded from the learning target because it is determined that it is difficult for the user to talk when the user is talking in an environment where the noise is loud.
As this determination method, a method similar to the noise determination of the transmitted voice in step 1040 can be used. As a result of the determination, if the noise level is equal to or higher than the threshold value, the process proceeds to step 1070 and a learning stop is output to the speech feature learning unit 940. If the noise level is less than the threshold, the process proceeds to step 1080.
In step 1080, the voice feature learning unit 940 is instructed to execute learning.

ステップ1070，1080，1090が終了した後には、ステップ1010へ戻り、処理を繰り返すものとする。なお、この処理の間隔は、送話音声解析部、受話音声解析部におけるフレーム間隔において行うものとする。 After steps 1070, 1080, and 1090 are completed, the process returns to step 1010 and the process is repeated. Note that this processing interval is performed at a frame interval in the transmitted voice analysis unit and the received voice analysis unit.

また、図１０のフローチャートには図示しないが、音声特徴学習部940に学習を指示するかしないかの判断は、他の情報を用いても行うことができる。具体的には、ユーザが車両内におり、ハンズフリー電話装置に備わっている学習装置830を使用している場合において、車両情報取得部870から得た情報より、自動車が停車中か走行中であるかを判定し、停車中の場合にのみ学習実行を指示する。こうすることにより、ユーザがより会話に集中しやすく、かつ騒音も少ない環境における会話音を学習対象とすることができる。 Further, although not shown in the flowchart of FIG. 10, the determination as to whether or not to instruct the speech feature learning unit 940 to perform learning can also be performed using other information. Specifically, when the user is in the vehicle and uses the learning device 830 provided in the hands-free telephone device, the vehicle is stopped or running from the information obtained from the vehicle information acquisition unit 870. Judgment is made on whether or not there is a learning execution instruction only when the vehicle is stopped. By doing this, it is possible to target conversational sounds in an environment in which the user is more likely to concentrate on the conversation and has less noise.

［音声特徴学習部の処理の流れ］
つぎに、送話音声特徴学習部850における、音声特徴学習部940の動作を、図１１のフローチャートに従って説明する。 [Processing flow of voice feature learning unit]
Next, the operation of the speech feature learning unit 940 in the transmitted speech feature learning unit 850 will be described with reference to the flowchart of FIG.

ステップ1102では、送話者を特定する。この特定の方法にはさまざまな方法があるが、第1に、制御部840の情報による着信電話番号の情報を使うことができる。また、送話者が誰であるか分からなくとも、その属性だけでも判別し、その属性を送話者情報として使用することも可能である。たとえば、送話音声解析部910の情報より、男性・女性のいずれかの声であるかを判定することも可能である。また、音声特徴保存部950に保存されているすでに採取された送話者と音声特徴のデータより、話者認識を行っても良い。 In step 1102, a speaker is specified. There are various methods for this specific method. First, the information of the incoming telephone number based on the information of the control unit 840 can be used. Further, even if it is not known who the speaker is, it is possible to determine only the attribute and use the attribute as the speaker information. For example, it is possible to determine whether the voice is male or female from the information of the transmitted voice analysis unit 910. Further, speaker recognition may be performed from the already collected transmitter and voice feature data stored in the voice feature storage unit 950.

ステップ1104では、送話者に対応する送話音声特徴が、音声特徴保存部950に保存されているかを検索する。なお、ステップ1102で述べたとおり、送話者は、男性・女性といった属性だけを特定する場合においては、音声特徴保存部950において、ステップ1102で判定された送話者属性に対応するデータが保存されているかを判定する。もしデータが存在すれば、ステップ1110へ進む。もしデータが存在しなければ、ステップ1106へ進む。 In step 1104, it is searched whether or not the transmitted voice feature corresponding to the sender is stored in the voice feature storage unit 950. As described in step 1102, when the speaker specifies only the attributes such as male / female, the voice feature storage unit 950 stores the data corresponding to the transmitter attribute determined in step 1102. It is determined whether it is done. If there is data, go to step 1110. If no data exists, the process proceeds to step 1106.

ステップ1106では、音声特徴保存部950において、新規の送話者に対する音声特徴情報を作成する。なお、ここで情報を作成した段階では、この新規の送話者の音声特徴はまだ蓄積されていない。 In step 1106, the voice feature storage unit 950 creates voice feature information for a new talker. It should be noted that at the stage where information is created here, the voice characteristics of the new sender have not been accumulated yet.

ステップ1110からステップ1160の処理は、通話中に繰り返し行われる。この繰り返す時間の間隔は、本装置の音声解析における周期であるフレーム間隔と同一のものと仮定する。
ステップ1110では、電話の通話が継続しているか否かを判定する。通話が継続していない場合には、処理を終了する。 The processing from step 1110 to step 1160 is repeatedly performed during a call. This repeated time interval is assumed to be the same as the frame interval, which is the period in the speech analysis of this apparatus.
In step 1110, it is determined whether the telephone call is continued. If the call is not continued, the process is terminated.

ステップ1120では、判定部が出力した情報を参照し、学習実行が指示されているかを判定する。学習実行が指示されている場合には、ステップ1130へ進む。指示されていない場合には、ステップ1110へ戻り、処理を繰り返す。 In step 1120, it is determined whether learning execution is instructed by referring to the information output by the determination unit. If learning execution is instructed, the process proceeds to step 1130. If not, the process returns to step 1110 to repeat the process.

ステップ1130では、送話音声解析部から出力される周波数-パワー特性を得る。 In step 1130, the frequency-power characteristic output from the transmitted voice analysis unit is obtained.

ステップ1140では、送話者の音声が発話中であるかを判定する。この音声発話の検出としては、公知の音声レベルによる方法や、GMMに基づく音声非音声判別の方法を使うことができる。 In step 1140, it is determined whether the voice of the sender is speaking. As the detection of the voice utterance, a method based on a known voice level or a method of voice non-voice discrimination based on GMM can be used.

ステップ1150では、送話解析部から出力された周波数-パワー特性を、音声特徴保存部950に保存する。この処理によって、送話音の音声-周波数特性は、各時刻の値が逐次、音声特徴保存部950に追加されていく。 In step 1150, the frequency-power characteristic output from the transmission analysis unit is stored in the speech feature storage unit 950. By this processing, the value of each time is sequentially added to the voice feature storage unit 950 as the voice-frequency characteristics of the transmitted sound.

ステップ1160では、音声特徴保存部950に蓄積されている送話音声の各時刻の周波数-パワー特性から、各周波数帯域における音圧レベルの分布を求める。この分布は、送話音声においてユーザが聞く範囲の音圧の変動範囲を把握するための情報として利用する。この分布の形式としては、いくつかの形式が考えられる。1つは、各時刻における音圧から、平均値、最大値、最小値を算出する方法である。2つには、過去の音圧の時系列データから、確率分布関数（たとえば、正規分布、ベータ分布など）へフィッティングさせる計算を行い、確率分布関数の各パラメタを推定する方法がある。いずれの方法の実装でもよい。
以上の処理を、通話が終了するまで繰り返すものとする。 In step 1160, the distribution of the sound pressure level in each frequency band is obtained from the frequency-power characteristics at each time of the transmitted voice stored in the voice feature storage unit 950. This distribution is used as information for grasping the fluctuation range of the sound pressure within the range heard by the user in the transmitted voice. There are several possible formats for this distribution. One is a method of calculating an average value, a maximum value, and a minimum value from the sound pressure at each time. Second, there is a method of estimating each parameter of the probability distribution function by performing a calculation for fitting to a probability distribution function (for example, normal distribution, beta distribution, etc.) from past sound pressure time-series data. Either method may be implemented.
The above processing is repeated until the call is finished.

なお、上述の実施例では、送話者の特定を処理の最初に行う方法とした。しかし、送話者の特定を話者認識により行う場合には、十分な音声の量がないと認識が難しい場合がある。そのような場合には、通話中の送話者の音声特徴は一時記憶として記録しておき、通話中に送話者が特定されたならば、通話中または通話終了後に、一時記憶に蓄積した音声特徴を音声特徴保存部950に保存する方法でもよい。 In the above-described embodiment, the method for specifying the sender at the beginning of the process is used. However, when the speaker is specified by speaker recognition, it may be difficult to recognize without a sufficient amount of speech. In such a case, the voice characteristics of the talker during the call are recorded as a temporary memory, and if the talker is identified during the call, it is stored in the temporary memory during or after the call. A method of storing the voice feature in the voice feature storage unit 950 may be used.

［送話音声特徴の利用方法］
音声特徴保存部950に保存されている送話音声特徴には、送話音声調整部842の働きにより、ユーザの聞きやすさを考慮した補正、送話者ごとの聞きやすさを考慮した補正が施された後の送話音声の音圧変動範囲が保存されている。よって、音声特徴保存部950に保存されている送話音声の音圧変動範囲は、ユーザにとって聞きやすい音声の特性として参照できる。ここでは、音声特徴保存部950に保存されている送話音声の音圧変動範囲を、ハンズフリー電話装置125の音質の自動調整において説明した、送話音声特徴保存部212において使用する。 [How to use the transmitted voice features]
For the transmitted voice features stored in the voice feature storage unit 950, the transmission voice adjustment unit 842 performs a correction that takes into account the user's ease of listening and a correction that takes into account the ease of hearing of each speaker. The sound pressure fluctuation range of the transmitted voice after being applied is stored. Therefore, the sound pressure fluctuation range of the transmitted voice stored in the voice feature storage unit 950 can be referred to as a voice characteristic that is easy to hear for the user. Here, the sound pressure fluctuation range of the transmission voice stored in the voice feature storage unit 950 is used in the transmission voice feature storage unit 212 described in the automatic adjustment of the sound quality of the hands-free telephone device 125.

学習手段830にある音声特徴保存部950にある送話音声特徴を、ハンズフリー電話装置125の送話音声特徴保存部212に転送する方法について説明する。
第1の方法として、携帯電話を経由する方法がある。この方法では、携帯電話に備わっているメモリに音声特徴保存部950の情報を転送する。つぎに、この携帯電話を介して、ハンズフリー電話装置125において電話をする。この際に、送話音声特徴保存部212に転送する。
第2の方法として、インターネットなどのネットワーク経由による方法が考えられる。ここでは、学習手段830または学習手段830が備わっている固定電話・携帯電話・ハンズフリー電話装置がデータ通信機能を持つことを想定する。これにより、所定のサーバに対して音声特徴保存部950にある送話音声特徴をアップロードする。ハンズフリー電話装置125においては、サーバに対して送話音声特徴の要求を出し、ダウンロードを行う。
第3の方法として、フラッシュメモリやICチップなどの外部記憶装置による転送がある。この方法では、学習手段830が外部記憶装置とのデータ通信機能を持つことを想定する。これにより、フラッシュメモリやICチップなどに音声特徴保存部950の送話音声特徴を書き込む。さらに、ハンズフリー電話装置125においても、外部記憶装置とのデータ通信機能を有することを想定し、送話音声特徴を転送する。または、運転免許証に埋め込まれているICや、自動車のキーに埋め込まれている記憶領域に対して書き込みをおこない、これを、自動車の機器を通じて、ハンズフリー電話装置125に転送する方法をとってもよい。 A method of transferring the transmission voice feature in the voice feature storage unit 950 in the learning unit 830 to the transmission voice feature storage unit 212 of the hands-free telephone device 125 will be described.
The first method is via a mobile phone. In this method, the information of the voice feature storage unit 950 is transferred to a memory provided in the mobile phone. Next, a call is made by the hands-free telephone device 125 via this mobile phone. At this time, it is transferred to the transmitted voice feature storage unit 212.
As a second method, a method via a network such as the Internet can be considered. Here, it is assumed that the learning unit 830 or the fixed telephone / mobile phone / hands-free telephone device provided with the learning unit 830 has a data communication function. Thereby, the transmitted voice feature in the voice feature storage unit 950 is uploaded to a predetermined server. The hands-free telephone device 125 issues a request for the transmitted voice feature to the server and downloads it.
As a third method, there is transfer by an external storage device such as a flash memory or an IC chip. In this method, it is assumed that the learning unit 830 has a data communication function with an external storage device. As a result, the transmitted voice feature of the voice feature storage unit 950 is written into a flash memory or an IC chip. Further, assuming that the hands-free telephone device 125 has a data communication function with the external storage device, the transmitted voice feature is transferred. Alternatively, a method may be used in which an IC embedded in a driver's license or a storage area embedded in a car key is written, and this is transferred to the hands-free telephone device 125 through a car device. .

［使用する音声特徴の選別］
なお、ハンズフリー電話装置125の動作の説明においても記述したが、送話音声特徴は、一般に、長い時間の音声から学習したほうが、信頼性が高い音声特徴であると考えられる。よって、ハンズフリー電話装置125では、送話音声特徴の学習の時間に応じて、送話音声特徴を使用するか否かを判断する実施例を説明した。この実施を可能とするために、学習手段830の音声特徴学習部940においても、音声特徴を保存するだけでなく、その特徴の算出に使われた音声の時間長（フレーム数）を記録するものとする。 [Selecting voice features to use]
Although described in the explanation of the operation of the hands-free telephone device 125, the transmitted voice feature is generally considered to be a voice feature with higher reliability when learned from a long-time voice. Therefore, in the hands-free telephone device 125, the embodiment has been described in which it is determined whether or not to use the transmitted voice feature according to the learning time of the transmitted voice feature. In order to enable this implementation, the speech feature learning unit 940 of the learning means 830 not only stores the speech feature but also records the time length (number of frames) of the speech used to calculate the feature. And

［ハンズフリー電話における学習］
また、学習手段は、固定電話、携帯電話、ハンズフリー電話など、異なる環境における学習が可能である。よって、学習された音声特徴を利用してハンズフリー電話装置125において音質調整を行なう際には、学習された場面が近いほうがより音質がふさわしくなると考えられる。よって、ハンズフリー電話使用時における学習データが十分な量だけ存在する際には、固定電話・携帯電話で学習された送話音声特徴ではなく、ハンズフリー電話で学習された送話音声特徴を使用することが好ましい。 [Learning on hands-free phone]
The learning means can learn in different environments such as a fixed phone, a mobile phone, and a hands-free phone. Therefore, when adjusting the sound quality in the hands-free telephone device 125 using the learned voice feature, it is considered that the closer the learned scene is, the better the sound quality is. Therefore, when there is a sufficient amount of learning data when using a hands-free phone, the sent voice feature learned with a hands-free phone is used instead of the sent voice feature learned with a fixed-line or mobile phone. It is preferable to do.

一方、ハンズフリー電話において学習された送話音声特徴は、車両の伝達関数の影響を受けた音声をユーザが聴取している条件において学習されている。よって、この送話音声特徴を音質調整に利用する際には、学習時の車両伝達特性の影響を考慮する必要がある。この方法としては、2つが挙げられる。第1は、学習時にユーザが乗っている車両での車両伝達関数をあらかじめ算出しておき、この特性をキャンセルした上で、学習を行う方法である。第2には、ハンズフリー電話装置125において音声を再生する際に、すでに車両伝達関数を加味した音声となっていることから、図6のステップ635の処理を省略して進める方法である。この方法は、ユーザが学習時に乗っていた車両の伝達関数と、ユーザが音質補正を行いつつ電話を行う車両の伝達特性とが同一であるとみなしている。実際には、伝達特性は異なるが、自動車の車室の一般的特性は反映される。 On the other hand, the transmitted voice feature learned in the hands-free telephone is learned under the condition that the user is listening to the voice affected by the transfer function of the vehicle. Therefore, when using this transmitted voice feature for sound quality adjustment, it is necessary to consider the influence of vehicle transfer characteristics during learning. There are two methods for this. The first is a method in which learning is performed after a vehicle transfer function is calculated in advance for a vehicle in which the user is riding during learning, and this characteristic is canceled. The second is a method of skipping the process of step 635 in FIG. 6 because the voice is already taken into account when the voice is reproduced by the hands-free telephone device 125, and the vehicle transfer function is taken into account. This method considers that the transfer function of the vehicle on which the user was riding at the time of learning and the transfer characteristics of the vehicle on which the user makes a phone call while correcting the sound quality are the same. In practice, the transfer characteristics are different, but the general characteristics of the vehicle cabin are reflected.

［まとめ］
以上、ユーザが普段、電話で会話している状況において、送話音声特徴を学習することにより、普段の聞きやすいと感じる音声の特徴を自動的に学習することができる。また、これをハンズフリー電話装置において利用することにより、自動車の運転における会話でも良い音質を提供することができる。 [Summary]
As described above, in a situation where the user is usually talking on the phone, by learning the transmitted voice feature, it is possible to automatically learn the voice feature that is usually easy to hear. Also, by using this in a hands-free telephone device, it is possible to provide sound quality that is good for conversations during driving of a car.

以上、ハンズフリー電話における実施の形態を説明したが、ハンズフリー電話装置に限らず、本発明で開示する技術は、オーディオの再生など、音響再生装置一般において利用することが可能である。また、自動車等の車両に限らず、一般の部屋などにおいても利用することができる。 Although the embodiment of the hands-free telephone has been described above, the technology disclosed in the present invention is not limited to the hands-free telephone apparatus, and can be used in general sound reproduction apparatuses such as audio reproduction. Moreover, it can utilize not only in vehicles, such as a motor vehicle, but in a general room.

本発明で開示する技術を音声調整部に適用すれば、受話者が聞く音質を快適にすることができる。 If the technology disclosed in the present invention is applied to the voice adjustment unit, the sound quality heard by the listener can be made comfortable.

１１０…電話回線網、１２０…携帯電話、１２５…ハンズフリー電話装置、１３０…制御部、１３２…受話音声調整部、１３４…送話音声調整部、１４０…エコーキャンセル部、１５０…車両情報取得部、１６０…フィルタ設計部、１７０…フィルタ処理部、１８０…マイク、１９０…スピーカ、
２１０…送話者特定部、２１１…送話音声特徴選択部、２１２…送話音声特徴保存部、２２０…車両情報特定部、２２１…車両伝達関数選定部、２２２…車両伝達関数保存部、２３０…車内音響環境推定部、２３５…騒音データ保存部、２４０…送話音声解析部、２５０…フィルタ作成部、
８１０…電話回線網、８２０…携帯電話、８３０…学習手段、８４０…制御部、８４２…送話音声調整部、８４４…受話音声調整部、８５０…送話音声特徴学習部、８６０…操作入力部、８７０…車両情報取得部、８８０…スピーカ、８８２…マイク、８８４…エコーキャンセル部、
９１０…送話音声解析部、９２０…判定部、９３０…受話音声解析部、９４０…音声特徴学習部、９５０…音声特徴保存部。 DESCRIPTION OF SYMBOLS 110 ... Telephone line network, 120 ... Mobile phone, 125 ... Hands free telephone apparatus, 130 ... Control part, 132 ... Received voice adjustment part, 134 ... Transmitted voice adjustment part, 140 ... Echo cancellation part, 150 ... Vehicle information acquisition part , 160 ... Filter design section, 170 ... Filter processing section, 180 ... Microphone, 190 ... Speaker,
210: Speaker identification unit 211: Transmission voice feature selection unit 212 ... Transmission voice feature storage unit 220 ... Vehicle information identification unit 221 ... Vehicle transfer function selection unit 222: Vehicle transfer function storage unit 230 ... In-car acoustic environment estimation unit, 235 ... Noise data storage unit, 240 ... Send speech analysis unit, 250 ... Filter creation unit,
810 ... Telephone line network, 820 ... Mobile phone, 830 ... Learning means, 840 ... Control unit, 842 ... Transmitted voice adjustment unit, 844 ... Received voice adjustment unit, 850 ... Transmitted voice feature learning unit, 860 ... Operation input unit 870 ... Vehicle information acquisition unit, 880 ... Speaker, 882 ... Microphone, 884 ... Echo cancellation unit,
910 ... Transmitted voice analysis unit, 920 ... Determination unit, 930 ... Received voice analysis unit, 940 ... Voice feature learning unit, 950 ... Voice feature storage unit.

Claims

A transmission voice feature storage unit that stores the acoustic characteristics of one or a plurality of transmission voices;
A speaker identification unit for identifying a speaker or a speaker attribute;
Based on the information of the speaker specified by the speaker identification unit, the acoustic characteristic of the transmission voice is selected and acquired from the acoustic characteristic of the transmission voice stored in the transmission voice feature storage unit. A speech feature selection unit;
A transfer function storage for storing one or more transfer functions from the speaker that reproduces the transmitted voice to the position of the listener,
From the transfer function stored in the transfer function storage unit, based on information including at least the location information of the listener, a transfer function selection unit that selects and acquires a transfer function;
Based on the acoustic characteristics of the transmitted voice selected by the transmitted voice feature selection section and the transfer function selected by the transfer function selection section, the acoustic characteristics of the reproduced sound at the position of the receiver is the transmitted voice feature selection section. A filter creation unit for creating a filter close to the acoustic characteristics of the transmitted voice selected by
A sound reproduction apparatus comprising: a filter processing unit that performs a filter process based on the filter created by the filter creation unit on a reproduced sound.

In the sound reproduction device according to claim 1,
The transmission voice feature storage unit has acoustic characteristics of a plurality of transmission voices stored for each attribute of the speaker,
The transmission voice feature selection unit is configured to transmit a transmission voice from an acoustic characteristic of the transmission voice stored in the transmission voice feature storage unit based on an output of the speaker specification unit that specifies a speaker attribute. An acoustic reproduction apparatus characterized by selecting and acquiring the acoustic characteristics of speech.

The sound reproducing device according to claim 1,
The acoustic reproduction device characterized in that the acoustic characteristics of the transmitted voice stored in the transmitted voice characteristic storage unit are stored by a value based on the sound pressure of the voice for each frequency band.

The sound reproduction device according to claim 3,
The value based on the sound pressure of the sound for each frequency band in the acoustic characteristics of the transmitted sound stored in the transmitted sound feature storage unit is a probability distribution in which the relationship between the sound pressure and the occurrence probability is recorded. A sound reproducing device characterized by the above.

The sound reproduction device according to claim 3,
In the acoustic characteristics of the transmitted voice stored in the transmitted voice feature storage unit, the values based on the sound pressure of the sound for each frequency band are the average value of the sound pressure, the lower limit value of the sound pressure, and the upper limit of the sound pressure. A sound reproducing device characterized by being one or more of the values.

The sound reproducing device according to claim 1,
Furthermore, an acoustic environment estimation unit that estimates acoustic characteristics at the position of the listener is provided,
The said filter preparation part changes the filter to produce based on the acoustic characteristic output from an acoustic environment estimation part, The acoustic reproduction apparatus characterized by the above-mentioned.

In the sound reproduction device according to claim 6,
The acoustic environment estimation unit predicts acoustic characteristics of noise at the position of the listener,
The filter creation unit is configured such that the frequency characteristic of the reproduced sound at the position of the listener is within the distribution of the acoustic characteristic of the transmitted voice selected by the transmitted voice feature selection unit, or the loudness masking by the non-reproduced sound A sound reproduction device characterized in that a filter is created based on one or a plurality of criteria among those having little influence on reproduction sound.

The sound reproducing device according to claim 1,
Furthermore, a vehicle information acquisition unit that acquires information on the vehicle used by the listener is provided.
The transfer function storage unit stores one or more transfer functions associated with various positions and conditions of the vehicle,
The sound reproduction apparatus, wherein the transfer function selection unit selects a transfer function based on an output of the vehicle information acquisition unit.

The sound reproducing device according to claim 1,
The transfer function stored in the transfer function storage unit is stored with accompanying information on the fluctuation range thereof.

The sound reproducing device according to claim 1, further comprising learning means,
The learning means is
A transmission voice adjustment unit for adjusting the sound quality of the transmission voice;
A transmission voice analysis unit for analyzing the acoustic characteristics of the transmission voice whose sound quality has been adjusted by the transmission voice adjustment unit;
A voice feature storage unit that stores the acoustic characteristics of the transmitted voice output by the transmitted voice analysis unit;
A speech feature learning unit that learns the acoustic characteristics of the transmitted speech based on the output of the transmitted speech analysis unit, and stores a learning result in the speech feature storage unit,
An acoustic reproduction apparatus characterized in that the acoustic characteristic of the transmitted voice stored in the voice feature storage unit of the learning means is used as the acoustic characteristic of the transmitted voice of the transmitted voice feature storage unit.

The sound reproducing device according to claim 10, wherein
In the voice feature storage unit, acoustic characteristics of a plurality of transmission voices are stored for each attribute of the speaker,
The voice feature learning unit specifies the attribute of the speaker based on the output of the speaker specifying unit that specifies the attribute of the speaker,
An audio reproducing apparatus, wherein the audio feature storage unit stores the specified attribute information and a learning result in a correlated form.

The sound reproducing device according to claim 10, wherein
The sound reproduction apparatus according to claim 1, wherein the learning unit further includes a determination unit that determines whether or not the acoustic characteristics of the transmitted sound can be learned.

The sound reproducing device according to claim 12,
The determination unit determines that learning is possible when it is determined that the sound is not a non-conversation sound as a result of analyzing one or more of the transmitted sound or the received sound.

The sound reproducing device according to claim 12,
The learning means further includes an operation input unit that receives a user operation,
The determination unit determines that learning is possible according to the output of the operation input unit.

The sound reproducing device according to claim 12,
A vehicle information acquisition unit for acquiring information on the vehicle used by the listener;
The determination unit determines that learning is possible based on the output of the vehicle information acquisition unit.

The sound reproducing device according to claim 10, wherein
The transmitted voice analysis unit calculates a value based on a sound pressure for each frequency band of the transmitted voice and outputs the calculated value as an acoustic feature amount.

The sound reproducing device according to claim 16, wherein
The voice feature learning unit receives a value based on the sound pressure for each frequency band output from the transmitted voice analysis unit, calculates a sound pressure occurrence probability distribution for each frequency band, and stores this in the voice feature storage unit. A sound reproducing device for storing.

The sound reproducing device according to claim 16, wherein
The voice feature learning unit receives a value based on the sound pressure for each frequency band output from the transmitted voice analysis unit, and from the sound pressure for each frequency band, the average value of the sound pressure, the lower limit value of the sound pressure, the sound pressure One or more of the upper limit values are calculated and stored in the transmitted voice feature storage unit.

In the sound reproduction device according to claim 1,
In the transmission voice feature storage unit, acoustic characteristics of different transmission voices are stored depending on the listener,
The transmission voice feature selection unit is configured to determine the transmission voice from the acoustic characteristics of the transmission voice stored in the transmission voice feature storage unit based on the output of the listener specification unit that specifies the attributes of the listener. An acoustic reproduction apparatus, wherein acoustic characteristics are selected and acquired.

A hands-free telephone device incorporating the sound reproducing device according to any one of claims 1 to 19.