JP2011071806A

JP2011071806A - Electronic device, and sound-volume control program for the same

Info

Publication number: JP2011071806A
Application number: JP2009221967A
Authority: JP
Inventors: Yusaku Kikukawa; 裕作菊川; Takashi Sudo; 隆須藤; Masataka Osada; 将高長田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2009-09-28
Filing date: 2009-09-28
Publication date: 2011-04-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide an electronic device for controlling amplitude of an input signal, or a sound-volume control program for the same. <P>SOLUTION: An input voice signal is partitioned in a frame unit to calculate an amplitude gain and to correct amplitude of the input signal according to the amplitude gain. An amplitude gain of an immediately preceding frame and that of a frame in processing are smoothly transited. The slowing-down or speeding-up of transition of the amplitude gains is changed according to voice characteristics of the immediately preceding frame and those of the frame in processing. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は出力音量を制御する電子機器、及び電子機器の音量制御プログラムに関する。 The present invention relates to an electronic device that controls an output volume, and a volume control program for the electronic device.

近年、電話通信用の音声信号や、あるいはラジオ電波やテレビ電波を受信し、スピーカなどの音響出力デバイスから音声を出力することが可能な電子機器が普及している。こうした電子機器の一例である携帯電話機は、無線信号を受信し無線信号中に変調された音声信号を取り出す。取り出した音声信号を例えば携帯電話機のスピーカから音声として出力することで、携帯電話機の使用者は音声信号を聞き取ることができる。 2. Description of the Related Art In recent years, electronic devices that can receive audio signals for telephone communication, radio waves, or television waves and output sound from an acoustic output device such as a speaker have become widespread. A mobile phone which is an example of such an electronic device receives a radio signal and extracts a voice signal modulated in the radio signal. By outputting the extracted audio signal as audio from a speaker of the mobile phone, for example, the user of the mobile phone can hear the audio signal.

この音声信号は、例えば音声信号を送信した話者のアクセントの変化などによって、ある程度の音量の変化を伴って音響出力デバイスから出力されることとなる。このため、音量の変化が急激だった場合には、例えば音響出力デバイスから突然大きな音量の音声が出力され、使用者に不快な思いをさせる虞があった。そこで、入力された音声信号をバッファ処理し、音声信号に大きな音量成分が含まれないか否かを検知する。大きな音量成分が含まれる場合は、音声信号全体を正規化し、バッファされた音声信号全体に亘って音量を低下させるよう制御する。音量を低下させた音声信号を出力することで、大きな音量の音声が音響出力デバイスから出力される事態を防ぐ発明が公開されている（例えば、特許文献１を参照）。 This sound signal is output from the sound output device with a certain amount of volume change due to, for example, a change in accent of the speaker who transmitted the sound signal. For this reason, when the change in volume is abrupt, for example, a sound with a large volume is suddenly output from the acoustic output device, which may make the user feel uncomfortable. Therefore, the input audio signal is buffered to detect whether the audio signal does not contain a large volume component. When a large volume component is included, the entire audio signal is normalized, and control is performed to reduce the volume over the entire buffered audio signal. An invention that prevents a situation in which a sound with a high sound volume is output from an acoustic output device by outputting a sound signal with a reduced sound volume is disclosed (for example, see Patent Document 1).

特開２００９−２１８３４JP 2009-21834 A

しかし先に述べた様な、音声信号全体をバッファリングし、その後音声信号の音量を補正する方法では、音声信号が入力されてから音響出力デバイスより音声が出力されるまでの間に多大な遅延が発生することとなる。携帯電話機を用いて通話を行う場合などは、音声信号の入力を受けてから出力するまでの遅延は短いことが望ましい。 However, with the method of buffering the entire audio signal and then correcting the volume of the audio signal as described above, there is a great delay between the input of the audio signal and the output of the audio from the audio output device. Will occur. When making a call using a mobile phone, it is desirable that the delay from the input of an audio signal to the output be short.

本発明は上記問題点を解決するためになされたもので、入力信号の振幅を制御することが可能な電子機器、あるいは電子機器の音量制御プログラムに関する。 The present invention has been made to solve the above-described problems, and relates to an electronic device capable of controlling the amplitude of an input signal or a volume control program for an electronic device.

前記目的を達成するために、本発明の電子機器は、音声の入力を受け付ける音声入力受付手段と、音声の入力音量を測定する音量測定手段と、第1の時間区間及び、第1の時間区間以前の第2の時間区間に受け付けた音声と、当該音声の入力音量とをそれぞれ測定する
音量測定手段と、第1の時間区間及び、第1の時間区間以前の第2の時間区間に受け付けた
音声と、当該音声の入力音量とをそれぞれ記憶する記憶手段と、前記記憶手段によって記憶された、第1の時間区間に受け付けた音声の入力音量のうち最大の音量が所定の音量以
上である場合に、第1の時間区間に受け付けた音声の入力音量の値に応じて第1の時間区間の音量ゲインを設定する音量ゲイン設定手段と、第1の時間区間に受け付けた音声の出力
開始時には第2の時間区間の音量ゲインであり、第１の時間区間に受け付けた音声の出力
終了時には第1の時間区間の音量ゲインであるように遷移する音量ゲイン遷移関数を設定
する音量ゲイン遷移関数設定手段と、音声の出力音量を前記音量ゲイン遷移関数に応じて変化させる音量制御手段と、前記音量制御手段によって変化した出力音量で第1の時間区
間に受け付けた音声を出力する音声出力手段とを有することを特徴とする。 In order to achieve the object, an electronic device according to the present invention includes a voice input receiving unit that receives a voice input, a volume measuring unit that measures a voice input volume, a first time interval, and a first time interval. The sound received in the previous second time interval, the volume measuring means for measuring the input sound volume of the sound, the first time interval, and the second time interval before the first time interval. A storage means for storing each of the sound and the input volume of the sound; and a maximum volume of the input sound volume received in the first time interval stored by the storage means is equal to or higher than a predetermined volume In addition, the volume gain setting means for setting the volume gain of the first time interval according to the value of the input volume of the sound received during the first time interval, and at the start of the output of the sound received during the first time interval Volume gain for time interval 2 There is a volume gain transition function setting means for setting a volume gain transition function for transitioning to be the volume gain of the first time interval when the output of the sound received in the first time interval is ended, and the output volume of the sound is Volume control means for changing in accordance with a volume gain transition function and voice output means for outputting the sound received in the first time interval with the output volume changed by the volume control means.

また、前記目的を達成するために、本発明の電子機器の音量制御プログラムは、第1の時
間区間及び、第1の時間区間以前の第2の時間区間に受け付けた音声と、当該音声の入力音量とをそれぞれ記憶する記憶手段と、前記記憶手段によって記憶された、第1の時間区間
に受け付けた音声の入力音量が所定の音量以上である場合に、第1の時間区間に受け付け
た音声の入力音量の値に応じて第1の時間区間の音量ゲインを設定する音量ゲイン設定手
段と、第1の時間区間に受け付けた音声の出力開始時には第2の時間区間の音量ゲインであり、第１の時間区間に受け付けた音声の出力終了時には第1の時間区間の音量ゲインであ
るように遷移する音量ゲイン遷移関数を設定する音量ゲイン遷移関数設定手段と、音声の出力音量を前記音量ゲイン遷移関数に応じて変化させる音量制御手段と、を有することを特徴とする。 In order to achieve the above object, the volume control program for an electronic device according to the present invention includes a sound received in a first time interval and a second time interval before the first time interval, and input of the sound. Storage means for storing each of the volume, and when the input volume of the sound received in the first time interval stored by the storage means is equal to or higher than a predetermined volume, the sound received in the first time interval The volume gain setting means for setting the volume gain of the first time interval according to the value of the input volume, and the volume gain of the second time interval at the start of the output of the sound accepted in the first time interval, A volume gain transition function setting means for setting a volume gain transition function for transitioning to be the volume gain of the first time section when the output of the sound received in the time section is completed, and the volume gain transition function In Flip and having and a volume control means vary.

本発明によれば、入力された音声信号を時間単位に区切り、時間単位毎に振幅の補正を行うことにより、出力音量を制御して出力することが可能な電子機器、あるいは電子機器の音量制御プログラムが得られる。 ADVANTAGE OF THE INVENTION According to this invention, the audio | voice apparatus which controls the output volume by dividing the input audio | voice signal into a time unit and performing an amplitude correction | amendment for every time unit, or the volume control of an electronic device A program is obtained.

本実施形態における携帯電話機の構成を示す図。The figure which shows the structure of the mobile telephone in this embodiment. 本実施形態における制御部の構成を示す図。The figure which shows the structure of the control part in this embodiment. 本実施形態における信号特性判定部の内部の構成を示す図。The figure which shows the internal structure of the signal characteristic determination part in this embodiment. 本実施形態におけるサンプル単位ゲイン算出部の構成を示す図。The figure which shows the structure of the sample unit gain calculation part in this embodiment. 本実施形態における入力信号と、フレーム及びサブフレームの構成を示す図。The figure which shows the structure of the input signal in this embodiment, a flame | frame, and a sub-frame. 本実施形態における振幅ゲインの一例を示す図。The figure which shows an example of the amplitude gain in this embodiment. 本実施形態の制御部１００による音量制御の処理の流れを示すフロー図。The flowchart which shows the flow of a process of the volume control by the control part 100 of this embodiment. 第２の実施形態における携帯電話機の内部の構成を示す図。The figure which shows the internal structure of the mobile telephone in 2nd Embodiment. 本実施形態の第２制御部２００による音量制御部の処理の流れを示すフロー図。The flowchart which shows the flow of a process of the volume control part by the 2nd control part 200 of this embodiment.

以下、本発明の実施の形態について、図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（電子機器の構成）
図１は、本発明における電子機器の一例である、携帯電話機の構成を示す図である。図２は、図１に示した携帯電話機に設けられる、制御部１００の構成を示す図である。以下、特に断りが無い場合は、この図１及び図２に従って、本発明の各構成部の動作を説明する。 (Configuration of electronic equipment)
FIG. 1 is a diagram illustrating a configuration of a mobile phone which is an example of an electronic apparatus according to the present invention. FIG. 2 is a diagram showing a configuration of the control unit 100 provided in the mobile phone shown in FIG. Hereinafter, the operation of each component of the present invention will be described with reference to FIGS. 1 and 2 unless otherwise specified.

携帯電話機に搭載される制御部１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの電子回路から構成される。ＣＰＵは、後述するＲＯＭあるいはＲＡＭに記憶されているプログラムに従って処理を実行する。更にＣＰＵは、上述した各回路部から供給される信号を処理し、また種々の制御信号を生成し、各回路部へ供給する。 The control unit 100 mounted on the mobile phone is configured by an electronic circuit such as a CPU (Central Processing Unit). The CPU executes processing according to a program stored in a ROM or RAM described later. Further, the CPU processes signals supplied from each circuit unit described above, generates various control signals, and supplies them to each circuit unit.

これらの処理により、ＣＰＵは携帯電話機を統括的に制御する。記憶部４は、例えば電気的に書き換えや消去が可能な不揮発性メモリであるフラッシュメモリ素子、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、およびＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などから構成される。記憶部４には、制御部１００のＣＰＵにより実行される種々のアプリケーションプログラムやデータ群、あるいは音声データなどが格納されている。 Through these processes, the CPU comprehensively controls the mobile phone. The storage unit 4 includes, for example, a flash memory element that is a nonvolatile memory that can be electrically rewritten and erased, an HDD (Hard Disc Drive), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. . The storage unit 4 stores various application programs executed by the CPU of the control unit 100, data groups, audio data, and the like.

アンテナ１は、無線信号の送受信を図示しない無線基地局との間で行う。アンテナ１は受信した無線信号を通信信号処理部２へと入力する。また、通信信号処理部２から入力された無線信号を、無線基地局へ向けて発信する。 The antenna 1 transmits and receives radio signals to and from a radio base station (not shown). The antenna 1 inputs the received radio signal to the communication signal processing unit 2. Further, the radio signal input from the communication signal processing unit 2 is transmitted to the radio base station.

通信信号処理部２は、アンテナ１から入力された無線信号を、制御部１００にて信号処理可能な電気信号へと変換し、変換した電気信号を制御部１００へと入力する。また、制御部１００から出力された電気信号を、アンテナ１から発信可能な無線信号へと変換し、アンテナ１へと出力する。 The communication signal processing unit 2 converts the radio signal input from the antenna 1 into an electric signal that can be processed by the control unit 100, and inputs the converted electric signal to the control unit 100. In addition, the electrical signal output from the control unit 100 is converted into a radio signal that can be transmitted from the antenna 1 and output to the antenna 1.

スピーカ５は、制御部１００から入力された音声出力信号を、音声に変換して出力する。 The speaker 5 converts the sound output signal input from the control unit 100 into sound and outputs the sound.

マイクロフォン６は、入力された音声を音声入力信号へと変換して制御部１００へと出力する。 The microphone 6 converts the input voice into a voice input signal and outputs it to the control unit 100.

図２を用いて説明する出力音量制御の処理は、音声入力信号の入力を受けて、これを音声出力信号へと加工して出力する処理である。音声入力信号の入力は、例えばマイクロフォン６から入力されるものであっても構わないし、記憶部４に格納された音声データを入力として制御部１００が読み出すものであっても構わない。また、アンテナ１が無線信号として受信するものであっても構わない。 The output volume control process described with reference to FIG. 2 is a process of receiving an input audio signal, processing it into an audio output signal, and outputting it. The input of the audio input signal may be input from, for example, the microphone 6 or may be read by the control unit 100 using the audio data stored in the storage unit 4 as an input. Further, the antenna 1 may be one that receives as a radio signal.

音声入力信号は、事前に決められた音声信号の単位（以下、単にフレームと表記する）毎にディジタル符号化され、音声入力信号ｘ［ｎ］（ｎ＝１，２，３，・・・Ｎ）として制御部１００へと入力され、音声出力信号ｙ［ｎ］へと加工される。加工された音声出力信号は、スピーカ５より音声として出力される。なお、音声出力信号の出力先はスピーカ５に限られるものではなく、例えば通信信号処理部２を介して無線信号へと変換し、アンテナ１が音声出力信号を発信するものであってもかまわない。あるいは、制御部１００に別途接続したヘッドホンなどに音声出力信号を出力するものであっても構わない。 The audio input signal is digitally encoded for each predetermined audio signal unit (hereinafter simply referred to as a frame), and the audio input signal x [n] (n = 1, 2, 3,... N ) To the control unit 100 and processed into an audio output signal y [n]. The processed audio output signal is output as audio from the speaker 5. Note that the output destination of the audio output signal is not limited to the speaker 5. For example, the audio output signal may be converted into a radio signal via the communication signal processing unit 2 and the antenna 1 may transmit the audio output signal. . Alternatively, a sound output signal may be output to headphones or the like separately connected to the control unit 100.

図５は、本発明における電子機器が処理する音声信号の一例を表した図である。音声入力信号ｘ［ｎ］は、音声をサンプリングした際の振幅の値を示している。音声入力信号ｘ［ｎ］は、フレームを構成するサンプルの数Ｎと等しい数の振幅データを有する。後述するサブフレーム分割部１０２の処理によって、Ｎ個のサンプルから構成されるフレームは、Ｓ個のサンプルから構成されるサブフレームへと分割される。図５においては、Ｎ＝１２、Ｓ＝４となる例を示している。後述する本発明の音量制御処理においては、サブフレーム毎に信号処理が行われる。時間領域に対して細かく分割されたサブフレーム毎に音量制御を行うことにより、音量変化の激しい音声信号が入力された場合であっても時間領域に対して精度良く音量の制御を行うことができる。また、音声入力信号ｘ［ｎ］は、例えば−２の１５乗から＋２の１５乗までの範囲を持つ、１６ビットのディジタルデータを用いて表記される。しかし、音声入力信号ｘ［ｎ］の表記形式はこれに限定されるものではなく、オーディオ信号であっても構わないし、１６ビット以外のビット数を持つディジタルデータを用いて表記しても構わないし、あるいはフローティング（浮動小数点）の信号を用いて表記しても構わない。なお、後述する音声信号の処理は、例としてＮ個のサンプルから構成されるフレームを、Ｓ個のサンプルから構成されるサブフレームへと分割し、サブフレーム単位で音声信号の処理を行うものとして述べる。しかし、本発明における音声信号の処理はこれに限られるものではなく、サブフレームへの分割を省略し、フレーム単位で音声信号の処理を行うものであっても構わない。また、後述する音声信号の処理は、便宜上Ｎ＝１２となる場合の処理について述べる。しかし、本発明における音声信号の処理はこれに限られるものではない。Ｎに他の値、例えば２のべき条となる５１２、１０２４、２０４８などの値を用いても構わない。 FIG. 5 is a diagram illustrating an example of an audio signal processed by the electronic device according to the present invention. The audio input signal x [n] indicates an amplitude value when audio is sampled. The audio input signal x [n] has the same number of amplitude data as the number N of samples constituting the frame. A frame composed of N samples is divided into subframes composed of S samples by processing of a subframe dividing unit 102 described later. FIG. 5 shows an example in which N = 12, S = 4. In the volume control processing of the present invention described later, signal processing is performed for each subframe. By performing volume control for each subframe finely divided in the time domain, it is possible to accurately control the volume in the time domain even when an audio signal with a large volume change is input. . The audio input signal x [n] is expressed using 16-bit digital data having a range from, for example, −2 to the 15th power to +2 to the 15th power. However, the notation format of the audio input signal x [n] is not limited to this, and may be an audio signal or may be represented using digital data having a number of bits other than 16 bits. Alternatively, a floating (floating point) signal may be used. Note that the audio signal processing to be described later assumes that a frame composed of N samples is divided into subframes composed of S samples as an example, and the audio signal is processed in units of subframes. State. However, the audio signal processing in the present invention is not limited to this, and the audio signal processing may be performed in units of frames by omitting division into subframes. The audio signal processing described later will be described for the case where N = 12, for convenience. However, the audio signal processing in the present invention is not limited to this. Other values may be used for N, for example, 512, 1024, 2048 or the like that is a power of 2.

以降、本明細書における各ディジタル信号は、ここに述べたいずれの形式を用いて表記されても構わないものとする。 Hereinafter, each digital signal in this specification may be expressed using any of the formats described herein.

（第１の実施例）
携帯電話機の音声信号をスピーカ５などの所謂音響出力デバイスから出力する際には、出力される音声の周波数特性は音響出力デバイスの出力周波数特性によって左右される。 (First embodiment)
When outputting an audio signal of a mobile phone from a so-called acoustic output device such as the speaker 5, the frequency characteristic of the output audio depends on the output frequency characteristic of the acoustic output device.

音響出力デバイスの出力周波数特性が平坦でない場合には、入力された音声に比べ出力される音声が変化してしまう。そのため、携帯電話機の使用者は入力音声の持つ本来の音質を体感することができない。例えば、携帯電話機に用いられる音響出力デバイスは小型に構成されるため、４ｋＨｚ以上の高周波帯で周波数特性が劣化している場合が多い。 When the output frequency characteristic of the acoustic output device is not flat, the output sound changes compared to the input sound. Therefore, the user of the mobile phone cannot experience the original sound quality of the input voice. For example, since an acoustic output device used for a mobile phone is configured to be small, frequency characteristics often deteriorate in a high frequency band of 4 kHz or higher.

こうした音響デバイスの出力周波数特性による出力音声の劣化を防ぐために、予め音響出力デバイスの出力周波数特性に合わせて音声信号を補正する方法が考えられる。先の携帯電話機に用いられる音響出力デバイスにおいては、音響出力デバイスによって減衰してしまう４ｋＨｚ以上の帯域の音声信号を増幅する。これにより、出力された音声の周波数特性は入力された音声に近づき、携帯電話機の使用者は入力音声の持つ本来の音質を体感することができる。 In order to prevent the deterioration of the output sound due to the output frequency characteristic of the acoustic device, a method of correcting the sound signal in advance according to the output frequency characteristic of the acoustic output device can be considered. In the acoustic output device used in the previous mobile phone, an audio signal in a band of 4 kHz or more that is attenuated by the acoustic output device is amplified. As a result, the frequency characteristic of the output sound approaches that of the input sound, and the user of the mobile phone can experience the original sound quality of the input sound.

しかし、このような音声信号の補正方法を用いる場合は、入力された音声信号によっては信号の増幅が多大になされてしまい、振幅飽和を起こす虞がある。振幅飽和を起こす音声信号としては、先の携帯電話機において通話を行う場合の、音声に無声音（肺からの呼気が声帯を振動させずそのまま通り抜けて出る音）を含む音声信号など考えられる。無声音は一般的に４ｋＨｚ以上に周波数成分が集中するため、無声音を含む音声信号を補正すると振幅飽和が発生してしまう。この振幅飽和により音響出力デバイスは、入力された音声を正しく再現できないこととなる。 However, when such a method for correcting an audio signal is used, depending on the input audio signal, the signal is greatly amplified, which may cause amplitude saturation. As an audio signal that causes amplitude saturation, an audio signal that includes an unvoiced sound (a sound that exhaled from the lungs passes through the vocal cords without oscillating the vocal cords) in the case of making a call on the previous mobile phone can be considered. Since unvoiced sounds generally have frequency components concentrated at 4 kHz or higher, amplitude saturation occurs when an audio signal containing unvoiced sounds is corrected. Due to this amplitude saturation, the sound output device cannot correctly reproduce the input sound.

そこで、第１の実施例においては、入力された音声信号に無声音が含まれるか否かを判定し、無声音が含まれる場合には音声信号の振幅の値を減少させる処理を行う。音声信号中の無声音が含まれる箇所に対して振幅の値を減少させる処理を行うことで、音響出力デバイスの出力周波数特性に合わせて４ｋＨｚ以上の帯域の音声信号の増幅を行う場合であっても、音声信号が振幅飽和を起こす事態を防ぐことができる。 Therefore, in the first embodiment, it is determined whether or not an unvoiced sound is included in the input sound signal. If the unvoiced sound is included, a process of reducing the amplitude value of the sound signal is performed. Even when an audio signal in a band of 4 kHz or higher is amplified in accordance with the output frequency characteristics of the acoustic output device by performing a process of reducing the amplitude value for a portion including an unvoiced sound in the audio signal. It is possible to prevent the audio signal from causing amplitude saturation.

更に、単位時間毎に音量補正を行う場合に、その音量補正の度合いが大きい場合を考える。この場合、電子機器の使用者にとっては短い時間で音量が大きく変化するために、音量のばたつきが不自然に感じられることとなる。第１の実施例においては、サブフレーム毎にリミッタ制御ゲインを算出し、リミッタ制御ゲインの値に基づいて、サンプル毎の振幅を調節する振幅ゲインを設定する。ここで、振幅ゲインの値はスイッチ的に変化することがないように、微分連続性を保ったまま変化するよう設定する。この振幅ゲインに従って各サンプルの振幅補正を行うため、使用者に音量調節量のスイッチ的な変化による不自然な印象を与えることがない。 Furthermore, when the volume correction is performed every unit time, a case where the degree of volume correction is large is considered. In this case, since the volume greatly changes in a short time for the user of the electronic device, the fluctuation of the volume is unnaturally felt. In the first embodiment, a limiter control gain is calculated for each subframe, and an amplitude gain for adjusting the amplitude for each sample is set based on the value of the limiter control gain. Here, the value of the amplitude gain is set so as to change while maintaining the differential continuity so as not to switch. Since the amplitude of each sample is corrected according to this amplitude gain, the user is not given an unnatural impression due to a switch-like change in the volume adjustment amount.

更に、無音状態から急に無声音が発生された場合は振幅ゲインを急激に変化するよう設定する。これにより、急な振幅飽和の発生を予期して音量制御を時間的に精度良く行うことができる。一方、有声音から無声音に変化する場合には急には振幅飽和が発生しないため、振幅ゲインの変化を緩やかに行う。これにより、使用者にとって不自然な印象を与えることなく振幅を制御することができる。 Further, when an unvoiced sound is suddenly generated from the silent state, the amplitude gain is set to change rapidly. As a result, the volume control can be accurately performed in time in anticipation of sudden amplitude saturation. On the other hand, when the voiced sound changes to the unvoiced sound, amplitude saturation does not occur suddenly, so the amplitude gain is changed gradually. As a result, the amplitude can be controlled without giving an unnatural impression to the user.

以下、図２に示す各構成要素について、その構成と役割を述べる。制御部１００は、直流成分制御フィルタ１０１、サブフレーム分割部１０、信号特性検出部１０３、サブフレーム単位ゲイン算出部１０４、サンプル単位ゲイン算出部１０５、音量制御部１０６、そして周波数特性補正部１０７から構成される。 The configuration and role of each component shown in FIG. 2 will be described below. The control unit 100 includes a DC component control filter 101, a subframe division unit 10, a signal characteristic detection unit 103, a subframe unit gain calculation unit 104, a sample unit gain calculation unit 105, a volume control unit 106, and a frequency characteristic correction unit 107. Composed.

直流成分除去フィルタ１００は、音声入力信号ｘ［ｎ］を入力として、音声入力信号ｘ［ｎ］に含まれる直流成分を除去し、信号ｈｐ［ｎ］を出力するものである。より具体的には、直流成分除去フィルタ１００は、音声入力信号ｘ［ｎ］に高域濾過フィルタ処理を施す。これにより、直流成分除去フィルタ１００は、音声入力信号ｘ［ｎ］中に含まれる低域成分、例えば５０Ｈｚを下回る成分を消去し、残った信号を信号ｈｐ［ｎ］として出力する。高域濾過フィルタ処理は、例えばＩＩＲのバタワースフィルタによって設計されたフィルタを用いることができる。しかし、直流成分を除去する処理はここに述べた方法に限られるものではなく、音声入力信号ｘ［ｎ］の振幅の値から直流成分が含まれるか否かを検知し、直流成分を消去する方法を用いればよい。 The DC component removal filter 100 receives the audio input signal x [n], removes a DC component contained in the audio input signal x [n], and outputs a signal hp [n]. More specifically, the DC component removal filter 100 performs a high-pass filtering process on the audio input signal x [n]. As a result, the DC component removal filter 100 eliminates a low frequency component included in the audio input signal x [n], for example, a component lower than 50 Hz, and outputs the remaining signal as a signal hp [n]. For the high-pass filter processing, for example, a filter designed by an IIR Butterworth filter can be used. However, the process of removing the DC component is not limited to the method described here, and it is detected from the amplitude value of the audio input signal x [n] whether or not the DC component is included, and the DC component is deleted. A method may be used.

サブフレーム分割部１０２は、Ｎ個のサンプルから構成される信号ｈｐ［ｎ］を入力として、信号ｈｐ［ｎ］をＮ個より少ないＳ個のサンプルから構成される、幾つかのサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］（ｓ＝１，２，３，・・・Ｓ）へと分割して出力する。 The subframe dividing unit 102 receives a signal hp [n] composed of N samples and inputs several subframe signals sub_hpb composed of the signal hp [n] from S samples fewer than N. [S] (s = 1, 2, 3,... S) and output.

本発明においては、後述する処理によって、サブフレーム毎に音量を変化させるリミッタ制御ゲインの値が算出される。ここで、サブフレーム毎に算出した音量制御信号を用いて音量の制御を行うことで、よりきめ細かな音量制御を行えるという効果が得られる。即ち、入力音声信号ｘ［ｎ］が後述する無声音、急激な音量変化を伴う信号であった場合であっても、音量変化に追随して音量の制御を行うことが可能である。 In the present invention, the value of the limiter control gain that changes the volume for each subframe is calculated by the process described later. Here, by performing volume control using the volume control signal calculated for each subframe, an effect that finer volume control can be performed can be obtained. That is, even when the input audio signal x [n] is a silent sound, which will be described later, or a signal with a sudden change in volume, the volume can be controlled following the change in volume.

信号特性検出部１０３は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］を入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］に有声音が含まれるか、無声音が含まれるか、あるいは無音であるかを判別する。更に、処理中のサブフレームが、無声音の発音が開始される箇所なのか、あるいは無声音の発音が終了する箇所なのかを判別する。この検出結果をｓｕｂ＿ｐｒｍとして出力する。 The signal characteristic detection unit 103 receives the subframe signal sub_hpb [s] and determines whether the subframe signal sub_hpb [s] includes voiced sound, unvoiced sound, or silence. Further, it is determined whether the subframe being processed is a location where the unvoiced sound is started or a portion where the unvoiced sound is ended. This detection result is output as sub_prm.

図３に、信号特性検出部１０３の詳しい構成を示す。信号特性検出部１０３は、有音・無音検出部１０３１と、有声音、無声音検出部１０３２の２つから構成される。 FIG. 3 shows a detailed configuration of the signal characteristic detection unit 103. The signal characteristic detection unit 103 includes a voiced / silent detection unit 1031 and a voiced / unvoiced sound detection unit 1032.

（有音・無音検出動作）
有音・無音検出部１０３１は、入力されたサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が無音であるか、あるいは有音であるかを判別して、その判別結果を有音・無音検出結果ｓｕｂ＿ｍｕｍとして出力する。有音・無音検出部１０３１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が有音であると判別した場合はｓｕｂ＿ｍｕｍ＝１、一方無音であると判別した場合はｓｕｂ＿ｍｕｍ＝０として有音・無音検出結果ｓｕｂ＿ｍｕｍを出力する。 (Sound / silence detection operation)
The voice / silence detection unit 1031 determines whether the input subframe signal sub_hpb [s] is silent or voiced, and outputs the determination result as the voice / silence detection result sub_um. . The sound / silence detection unit 1031 sets sub_um = 1 when it is determined that the subframe signal sub_hpb [s] is sound, and sub_um = 0 when it is determined that the subframe signal sub_hpb [s] is sound. Is output.

この有音・無音検出部１０３１はサブフレーム振幅最大値検出部１０３１１と、有音・無音判定部１０３１２の２つから構成される。 The sound / silence detection unit 1031 includes a subframe amplitude maximum value detection unit 10311 and a sound / silence determination unit 10312.

サブフレーム振幅最大値検出部１０３１１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］を入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に含まれる振幅の最大値を検出する。この検出された振幅の最大値をサブフレーム振幅最大値ｓｕｂ＿ｍａｘとして出力する。より具体的には、サブフレーム振幅最大値検出部１０３１１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］内に含まれる各サンプルの振幅の値を絶対値化する。この絶対値化された振幅の値の内、最大のものを検出する。検出された振幅の値をサブフレーム振幅最大値ｓｕｂ＿ｍａｘとして出力する。 Subframe amplitude maximum value detection section 10311 receives subframe signal sub_hpb [s] as an input, and detects the maximum value of the amplitude included in subframe signal sub_hpb [s]. The maximum value of the detected amplitude is output as the subframe amplitude maximum value sub_max. More specifically, the subframe amplitude maximum value detection unit 10311 converts the amplitude value of each sample included in the subframe signal sub_hpb [s] into an absolute value. The maximum value is detected from the absolute value of the amplitude. The detected amplitude value is output as the subframe amplitude maximum value sub_max.

ここで、サブフレーム振幅最大値検出部１０３１１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に含まれる振幅の最大値を検出し出力すると述べた。しかし、サブフレーム振幅最大値検出部１０３１１が出力する値は、後述する有音・無音判定部１０３１２が有音・無音を判定するために用いられる値であれば他の値であっても構わない。例えば、サブフレーム振幅最大値検出部１０３１１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］の平均の振幅レベルを出力するものであっても構わない。また、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］の平均のスペクトルパワーを出力するものであっても構わない。 Here, it has been described that the sub-frame amplitude maximum value detection unit 10311 detects and outputs the maximum value of the amplitude included in the sub-frame signal sub_hpb [s]. However, the value output from the subframe amplitude maximum value detection unit 10311 may be another value as long as it is a value used by the sound / silence determination unit 10312 to be described later to determine sound / silence. . For example, the subframe amplitude maximum value detection unit 10311 may output an average amplitude level of the subframe signal sub_hpb [s]. Moreover, you may output the average spectrum power of sub-frame signal sub_hpb [s].

有音・無音判定部１０３１２は、サブフレーム振幅最大値検出部１０３１１から出力されたサブフレーム振幅最大値ｓｕｂ＿ｍａｘの値を入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が有音であるか無音であるかの判定を行う。この判定結果を有音・無音判定結果ｓｕｂ＿ｍｕｍとして出力する。より具体的には、有音・無音判定部１０３１２は、予め定めておいた有音・無音判定用の閾値αと、サブフレーム振幅最大値ｓｕｂ＿ｍａｘの値とを比較する。有音・無音判定部１０３１２は、この比較結果がｓｕｂ＿ｍａｘ＜αであった場合には、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に大きな振幅のサンプルが含まれないため、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は無音であると判定する。そして、無音であると判定した結果をｓｕｂ＿ｍｕｍ＝０として出力する。一方、この比較結果がｓｕｂ＿ｍａｘ＞αであった場合には、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に大きな振幅のサンプルが含まれるため、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は有音であると判定する。そして、有音であると判定した結果をｓｕｂ＿ｍｕｍ＝１として出力する。なお、有音・無音の判定方法は、先に述べた振幅の値を閾値と比較する他に、例えばサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中の各周波数帯域におけるＳ／Ｎ比を算出し、これを有音・無音判定用の閾値αと比較する方法を用いても構わない。 The voice / silence determination unit 10312 receives the value of the subframe amplitude maximum value sub_max output from the subframe amplitude maximum value detection unit 10311 as an input, and determines whether the subframe signal sub_hpb [s] is voiced or silent. Judgment is made. This determination result is output as a sound / silence determination result sub_um. More specifically, the sound / silence determination unit 10312 compares a predetermined threshold value α for sound / silence determination with the value of the subframe amplitude maximum value sub_max. If the comparison result is sub_max <α, the sound / silence determination unit 10312 does not include a sample with a large amplitude in the subframe signal sub_hpb [s], so that the subframe signal sub_hpb [s] It is determined that there is no sound. Then, the result determined to be silent is output as sub_um = 0. On the other hand, when the comparison result is sub_max> α, since the subframe signal sub_hpb [s] includes a sample having a large amplitude, it is determined that the subframe signal sub_hpb [s] is sound. Then, the result determined to be sound is output as sub_um = 1. In addition, the sound / silence determination method includes, for example, calculating the S / N ratio in each frequency band in the subframe signal sub_hpb [s], in addition to comparing the amplitude value described above with a threshold value. You may use the method compared with the threshold value (alpha) for sound / silence determination.

（有声音・無声音検出動作）
有声音・無声音検出部１０３２は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］及び有音・無音判定結果ｓｕｂ＿ｍｕｍを入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］に有声音が含まれるか、あるいは無声音が含まれるかの判定を行う。更に、処理を行うサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が有声音から無声音に切り替わる信号であるか、あるいは無声音から有声音に切り替わる音であるか、無音から無声音に切り替わる信号であるか、あるいは無声音から無音に切り替わる信号であるか、などの判定が行われる。この判定結果を有声音・無声音判定結果ｓｕｂ＿ｐｒｍとして出力する。 (Voiced / unvoiced sound detection operation)
The voiced / unvoiced sound detection unit 1032 receives the subframe signal sub_hpb [s] and the voiced / silent determination result sub_um as input, and whether the subframe signal sub_hpb [s] includes voiced sound or unvoiced sound Make a decision. Further, the subframe signal sub_hpb [s] to be processed is a signal that switches from voiced sound to unvoiced sound, a sound that switches from unvoiced sound to voiced sound, a signal that switches from silent to unvoiced sound, or from unvoiced sound to silent It is determined whether or not the signal is switched to. This determination result is output as voiced / unvoiced sound determination result sub_prm.

この有声音・無声音検出部１０３２はゼロクロス数検出部１０３２１と、有声音・無声音判定部１０３２２の２つから構成される。 The voiced / unvoiced sound detection unit 1032 includes a zero-cross number detection unit 10321 and a voiced / unvoiced sound determination unit 10322.

ゼロクロス数検出部１０３２１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］及び有音・無音検出結果ｓｕｂ＿ｍｕｍを入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に含まれるゼロクロス点の数を検出する。検出したゼロクロス点の数を、ゼロクロス数ｓｕｂ＿ｚｃとして出力する。なお、本明細書においては、ゼロクロス点とは特定のサンプルの振幅値とその直前のサンプルの振幅値とを比較したときに、振幅値の極性が反転している点を指す。従って、サブフレーム中に数多くのゼロクロス点が含まれているサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は振動の速い信号であるため、高周波域にエネルギーを多く含む音声信号、すなわち無声音の音声信号であると見なすことができる。一方、サブフレーム中のゼロクロス点が少ない場合には、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は振動が遅い信号であるため、高周波域にある周波数成分は含まれない信号であるとみなすことができる。つまり本発明においては、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中のゼロクロス点の検出を、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］の周波数成分の分析のために用いている。 The zero-cross number detection unit 10321 detects the number of zero-cross points included in the subframe signal sub_hpb [s] by using the subframe signal sub_hpb [s] and the sound / silence detection result sub_um as inputs. The number of detected zero cross points is output as the zero cross number sub_zc. In the present specification, the zero-cross point refers to a point where the polarity of the amplitude value is inverted when the amplitude value of a specific sample is compared with the amplitude value of the immediately preceding sample. Therefore, since the subframe signal sub_hpb [s] in which a number of zero cross points are included in the subframe is a fast-vibration signal, it is regarded as an audio signal containing a lot of energy in a high frequency range, that is, an unvoiced audio signal. be able to. On the other hand, when the number of zero cross points in the subframe is small, the subframe signal sub_hpb [s] is a signal that oscillates slowly, and thus can be regarded as a signal that does not include a frequency component in the high frequency range. That is, in the present invention, the detection of the zero cross point in the subframe signal sub_hpb [s] is used for analyzing the frequency component of the subframe signal sub_hpb [s].

なお、有音・無音判定部１０３１２から出力された有音・無音検出結果ｓｕｂ＿ｍｕｍの値がｓｕｂ＿ｍｕｍ＝０である場合には、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は無音の信号であることがわかる。この場合はゼロクロス点の数を検出する処理を省き、ｓｕｂ＿ｚｃ＝０として出力する。一方、有音・無音検出結果ｓｕｂ＿ｍｕｍの値がｓｕｂ＿ｍｕｍ＝１である場合には、ゼロクロス数検出部１０３２１はサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］に含まれるゼロクロス数を検出し、ゼロクロス数ｓｕｂ＿ｚｃとして出力する。 In addition, when the value of the sound / silence detection result sub_um output from the sound / silence determination unit 10312 is sub_um = 0, it can be seen that the subframe signal sub_hpb [s] is a silence signal. In this case, the process of detecting the number of zero cross points is omitted, and the output is made as sub_zc = 0. On the other hand, when the value of the sound / silence detection result sub_um is sub_um = 1, the zero-cross number detection unit 10321 detects the number of zero-crosses included in the subframe signal sub_hpb [s] and outputs it as the zero-cross number sub_zc.

なお、本実施例においてゼロクロス数ｓｕｂ＿ｚｃは、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に含まれるゼロクロス点の数であると述べた。しかし、本実施例の動作はこれに限られるものではない。例えば、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］に含まれるゼロクロス数をサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］のサンプル数Ｓで除算した値を、ゼロクロス数ｓｕｂ＿ｚｃの替わりにゼロクロス発生率ｓｕｂ＿ｚｃとして用いても良い。この値を用いることにより、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］のサンプル数Ｓが変化した場合であっても、ゼロクロス数検出部１０３２１は同様の処理で後述する有声音・無声音検出部１０３２２へ値を出力し、有声音・無声音検出を行わせることができる。 In the present embodiment, the zero cross number sub_zc is described as the number of zero cross points included in the subframe signal sub_hpb [s]. However, the operation of the present embodiment is not limited to this. For example, a value obtained by dividing the number of zero crosses included in the subframe signal sub_hpb [s] by the number of samples S of the subframe signal sub_hpb [s] may be used as the zero cross occurrence rate sub_zc instead of the zero cross number sub_zc. By using this value, even when the number of samples S of the subframe signal sub_hpb [s] changes, the zero-cross number detection unit 10321 outputs a value to the voiced / unvoiced sound detection unit 10322 described later by the same processing. In addition, voiced / unvoiced sound can be detected.

また、本実施例においてゼロクロス数検出部１０３２１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］中に含まれるゼロクロス点の数を検出すると述べた。しかし、本実施例の構成はこれに限られるものではなく、ゼロクロス数検出部１０３２１は、音声信号が所定の周波数帯域の信号を含むか否かを判別する特徴量であれば、ゼロクロス数以外の特徴量を検出するものであっても構わない。例えば、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］をスペクトル分析し、高周波数成分のエネルギーを検出するものであっても構わない。あるいは、ＬＰＣ予測残差の自己相関の最大値を検出するものであっても構わない。 Further, in the present embodiment, it has been described that the zero-cross number detection unit 10321 detects the number of zero-cross points included in the subframe signal sub_hpb [s]. However, the configuration of the present embodiment is not limited to this, and the zero cross number detection unit 10321 may be a feature amount that determines whether or not the audio signal includes a signal of a predetermined frequency band. A feature amount may be detected. For example, the sub-frame signal sub_hpb [s] may be subjected to spectrum analysis to detect high-frequency component energy. Alternatively, the maximum value of the autocorrelation of the LPC prediction residual may be detected.

また、本実施例においてゼロクロス数検出部１０３２１は、例として４ｋＨｚ以上の高周波成分を検出するための手段として用いている。しかし、ゼロクロス数検出部１０３２１が検出する周波数帯域は音響出力デバイスの周波数特性に合わせて設定すればよいため、４ｋＨｚ以上の高周波に限定されるものではなく、任意の周波数成分を検出すればよい。 In the present embodiment, the zero cross number detection unit 10321 is used as a means for detecting a high frequency component of 4 kHz or more as an example. However, since the frequency band detected by the zero-cross number detection unit 10321 may be set according to the frequency characteristics of the acoustic output device, the frequency band is not limited to a high frequency of 4 kHz or more, and an arbitrary frequency component may be detected.

有声音・無声音判定部１０３２２は、有音・無音検出結果ｓｕｂ＿ｍｕｍ及びゼロクロス数ｓｕｂ＿ｚｃを入力として、処理中のサブフレームと直前のサブフレームとで有声音・無声音・無音といった信号特性がどう切り替わったかを判定する。この判定結果を、信号特性判定結果ｓｕｂ＿ｐｒｍとして出力する。より具体的には、有声音・無声音判定部１０３２２は、直前のサブフレームでの信号特性の検出結果を直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１として保持し、有音・無音検出結果ｓｕｂ＿ｍｕｍ、ゼロクロス数ｓｕｂ＿ｚｃと共に入力として用いる。有声音・無声音判定部１０３２２は、先の３つの値を入力として、信号特性判定結果ｓｕｂ＿ｐｒｍ及び直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１を更新して出力する。なお、後述する有声音無声音閾値βは、処理中のサブフレームが無声音であるか有声音あるいは無音であるかを判定するために予め定められた値が用いられる。なお、数１中を用いて信号特性を判定する際には、ｓｕｂ＿ｚｃ＝βをｓｕｂ＿ｚｃ＜βあるいはｓｕｂ＿ｚｃ＞βのいずれに等しいものとして扱っても構わない。 The voiced / unvoiced sound determination unit 10322 receives the voiced / silent detection result sub_um and the number of zero crosses sub_zc, and determines how the signal characteristics such as voiced / unvoiced / silent are switched between the subframe being processed and the immediately preceding subframe. judge. This determination result is output as a signal characteristic determination result sub_prm. More specifically, the voiced / unvoiced sound determination unit 10322 holds the detection result of the signal characteristic in the immediately preceding subframe as the immediately preceding subframe signal characteristic determination result sub_prm1, and the sounded / silent detection result sub_um, the number of zero crosses sub_zc As input. Voiced / unvoiced sound determination section 10322 receives the previous three values as input, updates and outputs signal characteristic determination result sub_prm and immediately preceding subframe signal characteristic determination result sub_prm1. Note that a voiced / unvoiced sound threshold β, which will be described later, uses a predetermined value to determine whether the subframe being processed is an unvoiced sound, a voiced sound or a silent sound. Note that when determining the signal characteristics using Equation 1, sub_zc = β may be treated as being equal to either sub_zc <β or sub_zc> β.

有声音・無声音判定部１０３２２は、直前のサブフレームと処理中のサブフレームが共に無音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝０、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝０として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that both the immediately preceding subframe and the subframe being processed are silent, the value of the signal characteristic determination result sub_prm is set to sub_prm = 0, and the immediately preceding subframe signal characteristic determination result sub_prm1. Is output as sub_prm1 = 0.

有声音・無声音判定部１０３２２は、直前のサブフレームが有声音であり、処理中のサブフレームが無音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝１、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝０として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is a voiced sound and the subframe being processed is silent, the value of the signal characteristic determination result sub_prm = 1, sub_prm = 1, and the immediately preceding subframe signal The value of the characteristic determination result sub_prm1 is output as sub_prm1 = 0.

有声音・無声音判定部１０３２２は、直前のサブフレームが無声音であり、処理中のサブフレームが無音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝２、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝０として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is an unvoiced sound and the subframe being processed is silent, the value of the signal characteristic determination result sub_prm = 2, sub_prm = 2, and the immediately preceding subframe signal characteristic The value of the determination result sub_prm1 is output as sub_prm1 = 0.

有声音・無声音判定部１０３２２は、直前のサブフレームが無音であり、処理中のサブフレームが有声音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝３、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝１として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is silent and the subframe being processed is a voiced sound, the value of the signal characteristic determination result sub_prm is 3, sub_prm = 3, and the immediately preceding subframe signal. The value of the characteristic determination result sub_prm1 is output as sub_prm1 = 1.

有声音・無声音判定部１０３２２は、直前のサブフレームと処理中のサブフレームが共に有声音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝４、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝１として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that both the immediately preceding subframe and the subframe being processed are voiced sounds, the value of the signal characteristic determination result sub_prm is 4, sub_prm = 4, and the immediately preceding subframe signal characteristic determination result. The value of sub_prm1 is output as sub_prm1 = 1.

有声音・無声音判定部１０３２２は、直前のサブフレームが無声音であり、処理中のサブフレームが有声音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝５、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝１として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is an unvoiced sound and the subframe being processed is a voiced sound, the value of the signal characteristic determination result sub_prm = 5, the previous subframe signal The value of the characteristic determination result sub_prm1 is output as sub_prm1 = 1.

有声音・無声音判定部１０３２２は、直前のサブフレームが無音であり、処理中のサブフレームが無声音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝６、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝２として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is silent and the subframe being processed is unvoiced, the value of the signal characteristic determination result sub_prm is set to sub_prm = 6, and the immediately preceding subframe signal characteristic The value of the determination result sub_prm1 is output as sub_prm1 = 2.

有声音・無声音判定部１０３２２は、直前のサブフレームが有声音であり、処理中のサブフレームが無声音であると判断した場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝７、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝２として出力する。 When the voiced / unvoiced sound determination unit 10322 determines that the immediately preceding subframe is a voiced sound and the subframe being processed is an unvoiced sound, the value of the signal characteristic determination result sub_prm = sub_prm = 7, the immediately preceding subframe signal The value of the characteristic determination result sub_prm1 is output as sub_prm1 = 2.

有声音・無声音判定部１０３２２は、直前のサブフレームが無声音であり、処理中のサブフレームが無声音であると判断された場合に、信号特性判定結果ｓｕｂ＿ｐｒｍの値をｓｕｂ＿ｐｒｍ＝８、直前サブフレーム信号特性判定結果ｓｕｂ＿ｐｒｍ１の値をｓｕｂ＿ｐｒｍ１＝２として出力する。 The voiced / unvoiced sound determination unit 10322 sets the value of the signal characteristic determination result sub_prm to sub_prm = 8, the previous subframe signal, when it is determined that the immediately preceding subframe is an unvoiced sound and the subframe being processed is an unvoiced sound. The value of the characteristic determination result sub_prm1 is output as sub_prm1 = 2.

（音量制御動作）
サブフレーム単位ゲイン算出部１０４は、信号特性判定結果ｓｕｂ＿ｐｒｍを入力として、音量制御を行うための基準値であるサブフレーム単位ゲインｓｕｂ＿ｐｏｗを、サブフレーム毎に算出する。より具体的には、信号特性判定結果ｓｕｂ＿ｐｒｍの値に応じて、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値をｓｕｂ＿ｐｏｗ＝ｆ（ａ）（ａ＝０，１，２，・・・８）と設定する。ｆ（ａ）は、８つの信号特性判定結果ｓｕｂ＿ｐｒｍの値それぞれに対応して予め設定された値とする。例えば信号特性判定結果ｓｕｂ＿ｐｒｍの値がｓｕｂ＿ｐｒｍ＝６であった場合は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値はｓｕｂ＿ｐｏｗ＝ｆ（６）と設定される。 (Volume control operation)
The subframe unit gain calculation unit 104 receives the signal characteristic determination result sub_prm and calculates a subframe unit gain sub_pow, which is a reference value for performing volume control, for each subframe. More specifically, the value of the subframe unit gain sub_pow is set to sub_pow = f (a) (a = 0, 1, 2,... 8) according to the value of the signal characteristic determination result sub_prm. f (a) is a value set in advance corresponding to each of the eight signal characteristic determination results sub_prm. For example, when the value of the signal characteristic determination result sub_prm is sub_prm = 6, the value of the subframe unit gain sub_pow is set as sub_pow = f (6).

ここで、信号特性判定結果ｓｕｂ＿ｐｒｍの値がｓｕｂ＿ｐｒｍ＝＝６、ｓｕｂ＿ｐｒｍ＝＝７、ｓｕｂ＿ｐｒｍ＝＝８のいずれかである場合、即ち処理中のサブフレームが無声音であると判断された場合には、処理中のサブフレームの音声信号には、４ｋＨｚ以上の周波数成分が含まれていることとなる。先述した様に、音響出力デバイスの出力周波数特性を補正するため４ｋＨｚ以上の周波数成分を増幅させる処理を行う場合は、無声音を含む音声信号が振幅飽和を起こす虞がある。そのため、出力される音声信号が振幅飽和を起こす事態を避けるために、処理中のサブフレームが無声音であると判断された場合は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値には、処理中のサブフレームの出力音量を減衰させる値が設定される。例えば、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値には、０＜ｓｕｂ＿ｐｏｗ＜１となるような値が設定される。 Here, when the value of the signal characteristic determination result sub_prm is any of sub_prm == 6, sub_prm == 7, sub_prm == 8, that is, when it is determined that the subframe being processed is an unvoiced sound, The audio signal of the subframe being processed includes a frequency component of 4 kHz or higher. As described above, when performing a process of amplifying a frequency component of 4 kHz or more in order to correct the output frequency characteristic of the acoustic output device, there is a possibility that an audio signal including an unvoiced sound will cause amplitude saturation. Therefore, in order to avoid a situation where the output audio signal causes amplitude saturation, when it is determined that the subframe being processed is an unvoiced sound, the value of the subframe unit gain sub_pow includes the value of the subframe being processed. A value that attenuates the output volume is set. For example, a value such that 0 <sub_pow <1 is set as the value of the subframe unit gain sub_pow.

更に、０＜ｆ（６）＜ｆ（７）＜ｆ（８）＜１となるようにサブフレーム単位ゲインｓｕｂ＿ｐｏｗの値を設定しても良い。ｓｕｂ＿ｐｒｍ＝８である場合は、即ち処理中のサブフレームが無声音であると判断され、且つ直前のサブフレームも無声音であると判断された場合である。この様な場合には、ｆ（８）を１に近い値とすることで、直前のサブフレームの振幅の値と処理中のサブフレームの振幅の値が近い値となる。これにより、出力音響デバイスから出力される音を聞く使用者は、音量の変化が少ない自然な音声を聞くことができる。一方、ｓｕｂ＿ｐｒｍ（６）である場合は、即ち処理中のサブフレームが無声音であると判断され、且つ直前のサブフレームが無音であると判断された場合である。この様な場合には、ｆ（６）を０に近い値とすることで、処理中のサブフレームの振幅の値を大きく低下させる。これにより、無音状態から突然無声音が出力された場合であっても、音量の変化に素早く追随して振幅の値を低減される処理を行い、振幅飽和を起こす事態を防ぐことができる。 Furthermore, the value of the subframe unit gain sub_pow may be set so that 0 <f (6) <f (7) <f (8) <1. The case where sub_prm = 8 is a case where it is determined that the subframe being processed is an unvoiced sound and that the immediately preceding subframe is also determined to be an unvoiced sound. In such a case, by setting f (8) to a value close to 1, the amplitude value of the immediately preceding subframe and the amplitude value of the subframe being processed are close to each other. Thereby, the user who hears the sound output from the output acoustic device can hear natural sound with little change in volume. On the other hand, the case of sub_prm (6) is a case where it is determined that the subframe being processed is an unvoiced sound and that the immediately preceding subframe is determined to be silent. In such a case, by setting f (6) to a value close to 0, the amplitude value of the subframe being processed is greatly reduced. As a result, even when a silent sound is suddenly output from the silent state, it is possible to quickly follow the change in volume and perform a process of reducing the amplitude value, thereby preventing a situation where amplitude saturation occurs.

一方、信号特性判定結果ｓｕｂ＿ｐｒｍの値がｓｕｂ＿ｐｒｍ＝＝６、ｓｕｂ＿ｐｒｍ＝＝７、ｓｕｂ＿ｐｒｍ＝＝８以外の値である場合、即ち処理中のサブフレームが無音あるいは有声音であると判断された場合には、音響出力デバイスの出力周波数特性を補正するため４ｋＨｚ以上の周波数成分を増幅させる処理を行っていても、振幅飽和を起こす虞は少ない。そこで、処理中のサブフレームが無音あるいは有声音であると判断された場合は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値を例えばｓｕｂ＿ｐｏｗ＝１として設定する。 On the other hand, when the value of the signal characteristic determination result sub_prm is a value other than sub_prm == 6, sub_prm == 7, sub_prm == 8, that is, when the subframe being processed is determined to be silent or voiced. Is less likely to cause amplitude saturation even when a process of amplifying a frequency component of 4 kHz or higher is performed to correct the output frequency characteristics of the acoustic output device. Therefore, when it is determined that the subframe being processed is silent or voiced, the value of the subframe unit gain sub_pow is set as, for example, sub_pow = 1.

サンプル単位ゲイン算出部１０５は、信号特性判定結果ｓｕｂ＿ｐｒｍ及びサブフレーム単位ゲインｓｕｂ＿ｐｏｗを入力として、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］を構成する各サンプルの振幅の値を制御するためのサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］（ｓ＝１，２，３・・・Ｓ）を出力する。サンプル単位ゲイン算出部１０５は、サブフレーム変化量算出部１０５１、窓関数適応部１０５２、及び遅延部１０５３から構成される。 The sample unit gain calculator 105 receives the signal characteristic determination result sub_prm and the subframe unit gain sub_pow, and controls the sample unit gain sm_pow [s] for controlling the amplitude value of each sample constituting the subframe signal sub_hpb [s]. ] (S = 1, 2, 3,... S). The sample unit gain calculation unit 105 includes a subframe change amount calculation unit 1051, a window function adaptation unit 1052, and a delay unit 1053.

サンプル単位ゲイン算出部１０５は後述する処理によって、サブフレーム単位ゲインｓｕｂ＿ｐｏｗ［ｓ］の値を基にして、サンプル毎に微分連続性を保つように滑らかに遷移するサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を出力する。これにより、自然な音量変化を保ったまま音量制御を行うことが可能である。サンプル単位ゲイン算出部１０５は、信号特性判定結果ｓｕｂ＿ｐｒｍに応じて、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］の遷移の遅速を変化させる。 The sample unit gain calculation unit 105 outputs a sample unit gain sm_pow [s] that smoothly transitions so as to maintain differential continuity for each sample based on the value of the subframe unit gain sub_pow [s] by a process described later. To do. Thereby, it is possible to perform volume control while maintaining a natural volume change. The sample unit gain calculation unit 105 changes the transition speed of the sample unit gain sm_pow [s] according to the signal characteristic determination result sub_prm.

サブフレーム変化量算出部１０５１は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗ及び直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１を入力として、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗを出力する。リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗは後述する窓関数適応部１０５３においてサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を算出する際に用いられる値である。サブフレーム変化量算出部１０５１は、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値を直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１に比して近い値に設定して出力する。これにより、直前のサブフレームの振幅の値と処理中のサブフレームの振幅の値とを使用者にとって自然に感じられる範囲で変化させることができる。より具体的には、サブフレーム変化量算出部１０５１は、直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１とサブフレーム単位ゲインｓｕｂ＿ｐｏｗとの値の差を比較する。両者の値の差が予め定めた閾値γを下回る場合、即ち｜ｓｕｂ＿ｐｏｗ−ｌｉｍｉｔ＿ｐｏｗ１｜＜１／γとなる場合には、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値をｌｉｍｉｔ＿ｐｏｗ＝ｓｕｂ＿ｐｏｗとなるように設定する。一方、両者の値の差が閾値γを上回る場合、即ち（ｓｕｂ＿ｐｏｗ−ｌｉｍｉｔ＿ｐｏｗ１）＞１／γとなる場合は、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値をｌｉｍｉｔ＿ｐｏｗ＝ｌｉｍｉｔ＿ｐｏｗ１×γとして設定する。また、（ｓｕｂ＿ｐｏｗ−ｌｉｍｉｔ＿ｐｏｗ１）＜１／γとなる場合は、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値をｌｉｍｉｔ＿ｐｏｗ＝ｌｉｍｉｔ＿ｐｏｗ１×（１／γ）として設定する。｜ｓｕｂ＿ｐｏｗ−ｌｉｍｉｔ＿ｐｏｗ１｜＝１／γとなる場合は、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値として先述したいずれの値を設定しても構わない。ここで、閾値γは直前のサブフレームと処理中のサブフレームとの振幅の差が変化した場合に、音響出力デバイスから出力された音声を使用者が聞いた際に自然な変化量として感じられる限度の値を設定する。例えば、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が出力される時間が４ｍｓである場合に、閾値γは１．２５〜１．５０の範囲内の値を設定する。なお、閾値γの値はここに述べた値には限定されない。 The subframe change amount calculation unit 1051 receives the subframe unit gain sub_pow and the immediately preceding subframe limiter control gain limit_pow1 and outputs the limiter control gain limit_pow. The limiter control gain limit_pow is a value used when the window function adaptation unit 1053 described later calculates the sample unit gain sm_pow [s]. The subframe change amount calculation unit 1051 sets the value of the limiter control gain limit_pow to a value close to the limiter control gain limit_pow1 of the immediately preceding subframe and outputs the set value. As a result, the amplitude value of the immediately preceding subframe and the amplitude value of the subframe being processed can be changed within a range that is naturally felt by the user. More specifically, the subframe change amount calculation unit 1051 compares the difference in value between the limiter control gain limit_pow1 and the subframe unit gain sub_pow of the immediately preceding subframe. When the difference between the two values falls below a predetermined threshold γ, that is, when | sub_pow−limit_pow1 | <1 / γ, the value of the limiter control gain limit_pow is set to be limit_pow = sub_pow. On the other hand, if the difference between the two values exceeds the threshold γ, that is, (sub_pow−limit_pow1)> 1 / γ, the value of the limiter control gain limit_pow is set as limit_pow = limit_pow1 × γ. If (sub_pow-limit_pow1) <1 / γ, the limiter control gain limit_pow is set as limit_pow = limit_pow1 × (1 / γ). When | sub_pow−limit_pow1 | = 1 / γ, any of the above-described values may be set as the value of the limiter control gain limit_pow. Here, the threshold γ is felt as a natural amount of change when the user listens to the sound output from the acoustic output device when the difference in amplitude between the immediately preceding subframe and the subframe being processed changes. Set the limit value. For example, when the time during which the subframe signal sub_hpb [s] is output is 4 ms, the threshold γ is set to a value within the range of 1.25 to 1.50. Note that the value of the threshold γ is not limited to the value described here.

遅延部１０５２は、直前のサブフレームにおけるリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗを入力として保持し、直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１として出力する。なお、制御部１００が音量制御処理を開始する場合、即ち直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１の値が存在しない場合は、直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１の値をｌｉｍｉｔ＿ｐｏｗ１＝１として出力する。 The delay unit 1052 holds the limiter control gain limit_pow in the immediately preceding subframe as an input, and outputs it as the limiter control gain limit_pow1 in the immediately preceding subframe. When the control unit 100 starts the sound volume control process, that is, when the value of the limiter control gain limit_pow1 of the immediately preceding subframe does not exist, the value of the limiter control gain limit_pow1 of the immediately preceding subframe is output as limit_pow1 = 1. .

窓関数適応部１０５３は、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ、直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１、及び信号特性判定結果ｓｕｂ＿ｐｒｍを入力として、数１に従って窓関数ｗｉｎｄｏｗ［ｓ］を算出する。更に、窓関数ｗｉｎｄｏｗ［ｓ］に従って、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を算出する。そして、サブフレーム毎に得られたサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］をフレーム毎にまとめたサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］の値を出力する。また、窓関数適応部１０５３は、信号特性判定結果ｓｕｂ＿ｐｒｍに応じて窓関数ｗｉｎｄｏｗ［ｓ］の遷移の遅速を変化させる。より具体的には、まず窓関数適応部１０５３が、信号特性判定結果ｓｕｂ＿ｐｒｍの値を読み出す。そして、窓関数適応部１０５３は信号特性判定結果ｓｕｂ＿ｐｒｍの値に応じてスムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値を設定する。スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値の設定は、例えば信号特性判定結果ｓｕｂ＿ｐｒｍの値とスムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値とを対応付けるテーブルデータを窓関数適応部１０５３が保持しておき、このテーブルデータに従って値を設定することで行われる。スムージングパラメータＸＧａｔｅは、時間方向に対する窓関数ｗｉｎｄｏｗ［ｓ］の変化の遅速を決定するパラメータであり、０＜ＸＧａｔｅ≦１の値が用いられる。スムージングパラメータＹＧａｔｅは、ゲイン方向に対する窓関数ｗｉｎｄｏｗ［ｓ］の変化の遅速を決定するパラメータである。 The window function adaptation unit 1053 receives the limiter control gain limit_pow, the previous subframe limiter control gain limit_pow1, and the signal characteristic determination result sub_prm, and calculates the window function window [s] according to Equation 1. Further, the sample unit gain sm_pow [s] is calculated according to the window function window [s]. Then, the value of the sample unit gain sm_pow [n] obtained by collecting the sample unit gain sm_pow [s] obtained for each subframe for each frame is output. Further, the window function adaptation unit 1053 changes the transition speed of the window function window [s] according to the signal characteristic determination result sub_prm. More specifically, first, the window function adaptation unit 1053 reads the value of the signal characteristic determination result sub_prm. Then, the window function adaptation unit 1053 sets the values of the smoothing parameters (XGate, YGate) according to the value of the signal characteristic determination result sub_prm. For the setting of the smoothing parameters (XGate, YGate), for example, the window function adaptation unit 1053 holds table data that associates the value of the signal characteristic determination result sub_prm with the value of the smoothing parameter (XGate, YGate). This is done by setting a value according to the data. The smoothing parameter XGate is a parameter that determines the slow speed of the change of the window function window [s] with respect to the time direction, and a value of 0 <XGate ≦ 1 is used. The smoothing parameter YGate is a parameter that determines the slow speed of the change in the window function window [s] with respect to the gain direction.

以下に、リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ、直前のサブフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１、及び信号特性判定結果ｓｕｂ＿ｐｒｍを入力として、窓関数ｗｉｎｄｏｗ［ｓ］、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］、及びサブフレーム毎に算出されるサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］をフレーム毎にまとめたサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を算出する数式である数１を示す。

Hereinafter, the limiter control gain limit_pow, the limiter control gain limit_pow1 of the immediately preceding subframe, and the signal characteristic determination result sub_prm are input, and the window function window [s], the sample unit gain sm_pow [s], and each subframe are calculated. Equation 1 is shown as an equation for calculating the sample unit gain sm_pow [n] in which the sample unit gain sm_pow [s] is summarized for each frame.

まず、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）を入力として、数１（ａ，ｂ，ｃ）より窓関数ｗｉｎｄｏｗ［ｓ］が算出される。次に、算出した窓関数ｗｉｎｄｏｗ［ｓ］を入力として、数１（ｄ）よりサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］が算出される。サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］は、直前フレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１からリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗへと遷移する関数として算出される。最後に、サブフレーム毎に算出したサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を結合し、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］が算出される。 First, the smoothing parameters (XGate, YGate) are input, and the window function window [s] is calculated from Equation 1 (a, b, c). Next, the sample unit gain sm_pow [s] is calculated from Equation 1 (d) using the calculated window function window [s] as an input. The sample unit gain sm_pow [s] is calculated as a function of transition from the limiter control gain limit_pow1 of the immediately preceding frame to the limiter control gain limit_pow. Finally, the sample unit gain sm_pow [s] calculated for each subframe is combined to calculate the sample unit gain sm_pow [n].

図６にスムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値に応じてサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］が変化する様子を示す。図６（Ａ）は、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値が、（ＸＧａｔｅ，ＹＧａｔｅ）＝（０．５，１．０）と設定された時のサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を示している。図６（Ｂ）は、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値が、（ＸＧａｔｅ，ＹＧａｔｅ）＝（０．２５，１．０）と設定された時のサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を示している。図６（Ｂ）に示すサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］は、図６（Ａ）に示すサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］に比べ、時間方向に速い速度で値が遷移していることがわかる。図６（Ｃ）は、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値が、（ＸＧａｔｅ，ＹＧａｔｅ）＝（０．５，２．０）と設定された時のサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を示している。図６（Ｄ）は、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値が、（ＸＧａｔｅ，ＹＧａｔｅ）＝（０．５，３．０）と設定された時のサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を示している。図６（Ｄ）に示すサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］は、図６（Ｃ）に示すサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］に比べ、ゲイン方向に早い速度で遷移していることがわかる。 FIG. 6 shows how the sample unit gain sm_pow [s] changes according to the values of the smoothing parameters (XGate, YGate). FIG. 6A shows the sample unit gain sm_pow [s] when the values of the smoothing parameters (XGate, YGate) are set to (XGate, YGate) = (0.5, 1.0). . FIG. 6B shows the sample unit gain sm_pow [s] when the values of the smoothing parameters (XGate, YGate) are set to (XGate, YGate) = (0.25, 1.0). . It can be seen that the value of the sample unit gain sm_pow [s] shown in FIG. 6B transitions at a faster speed in the time direction than the sample unit gain sm_pow [s] shown in FIG. FIG. 6C shows the sample unit gain sm_pow [s] when the values of the smoothing parameters (XGate, YGate) are set to (XGate, YGate) = (0.5, 2.0). . FIG. 6D shows the sample unit gain sm_pow [s] when the values of the smoothing parameters (XGate, YGate) are set to (XGate, YGate) = (0.5, 3.0). . It can be seen that the sample unit gain sm_pow [s] shown in FIG. 6D transitions at a faster speed in the gain direction than the sample unit gain sm_pow [s] shown in FIG.

例えば、信号特性判定結果ｓｕｂ＿ｐｒｍの値がｓｕｂ＿ｐｒｍ＝６の場合、即ち直前のサブフレームが無音であり、処理中のサブフレームが無声音である場合には、振幅の値がスイッチ的に切り替わる事態が予想される。この様な場合には、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値をそれぞれ大きな値に設定し、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］の遷移を早めることで、振幅の値の切り替わりに追随した制御を行うことができる。一方、信号特性判定結果ｓｕｂ＿ｐｒｍの値がｓｕｂ＿ｐｒｍ＝７の場合、即ち直前のサブフレームが有声音であり、処理中のサブフレームが無声音であると判断された場合には、振幅の値が徐々に切り替わる事態が予想される。この様な場合には、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値をそれぞれ小さな値に設定し、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］の遷移を遅めることで、振幅の値を徐々に制御する。 For example, when the value of the signal characteristic determination result sub_prm is sub_prm = 6, that is, when the immediately preceding subframe is silent and the subframe being processed is unvoiced, it is expected that the amplitude value is switched in a switch manner. Is done. In such a case, the smoothing parameters (XGate, YGate) are set to large values and the transition of the sample unit gain sm_pow [s] is advanced so that the control following the switching of the amplitude value is performed. Can do. On the other hand, when the value of the signal characteristic determination result sub_prm is sub_prm = 7, that is, when it is determined that the immediately preceding subframe is voiced sound and the subframe being processed is unvoiced sound, the amplitude value gradually increases. The situation is expected to change. In such a case, the value of the smoothing parameter (XGate, YGate) is set to a small value, and the amplitude value is gradually controlled by delaying the transition of the sample unit gain sm_pow [s].

信号特性判定結果ｓｕｂ＿ｐｒｍの値がこれ以外、すなわちｓｕｂ＿ｐｒｍ＝１，２，３，４，５，８の場合には、スムージングパラメータ（ＸＧａｔｅ，ＹＧａｔｅ）の値をｓｕｂ＿ｐｒｍ＝６の場合とｓｕｂ＿ｐｒｍ＝７の場合に比べ中間となる値に設定してもよい。これにより、音響出力デバイスから出力された音声の音量を使用者にとって自然に変化させることができる。 When the value of the signal characteristic determination result sub_prm is other than this, that is, when sub_prm = 1, 2, 3, 4, 5, 8, the values of the smoothing parameters (XGate, YGate) are set to the values of sub_prm = 6 and sub_prm = 7. It may be set to an intermediate value compared to the case. Thereby, the volume of the sound output from the acoustic output device can be naturally changed for the user.

なお、本実施例においては窓関数適応部１０５３がサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を算出する関数の一例として数１を挙げた。しかし、本発明が用いるサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］の算出方法はこれに限られるものではなく、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］が微分連続性を保ちながら値の遷移の遅速を変化させるものであれば、他の関数を用いて算出しても構わない。 In this embodiment, Expression 1 is given as an example of a function for the window function adaptation unit 1053 to calculate the sample unit gain sm_pow [n]. However, the method of calculating the sample unit gain sm_pow [n] used by the present invention is not limited to this, and the sample unit gain sm_pow [n] may change the slow transition of the value while maintaining differential continuity. For example, you may calculate using another function.

音量制御部１０６は、信号ｈｐ［ｎ］及びサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を入力として、信号ｈｐ［ｎ］中に含まれる振幅の値を変化させ、これをプレ処理信号ｄ［ｎ］として出力する。より具体的には、例えばプレ処理信号ｄ［ｎ］は信号ｈｐ［ｎ］とサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］の積算によって算出される。これにより、プレ処理信号ｄ［ｎ］の振幅の大きさは、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］に沿って変化することとなる。ここで、音量制御部１０６は、例としてプレ処理信号ｄ［ｎ］を信号ｈｐ［ｎ］とサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］の積算によって算出すると述べた。しかし、本発明が用いる音量制御部１０６の動作はこれに限られるものではなく、例えば信号ｈｐ［ｎ］に対してサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を累乗したものを積算しても構わないし、あるいは他の方法によりサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］に従い信号ｈｐ［ｎ］の振幅を重み付けして、プレ処理信号ｄ［ｎ］を出力するものであっても構わない。 The volume control unit 106 receives the signal hp [n] and the sample unit gain sm_pow [n], changes the amplitude value included in the signal hp [n], and outputs this as the pre-process signal d [n]. To do. More specifically, for example, the pre-process signal d [n] is calculated by integrating the signal hp [n] and the sample unit gain sm_pow [n]. As a result, the amplitude of the pre-processed signal d [n] changes along with the sample unit gain sm_pow [n]. Here, it has been described that the volume control unit 106 calculates the pre-process signal d [n] by integration of the signal hp [n] and the sample unit gain sm_pow [n] as an example. However, the operation of the volume control unit 106 used in the present invention is not limited to this. For example, the signal hp [n] may be multiplied by a power of the sample unit gain sm_pow [n], or The pre-processed signal d [n] may be output by weighting the amplitude of the signal hp [n] according to the sample unit gain sm_pow [n] by another method.

周波数特性補正部１０７は、プレ処理信号ｄ［ｎ］を入力として、音響出力デバイスの周波数特性に合わせてプレ処理信号ｄ［ｎ］の補正を行い、出力信号ｙ［ｎ］として出力する。例えば音響出力デバイスが４ｋＨｚ以上の周波数を持つ出力信号ｙ［ｎ］を出力するときに、４ｋＨｚ以下の周波数を持つ出力信号ｙ［ｎ］の出力に比べて出力音量が減衰するような周波数特性を有する場合を考える。この場合、周波数特性補正部１０７は、プレ処理信号ｄ［ｎ］中に含まれる４ｋＨｚ以上の周波数成分を増幅して出力信号ｙ［ｎ］として出力する。これにより、音響出力デバイスによる４ｋＨｚ以上の周波数成分の減衰と、周波数特性補正部１０７が行った４ｋＨｚ以上の周波数成分の増幅とが打ち消しあう。これにより、音響出力デバイスから出力される音声を聞く使用者は、入力音声の持つ本来の音質を体感することができる。より具体的には、周波数特性補正部１０７は、プレ処理信号ｄ［ｎ］に対して例えばＦＦＴ（ＦｉｎｉｔｅＦｏｕｒｉｅｒｔｒａｎｓｆｏｒｍ）処理などを施し、周波数領域の信号へと直交変換する。直交変換された周波数領域の信号に対して、音響出力デバイスに合わせて周波数特性の加減算を行う。周波数特性の加減算が行われた周波数領域の信号に対して例えばＩＦＦＴ（InverseＦｉｎｉｔｅＦ
ｏｕｒｉｅｒｔｒａｎｓｆｏｒｍ）処理などを施し、時間領域の信号へと直交変換する。直交変換された時間領域の信号を、出力信号ｙ［ｎ］として出力する。本実施例では周波数特性補正部１０７の動作の例として、プレ処理信号ｄ［ｎ］を周波数変換して周波数特性の補正を行う動作を述べた。しかし、本発明における周波数特性補正部１０７の動作はこれに限られるものではなく、例えばプレ処理信号ｄ［ｎ］の周波数領域への直交変換を省略し、時間領域のプレ処理信号ｄ［ｎ］に対してＩＩＲ（ＩｎｆｉｎｉｔｅＩｍｐｕｌｓｅＲｅｓｐｏｎｓｅ）フィルタやＦＩＲ（ＦｉｎｉｔｅＩｍｐｕｌｓｅＲｅｓｐｏｎｓｅ）フィルタ処理を施すことでも同様の発明の効果が得られる。 The frequency characteristic correction unit 107 receives the pre-processed signal d [n], corrects the pre-processed signal d [n] according to the frequency characteristic of the sound output device, and outputs it as an output signal y [n]. For example, when the acoustic output device outputs an output signal y [n] having a frequency of 4 kHz or higher, the frequency characteristic is such that the output volume is attenuated as compared with the output of the output signal y [n] having a frequency of 4 kHz or lower. Consider having it. In this case, the frequency characteristic correction unit 107 amplifies a frequency component of 4 kHz or more included in the pre-process signal d [n] and outputs the amplified signal as an output signal y [n]. As a result, the attenuation of the frequency component of 4 kHz or more by the acoustic output device and the amplification of the frequency component of 4 kHz or more performed by the frequency characteristic correction unit 107 cancel each other. Thereby, the user who hears the sound output from the sound output device can experience the original sound quality of the input sound. More specifically, the frequency characteristic correction unit 107 performs, for example, FFT (Finite Fourier transform) processing on the pre-processed signal d [n] and orthogonally transforms it into a frequency domain signal. Frequency characteristics are added to or subtracted from the orthogonally transformed frequency domain signal in accordance with the acoustic output device. For example, IFFT (InverseFinit F) is applied to a frequency domain signal subjected to frequency characteristic addition / subtraction.
(orientor transform) processing and the like, and orthogonal transform into a signal in the time domain. The orthogonally transformed time domain signal is output as an output signal y [n]. In the present embodiment, as an example of the operation of the frequency characteristic correcting unit 107, the operation of correcting the frequency characteristic by converting the frequency of the preprocess signal d [n] has been described. However, the operation of the frequency characteristic correction unit 107 in the present invention is not limited to this, and for example, the orthogonal transformation of the preprocessed signal d [n] to the frequency domain is omitted, and the preprocessed signal d [n] in the time domain is omitted. Similar effects of the present invention can also be obtained by applying an IIR (Infinite Impulse Response) filter or FIR (Finite Impulse Response) filter processing.

（制御部による音量制御処理の流れ）
図７は、制御部１００に入力された入力信号ｘ［ｎ］が出力信号ｙ［ｎ］として出力される際の、実質的な処理の流れを述べたフローチャートである。以下に、図７に沿って制御部１００が行う処理を述べる。 (Flow of volume control processing by the control unit)
FIG. 7 is a flowchart illustrating a substantial processing flow when the input signal x [n] input to the control unit 100 is output as the output signal y [n]. Below, the process which the control part 100 performs is described along FIG.

まず、制御部１００へと入力された入力信号ｘ［ｎ］は、直流成分除去フィルタ部１０１に入力される（ステップ１００１）。直流成分除去フィルタ部１０１は、入力信号ｘ［ｎ］中に含まれる直流成分を除去し、音声信号のみを抽出した信号ｈｐ［ｎ］をサブフレーム分割部１０２及び音量制御部１０６へと出力する。 First, the input signal x [n] input to the control unit 100 is input to the DC component removal filter unit 101 (step 1001). The DC component removal filter unit 101 removes the DC component included in the input signal x [n] and outputs the signal hp [n] obtained by extracting only the audio signal to the subframe division unit 102 and the volume control unit 106. .

信号ｈｐ［ｎ］は、サブフレーム分割部１０２に入力される（ステップ１００２）。サブフレーム分割部１０２は、Ｎ個のサンプルからなる信号ｈｐ［ｎ］を、Ｓ個のサンプルからなるサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］へと分割し、信号特性検出部１０３へと出力する。 The signal hp [n] is input to the subframe dividing unit 102 (step 1002). The subframe dividing unit 102 divides the signal hp [n] composed of N samples into the subframe signal sub_hpb [s] composed of S samples, and outputs it to the signal characteristic detection unit 103.

サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］は、サブフレーム振幅最大値検出部１０３１１へと入力される（ステップ１００３）。サブフレーム振幅最大値検出部１０３１１は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］内のサンプルのうち、最大の振幅の値を持つサンプルを抽出し、その振幅の値をサブフレーム最大振幅値ｓｕｂ＿ｍａｘとして有音・無音判定部１０３１２へと出力する。 The subframe signal sub_hpb [s] is input to the subframe amplitude maximum value detection unit 10311 (step 1003). The subframe amplitude maximum value detection unit 10311 extracts a sample having the maximum amplitude value from the samples in the subframe signal sub_hpb [s], and uses the amplitude value as the subframe maximum amplitude value sub_max. The sound is output to the silence determination unit 10312.

サブフレーム最大振幅値ｓｕｂ＿ｍａｘは、有音・無音判定部１０３１２へと入力される（ステップ１００４）。有音・無音判定部１０３１２は、サブフレーム最大振幅値ｓｕｂ＿ｍａｘと有音無音閾値αとの大小を比較し、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が有音であるか無音であるかを判断する。 The subframe maximum amplitude value sub_max is input to the sound / silence determination unit 10312 (step 1004). The voice / silence determination unit 10312 compares the subframe maximum amplitude value sub_max with the voice / silence threshold α to determine whether the subframe signal sub_hpb [s] is voiced or silent.

サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が有音であると判断されると（ステップ１００４の「有音」）、ｓｕｂ＿ｈｐｂ［ｓ］の入力を受けたゼロクロス数検出部１０３２１によって、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］内に含まれるゼロクロス点の検出が行われる（ステップ１００５）。ゼロクロス点の検出が行われると、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］内に含まれるゼロクロス点の数がゼロクロス数ｓｕｂ＿ｚｃとして有声音・無声音判定部１０３２２へと出力される。 If it is determined that the subframe signal sub_hpb [s] is sound (“sound” in step 1004), the subframe signal sub_hpb [s] is received by the zero cross number detection unit 10321 that receives the input of sub_hpb [s]. The zero cross point included in the frame is detected (step 1005). When the zero cross point is detected, the number of zero cross points included in the subframe signal sub_hpb [s] is output to the voiced / unvoiced sound determination unit 10322 as the zero cross number sub_zc.

一方、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が無音であると判断されると（ステップ１００４の「無音」）、ゼロクロス数検出部１０３２１によるゼロクロス点の検出は省略され、ゼロクロス数ｓｕｂ＿ｚｃの値はｓｕｂ＿ｚｃ＝０として有声音・無声音判定部１０３２２へと出力される。 On the other hand, if it is determined that the subframe signal sub_hpb [s] is silent (“silence” in step 1004), detection of the zero cross point by the zero cross number detection unit 10321 is omitted, and the value of the zero cross number sub_zc is sub_zc = 0. Is output to the voiced / unvoiced sound determination unit 10322.

ゼロクロス数ｓｕｂ＿ｚｃは、有声音・無声音判定部１０３２２へと入力される（ステップ１００６）。有声音・無声音判定部１０３２２は、ゼロクロス数ｓｕｂ＿ｚｃと閾値βとの大小を比較し、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が無声音であるか否かを判断する。 The zero-cross number sub_zc is input to the voiced / unvoiced sound determination unit 10322 (step 1006). Voiced / unvoiced sound determination section 10322 compares the number of zero crosses sub_zc and threshold β to determine whether subframe signal sub_hpb [s] is an unvoiced sound.

ステップ１００４及びステップ１００６によって、ｓｕｂ＿ｈｐｂ［ｓ］が無音であるか、有音であるか、あるいは無声音であるかが判断され、この判断結果は信号特性判定結果ｓｕｂ＿ｐｒｍとしてサブフレーム単位ゲイン算出部１０４及び窓関数適応部１０５２へと出力される。 In step 1004 and step 1006, it is determined whether sub_hpb [s] is silent, voiced, or unvoiced, and this determination result is a signal characteristic determination result sub_prm. The data is output to the window function adaptation unit 1052.

信号特性判定結果ｓｕｂ＿ｐｒｍは、サブフレーム単位ゲイン算出部１０４へと入力される。サブフレーム単位ゲイン算出部１０４は、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が無音であるか、あるいは有声音である場合は（ステップ１００４の「無音」及びステップ１００６の「有声音」）、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値をｓｕｂ＿ｐｏｗ＝１として、サブフレーム単位ゲイン算出部１０５１へ出力する（ステップ１００８）。 The signal characteristic determination result sub_prm is input to the subframe unit gain calculation unit 104. When the subframe signal sub_hpb [s] is silent or voiced (“silence” in step 1004 and “voiced” in step 1006), the subframe unit gain calculation unit 104 performs subframe unit gain. The value of sub_pow is set to sub_pow = 1 and is output to subframe unit gain calculation section 1051 (step 1008).

一方、サブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］が無声音である場合は（ステップ１００６の「無声音」）、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値をｓｕｂ＿ｐｏｗ＜１となる値に設定して、サブフレーム単位ゲイン算出部１０５１へ出力する（ステップ１００７）。ここで、サブフレーム単位ゲイン算出部１０４の動作の項で述べたように、サブフレーム単位ゲインｓｕｂ＿ｐｏｗの値は、直前のサブフレームが無音であるか、有音であるか、あるいは無声音であるかに応じて変化させてもよい。 On the other hand, when the subframe signal sub_hpb [s] is an unvoiced sound (“unvoiced sound” in step 1006), the value of the subframe unit gain sub_pow is set to a value satisfying sub_pow <1, and the subframe unit gain calculation unit 1051 (Step 1007). Here, as described in the section of the operation of the subframe unit gain calculation unit 104, the value of the subframe unit gain sub_pow is whether the immediately preceding subframe is silent, voiced, or unvoiced. You may change according to.

サブフレーム単位ゲインｓｕｂ＿ｐｏｗは、サブフレーム変化量算出部１０５１へと入力される（ステップ１００９）。サブフレーム変化量算出部１０５１は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗと、直前のフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１とを比較する。サブフレーム変化量算出部１０５１は、サブフレーム単位ゲインｓｕｂ＿ｐｏｗと、直前のフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１との差が大きい場合には、自然な音量変化が得られるリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗを閾値γに基づいて再設定し、窓関数適応部１０５２へ出力する。 The subframe unit gain sub_pow is input to the subframe change amount calculation unit 1051 (step 1009). The subframe change amount calculation unit 1051 compares the subframe unit gain sub_pow with the limiter control gain limit_pow1 of the immediately preceding frame. When the difference between the subframe unit gain sub_pow and the limiter control gain limit_pow1 of the immediately preceding frame is large, the subframe change amount calculation unit 1051 sets the limiter control gain limit_pow to obtain a natural volume change based on the threshold γ. Reset and output to the window function adaptation unit 1052.

リミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗは、窓関数適応部１０５２へと入力される。窓関数適応部１０５２は、信号特性判定結果ｓｕｂ＿ｐｒｍから、直前のサブフレーム信号が有音であるか、あるいは無音であるかを判断する。これが無音である場合は（ステップ１０１０の「無音」）、窓関数適応部１０５２は直前のリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１の値から処理中のフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値へと早く遷移する、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を算出する（ステップ１０１１）。 The limiter control gain limit_pow is input to the window function adaptation unit 1052. The window function adaptation unit 1052 determines whether the immediately preceding subframe signal is sound or silence from the signal characteristic determination result sub_prm. If this is silence (“silence” in step 1010), the window function adaptation unit 1052 quickly changes from the value of the previous limiter control gain limit_pow1 to the value of the limiter control gain limit_pow of the frame being processed. sm_pow [s] is calculated (step 1011).

一方、直前のサブフレーム信号が有音である場合は（ステップ１０１０の「有音」）、窓関数適応部１０５２は直前のリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗ１の値から処理中のフレームのリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗの値へと緩やかに遷移する、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を算出する（ステップ１０１２）。窓関数適応部１０５２は、算出されたサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を結合し、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を生成する（ステップ１０１３）。生成されたサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］は、音量制御部１０６へと出力される。 On the other hand, if the immediately preceding subframe signal is sound (“sound” in step 1010), the window function adaptation unit 1052 determines the value of the limiter control gain limit_pow of the frame being processed from the value of the immediately previous limiter control gain limit_pow1. The sample unit gain sm_pow [s] that gradually transitions to is calculated (step 1012). The window function adaptation unit 1052 combines the calculated sample unit gains sm_pow [s] to generate a sample unit gain sm_pow [n] (step 1013). The generated sample unit gain sm_pow [n] is output to the volume control unit 106.

サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］は、音量制御部１０６へと入力される（ステップ１０１４）。音量制御部１０６は、信号ｈｐ［ｎ］中に含まれる振幅の値を、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］に従って変化させ、これをプレ処理信号ｄ［ｎ］として、周波数特性補正部１０７へと出力する。 The sample unit gain sm_pow [n] is input to the volume control unit 106 (step 1014). The volume control unit 106 changes the value of the amplitude included in the signal hp [n] according to the sample unit gain sm_pow [n], and outputs this to the frequency characteristic correction unit 107 as the pre-processed signal d [n]. To do.

プレ処理信号ｄ［ｎ］は、周波数特性補正部１０７へと入力される（ステップ１０１５）。周波数特性補正部１０７は、制御部１００に接続された音響出力デバイスの周波数特性に合わせてプレ処理信号ｄ［ｎ］の補正を行い、出力信号ｙ［ｎ］として出力する。 The pre-process signal d [n] is input to the frequency characteristic correction unit 107 (step 1015). The frequency characteristic correction unit 107 corrects the preprocessed signal d [n] according to the frequency characteristic of the acoustic output device connected to the control unit 100, and outputs the corrected signal as an output signal y [n].

上述した一連の処理によって、制御部１００は入力された音声信号に無声音が含まれるか否かを判定し、無声音が含まれる場合には音声信号の振幅の値を減少させる処理を行う。音声信号中の無声音が含まれる箇所に対して振幅の値を減少させる処理を行うことで、音響出力デバイスの出力周波数特性に合わせて４ｋＨｚ以上の帯域の音声信号の増幅がなされた場合であっても、音声信号が振幅飽和を起こす事態を防ぐことができる。 Through the series of processes described above, the control unit 100 determines whether or not an unvoiced sound is included in the input sound signal. If the unvoiced sound is included, the control unit 100 performs a process of reducing the amplitude value of the sound signal. This is a case where an audio signal in a band of 4 kHz or more is amplified in accordance with the output frequency characteristic of the acoustic output device by performing a process of reducing the amplitude value for a portion including an unvoiced sound in the audio signal. However, it is possible to prevent the audio signal from causing amplitude saturation.

更に、制御部１００はサブフレーム毎にリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗを算出するが、このリミッタ制御ゲインｌｉｍｉｔ＿ｐｏｗを滑らかに変化させるようにサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を算出する。このサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］に従って各サンプルの振幅補正を行うため、使用者に音量の急激な変化による不自然な印象を与えることがない。 Furthermore, the control unit 100 calculates the limiter control gain limit_pow for each subframe, and calculates the sample unit gain sm_pow [n] so as to smoothly change the limiter control gain limit_pow. Since the amplitude of each sample is corrected according to the sample unit gain sm_pow [n], the user is not given an unnatural impression due to a sudden change in volume.

更に、制御部１００は無音状態から急に無声音が発生された場合はサンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］を急激に変化させる。これにより、急な振幅飽和の発生を防ぎ、音量制御を時間的に精度良く行うことができる。一方、有声音から無声音に変化する場合には急には振幅飽和が発生しないため、サンプル単位ゲインｓｍ＿ｐｏｗ［ｎ］の変化を緩やかに行う。これにより、使用者にとって不自然な印象を与えることなく振幅を制御することができる。 Further, the control unit 100 abruptly changes the sample unit gain sm_pow [n] when an unvoiced sound is suddenly generated from the silent state. As a result, sudden amplitude saturation can be prevented and volume control can be performed with high accuracy in terms of time. On the other hand, when changing from voiced sound to unvoiced sound, amplitude saturation does not occur suddenly, so the sample unit gain sm_pow [n] is gradually changed. As a result, the amplitude can be controlled without giving an unnatural impression to the user.

（第２の実施例）
図８は、本発明の第２の実施形態に関わる電子機器に内蔵される、制御部１００及び、制御部１００に接続された第２制御部２００の構成を示す図である。 (Second embodiment)
FIG. 8 is a diagram illustrating a configuration of the control unit 100 and the second control unit 200 connected to the control unit 100, which are built in the electronic apparatus according to the second embodiment of the present invention.

制御部１００から出力される出力信号ｙ［ｎ］は、入力信号ｘ［ｎ］中に含まれる無声音部分の振幅が減衰された信号となっている。従って、無声音の増幅が原因となって発生する振幅飽和が生じる事態は防がれている。ところで入力信号ｘ［ｎ］は、例えば入力信号ｘ［ｎ］を生成した話者のアクセントや、入力信号ｘ［ｎ］を生成する原信号となった音楽などの強弱、あるいは入力信号を生成する機器の状態の変化などによって、その振幅の強弱には大きな差が生じることとなる。制御部１００から出力される出力信号ｙ［ｎ］も依然として、振幅の強弱に大きな差を含んでいる。この様な振幅の変化の大きい信号を音響出力デバイスから出力すると、例えば無音状態から突然大きな音声が音響出力デバイスから出力されることで携帯電話機の使用者に不快な印象を与える虞や、あるいは音量の小さい音声が音響出力デバイスから出力されることで携帯電話機の使用者が音声を聞き取ることが出来なくなる虞があった。 The output signal y [n] output from the control unit 100 is a signal in which the amplitude of the unvoiced sound part included in the input signal x [n] is attenuated. Therefore, a situation in which amplitude saturation caused by amplification of unvoiced sound occurs is prevented. By the way, the input signal x [n] generates, for example, the accent of the speaker who generated the input signal x [n], the strength of music that has become the original signal for generating the input signal x [n], or the input signal. Due to changes in the state of the device, etc., there will be a large difference in the amplitude. The output signal y [n] output from the control unit 100 still includes a large difference in amplitude. When such a signal with a large change in amplitude is output from the sound output device, for example, a sudden loud sound may be output from the sound output device from a silent state, which may give an unpleasant impression to the user of the mobile phone, or the volume. There is a possibility that the user of the mobile phone cannot hear the voice because the small voice is output from the acoustic output device.

そこで、第２の実施形態においては、制御部１００に更に第２制御部２００を接続し、第２制御部によって出力信号ｙ［ｎ］の振幅の正規化を行う。そして、正規化された出力信号ｙ［ｎ］を出力信号ｚ［ｎ］として音響出力デバイスへ出力する。振幅を正規化することにより、一定の音量を保った聞き取りやすい音声信号を出力することができる。 Therefore, in the second embodiment, the second control unit 200 is further connected to the control unit 100, and the amplitude of the output signal y [n] is normalized by the second control unit. Then, the normalized output signal y [n] is output to the acoustic output device as the output signal z [n]. By normalizing the amplitude, it is possible to output an easily audible audio signal with a constant volume.

更に、無音状態から急に音声が発生する音声信号が入力された場合は、振幅ゲインの変化も急激なものとする。これにより、無音状態から突然大きな音量が出力される事態を避け、音量制御を時間的に精度良く行うことができる。一方、音声が継続するような音声信号が入力された場合は、振幅ゲインの変化を緩やかなものとする。これにより、使用者にとって不自然な印象を与えることなく振幅を制御することができる。 Furthermore, when an audio signal that suddenly generates audio from a silent state is input, the change in amplitude gain is also abrupt. As a result, it is possible to avoid a situation in which a large sound volume is suddenly output from the silent state, and to perform sound volume control with high accuracy in time. On the other hand, when an audio signal that continues the audio is input, the change in the amplitude gain is made gradual. As a result, the amplitude can be controlled without giving an unnatural impression to the user.

以下、図３に示す第2制御部２００の各構成要素について、その構成と役割を述べる。制
御部２００は、有音・無音検出部２０１、フレーム振幅最大値検出部２０２、フレーム単位ゲイン算出部２０３、第２サンプル単位ゲイン算出部２０４、及び音量正規化部２０５から構成される。 Hereinafter, the configuration and role of each component of the second control unit 200 shown in FIG. 3 will be described. The control unit 200 includes a sound / silence detection unit 201, a frame amplitude maximum value detection unit 202, a frame unit gain calculation unit 203, a second sample unit gain calculation unit 204, and a volume normalization unit 205.

有音・無音検出部２０１は、直流成分制御フィルタ１０１から出力された信号ｈｐ［ｎ］を入力として、信号ｈｐ［ｎ］が有音であるか、無音であるかを判別する。また、直前のフレームの信号ｈｐ［ｎ］が有音であるか、無音であるかの情報を用いて、直前フレームと処理中のフレーム間の有音・無音の切り替わりを示す情報を、有音・無音検出結果ｆｒ＿ｍｕｍとして出力する。有音・無音検出部２０１は、第１の実施例で述べたサブフレーム信号ｓｕｂ＿ｈｐｂ［ｓ］を入力とする有音・無音検出部１０３１と同様の処理によって、ｈｐ［ｎ］の有音・無音の判別を行う。 The voice / silence detection unit 201 receives the signal hp [n] output from the DC component control filter 101 and determines whether the signal hp [n] is voiced or silent. Also, using information on whether the signal hp [n] of the immediately preceding frame is sounded or silent, information indicating the switching between sounded and silenced between the immediately preceding frame and the frame being processed is used as sounded. -Output as silence detection result fr_um. The voice / silence detection unit 201 performs the same process as the voice / silence detection unit 1031 that receives the subframe signal sub_hpb [s] described in the first embodiment, to generate the voice / silence of hp [n]. To determine.

以下に、処理中のフレームが有音・無音であるか、また直前のフレームが有音・無音であるかによって有音・無音検出結果ｆｒ＿ｍｕｍ及び直前フレームの有音・無音検出結果ｆｒ＿ｍｕｍ１を算出する条件を示す。 In the following, the sound / silence detection result fr_um and the sound / silence detection result fr_um1 of the immediately preceding frame are calculated depending on whether the frame being processed is sound / silence or whether the immediately preceding frame is sound / silence. Indicates conditions.

直前のフレームと処理中のフレームが共に無音であると判断された場合に、有音・無音検出結果ｆｒ＿ｍｕｍの値をｆｒ＿ｍｕｍ＝０、直前フレームの有音・無音検出結果ｆｒ＿ｍｕｍ１の値をｆｒ＿ｍｕｍ１＝０とする。 When it is determined that both the immediately preceding frame and the frame being processed are silent, the value of the sound / silence detection result fr_mum is 0, and the value of the sound / silence detection result fr_mum1 of the immediately previous frame is fr_mum1 = 0. And

直前のフレームが無音であり、処理中のフレームが有音であると判断された場合に、有音・無音検出結果ｆｒ＿ｍｕｍの値をｆｒ＿ｍｕｍ＝１、直前フレームの有音・無音検出結果ｆｒ＿ｍｕｍ１の値をｆｒ＿ｍｕｍ１＝１とする。 When it is determined that the previous frame is silent and the frame being processed is voiced, the value of the voiced / silent detection result fr_um = 1 is set to fr_mum = 1, and the value of the voiced / silent detection result fr_mum1 of the immediately preceding frame is set. Is fr_um1 = 1.

直前のフレームが有音であり、処理中のフレームが無音であると判断された場合に、有音・無音検出結果ｆｒ＿ｍｕｍの値をｆｒ＿ｍｕｍ＝２、直前フレームの有音・無音検出結果ｆｒ＿ｍｕｍ１の値をｆｒ＿ｍｕｍ１＝０とする。 When it is determined that the immediately preceding frame is sound and the frame being processed is silent, the value of the sound / silence detection result fr_mum is set to fr_mum = 2, and the value of the sound / silence detection result fr_mum1 of the immediately preceding frame is determined. Is fr_um1 = 0.

直前のフレームと処理中のフレームが共に有音であると判断された場合に、有音・無音検出結果ｆｒ＿ｍｕｍの値をｆｒ＿ｍｕｍ＝３、直前フレームの有音・無音検出結果ｆｒ＿ｍｕｍ１の値をｆｒ＿ｍｕｍ１＝１とする。 When it is determined that both the previous frame and the frame being processed are sound, the value of the sound / silence detection result fr_mum is 3, and the value of the sound / silence detection result fr_mum1 of the immediately previous frame is fr_mum1 = Set to 1.

フレーム振幅最大値検出部２０２は、有音・無音検出結果ｆｒ＿ｍｕｍ及び出力信号ｙ［ｎ］を入力として、出力信号ｙ［ｎ］内に含まれる振幅の最大値を検出する。この検出された振幅の最大値をフレーム振幅最大値ｆｒ＿ｍａｘとして出力する。また、入力された有音・無音検出結果ｆｒ＿ｍｕｍの値がｆｒ＿ｍｕｍ＝０である場合には、最大値の検出動作を省略し、フレーム振幅最大値ｆｒ＿ｍａｘの値をｆｒ＿ｍａｘ＝０として出力してもよい。また、フレーム振幅最大値検出部２０２は、信号ｈｐ［ｎ］内に含まれる振幅の最大値を検出し出力すると述べた。しかし、フレーム振幅最大値検出部２０３が出力する値はこれに限られるものではなく、信号ｈｐ［ｎ］の平均の振幅レベルを出力するものであっても構わない。また、信号ｈｐ［ｎ］の平均のスペクトルパワーを出力するものであっても構わない。 The frame amplitude maximum value detection unit 202 receives the sound / silence detection result fr_um and the output signal y [n], and detects the maximum value of the amplitude included in the output signal y [n]. The maximum value of the detected amplitude is output as the frame amplitude maximum value fr_max. Further, when the value of the input sound / silence detection result fr_um is fr_um = 0, the maximum value detection operation may be omitted and the value of the frame amplitude maximum value fr_max may be output as fr_max = 0. . Further, it has been stated that the frame amplitude maximum value detection unit 202 detects and outputs the maximum value of the amplitude included in the signal hp [n]. However, the value output by the frame amplitude maximum value detection unit 203 is not limited to this, and the average amplitude level of the signal hp [n] may be output. Moreover, you may output the average spectral power of signal hp [n].

フレーム単位ゲイン算出部２０３は、フレーム振幅最大値検出部２０２から出力されたフレーム振幅最大値ｆｒ＿ｍａｘ及び有音・無音検出部２０１から出力された有音・無音検出結果ｆｒ＿ｍｕｍを入力として、音量制御を行うための基準値であるフレーム単位ゲインｆｒ＿ｐｏｗを、フレーム毎に算出する。より具体的には、有音・無音検出結果ｆｒ＿ｍｕｍの値に応じて、フレーム単位ゲインｆｒ＿ｐｏｗの値をｆｒ＿ｐｏｗ＝ｆ（ｂ）（ｂ＝０，１，２，３）と設定する。ここで、有音・無音検出結果ｆｒ＿ｍｕｍの値がｆｒ＿ｍｕｍ＝＝０、ｆｒ＿ｍｕｍ＝＝２のいずれかである場合、即ち処理中のフレームが無音であると判断された場合は、処理中のフレームに振幅を増減する制御を行う必要がない。従って、処理中のフレームが無音であると判断された場合は、フレーム単位ゲインｆｒ＿ｐｏｗの値はｆｒ＿ｐｏｗ＝１が設定される。一方、有音・無音検出結果ｆｒ＿ｍｕｍの値がｆｒ＿ｍｕｍ＝＝１、ｆｒ＿ｍｕｍ＝＝３のいずれかである場合、即ち処理中のフレームが有音であると判断された場合は、処理中のフレームが音響出力デバイスから出力された際に一定の音量となるよう、振幅の規格化を行う必要がある。従って、処理中のフレームが有音であると判断された場合は、フレーム単位ゲインｆｒ＿ｐｏｗの値はｆｒ＿ｐｏｗ＝（ＯＵＴ＿ＬＥＶＥＬ／ｆｒ＿ｍａｘ）が設定される。ここで、音量正規化目標レベルＯＵＴ＿ＬＥＶＥＬは、出力目標とする振幅レベルである。音量正規化目標レベルＯＵＴ＿ＬＥＶＥＬは、予め定められた値を用いても構わないし、携帯電話機の使用者が必要に応じて定める値であっても構わない。 The frame unit gain calculation unit 203 receives the frame amplitude maximum value fr_max output from the frame amplitude maximum value detection unit 202 and the sound / silence detection result fr_um output from the sound / silence detection unit 201, and performs volume control. A frame unit gain fr_pow, which is a reference value for performing, is calculated for each frame. More specifically, the value of the frame unit gain fr_pow is set to fr_pow = f (b) (b = 0, 1, 2, 3) according to the value of the sound / silence detection result fr_um. Here, in the case where the value of the sound / silence detection result fr_um is either fr_um == 0 or fr_um == 2, that is, when it is determined that the frame being processed is silent, There is no need to perform control to increase or decrease the amplitude. Therefore, when it is determined that the frame being processed is silent, the value of the frame unit gain fr_pow is set to fr_pow = 1. On the other hand, if the value of the sound / silence detection result fr_um is either fr_um == 1 or fr_um == 3, that is, if it is determined that the frame being processed is sound, the frame being processed is It is necessary to normalize the amplitude so that the sound volume becomes constant when output from the sound output device. Therefore, when it is determined that the frame being processed is sound, the value of the frame unit gain fr_pow is set to fr_pow = (OUT_LEVEL / fr_max). Here, the volume normalization target level OUT_LEVEL is an amplitude level that is an output target. A predetermined value may be used as the volume normalization target level OUT_LEVEL, or a value determined by the user of the mobile phone as necessary.

第２サンプル単位ゲイン算出部２０４は、フレーム単位ゲイン算出部２０３から出力されたフレーム単位ゲインｆｒ＿ｐｏｗ及び有音・無音検出結果ｆｒ＿ｍｕｍを入力として、出力信号ｙ［ｎ］を構成する各サンプルの振幅の値を制御するための第２サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］を出力する。第２サンプル単位ゲイン算出部２０４は、信号特性判定結果ｓｕｂ＿ｐｒｍ及びサブフレーム単位ゲインｓｕｂ＿ｐｏｗを入力として、サンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］を出力するサンプル単位ゲイン算出部１０５と同様の処理によって、第２サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の出力を行う。先述したサンプル単位ゲイン算出部１０５は、信号特性判定結果ｓｕｂ＿ｐｒｍに応じてサンプル単位ゲインｓｍ＿ｐｏｗ［ｓ］の遷移の遅速を変化させていた。一方第２サンプル単位ゲイン算出部２０４は、有音・無音検出結果ｆｒ＿ｍｕｍに応じて第２サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の遷移の遅速を変化させる。例えば有音・無音検出結果ｆｒ＿ｍｕｍの値がｆｒ＿ｍｕｍ＝＝１であるとき、即ち直前のフレームが無音であり、処理中のフレームが有音である場合は、無音から有音への変化はスイッチ的に急激に行われる事態が考えられる。この様な場合には第２サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の遷移を早めることで、スイッチ的な音量の変化に対しても追従性よく音量の補正を行うことができる。一方、有音・無音検出結果ｆｒ＿ｍｕｍの値がｆｒ＿ｍｕｍ＝＝３であるとき、即ち直前のフレームと処理中のフレームが共に有音である場合は、音量の変化は比較的緩やかに行われる事態が考えられる。この様な場合には第２サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の遷移を遅めることで、携帯電話機の使用者にとって自然な補正量で音量の補正を行うことが可能である。 The second sample unit gain calculation unit 204 receives the frame unit gain fr_pow and the sound / silence detection result fr_um output from the frame unit gain calculation unit 203 as input, and calculates the amplitude of each sample constituting the output signal y [n]. A second sample unit gain auto_gain [n] for controlling the value is output. The second sample unit gain calculation unit 204 receives the signal characteristic determination result sub_prm and the subframe unit gain sub_pow and outputs the second sample by the same process as the sample unit gain calculation unit 105 that outputs the sample unit gain sm_pow [s]. The unit gain auto_gain [n] is output. The above-described sample unit gain calculation unit 105 changes the slow speed of the transition of the sample unit gain sm_pow [s] according to the signal characteristic determination result sub_prm. On the other hand, the second sample unit gain calculation unit 204 changes the slowness of the transition of the second sample unit gain auto_gain [n] according to the sound / silence detection result fr_um. For example, when the value of the voice / silence detection result fr_um is fr_um == 1, that is, the previous frame is silent and the frame being processed is voiced, the change from silent to voice is switched. It is conceivable that the situation will happen suddenly. In such a case, the volume can be corrected with good follow-up even when the volume of the switch is switched by advancing the transition of the second sample unit gain auto_gain [n]. On the other hand, when the value of the voiced / silent detection result fr_um is fr_um == 3, that is, when both the immediately preceding frame and the frame being processed are both voiced, the volume may change relatively slowly. Conceivable. In such a case, by delaying the transition of the second sample unit gain auto_gain [n], the volume can be corrected with a natural correction amount for the user of the mobile phone.

音量正規化部２０５は、出力信号ｙ［ｎ］及びサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］を入力として、出力信号ｙ［ｎ］の各サンプルの振幅の値を変化させる。より具体的には、音量正規化部２０５は、出力信号ｚ［ｎ］を、出力信号ｙ［ｎ］とサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］との積算から算出して出力する。例えばサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］が１より小さな値である場合には、出力信号ｙ［ｎ］に比べ出力信号ｚ［ｎ］は小さな振幅の値に減衰されることとなる。一方、例えばサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］が１より大きな値である場合には、出力信号ｙ［ｎ］に比べ出力信号ｚ［ｎ］は大きな振幅の値に増幅されることとなる。こうした音量正規下部２０５による振幅値の操作によって、音量正規化目標レベルＯＵＴ＿ＬＥＶＥＬに沿った出力信号ｚ［ｎ］を得ることができる。 The volume normalization unit 205 receives the output signal y [n] and the sample unit gain auto_gain [n] and changes the amplitude value of each sample of the output signal y [n]. More specifically, the volume normalization unit 205 calculates and outputs the output signal z [n] from the integration of the output signal y [n] and the sample unit gain auto_gain [n]. For example, when the sample unit gain auto_gain [n] is a value smaller than 1, the output signal z [n] is attenuated to a value having a smaller amplitude than the output signal y [n]. On the other hand, for example, when the sample unit gain auto_gain [n] is a value larger than 1, the output signal z [n] is amplified to a value having a larger amplitude than the output signal y [n]. The output signal z [n] along the volume normalization target level OUT_LEVEL can be obtained by manipulating the amplitude value by the volume normal lower part 205.

（第２制御部による音量制御処理の流れ）
図９は、第２制御部２００に入力された出力信号ｙ［ｎ］が出力信号ｚ［ｎ］として出力される際の、実質的な処理の流れを述べたフローチャートである。以下に、図９に沿って第２制御部２００が行う処理を述べる。 (Flow of volume control processing by second control unit)
FIG. 9 is a flowchart describing a substantial process flow when the output signal y [n] input to the second control unit 200 is output as the output signal z [n]. Below, the process which the 2nd control part 200 performs is described along FIG.

まず、第２制御部２００へと入力された出力信号ｙ［ｎ］は、フレーム振幅最大値検出部２０２へと入力される（ステップ２００１）。フレーム振幅最大値検出部は、出力信号ｙ［ｎ］内のサンプルのうち、最大の振幅の値を持つサンプルを抽出し、その振幅の値をフレーム最大振幅値ｆｒ＿ｍａｘとしてフレーム単位ゲイン算出部２０３へと出力する。 First, the output signal y [n] input to the second control unit 200 is input to the frame amplitude maximum value detection unit 202 (step 2001). The frame amplitude maximum value detection unit extracts a sample having the maximum amplitude value from the samples in the output signal y [n], and sets the amplitude value as the frame maximum amplitude value fr_max to the frame unit gain calculation unit 203. Is output.

一方、第２制御部２００へと入力された信号ｈｐ［ｎ］は、有音・無音検出部２０１へと入力される（ステップ２００２）。有音・無音検出部２０１は、信号ｈｐ［ｎ］の振幅の値から、信号ｈｐ［ｎ］が有音であるか、あるいは無音であるかを判断する。有音・無音検出部２０１は、直前のフレームが有音であるかあるいは無音であるかと、処理中のフレームが有音であるかあるいは無音であるかを判別した有音無音検出結果ｆｒ＿ｍｕｍを、フレーム単位ゲイン算出部２０３へと出力する。 On the other hand, the signal hp [n] input to the second control unit 200 is input to the voice / silence detection unit 201 (step 2002). The voice / silence detection unit 201 determines whether the signal hp [n] is voiced or silent from the amplitude value of the signal hp [n]. The voice / silence detection unit 201 determines whether the immediately preceding frame is voiced or silent, and the voiced / silent detection result fr_um that determines whether the frame being processed is voiced or silent. This is output to the frame unit gain calculation unit 203.

有音・無音検出結果ｆｒ＿ｍｕｍ及びフレーム振幅最大値ｆｒ＿ｍａｘは、フレーム単位ゲイン算出部２０３へと入力される。フレーム単位ゲイン算出部２０３は、有音・無音検出結果ｆｒ＿ｍｕｍから入力信号ｙ［ｎ］が有音か無音かの判断を行う（ステップ２００２）。入力信号ｙ［ｎ］が有音であると判断されると（ステップ２００２の「有音」）、フレーム単位ゲイン算出部２０３はフレーム振幅最大値ｆｒ＿ｍａｘを規格化したフレーム単位ゲインｆｒ＿ｐｏｗを出力する（ステップ２００３）。 The voiced / silent detection result fr_um and the maximum frame amplitude value fr_max are input to the frame unit gain calculation unit 203. The frame unit gain calculation unit 203 determines whether the input signal y [n] is sound or sound from the sound / silence detection result fr_um (step 2002). When it is determined that the input signal y [n] is sound (“sound” in step 2002), the frame unit gain calculation unit 203 outputs a frame unit gain fr_pow in which the frame amplitude maximum value fr_max is standardized ( Step 2003).

一方、入力信号ｙ［ｎ］が無音であると判断されると（ステップ２００２の「無音」）、フレーム単位ゲイン算出部２０３はフレーム単位ゲインｆｒ＿ｐｏｗの値をｆｒ＿ｐｏｗ＝１として出力する（ステップ２００４）。 On the other hand, when it is determined that the input signal y [n] is silent (“silence” in step 2002), the frame unit gain calculation unit 203 outputs the value of the frame unit gain fr_pow = 1 as fr_pow = 1 (step 2004). .

フレーム単位ゲインｆｒ＿ｐｏｗ及び有音・無音検出結果ｆｒ＿ｍｕｍは、第２サンプル単位ゲイン算出部２０４へと入力される。第２サンプル単位ゲイン算出部２０４は、有音・無音検出結果ｆｒ＿ｍｕｍから直前のフレームの入力信号ｙ［ｎ］が有音か無音かの判断を行う（ステップ２００５）。直前のフレームの入力信号ｙ［ｎ］が無音であると判断されると（ステップ２００５の「無音」）、第２サンプル単位ゲイン算出部２０４は直前のフレームのフレーム単位ゲインｆｒ＿ｐｏｗ１から処理中のフレームのフレーム単位ゲインｆｒ＿ｐｏｗへと早く遷移するサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］を算出する（ステップ２００６）。 The frame unit gain fr_pow and the sound / silence detection result fr_um are input to the second sample unit gain calculation unit 204. The second sample unit gain calculation unit 204 determines whether the input signal y [n] of the immediately preceding frame is sound or sound from the sound / silence detection result fr_um (step 2005). When it is determined that the input signal y [n] of the immediately preceding frame is silent (“silence” in step 2005), the second sample unit gain calculating unit 204 calculates the frame being processed from the frame unit gain fr_pow1 of the immediately preceding frame. A sample unit gain auto_gain [n] that makes a transition to the frame unit gain fr_pow is calculated (step 2006).

一方、直前のフレームの入力信号ｙ［ｎ］が有音であると判断されると（ステップ２００５の「有音」）、第２サンプル単位ゲイン算出部２０４は、直前のフレームのフレーム単位ゲインｆｒ＿ｐｏｗ１から処理中のフレームのフレーム単位ゲインｆｒ＿ｐｏｗへと緩やかに遷移するサンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］を算出する（ステップ２００７）。 On the other hand, when it is determined that the input signal y [n] of the immediately preceding frame is sound (“sound” in step 2005), the second sample unit gain calculator 204 determines the frame unit gain fr_pow1 of the immediately previous frame. Sample unit gain auto_gain [n] that gradually transitions from to the frame unit gain fr_pow of the frame being processed (step 2007).

サンプル単位ゲインａｕｔｏ＿ｇａｉｎ［ｎ］は、音量正規化部２０５へと入力される（ステップ２００８）。音量正規化部２０５は、出力信号ｙ［ｎ］中に含まれる振幅の値を、振幅ゲインａｕｔｏ＿ｇａｉｎ［ｎ］に従って変化させ、これを出力信号ｚ［ｎ］として出力する。 The sample unit gain auto_gain [n] is input to the volume normalization unit 205 (step 2008). The volume normalization unit 205 changes the amplitude value included in the output signal y [n] according to the amplitude gain auto_gain [n], and outputs this as the output signal z [n].

上述した一連の処理によって、第２制御部２００は、制御部１００が出力した出力信号ｙ［ｎ］の振幅の値を音量正規化目標レベルＯＵＴ＿ＬＥＶＥＬに従って正規化し、正規化された出力信号ｚ［ｎ］を出力する。振幅を正規化することにより、一定の音量を保った聞き取りやすい音声信号を出力することができる。 Through the series of processes described above, the second control unit 200 normalizes the amplitude value of the output signal y [n] output from the control unit 100 according to the volume normalization target level OUT_LEVEL, and normalizes the output signal z [n ] Is output. By normalizing the amplitude, it is possible to output an easily audible audio signal with a constant volume.

更に、無音状態から急に音声が発生する出力信号ｙ［ｎ］が入力された場合は、振幅ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の変化も急激なものとする。これにより、無音状態から突然大きな音量が出力される事態を避け、音量制御を時間的に精度良く行うことができる。一方、音声が継続するような出力信号ｙ［ｎ］が入力された場合は、振幅ゲインａｕｔｏ＿ｇａｉｎ［ｎ］の変化を緩やかなものとする。これにより、使用者にとって不自然な印象を与えることなく振幅を制御することができる。 Furthermore, when an output signal y [n] that suddenly generates sound from a silent state is input, the amplitude gain auto_gain [n] also changes rapidly. As a result, it is possible to avoid a situation in which a large sound volume is suddenly output from the silent state, and to perform sound volume control with high accuracy in time. On the other hand, when the output signal y [n] that continues the voice is input, the change of the amplitude gain auto_gain [n] is made gradual. As a result, the amplitude can be controlled without giving an unnatural impression to the user.

なお、本発明は、上記実施形態に限定されるものではなく、発明の要旨を逸脱しない範囲であれば、構成要素を変形して具体化しても良い。また、上記各実施形態に開示されている複数の構成要素の適宣な組み合わせにより、種々の発明を形成しても良い。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。
In addition, this invention is not limited to the said embodiment, As long as it is a range which does not deviate from the summary of invention, you may deform | transform and embody a component. Various inventions may be formed by proper combinations of a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment.

１００制御部
１０１直流成分制御フィルタ
１０２サブフレーム分割部
１０３信号特性検出部
１０３１信号特性検出部
１０３１１サブフレーム振幅最大値検出部
１０３１２有音・無音判定部
１０３２有声音・無声音検出部
１０３２１ゼロクロス数検出部
１０３２２有声音・無声音判定部
１０４サブフレーム単位ゲイン算出部
１０５サンプル単位ゲイン算出部
１０５１サブフレーム変化量算出部
１０５２窓関数適応部
１０５３遅延部
１０６音量制御部
１０７周波数特性補正部
２００第２制御部
２０１有音・無音検出部
２０２フレーム振幅最大値検出部
２０３フレーム単位ゲイン算出部
２０４第２サンプル単位ゲイン算出部
２０５音量正規化部 DESCRIPTION OF SYMBOLS 100 Control part 101 DC component control filter 102 Sub-frame division | segmentation part 103 Signal characteristic detection part 1031 Signal characteristic detection part 10311 Sub-frame amplitude maximum value detection part 10312 Voice / silence determination part 1032 Voiced / unvoiced sound detection part 10321 Zero cross number detection part 10322 Voiced / unvoiced sound determination unit 104 Subframe unit gain calculation unit 105 Sample unit gain calculation unit 1051 Subframe change amount calculation unit 1052 Window function adaptation unit 1053 Delay unit 106 Volume control unit 107 Frequency characteristic correction unit 200 Second control unit 201 Sound / Silence Detection Unit 202 Frame Amplitude Maximum Value Detection Unit 203 Frame Unit Gain Calculation Unit 204 Second Sample Unit Gain Calculation Unit 205 Volume Normalization Unit

Claims

Voice input receiving means for receiving voice input;
Volume measuring means for measuring the input volume and frequency component of the sound received by the sound input receiving means;
When the frequency component included in the predetermined frequency band out of the sound received during the predetermined time interval exceeds the predetermined energy, the volume according to the value of the input volume of the audio received during the predetermined time interval Volume gain setting means for setting the gain;
Audio output means for outputting the sound received during the predetermined time interval at a volume corresponding to the volume gain set by the volume gain setting means;
An electronic device comprising:

Voice input receiving means for receiving voice input;
Volume measuring means for measuring the input volume and frequency component of the sound received by the sound input receiving means;
When the frequency component included in the predetermined frequency band out of the sound received during the predetermined time interval exceeds the predetermined energy, the volume according to the value of the input volume of the audio received during the predetermined time interval Volume gain setting means for setting the gain;
The volume gain transition function
At the end of the output of the sound received during the first time interval, the sound volume gain setting means sets the volume gain value for the sound received during the first time interval,
At the start of the output of the sound received during the first time interval, the sound volume gain setting means sets the value of the volume gain set by the sound volume received during the second time interval before the first time interval. Volume gain transition function setting means for setting to transition,
Voice output means for outputting the voice received in the first time interval at a volume corresponding to the volume gain transition function;
An electronic device comprising:

The volume gain transition function setting means includes:
Set the first volume gain transition function when the input volume of the sound received in the first time interval and the second time interval is less than or equal to a predetermined volume,
The first volume gain transition function when the audio input volume received during the first time interval is greater than or equal to the predetermined volume and the audio input volume received during the second time interval is less than or equal to the predetermined volume. Compared to, set a second volume gain transition function that suddenly transitions to the volume gain value of the first time interval,
When the input volume of the sound received in the second time interval is equal to or higher than the predetermined volume, the volume gain value gradually changes in the first time interval compared to the first volume gain transition function. The electronic device according to claim 1, wherein a third volume gain transition function is set.

The volume gain transition function setting means includes:
Set the first volume gain transition function when the input volume of the sound received in the first time interval and the second time interval is less than or equal to a predetermined volume,
The audio received during the first time interval has a frequency component included in the predetermined frequency band in the spectrum distribution exceeding the predetermined energy, and the input volume of the audio received during the second time interval is Set the second volume gain transition function that transitions to the value of the volume gain set in the first time interval abruptly compared to the first volume gain transition function when the volume is below a predetermined volume,
The audio received during the first time interval has a frequency component included in the predetermined frequency band in the spectrum distribution exceeding the predetermined energy, and the input volume of the audio received during the second time interval is Setting a third volume gain transition function that transitions to the value of the volume gain set in the first time interval more slowly than the first volume gain transition function when the volume is equal to or higher than a predetermined volume. The electronic apparatus according to claim 1, wherein the electronic apparatus is an electronic device.

A voice input receiving means for receiving a voice signal input;
Sound volume measuring means for acquiring the input sound volume and frequency component value of the sound signal received by the sound input receiving means;
When the frequency component included in the predetermined frequency band out of the sound received during the predetermined time interval exceeds the predetermined energy, the volume according to the value of the input volume of the audio received during the predetermined time interval Volume gain setting means for setting the gain;
Output volume control means for setting the output volume of the sound received during the predetermined time interval to an output volume corresponding to the volume gain set by the volume gain setting means;
A volume control program for an electronic device, comprising: