JP2008028532A

JP2008028532A - Voice processor and voice processing method

Info

Publication number: JP2008028532A
Application number: JP2006196847A
Authority: JP
Inventors: Hiroki Yamamoto; 寛樹山本; Kenichiro Nakagawa; 賢一郎中川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2006-07-19
Filing date: 2006-07-19
Publication date: 2008-02-07

Abstract

PROBLEM TO BE SOLVED: To set the optimum amplification value of an analog voice signal and the optimum amplification value of a digital voice signal that make a voice signal to be inputted become desired sound volume in a voice processor which performs the amplification of the analog voice signals and that of the digital voice signals. SOLUTION: When setting an analog amplification value and a digital amplification value respectively with respect to a target amplification value, the analog amplification value and digital amplification value are set so that an amplification value obtained by combining analog amplification with digital amplification is equal to the target amplification value and the analog amplification value becomes the maximum within a settable range. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、使用者が入力した音声を処理する、音声認識をはじめとする音声処理装置に関する。 The present invention relates to a speech processing apparatus such as speech recognition that processes speech input by a user.

さまざまな音声処理装置で、入力された音声の音量調整を行う処理が行われている。例えば、会議や音声でメモを取る際に用いられるボイスレコーダでは、録音レベルを一定に保つための音量調整を自動で行うものがある。また、市販されている音声認識ソフトウェアで、音声検出や音声認識の精度を向上する目的で入力音量を調整する機能を備えたものがある。 Various audio processing devices perform processing for adjusting the volume of input audio. For example, some voice recorders used when taking notes during a conference or voice automatically adjust the volume to keep the recording level constant. Some voice recognition software that is commercially available has a function of adjusting the input volume for the purpose of improving the accuracy of voice detection and voice recognition.

このような音量調整では、入力された音声の音量が所定の音量になるように入力音声の音量を加減する。例えば、特許文献１で開示された音声認識装置では、マイクアンプのゲインを騒音レベルに応じて変更している。また、特許文献２に開示されている音声認識装置では、音声認識率によって音声信号の増幅を調整する。また、特許文献３に開示された入力レベル調整回路及び音声通信端末装置では、音声区間の平均入力レベルに応じて音声信号の増幅を調整する。
特開平６−６７６８９号公報特開平６−３３７６９７号公報特開２００１−６９２００号公報 In such volume adjustment, the volume of the input voice is adjusted so that the volume of the input voice becomes a predetermined volume. For example, in the speech recognition device disclosed in Patent Document 1, the gain of a microphone amplifier is changed according to the noise level. In the speech recognition apparatus disclosed in Patent Document 2, the amplification of the speech signal is adjusted according to the speech recognition rate. In the input level adjustment circuit and the voice communication terminal device disclosed in Patent Document 3, the amplification of the voice signal is adjusted according to the average input level of the voice section.
JP-A-6-67689 JP-A-6-337697 JP 2001-69200 A

従来の技術では単一の増幅器により入力音量を調整している。しかしながら、音声処理装置がアナログ音声信号を増幅する増幅器とＡ／Ｄ変換後のデジタル音声信号の増幅する増幅器を備える場合に、それぞれの増幅器において増幅値をどのように配分するかを開示しているものはない。 In the conventional technique, the input volume is adjusted by a single amplifier. However, when the audio processing apparatus includes an amplifier that amplifies the analog audio signal and an amplifier that amplifies the digital audio signal after A / D conversion, it discloses how to distribute the amplification value in each amplifier. There is nothing.

デジタル音声信号を増幅すると、Ａ／Ｄ変換時の量子化誤差も同時に増幅されるため、アナログ・デジタル両方の増幅が利用できる場合は、アナログ音声信号による増幅を優先的に用いる方が良い。 When the digital audio signal is amplified, the quantization error at the time of A / D conversion is also amplified at the same time. Therefore, when both analog and digital amplification can be used, it is better to preferentially use the amplification by the analog audio signal.

本発明は、このような事情を鑑みてなされたものであり、アナログ音声信号を増幅する増幅器とデジタル音声信号を増幅する増幅器を備えた音声処理装置において、Ａ／Ｄ変換時の量子化誤差が少なくなるようにアナログ音声信号の増幅値、デジタル音声信号の増幅値を決定することを目的とする。 The present invention has been made in view of such circumstances. In an audio processing apparatus including an amplifier that amplifies an analog audio signal and an amplifier that amplifies a digital audio signal, quantization error during A / D conversion is reduced. It is an object to determine an amplification value of an analog audio signal and an amplification value of a digital audio signal so as to be reduced.

上記課題を解決するために、本発明に係る音声処理装置は、アナログ音声信号を増幅するアナログ増幅手段と、アナログ音声信号をデジタル音声信号に変換するＡ／Ｄ変換手段と、デジタル音声信号を増幅するデジタル増幅手段と、目標増幅値に対するアナログ増幅値とデジタル増幅値を設定する増幅値設定手段とを備えた音声処理装置であって、前記増幅値設定手段は、アナログ増幅とデジタル増幅を組み合わせて得られる増幅値が目標増幅値と等しく、かつアナログ増幅値が設定可能な範囲で最大になるようにアナログ増幅値とデジタル増幅値とを設定することを特徴とする。 In order to solve the above problems, an audio processing apparatus according to the present invention includes an analog amplifying unit that amplifies an analog audio signal, an A / D conversion unit that converts the analog audio signal into a digital audio signal, and amplifies the digital audio signal. A sound processing apparatus comprising: a digital amplifying unit that performs analog amplification with respect to a target amplification value; and an amplification value setting unit that sets the digital amplification value. The amplification value setting unit combines analog amplification and digital amplification. The analog amplification value and the digital amplification value are set so that the obtained amplification value is equal to the target amplification value and the analog amplification value is maximized within a settable range.

また上記課題を解決するために、本発明に係る音声処理装置は、アナログ音声信号を増幅するアナログ増幅手段と、アナログ音声信号をデジタル音声信号に変換するＡ／Ｄ変換手段と、デジタル音声信号を増幅するデジタル増幅手段と目標増幅値に対するアナログ増幅値とデジタル増幅値を設定する増幅値設定手段とを備えた音声処理装置であって、アナログ増幅値、デジタル増幅値のいずれか一方あるいは両方に設定できる値が離散値である場合に、前記増幅値設定手段は、得られる増幅値と目標増幅値との差が所定の範囲内に収まるアナログ増幅値とデジタル増幅値の組み合わせのうち、アナログ増幅値が最大の組み合わせを設定することを特徴とする。 In order to solve the above problems, an audio processing apparatus according to the present invention includes an analog amplifying unit that amplifies an analog audio signal, an A / D conversion unit that converts the analog audio signal into a digital audio signal, and a digital audio signal. An audio processing apparatus having a digital amplification means for amplification, an analog amplification value for a target amplification value, and an amplification value setting means for setting the digital amplification value, and is set to one or both of the analog amplification value and the digital amplification value When the value that can be obtained is a discrete value, the amplification value setting means is configured to select an analog amplification value from a combination of an analog amplification value and a digital amplification value in which a difference between the obtained amplification value and the target amplification value is within a predetermined range. Is set to the maximum combination.

本発明によれば、アナログ音声信号を増幅する増幅器とデジタル音声信号を増幅する増幅器を備えた音声処理装置において、Ａ／Ｄ変換時の量子化誤差が少なくなるようにアナログ音声信号の増幅値、デジタル音声信号の増幅値を設定することができる。 According to the present invention, in an audio processing apparatus including an amplifier that amplifies an analog audio signal and an amplifier that amplifies a digital audio signal, the amplification value of the analog audio signal is reduced so that quantization error during A / D conversion is reduced. The amplification value of the digital audio signal can be set.

以下、図面を参照しながら本発明の好適な実施例について説明していく。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

図１は本発明の実施例１における音声処理装置の概略構成を示す機能ブロック図である。図１において、１０１はアナログ増幅部、１０２はＡ／Ｄ変換部、１０３はデジタル増幅部、１０４は音声信号記憶部、１０５は音声入力制御部、１０６は入力音量調整部、１０７は増幅値記憶部である。 FIG. 1 is a functional block diagram showing a schematic configuration of a speech processing apparatus according to Embodiment 1 of the present invention. In FIG. 1, 101 is an analog amplification unit, 102 is an A / D conversion unit, 103 is a digital amplification unit, 104 is an audio signal storage unit, 105 is an audio input control unit, 106 is an input volume adjustment unit, and 107 is an amplification value storage. Part.

アナログ増幅部１０１は、入力されたアナログ音声信号をアナログ増幅し、アナログ音声信号を出力する。Ａ／Ｄ変換部１０２はアナログ増幅部１０１が出力するアナログ音声信号をデジタル音声信号に変換して出力する。デジタル増幅部１０３は、Ａ／Ｄ変換部１０２が出力するデジタル音声信号をデジタル増幅し、デジタル音声信号を出力する。音声信号記憶部１０４はデジタル増幅部１０３が出力するデジタル音声信号を一時的に記憶する。音声入力制御部１０５は、アナログ増幅部１０１、Ａ／Ｄ変換部１０２、デジタル増幅部１０３、音声信号記憶部１０４の動作を制御する。また、音声入力制御部１０５はアナログ増幅部１０１で信号を増幅する際の増幅値（アナログ増幅値）、デジタル増幅部１０３で信号を信号を増幅する際の増幅値（デジタル増幅値）を制御する。また、音声入力制御部１０５は音声信号記憶部１０４に記憶されたデジタル音声信号を取得する。入力音量調整部１０６は、取得する音声信号（＝音声信号記憶部１０４に記憶されるデジタル音声信号）が所望の音量になるようにアナログ増幅値、デジタル増幅値を決定する。アナログ増幅値およびデジタル増幅値（以降、二つを合わせて増幅値と記す）の決定方法は後に詳しく説明する。増幅値記憶部１０７は、入力音量調整部１０６で決定した増幅値を記憶する。 The analog amplifying unit 101 analog-amplifies the input analog audio signal and outputs the analog audio signal. The A / D converter 102 converts the analog audio signal output from the analog amplifier 101 into a digital audio signal and outputs the digital audio signal. The digital amplification unit 103 digitally amplifies the digital audio signal output from the A / D conversion unit 102 and outputs the digital audio signal. The audio signal storage unit 104 temporarily stores the digital audio signal output from the digital amplification unit 103. The audio input control unit 105 controls operations of the analog amplification unit 101, the A / D conversion unit 102, the digital amplification unit 103, and the audio signal storage unit 104. The audio input control unit 105 controls an amplification value (analog amplification value) when the analog amplification unit 101 amplifies the signal, and an amplification value (digital amplification value) when the digital amplification unit 103 amplifies the signal. . The audio input control unit 105 acquires a digital audio signal stored in the audio signal storage unit 104. The input volume adjustment unit 106 determines the analog amplification value and the digital amplification value so that the acquired audio signal (= digital audio signal stored in the audio signal storage unit 104) has a desired volume. A method for determining an analog amplification value and a digital amplification value (hereinafter, the two are collectively referred to as an amplification value) will be described in detail later. The amplification value storage unit 107 stores the amplification value determined by the input sound volume adjustment unit 106.

図２は、本装置において入力された音声信号がアナログ増幅部、デジタル増幅部を経て増幅される様子を示す図である。図２において、音量Ｇｏ（２０１）のアナログ音声信号が音声処理装置に入力されると、アナログ増幅部１０１で増幅され音量はＧｏ＋Ａａとなる（Ａａ（２０２）はアナログ増幅値）。Ａ／Ｄ変換部１０２によりＡ／Ｄ変換されたデジタル音声信号は、デジタル増幅部１０３でさらに増幅されて、最終的に得られる音量Ｇ（２０４）はＧ＝Ｇｏ＋Ａａ＋Ａｄとなる（Ａｄ（２０３）はデジタル増幅値）。アナログ増幅、デジタル増幅を組み合わせた合計の増幅値Ａ（２０５）はＡ＝Ａａ＋Ａｄである。なお、音量、増幅値の単位はいずれもデシベル（ｄＢ）で示している。以降、音量、増幅値の単位にｄＢを用いる。 FIG. 2 is a diagram illustrating a state in which an audio signal input in this apparatus is amplified through an analog amplification unit and a digital amplification unit. In FIG. 2, when an analog audio signal having a volume Go (201) is input to the audio processing apparatus, the analog amplification unit 101 amplifies the volume to Go + Aa (Aa (202) is an analog amplification value). The digital audio signal A / D converted by the A / D converter 102 is further amplified by the digital amplifier 103, and the finally obtained volume G (204) becomes G = Go + Aa + Ad (Ad (203) is Digital amplification value). The total amplification value A (205) obtained by combining analog amplification and digital amplification is A = Aa + Ad. Note that the unit of the volume and the amplified value is expressed in decibels (dB). Hereinafter, dB is used as a unit of volume and amplification value.

以上のような構成の音声処理装置において入力音量を調整する手順について説明する。
本装置ではサンプル音声を取得し、サンプル音声の音量が所望の音量に近づくように増幅値を設定する。本実施例では、サンプル音声の最大音量を所望の音量に近づける場合について説明するが、平均音量や最低音量など音量の算出方法を変更した場合でも以下で説明する手順を適用できる。サンプル音声の増幅前の音量をＧｏ、所望の音量をＧ’としたとき、目標とする増幅値である目標増幅値Ａ’は次式で求まる。
Ａ’＝Ｇ’−Ｇｏ（数式１） A procedure for adjusting the input sound volume in the sound processing apparatus having the above configuration will be described.
In this apparatus, sample sound is acquired, and an amplification value is set so that the volume of the sample sound approaches a desired volume. In this embodiment, the case where the maximum volume of the sample sound is brought close to the desired volume will be described. However, the procedure described below can be applied even when the volume calculation method such as the average volume or the minimum volume is changed. When the volume of the sample sound before amplification is Go and the desired volume is G ′, a target amplification value A ′ that is a target amplification value is obtained by the following equation.
A ′ = G′−Go (Formula 1)

ここでサンプル音声の増幅前の音量Ｇｏを本装置で直接求めることはできないが、取得したサンプル音声の音量Ｇを用いて次式で計算できる。
Ｇｏ＝Ｇ−Ａ（数式２）
数式２において、Ａはサンプル音声を取得する際に用いた増幅値である。
数式２のＧｏを数式１に代入すると求める増幅値Ａ’は次式で求まる。
Ａ’＝Ｇ’−（Ｇ−Ａ）
＝（Ｇ’−Ｇ）＋Ａ
＝（Ｇ’−Ｇ）＋（Ａａ＋Ａｄ）（数式３）
数式３において、Ａａ、Ａｄはそれぞれサンプル音声を取得する際に用いたアナログ増幅値、デジタル増幅値である。 Here, the volume Go before amplification of the sample sound cannot be directly obtained by this apparatus, but can be calculated by the following equation using the acquired volume G of the sample sound.
Go = GA (Formula 2)
In Equation 2, A is an amplification value used when obtaining sample audio.
Substituting Go of Equation 2 into Equation 1 yields the amplification value A ′ to be obtained by the following equation.
A ′ = G ′ − (GA)
= (G'-G) + A
= (G'-G) + (Aa + Ad) (Formula 3)
In Formula 3, Aa and Ad are an analog amplification value and a digital amplification value used when acquiring sample audio, respectively.

次に、数式３で求めた増幅値Ａ’をアナログ増幅値Ａａ’、デジタル増幅値Ａｄ’に配分する。
Ａ’＝Ａａ’＋Ａｄ’ （数式４）
デジタル増幅ではＡ／Ｄ変換時の量子化誤差も増幅されるのでデジタル増幅値はできるだけ小さい方が良い。また、Ａ／Ｄ変換時の量子化誤差をできるだけ小さくするため、アナログ増幅値はできるだけ大きい方が良い。したがって、本装置ではＡ／Ｄ変換時の量子化誤差を小さくするため、アナログ増幅値ができるだけ大きく、デジタル増幅値ができるだけ小さくなるように増幅値の配分を行う。デジタル増幅値の最大値をＡｄ＿ｍａｘ、最小値をＡｄ＿ｍｉｎ、アナログ増幅値の最大値をＡａ＿ｍａｘ、最小値をＡａ＿ｍｉｎとすると、アナログ増幅値、デジタル増幅値を次式で求める。
（ケース１）Ａ’≧Ａａ＿ｍａｘ＋Ａｄ＿ｍａｘの場合
Ａａ’＝Ａａ＿ｍａｘ（数式５）
Ａｄ’＝Ａｄ＿ｍａｘ（数式６）
（ケース２）Ａａ＿ｍａｘ＋Ａｄ＿ｍａｘ＞Ａ’≧Ａａ＿ｍａｘ＋Ａｄ＿ｍｉｎの場合
Ａａ’＝Ａａ＿ｍａｘ（数式７）
Ａｄ’＝Ａ’−Ａａ＿ｍａｘ（数式８）
（ケース３）Ａａ＿ｍａｘ＋Ａｄ＿ｍｉｎ＞Ａ’＞Ａａ＿ｍｉｎ＋Ａｄ＿ｍｉｎの場合
Ａａ’＝Ａ’−Ａｄ＿ｍｉｎ（数式９）
Ａｄ’＝Ａｄ＿ｍｉｎ（数式１０）
（ケース４）Ａａ＿ｍｉｎ＋Ａｄ＿ｍｉｎ≧Ａ’の場合
Ａａ’＝Ａａ＿ｍｉｎ（数式１１）
Ａｄ’＝Ａｄ＿ｍｉｎ（数式１２）
このように増幅値を設定することにより、量子化誤差を最小化し、所望の音量Ｇ’を得るための入力音量調整が可能になる。 Next, the amplification value A ′ obtained by Expression 3 is distributed to the analog amplification value Aa ′ and the digital amplification value Ad ′.
A ′ = Aa ′ + Ad ′ (Formula 4)
In digital amplification, the quantization error during A / D conversion is also amplified, so the digital amplification value should be as small as possible. Also, in order to minimize the quantization error during A / D conversion, the analog amplification value should be as large as possible. Therefore, in this apparatus, in order to reduce the quantization error during A / D conversion, the amplification values are distributed so that the analog amplification value is as large as possible and the digital amplification value is as small as possible. When the maximum value of the digital amplification value is Ad_max, the minimum value is Ad_min, the maximum value of the analog amplification value is Aa_max, and the minimum value is Aa_min, the analog amplification value and the digital amplification value are obtained by the following equations.
(Case 1) When A ′ ≧ Aa_max + Ad_max Aa ′ = Aa_max (Formula 5)
Ad ′ = Ad_max (Formula 6)
(Case 2) When Aa_max + Ad_max> A ′ ≧ Aa_max + Ad_min Aa ′ = Aa_max (Formula 7)
Ad ′ = A′−Aa_max (Formula 8)
(Case 3) When Aa_max + Ad_min> A ′> Aa_min + Ad_min Aa ′ = A′−Ad_min (Formula 9)
Ad ′ = Ad_min (Formula 10)
(Case 4) When Aa_min + Ad_min ≧ A ′ Aa ′ = Aa_min (Formula 11)
Ad ′ = Ad_min (Formula 12)
By setting the amplification value in this way, the input sound volume can be adjusted to minimize the quantization error and obtain the desired sound volume G ′.

なお、Ａａ＿ｍａｘの替わりに、次式で求まるＡａ＿ｍａｘ２を用いると、Ａ／Ｄ変換時のクリッピングを回避した最大のアナログ増幅値を設定することができる。
Ａａ＿ｍａｘ２＝ｍｉｎ（Ａａ＿ｍａｘ、Ｇ＿ｍａｘ−Ｇｏ）
＝ｍｉｎ（Ａａ＿ｍａｘ、Ｇ＿ｍａｘ−（Ｇ−Ａ））
（数式１３）
数式１３において、Ｇ＿ｍａｘはＡ／Ｄ変換後に得られるデジタル音声がとり得る音量の最大値である。 If Aa_max2 obtained by the following equation is used instead of Aa_max, the maximum analog amplification value that avoids clipping during A / D conversion can be set.
Aa_max2 = min (Aa_max, G_max−Go)
= Min (Aa_max, G_max− (GA))
(Formula 13)
In Expression 13, G_max is a maximum value of the volume that can be taken by the digital sound obtained after A / D conversion.

以上の入力音量調整の処理の流れを図３のフローチャートを用いて説明する。 The flow of the above input volume adjustment processing will be described with reference to the flowchart of FIG.

入力音量調整の処理を開始すると、まずサンプル音声の取得を行う。入力音量調整部１０６は音声入力制御部１０５にサンプル音声を取得するよう指示を送る。指示を受けた音声入力制御部１０５は、アナログ制御部１０１、Ａ／Ｄ変換部１０２、デジタル増幅部１０３を介して音声信号記憶部１０４に記憶されたデジタル音声信号を取得する（Ｓ１０１）。次に、音声入力制御部１０５が取得したデジタル音声信号の音量を入力音量調整部１０６で算出する（Ｓ１０２）。続いて入力音量調整部１０５は、所望の音量にするために必要な増幅値Ａ’を算出し（数式３の計算、Ｓ１０３）、先に説明した手順にしたがって、アナログ増幅値が大きくなるようにアナログ増幅値とデジタル増幅値を算出する（Ｓ１０４）。アナログ増幅値とデジタル増幅値を決定する手順を図４のフローチャートに示す。 When the input volume adjustment process is started, sample audio is first acquired. The input volume adjustment unit 106 sends an instruction to the audio input control unit 105 to acquire sample audio. Upon receiving the instruction, the audio input control unit 105 acquires the digital audio signal stored in the audio signal storage unit 104 via the analog control unit 101, the A / D conversion unit 102, and the digital amplification unit 103 (S101). Next, the volume of the digital audio signal acquired by the audio input control unit 105 is calculated by the input volume adjustment unit 106 (S102). Subsequently, the input volume adjustment unit 105 calculates an amplification value A ′ necessary for obtaining a desired volume (calculation of Formula 3, S103), and increases the analog amplification value according to the procedure described above. An analog amplification value and a digital amplification value are calculated (S104). The procedure for determining the analog amplification value and the digital amplification value is shown in the flowchart of FIG.

図４において、Ａ’はＳ１０３で算出された、所望の音量にするために必要な増幅値である。 In FIG. 4, A ′ is an amplification value calculated to obtain a desired sound volume calculated in S 103.

本装置における最大増幅値、すなわちアナログ増幅、デジタル増幅をともに最大にして得られる増幅値が必要な増幅値Ａ’に満たない場合、あるいはＡ’と等しい場合（前述のケース１、Ｓ２０１ＹＥＳ）、デジタル増幅値、アナログ増幅値ともに最大値に設定する（数式５、数式６、Ｓ２０２）。 When the maximum amplification value in this apparatus, that is, the amplification value obtained by maximizing both analog amplification and digital amplification is less than the necessary amplification value A ′ or equal to A ′ (the above-mentioned case 1, S201 YES), digital Both the amplified value and the analog amplified value are set to the maximum values (Equation 5, Equation 6, S202).

Ｓ２０１ＮＯの場合でかつ、アナログ増幅値を最大にしてもデジタル増幅の調整によりＡ’を得られる場合（ケース２、Ｓ２０３ＹＥＳ）、アナログ増幅値を最大値に設定し、必要な増幅値Ａ’とアナログ増幅値の差分をデジタル増幅値で調整する（数式７、数式８、Ｓ２０４）。 In the case of S201NO and when A ′ can be obtained by adjusting the digital amplification even when the analog amplification value is maximized (Case 2, S203 YES), the analog amplification value is set to the maximum value, and the necessary amplification value A ′ and analog The difference between the amplification values is adjusted with the digital amplification value (Equation 7, Equation 8, S204).

Ｓ２０３ＮＯの場合かつＡ’が本装置における最小増幅値よりも大きい場合（ケース３）、デジタル増幅の増幅値を最小にして差分をアナログ増幅値に設定する（数式９、数式１０、Ｓ２０６）。 In the case of S203NO and when A 'is larger than the minimum amplification value in the present apparatus (Case 3), the amplification value of digital amplification is minimized and the difference is set to the analog amplification value (Equation 9, Equation 10, S206).

Ｓ２０６ＮＯの場合、すなわち必要な増幅値Ａ’が本装置の最小増幅値以下の場合（ケース４）、アナログ増幅、デジタル増幅ともに最小値に設定する（数式１１、数式１２、Ｓ２０７）。 In the case of S206 NO, that is, when the necessary amplification value A ′ is equal to or smaller than the minimum amplification value of the present apparatus (case 4), both analog amplification and digital amplification are set to the minimum value (Formula 11, Formula 12, and S207).

なお、前述したようにＡ／Ｄ変換時のクリッピングを回避する場合は、アナログ増幅値の最大値Ａａ＿ｍａｘの替わりに、数式１３で求まるクリッピングを考慮したアナログ増幅値の最大値Ａａ＿ｍａｘ２を用いる。また、入力音量調整を平均音量や最低音量を基準にして調整する場合は、Ｓ１０３でそれらの音量を求める際に、あわせて最大音量を算出しておく必要がある。 As described above, when clipping at the time of A / D conversion is avoided, the maximum value Aa_max2 of the analog amplification value in consideration of clipping obtained by Expression 13 is used instead of the maximum value Aa_max of the analog amplification value. Further, when the input volume adjustment is adjusted based on the average volume or the minimum volume, it is necessary to calculate the maximum volume at the same time when obtaining those volumes in S103.

図３のフローチャートに戻り、入力音量調整部１０６は決定した増幅値を増幅値記憶部１０７に記憶して終了する（Ｓ１０５）。 Returning to the flowchart of FIG. 3, the input volume adjustment unit 106 stores the determined amplification value in the amplification value storage unit 107 and ends (S105).

以上のように、アナログ増幅値、デジタル増幅値を決定することにより、アナログ増幅をできるだけ大きくして所望の音量を得ることができるので、Ａ／Ｄ変換時の量子化誤差を最小にすることができる。 As described above, by determining the analog amplification value and the digital amplification value, the analog amplification can be increased as much as possible to obtain a desired sound volume, thereby minimizing the quantization error during A / D conversion. it can.

一般に用いられている増幅回路では、あらかじめ決められた数種類の離散的な増幅値しか設定できないものがある。特にアナログ増幅回路でこのような場合が多い。本実施例では、実施例１と同様にアナログ増幅とデジタル増幅の二つの増幅部を使用する音声処理装置において、アナログ増幅部１０１およびデジタル増幅部１０３における増幅値が離散的にしか設定できない場合について説明する。 Some commonly used amplifier circuits can set only a few predetermined discrete amplification values. This is particularly the case with analog amplifier circuits. In this embodiment, as in the first embodiment, in an audio processing device that uses two amplification units of analog amplification and digital amplification, the amplification values in the analog amplification unit 101 and the digital amplification unit 103 can only be set discretely. explain.

サンプル音声を取得してその音量を算出し必要な増幅値Ａ’を算出するまでの処理は実施例１と同様（Ｓ１０１〜Ｓ１０３）である。以下、アナログ増幅値とデジタル増幅値の配分を決めるＳ１０４の処理について説明する。 The processing from obtaining the sample sound, calculating the volume and calculating the necessary amplification value A ′ is the same as in the first embodiment (S101 to S103). Hereinafter, the process of S104 for determining the distribution of the analog amplification value and the digital amplification value will be described.

本実施例では、設定可能な増幅値が離散的になるため、必要な増幅値と全く同じ増幅値を設定できない場合がある。したがって、必要な増幅値に対する誤差が最も少ないアナログ増幅値Ａａ’とデジタル増幅値Ａｄ’を決定する。すなわち、次式の値が最も小さくなるＡａ’とＡｄ’をＡａ＿ｍａｘ≧Ａａ’≧Ａａ＿ｍｉｎ、Ａｄ＿ｍａｘ≧Ａｄ’≧Ａｄ＿ｍｉｎの範囲で決定する。
Ｅ＝｜Ａ’−（Ａａ’＋Ａｄ’）｜（数式１４）
なお、Ａ／Ｄ変換時のクリッピングを回避するように設定する場合は、数式１で計算されるＡａ＿ｍａｘ２を越えない設定可能なアナログ増幅値のうち最大の値Ａａ＿ｍａｘ３を用いて、Ａａ＿ｍａｘ３≧Ａａ≧Ａａ＿ｍｉｎの範囲でＡａ’を決定する。Ａａ＿ｍａｘ３は次式で表される。
Ａａ＿ｍａｘ３＝ｍａｘ｛Ａａ∋Ｓ（Ａａ＿ｍａｘ２）｝（数式１５）
数式１５において、Ｓ（Ａａ＿ｍａｘ２）は本装置で設定可能なアナログ増幅値のうちＡａ＿ｍａｘ２以下の増幅値の集合である。 In the present embodiment, since the settable amplification values are discrete, it may be impossible to set the same amplification value as the necessary amplification value. Therefore, the analog amplification value Aa ′ and the digital amplification value Ad ′ with the least error with respect to the necessary amplification value are determined. That is, Aa ′ and Ad ′ that minimize the value of the following expression are determined in the ranges of Aa_max ≧ Aa ′ ≧ Aa_min and Ad_max ≧ Ad ′ ≧ Ad_min.
E = | A ′ − (Aa ′ + Ad ′) | (Formula 14)
When setting so as to avoid clipping at the time of A / D conversion, Aa_max3 ≧ Aa ≧ Aa_min using the maximum value Aa_max3 among settable analog amplification values not exceeding Aa_max2 calculated by Expression 1. Aa ′ is determined within the range of Aa_max3 is expressed by the following equation.
Aa_max3 = max {Aa∋S (Aa_max2)} (Formula 15)
In Equation 15, S (Aa_max2) is a set of amplification values that are equal to or smaller than Aa_max2 among the analog amplification values that can be set by this apparatus.

実施例１で説明したケース１、すなわち必要な増幅値が設定可能な増幅値の最大値以上になる場合は、実施例１と同様にアナログ増幅値とデジタル増幅値を最大に設定する。ケース４の場合、すなわち必要な増幅値が設定可能な増幅値の最小値以下になる場合も、実施例１と同様にアナログ増幅値とデジタル増幅値を最小に設定する。 In case 1 described in the first embodiment, that is, when the necessary amplification value is equal to or larger than the maximum value of the settable amplification value, the analog amplification value and the digital amplification value are set to the maximum as in the first embodiment. In the case 4, that is, when the necessary amplification value is less than or equal to the minimum value of the settable amplification value, the analog amplification value and the digital amplification value are set to the minimum as in the first embodiment.

ケース２およびケース３の場合は、数式１４を満たすアナログ増幅値、デジタル増幅値の組み合わせから誤差が数式１４が最小になる組合せを選択する。誤差が最小になる組合せが複数ある場合は、アナログ増幅値が大きい組み合わせを選択する。アナログ増幅値も同じ場合はデジタル増幅値が小さい組み合わせを選択する、具体的な例で説明する。 In case 2 and case 3, a combination that minimizes the error in the equation 14 is selected from the combination of the analog amplification value and the digital amplification value that satisfy the equation 14. When there are a plurality of combinations that minimize the error, a combination with a large analog amplification value is selected. A specific example will be described in which a combination with a small digital amplification value is selected when the analog amplification value is the same.

離散的な増幅値しか設定できない一例としてアナログ増幅は−４０ｄＢ〜４０ｄＢの範囲で１０ｄＢごと、デジタル増幅は−２０ｄＢ〜２０ｄＢの範囲で２ｄＢごとに設定できる場合について説明する。 As an example in which only discrete amplification values can be set, a case where analog amplification can be set every 10 dB in the range of −40 dB to 40 dB and digital amplification can be set every 2 dB in the range of −20 dB to 20 dB will be described.

必要な増幅値Ａ’が４６．３ｄＢの場合、アナログ増幅値を４０ｄＢ、デジタル増幅値を６ｄＢに設定する。このとき、数式１４で表せる誤差Ｅは０．３ｄＢとなり、他の組合せでこれよりも誤差Ｅが小さくなるものはないため、この組合せを選択する。 When the necessary amplification value A ′ is 46.3 dB, the analog amplification value is set to 40 dB and the digital amplification value is set to 6 dB. At this time, the error E that can be expressed by Equation 14 is 0.3 dB, and no other combination has the error E smaller than this, so this combination is selected.

必要な増幅値が４５．０ｄＢの場合は、誤差が最小になる組合せが以下の４通りある。 When the necessary amplification value is 45.0 dB, there are the following four combinations that minimize the error.

これらの候補の組み合わせから、アナログ増幅値の大きい（１）と（２）の２つの組み合わせに絞り込まれる。（１）と（２）を比較し、デジタル増幅値の小さい（２）を最終的な増幅値の組み合わせに決定する。以上のようにすることで、Ａ／Ｄ変換時の量子化誤差をできるだけ小さくし、デジタル増幅時の量子化誤差の増幅を少なくした増幅値を決定できる。 These combinations of candidates are narrowed down to two combinations (1) and (2) with large analog amplification values. (1) and (2) are compared, and (2) having a small digital amplification value is determined as a final combination of amplification values. By doing so, it is possible to determine an amplification value that minimizes the quantization error during A / D conversion and reduces the amplification of the quantization error during digital amplification.

以上で説明した本実施例のＳ１０４において、アナログ増幅値、デジタル増幅値を決定する手順を、図５のフローチャートに示す。図５のフローチャートにおいて、Ｓ２０１、Ｓ２０２、Ｓ２０５、Ｓ２０７は実施例１と同じ動作になる。 The procedure for determining the analog amplification value and the digital amplification value in S104 of the present embodiment described above is shown in the flowchart of FIG. In the flowchart of FIG. 5, S201, S202, S205, and S207 are the same as those in the first embodiment.

図５において、Ａ’はＳ１０３で算出された所望の音量にするために必要な増幅値である。本装置における最大増幅値、すなわちアナログ増幅、デジタル増幅をともに最大にして得られる増幅値が必要な増幅値Ａ’に満たない場合、あるいはＡ’と等しい場合（前述のケース１、Ｓ２０１ＹＥＳ）、デジタル増幅値、アナログ増幅値ともに最大値に設定する（数式５、数式６、Ｓ２０２）。 In FIG. 5, A ′ is an amplification value necessary for obtaining the desired volume calculated in S 103. When the maximum amplification value in this apparatus, that is, the amplification value obtained by maximizing both analog amplification and digital amplification is less than the necessary amplification value A ′ or equal to A ′ (the above-mentioned case 1, S201 YES), digital Both the amplified value and the analog amplified value are set to the maximum values (Equation 5, Equation 6, S202).

Ｓ２０１ＮＯかつＳ２０５ＮＯの場合、すなわち必要な増幅値Ａ’が本装置の最小増幅値以下の場合（ケース４）、アナログ増幅、デジタル増幅ともに最小値に設定する（数式１１、数式１２、Ｓ２０７）。 In the case of S201NO and S205NO, that is, when the necessary amplification value A 'is less than or equal to the minimum amplification value of the present apparatus (case 4), both analog amplification and digital amplification are set to the minimum values (Formula 11, Formula 12, and S207).

一方、Ｓ２０１ＮＯの場合でかつＡ’が本装置における最小増幅値よりも大きい場合（ケース２、ケース３）、数式１４で表される誤差Ｅを最小にするアナログ増幅値、デジタル増幅値の組み合わせを算出する（Ｓ２０８）。 On the other hand, in the case of S201NO and when A ′ is larger than the minimum amplification value in the present apparatus (Case 2 and Case 3), the combination of the analog amplification value and the digital amplification value that minimizes the error E expressed by Equation 14 is obtained. Calculate (S208).

最小誤差になる組み合わせが１組の場合は算出された組み合わせで増幅値を決定する（Ｓ２０９ＮＯ）。一方、最小誤差になる組み合わせの候補が複数ある場合（Ｓ２０９ＹＥＳ）は、これらの候補のうちアナログ増幅値が最も大きい組み合わせを選択する（Ｓ２１０）。 If there is only one combination that results in the minimum error, an amplification value is determined based on the calculated combination (NO in S209). On the other hand, when there are a plurality of combination candidates having the smallest error (YES in S209), the combination having the largest analog amplification value is selected from these candidates (S210).

Ｓ２１０で選択された候補が１組の場合は選択された組み合わせで増幅値を決定する（Ｓ２１０ＮＯ）。Ｓ２１０で選択された候補が複数ある場合（Ｓ２１１ＹＥＳ）は、これらの候補のうちデジタル増幅値が最も小さい組み合わせを選択して増幅値を決定する（Ｓ２１２）。 When the candidate selected in S210 is one set, the amplification value is determined by the selected combination (NO in S210). If there are a plurality of candidates selected in S210 (YES in S211), the combination having the smallest digital amplification value is selected from these candidates and the amplification value is determined (S212).

なお、実施例１と同様にＡ／Ｄ変換時のクリッピングを回避する場合は、アナログ増幅値の最大値Ａａ＿ｍａｘの替わりに、数式１５で求まるＡａ＿ｍａｘ３を用いる。 In addition, when avoiding clipping at the time of A / D conversion as in the first embodiment, Aa_max3 obtained by Expression 15 is used instead of the maximum analog amplification value Aa_max.

以上のように、アナログ増幅値、デジタル増幅値を決定することにより、アナログ増幅値およびデジタル増幅値に設定可能な数値が離散的な場合に、量子化誤差を小さくすることができる。 As described above, by determining the analog amplification value and the digital amplification value, the quantization error can be reduced when the numerical values that can be set for the analog amplification value and the digital amplification value are discrete.

実施例１、実施例２では、サンプル音声を取得して、取得したサンプル音声の音量から増幅値を決定した。しかしながら、本装置では、取得したサンプル音声に音声が含まれているかどうかを確認する手段がない。そのため、使用者が発声しなかった場合、サンプル音声取得時に突発的な内部雑音が混入した場合など、想定外の入力が行われると増幅値を正しく算出できないという課題が残る。本実施例では、この課題に対処するため、設定した増幅値で適正な音量が取得できるかどうかを確認する手段とあらかじめ決められている標準的な増幅値を設定する手段を提供する例を説明する。 In Example 1 and Example 2, sample sound was acquired and the amplification value was determined from the volume of the acquired sample sound. However, in this apparatus, there is no means for confirming whether or not the acquired sample sound includes sound. For this reason, there is a problem that the amplified value cannot be calculated correctly if an unexpected input is made, such as when the user does not utter or when sudden internal noise is mixed at the time of sample sound acquisition. In this embodiment, in order to cope with this problem, an example will be described in which a means for confirming whether or not an appropriate volume can be acquired with a set amplification value and a means for setting a predetermined standard amplification value are provided. To do.

本実施例では、実施例１、実施例２で説明した入力音量装置を組み込んだ音声認識装置を例にその動作を説明する。本実施例では、説明のため、コマンドを音声で入力して実行するシンプルな音声認識装置を例に説明するが、本発明はこれに限定することなく、音声入力を伴うあらゆる装置、システムに適用可能である。 In the present embodiment, the operation of the voice recognition device incorporating the input sound volume device described in the first and second embodiments will be described as an example. In the present embodiment, for the sake of explanation, a simple voice recognition apparatus that executes commands by voice input will be described as an example. However, the present invention is not limited to this, and the present invention is applicable to all apparatuses and systems that involve voice input. Is possible.

図６は本実施例の音声認識装置の概略構成を示すブロック図である。１０１〜１０７は実施例１、実施例２で説明した音声処理装置と同じ構成なので説明を省略する。 FIG. 6 is a block diagram showing a schematic configuration of the speech recognition apparatus according to the present embodiment. Reference numerals 101 to 107 are the same as those of the sound processing apparatus described in the first and second embodiments, and thus description thereof is omitted.

図６において、１０８は音声認識制御部で、後述する音声検出部１０９や音声認識部１１０の動作を制御する。また、音声入力制御部１０５を介して音声信号記憶部１０４に記憶している音声信号を取得し、音声検出部１０９に音声信号を渡す役割も担う。音声検出部１０９は、音声認識制御部１０９を介して得られる音声信号から、使用者が発声した音声区間を検出する。音声認識１１０は音声検出部１０９が検出した区間の音声信号の音響分析を行い、図示しない音響モデルや言語モデルなど必要なデータを用いてデコード処理を行い音声認識結果を得る。音声認識部１１０で得られた結果は音声認識制御部１０８を介して参照する。音声出力部１１１は、認識結果のトークバックをはじめとする音声による使用者へのガイダンス出力を制御する。使用者への音声ガイダンスは図示しない音声合成を用いて実現しても良い。表示制御部１１２は、本装置で用いるＧＵＩや認識結果の表示など、表示の制御を行う。操作制御部１１３は、マウスやキボーボード、ボタンなど、使用者に提供される操作手段の制御を行う。制御部１１４は本装置全体の動作の制御を行う。 In FIG. 6, reference numeral 108 denotes a voice recognition control unit which controls operations of a voice detection unit 109 and a voice recognition unit 110 described later. In addition, the audio signal stored in the audio signal storage unit 104 is acquired via the audio input control unit 105, and the audio signal is passed to the audio detection unit 109. The voice detection unit 109 detects a voice section uttered by the user from a voice signal obtained via the voice recognition control unit 109. The voice recognition 110 performs an acoustic analysis of the voice signal in the section detected by the voice detection unit 109, performs a decoding process using necessary data such as an acoustic model and a language model (not shown), and obtains a voice recognition result. The result obtained by the voice recognition unit 110 is referred to via the voice recognition control unit 108. The voice output unit 111 controls guidance output to the user by voice including talkback of the recognition result. The voice guidance to the user may be realized by using voice synthesis (not shown). The display control unit 112 controls display such as GUI used in the apparatus and display of recognition results. The operation control unit 113 controls operation means provided to the user, such as a mouse, a kibo board, and a button. The control unit 114 controls the operation of the entire apparatus.

上記構成の音声認識装置の動作について図７のフローチャートを用いて説明する。本装置を起動すると、制御部１１４で起動処理を行う際に、増幅値の設定を行う。制御部１１４は増幅値記憶部１０７に記憶されているアナログ増幅値、デジタル増幅値を読み出し、音声入力制御部１０５を介して増幅値を設定する（Ｓ３０１）。本装置では、一度設定された増幅値は増幅値記憶部１０７に記憶され、装置を終了した場合でも、次回起動時に同じ増幅値が適用される。増幅値１０７に記憶された増幅値は、後述する入力音量調整で増幅値を更新しない限り、増幅値記憶部１０７に保持される。 The operation of the speech recognition apparatus having the above configuration will be described with reference to the flowchart of FIG. When this apparatus is activated, the amplification value is set when the activation process is performed by the control unit 114. The control unit 114 reads the analog amplification value and the digital amplification value stored in the amplification value storage unit 107, and sets the amplification value via the voice input control unit 105 (S301). In this apparatus, the amplification value once set is stored in the amplification value storage unit 107, and the same amplification value is applied at the next startup even when the apparatus is terminated. The amplification value stored in the amplification value 107 is held in the amplification value storage unit 107 unless the amplification value is updated by input volume adjustment described later.

起動処理が終了すると、表示制御部１１２により図８に示すような表示を行って、使用者が操作を行うまで待機する（Ｓ３０２ＮＯ）。図８は待機状態での表示の一例で、ウィンドウ３０１内に、案内３０２、発声ボタン３０３、入力音量調整ボタン３０４および終了ボタン３０５を表示する。案内３０２は操作ガイドなど使用者に通知する内容を表示する。ここでは、コマンド入力の操作案内を表示している。 When the activation process ends, the display control unit 112 displays as shown in FIG. 8 and waits until the user performs an operation (NO in S302). FIG. 8 shows an example of a display in a standby state, in which a guide 302, an utterance button 303, an input volume adjustment button 304, and an end button 305 are displayed in a window 301. The guidance 302 displays contents to be notified to the user such as an operation guide. Here, an operation guide for command input is displayed.

表示しているボタンをユーザが操作すると操作制御部１１３が使用者の操作を検出し（Ｓ３０２ＹＥＳ）、制御部１１４が操作内容に応じた処理の制御を行う（Ｓ３０３〜Ｓ３１０）。図８において、使用者が発声ボタン３０３を押した場合（Ｓ３０３ＮＯ、Ｓ３０５ＹＥＳ）は、音声認識を実行し（Ｓ３０９）、得られた認識結果に対応した処理を実施する（Ｓ３０９）。同様に、使用者が入力音量調整ボタン３０４を押した場合は入力音量調整を実行する（Ｓ３０７）。Ｓ３０７における入力音量調整の詳細は後述する。 When the user operates the displayed button, the operation control unit 113 detects a user operation (YES in S302), and the control unit 114 controls processing according to the operation content (S303 to S310). In FIG. 8, when the user presses the utterance button 303 (S303 NO, S305 YES), voice recognition is executed (S309), and processing corresponding to the obtained recognition result is performed (S309). Similarly, when the user presses the input volume adjustment button 304, the input volume adjustment is executed (S307). Details of the input volume adjustment in S307 will be described later.

終了ボタン３０５が押された場合（Ｓ３０２ＹＥＳ、Ｓ３０３ＮＯ，Ｓ３０５ＮＯ、Ｓ３０６ＹＥＳ）は、音声認識装置の実行を終了する。表示されているボタン以外の操作（例えばキーボードによる操作など）を提供する場合は、それらの操作を検出する（Ｓ３０２ＹＥＳ、Ｓ３０３ＮＯ、Ｓ３０５ＮＯ、Ｓ３０６ＮＯ）と、対応する処理を制御部１１４が行う（Ｓ３１０）。 When the end button 305 is pressed (S302 YES, S303 NO, S305 NO, S306 YES), the execution of the speech recognition apparatus is ended. When providing an operation other than the displayed button (for example, an operation using a keyboard), the control unit 114 performs a corresponding process when the operation is detected (S302 YES, S303 NO, S305 NO, S306 NO) (S310). .

次に、本装置におけるＳ３０７の入力音量調整の詳細な動作を図９〜図１３を用いて説明する。図９は本装置における入力音量調整の処理の流れを示すフローチャートである。図１０〜図１３は入力音量調整の処理の際に表示制御部１１２が制御する表示の一例である。 Next, the detailed operation of the input volume adjustment in S307 in this apparatus will be described with reference to FIGS. FIG. 9 is a flowchart showing the flow of input volume adjustment processing in this apparatus. 10 to 13 are examples of displays controlled by the display control unit 112 during the input volume adjustment process.

入力音量調整を開始する（図７のＳ３０７）と、表示制御部１１２は表示を入力音量調整用のＵＩ（ユーザインタフェース）に切り替え、使用者が操作するまで待機する（Ｓ４０１ＮＯ）。 When the input volume adjustment is started (S307 in FIG. 7), the display control unit 112 switches the display to an input volume adjustment UI (user interface) and waits until the user operates (NO in S401).

図１０は入力音量調整用ＵＩの一例で、ウィンドウ３０１内に、開始ボタン３０６、標準値設定ボタン３０７、取消しボタン３０８を表示する。また、案内３０２には入力音量調整の操作方法を表示する。 FIG. 10 shows an example of the input volume adjustment UI. A start button 306, a standard value setting button 307, and a cancel button 308 are displayed in the window 301. The guidance 302 displays an input volume adjustment operation method.

図９に戻り、操作制御部１１３が使用者の操作を検出する（Ｓ４０１ＹＥＳ）と、制御部１１４が使用者の操作に応じた処理を実行する（Ｓ４０２〜Ｓ４１９）。 Returning to FIG. 9, when the operation control unit 113 detects a user's operation (YES in S401), the control unit 114 executes processing according to the user's operation (S402 to S419).

１０において使用者が開始ボタン３０６を押す（Ｓ４０１ＹＥＳ、Ｓ４０２ＹＥＳ）と、Ｓ４０３以降でサンプル音声を用いた入力音量調整を開始する。一方、標準値設定ボタン３０７を押す（Ｓ４０１ＹＥＳ、Ｓ４０２ＮＯ，Ｓ４１７ＹＥＳ）と、入力音量調整部１０６が増幅値記憶部１０７に記憶されている増幅値の標準値を読み出し、音声入力制御部１０５を介して読み出した増幅値を設定する（Ｓ４１８）。続いて、設定した増幅値を増幅値記憶部に記憶して（Ｓ４１９）、入力音量調整を終了する。 10, when the user presses the start button 306 (YES in S401, S402 YES), the input volume adjustment using the sample voice is started in S403 and thereafter. On the other hand, when the standard value setting button 307 is pressed (S401 YES, S402 NO, S417 YES), the input volume adjustment unit 106 reads the standard value of the amplification value stored in the amplification value storage unit 107 and passes through the voice input control unit 105. The read amplification value is set (S418). Subsequently, the set amplification value is stored in the amplification value storage unit (S419), and the input volume adjustment is terminated.

図１０において、使用者が取り消しボタン３０８を押した場合（Ｓ４０１ＹＥＳ、Ｓ４０２ＮＯ、４１７ＮＯ）は、ただちに入力音量調整を終了する。 In FIG. 10, when the user presses the cancel button 308 (S401 YES, S402 NO, 417 NO), the input volume adjustment is immediately finished.

次に、Ｓ４０３以降の処理について説明する。まず、Ｓ４０３で入力音量調整部１０７は音声入力制御部１０５を介してサンプル音声取得用の増幅値を設定する。サンプル音声を取得する際は、Ａ／Ｄ変換時にクリッピングしないように小さな増幅値を使用する。サンプル音声取得用の増幅値は、使用者の声が大きい場合でもクリッピングせず、使用者の声が小さい場合でも音声信号が取得できる、適当な値をあらかじめ実験的に求めておき、増幅値記憶部１０７に記憶しておく。 Next, the processing after S403 will be described. First, in step S 403, the input volume adjustment unit 107 sets an amplification value for obtaining sample audio via the audio input control unit 105. When acquiring sample audio, a small amplification value is used so as not to clip during A / D conversion. Amplification value for sample audio acquisition is not clipped even when the user's voice is loud, and an appropriate value can be obtained experimentally in advance so that an audio signal can be acquired even when the user's voice is low, and the amplification value is stored. Stored in the unit 107.

次に入力音量調整に用いるサンプル音声を取得する。入力音量調整部１０６は音声入力制御部１０５にサンプル音声を取得するよう指示を送る。指示を受けた音声入力制御部１０５は、アナログ制御部１０１、Ａ／Ｄ変換部１０２、デジタル増幅部１０３を介して音声信号記憶部１０４に記憶されたデジタル音声信号を取得する（Ｓ４０４）。 Next, sample audio used for input volume adjustment is acquired. The input volume adjustment unit 106 sends an instruction to the audio input control unit 105 to acquire sample audio. Upon receiving the instruction, the audio input control unit 105 acquires the digital audio signal stored in the audio signal storage unit 104 via the analog control unit 101, the A / D conversion unit 102, and the digital amplification unit 103 (S404).

次に、音声入力制御部１０５が取得したデジタル音声信号の音量を入力音量調整部１０６で算出する（Ｓ４０５）。 Next, the input volume adjusting unit 106 calculates the volume of the digital audio signal acquired by the audio input control unit 105 (S405).

続いて入力音量調整部１０５は、所望の音量にするために必要な増幅値Ａ’を算出し（Ｓ４０６）、アナログ増幅値とデジタル増幅値を算出する（Ｓ４０７）。 Subsequently, the input volume adjusting unit 105 calculates an amplification value A ′ necessary to obtain a desired volume (S406), and calculates an analog amplification value and a digital amplification value (S407).

Ｓ４０６における増幅値Ａ’の算出およびＳ４０７におけるデジタル増幅値、アナログの算出には、実施例１あるいは実施例２で説明した手順を用いる。 The procedure described in the first or second embodiment is used for the calculation of the amplification value A ′ in S406 and the digital amplification value and the analog calculation in S407.

次にＳ４０８で、入力音量調整部１０５は、音声入力制御部１０５を介してＳ４０７で算出したアナログ増幅値とデジタル増幅値を設定する。 In step S 408, the input volume adjustment unit 105 sets the analog amplification value and the digital amplification value calculated in step S 407 via the voice input control unit 105.

以上の処理でデジタル増幅値およびアナログ増幅値が、実施例１、実施例２のいずれかの方法により変更される。 With the above processing, the digital amplification value and the analog amplification value are changed by any one of the methods of the first and second embodiments.

以降、設定した増幅値で適正な音量が取得できるかどうかの確認を行う処理（Ｓ４０９〜Ｓ４１６）について説明する。 Hereinafter, processing (S409 to S416) for confirming whether or not an appropriate sound volume can be acquired with the set amplification value will be described.

増幅値の設定が終わると（Ｓ４０８）と、表示制御部１１２は確認用の音声を取得するため、使用者に発声を促す表示に切り替え、使用者が操作するまで待機する（Ｓ４０９ＮＯ）。 When the setting of the amplification value is completed (S408), the display control unit 112 switches to a display for prompting the user to speak in order to obtain confirmation sound, and waits until the user operates (NO in S409).

図１１は確認用の発声を促す表示の一例で、ウィンドウ３０１内に、開始ボタン３０９、取消しボタン３１０を表示する。また、案内３０２には確認用の発声を促す内容を表示する。 FIG. 11 shows an example of a display for prompting a utterance for confirmation. A start button 309 and a cancel button 310 are displayed in the window 301. Further, the guidance 302 displays contents for prompting confirmation.

図９に戻り、操作制御部１１３が使用者の操作を検出する（Ｓ４０９ＹＥＳ）と、制御部１１４が使用者の操作に応じた処理を実行する（Ｓ４１０〜Ｓ４１６）。 Returning to FIG. 9, when the operation control unit 113 detects a user operation (YES in S409), the control unit 114 executes a process according to the user operation (S410 to S416).

図１１において使用者が開始ボタン３０９を押すと（Ｓ４０９ＹＥＳ、Ｓ４１０ＹＥＳ）と、Ｓ４０４のサンプル音声取得と同様に、入力音量調整部１０６から音声入力制御部１０５に指示が送られ、確認用音声を取得する（Ｓ４１１）。 In FIG. 11, when the user presses the start button 309 (S409 YES, S410 YES), an instruction is sent from the input volume adjustment unit 106 to the voice input control unit 105 and the confirmation voice is acquired in the same manner as the sample voice acquisition in S404. (S411).

続いて、Ｓ４１１で取得した確認用音声の音量Ｇ’’を算出して、Ｓ４１３で算出した音量Ｇ’’が適正か否かを判断する。 Subsequently, the volume G ″ of the confirmation sound acquired in S411 is calculated, and it is determined whether or not the volume G ″ calculated in S413 is appropriate.

ここでは、あらかじめ所望の音量Ｇ’との差に対する許容範囲の最小値Ｍｍｉｎ、最大値Ｍｍａｘを決めておき、Ｍｍｉｎ＜Ｇ’’−Ｇ’＜Ｍｍａｘなら適正と判断する。判断基準はこれに限るものではなく、例えば、Ｇ’’とＧ’の差が５ｄＢ以内など、簡便な方法を用いてもよい。また、これらの判断とは別に、確認用音声がクリッピングした場合に適正でないと判断しても良い。 Here, the minimum value Mmin and the maximum value Mmax of the allowable range with respect to the difference from the desired sound volume G ′ are determined in advance, and it is determined that Mmin <G ″ −G ′ <Mmax is appropriate. The determination criterion is not limited to this, and a simple method such as a difference between G ″ and G ′ within 5 dB may be used. In addition to these determinations, it may be determined that the confirmation sound is not appropriate when clipping.

確認用音声の音量が適正の場合（Ｓ４１３ＹＥＳ）は、設定した増幅値を増幅値記憶部１０７に記憶して入力音量調整を終了する（Ｓ４１４）。 If the volume of the confirmation sound is appropriate (YES in S413), the set amplification value is stored in the amplification value storage unit 107, and the input volume adjustment is terminated (S414).

以上で説明したように、Ｓ４０７で算出された増幅値は、その増幅値を設定して取得した音声の音量が適正になることを確認してから、増幅値記憶部１０７に記憶する。入力音量調整を始める前の元の増幅値は増幅値記憶部１０７に保存されている。従って使用者が図１１で取り消しボタン３１０を押して、入力音量調整を途中で取り消した場合（Ｓ４１０ＮＯ）や、後述する確認用音声の音量が適正でない場合は、保存されている増幅値を読み出して再度設定することにより、入力音量調整前の増幅値に戻せる。 As described above, the amplification value calculated in S407 is stored in the amplification value storage unit 107 after confirming that the sound volume obtained by setting the amplification value is appropriate. The original amplification value before starting the input volume adjustment is stored in the amplification value storage unit 107. Accordingly, when the user presses the cancel button 310 in FIG. 11 and cancels the input volume adjustment halfway (NO in S410), or when the volume of the confirmation sound described later is not appropriate, the stored amplification value is read out again. By setting, it is possible to return to the amplified value before the input volume adjustment.

Ｓ４１３において確認用音声の音量が適正ではない（ＮＯ）場合は、入力音量調整ができなかったこと、および予想される原因を使用者に通知する（Ｓ４１５）。図１２は表示制御部１１２による、入力音量調整の失敗およびその原因を通知する表示の一例である。 If the volume of the confirmation voice is not appropriate in S413 (NO), the user is notified of the failure to adjust the input volume and the expected cause (S415). FIG. 12 is an example of a display for notifying the input volume adjustment failure and its cause by the display control unit 112.

Ｓ４１５の原因通知に続いて、入力音量調整を行うために設定していた増幅値を元の値に戻す。入力音量調整部１０６は増幅値記憶部１０７に記憶されている増幅値を読み出し、音声入力制御部を介して、読み出した増幅値を設定し（Ｓ４１６）、入力音量調整の処理を終了する。 Following the cause notification in S415, the amplification value set for performing the input volume adjustment is returned to the original value. The input volume adjustment unit 106 reads the amplification value stored in the amplification value storage unit 107, sets the read amplification value via the voice input control unit (S416), and ends the input volume adjustment processing.

以上説明したように実施例３の音声認識装置によれば、サンプル音声を取得して増幅値を設定する際に、算出した増幅値を用いて確認用音声を取得して、設定した増幅値が適正かどうかを判断している。このため、サンプル音声取得時の想定外の入力により、適正でない増幅値が設定される可能性を減らすことができる。 As described above, according to the speech recognition apparatus of the third embodiment, when acquiring sample speech and setting an amplification value, a confirmation speech is acquired using the calculated amplification value, and the set amplification value is Judging whether it is appropriate. For this reason, it is possible to reduce the possibility that an inappropriate amplification value is set due to an unexpected input at the time of sample sound acquisition.

（その他の実施例）
以上の説明において図８、図１０、図１１、図１２で説明したウィンドウ３０１内に表示する案内３０２の内容を、音声出力制御部１１１により図示しない音声合成を用いて合成音で案内してもよい。 (Other examples)
In the above description, the content of the guidance 302 displayed in the window 301 described with reference to FIGS. 8, 10, 11, and 12 can be guided by the synthesized output using speech synthesis (not shown) by the speech output control unit 111. Good.

なお、本発明の目的は次のようにしても達成される。即ち、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給する。そして、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行する。このようにしても目的が達成されることは言うまでもない。 The object of the present invention can also be achieved as follows. That is, a storage medium in which a program code of software that realizes the functions of the above-described embodiments is recorded is supplied to the system or apparatus. Then, the computer (or CPU or MPU) of the system or apparatus reads and executes the program code stored in the storage medium. It goes without saying that the purpose is achieved even in this way.

この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

プログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどを用いることができる。 As a storage medium for supplying the program code, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

また、本発明に係る実施の形態は、コンピュータが読出したプログラムコードを実行することにより、前述した実施形態の機能が実現される場合に限られない。例えば、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, the embodiments according to the present invention are not limited to the case where the functions of the above-described embodiments are realized by executing the program code read by the computer. For example, an OS (operating system) running on a computer performs part or all of actual processing based on an instruction of the program code, and the functions of the above-described embodiments may be realized by the processing. Needless to say, it is included.

さらに、本発明に係る実施形態の機能は次のようにしても実現される。即ち、記憶媒体から読出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれる。そして、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行う。この処理により前述した実施形態の機能が実現されることは言うまでもない。 Furthermore, the functions of the embodiment according to the present invention are also realized as follows. That is, the program code read from the storage medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Then, based on the instruction of the program code, the CPU provided in the function expansion board or function expansion unit performs part or all of the actual processing. It goes without saying that the functions of the above-described embodiments are realized by this processing.

本発明の実施例１、実施例２に係る音声処理装置の概略構成を示す図である。It is a figure which shows schematic structure of the audio | voice processing apparatus which concerns on Example 1 and Example 2 of this invention. 本発明の実施例１および実施例２に係る音声処理装置、実施例３に係る音声認識装置で入力された音声信号がアナログ増幅部、デジタル増幅部を経て増幅される様子を説明する図である。It is a figure explaining a mode that the audio | voice signal input with the speech processing apparatus which concerns on Example 1 and Example 2 of this invention, and the speech recognition apparatus which concerns on Example 3 is amplified through an analog amplification part and a digital amplification part. . 本発明の実施例１および実施例２に係る音声処理装置の動作手順を表すフローチャートである。It is a flowchart showing the operation | movement procedure of the audio | voice processing apparatus which concerns on Example 1 and Example 2 of this invention. 本発明の実施例１に係る音声処理装置において、アナログ増幅値とデジタル増幅値を算出する手順を示すフローチャートである。4 is a flowchart illustrating a procedure for calculating an analog amplification value and a digital amplification value in the sound processing apparatus according to Embodiment 1 of the present invention. 本発明の実施例２に係る音声処理装置において、アナログ増幅値とデジタル増幅値を算出する手順を示すフローチャートである。It is a flowchart which shows the procedure which calculates an analog amplification value and a digital amplification value in the audio processing apparatus which concerns on Example 2 of this invention. 本発明の実施例２に係る音声認識装置の概略構成を示す図である。It is a figure which shows schematic structure of the speech recognition apparatus which concerns on Example 2 of this invention. 本発明の実施例３に係る音声認識装置の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition apparatus which concerns on Example 3 of this invention. 本発明の実施例３に係る音声認識装置の動作の一例を説明する図である。It is a figure explaining an example of operation | movement of the speech recognition apparatus which concerns on Example 3 of this invention. 本発明の実施例３に係る音声認識装置の動作手順を示すフローチャートである。It is a flowchart which shows the operation | movement procedure of the speech recognition apparatus which concerns on Example 3 of this invention. 本発明の実施例３に係る音声認識装置において、入力音量調整を行う時の動作の一例を説明する図である。It is a figure explaining an example of operation | movement at the time of performing input volume adjustment in the speech recognition apparatus which concerns on Example 3 of this invention. 本発明の実施例３に係る音声認識装置において、入力音量調整を行う時の動作の一例を説明する図である。It is a figure explaining an example of operation | movement at the time of performing input volume adjustment in the speech recognition apparatus which concerns on Example 3 of this invention. 本発明の実施例３に係る音声認識装置において、入力音量調整を行う時の動作の一例を説明する図である。It is a figure explaining an example of operation | movement at the time of performing input volume adjustment in the speech recognition apparatus which concerns on Example 3 of this invention.

Explanation of symbols

１０１アナログ増幅部
１０２Ａ／Ｄ変換部
１０３デジタル増幅部
１０４音声信号記憶部
１０５音声入力制御部
１０６入力音量調整部
１０７増幅値記憶部
DESCRIPTION OF SYMBOLS 101 Analog amplification part 102 A / D conversion part 103 Digital amplification part 104 Audio | voice signal storage part 105 Audio | voice input control part 106 Input volume adjustment part 107 Amplification value storage part

Claims

Analog amplification means for amplifying an analog audio signal;
A / D conversion means for converting an analog audio signal into a digital audio signal;
An audio processing apparatus comprising digital amplification means for amplifying a digital audio signal, and an amplification value setting means for setting an analog amplification value and a digital amplification value for a target amplification value,
The amplification value setting means sets the analog amplification value and the digital amplification value so that the amplification value obtained by combining analog amplification and digital amplification is equal to the target amplification value and becomes the maximum within the settable range of the analog amplification value. An audio processing apparatus characterized by setting.

Analog amplification means for amplifying an analog audio signal;
A / D conversion means for converting an analog audio signal into a digital audio signal;
An audio processing apparatus comprising digital amplification means for amplifying a digital audio signal, and an amplification value setting means for setting an analog amplification value and a digital amplification value for a target amplification value,
If the value that can be set to either one or both of the analog amplification value and the digital amplification value is a discrete value,
The amplification value setting means sets a combination having the maximum analog amplification value among a combination of an analog amplification value and a digital amplification value in which a difference between an obtained amplification value and a target amplification value falls within a predetermined range. A voice processing device.

3. The audio processing apparatus according to claim 2, wherein the amplification value setting means sets a combination that minimizes the amplification value of the digital audio signal when there are a plurality of combinations having the maximum amplification value of the analog signal. .

4. The audio processing apparatus according to claim 1, wherein the amplification value setting unit sets an amplification value of an analog signal so as not to be clipped during A / D conversion.

An audio acquisition means for acquiring an audio signal;
Volume calculation means for calculating the volume of the acquired audio signal;
A target amplification value calculating means for calculating a target amplification value for setting the volume of the acquired audio signal to a predetermined volume;
The voice acquisition means acquires a first voice signal;
The volume calculation means calculates the volume of the acquired first audio signal,
The target amplification value calculation means calculates a target amplification value using a predetermined volume, an analog amplification value when the first audio signal is acquired, and a digital amplification value when the first audio signal is acquired,
5. The audio processing apparatus according to claim 1, wherein the amplification value setting unit sets an analog amplification value and a digital amplification value for the target amplification value calculated by the target amplification value calculation unit. .

When the first audio signal is acquired by the audio acquisition unit, audio is acquired using a predetermined analog amplification value and digital amplification value that are predetermined so that the acquired audio signal is not clipped. The speech processing apparatus according to claim 5.

Acquiring a second audio signal by the audio acquisition means;
The volume of the second audio signal is calculated by the volume calculation means,
A check means for determining whether or not the calculated volume of the second audio signal is appropriate;
The sound processing apparatus according to claim 5, wherein the confirmation unit determines that the sound volume value of the second sound signal is appropriate when the sound value is within a predetermined range.

8. The user according to claim 7, wherein when the volume of the second audio signal is not determined to be appropriate by the confirmation means, the user is notified of the reason that the volume of the second audio signal is not appropriate. Voice processing device.

9. The speech processing apparatus according to claim 1, further comprising standard value setting means for setting the analog amplification value and the digital amplification value to a predetermined standard value.

Analog amplification means for amplifying an analog audio signal;
A / D conversion means for converting an analog audio signal into a digital audio signal;
An audio processing method in an audio processing apparatus comprising digital amplification means for amplifying a digital audio signal,
An amplification value setting step for setting an analog amplification value and a digital amplification value for a target amplification value is provided.
In the amplification value setting step, the analog amplification value and the digital amplification value are set so that the amplification value obtained by combining analog amplification and digital amplification is equal to the target amplification value and the analog amplification value is maximized within a settable range. An audio processing method comprising: setting.

Analog amplification means for amplifying an analog audio signal;
A / D conversion means for converting an analog audio signal into a digital audio signal;
An audio processing method in an audio processing apparatus comprising digital amplification means for amplifying a digital audio signal,
An amplification value setting step for setting an analog amplification value and a digital amplification value for a target amplification value
If the value that can be set to either one or both of the analog amplification value and the digital amplification value is a discrete value,
In the amplification value setting step, the combination of the analog amplification value and the digital amplification value that sets the difference between the obtained amplification value and the target amplification value within a predetermined range is set to the maximum analog amplification value. Voice processing method.

12. The audio processing method according to claim 11, wherein the amplification value setting step sets a combination that minimizes the amplification value of the digital audio signal when there are a plurality of combinations having the maximum amplification value of the analog signal. .

13. The audio processing method according to claim 10, wherein the amplification value setting step sets an amplification value of an analog signal so as not to be clipped at the time of A / D conversion.

A voice acquisition step of acquiring a voice signal;
A volume calculation step for calculating the volume of the acquired audio signal;
A target amplification value calculating step of calculating a target amplification value for setting the volume of the acquired audio signal to a predetermined volume;
The voice acquisition step acquires a first voice signal,
The volume calculation step calculates the volume of the acquired first audio signal,
The target amplification value calculation step calculates a target amplification value using a predetermined volume, an analog amplification value when the first audio signal is acquired, and a digital amplification value when the first audio signal is acquired,
The audio processing method according to claim 10, wherein the amplification value setting step sets an analog amplification value and a digital amplification value for the target amplification value calculated by the target amplification value calculation step. .

When acquiring the first audio signal in the audio acquisition step, the audio is acquired using a predetermined analog amplification value and digital amplification value that are predetermined so that the acquired audio signal is not clipped. The voice processing method according to claim 14.

A second audio signal is acquired by the audio acquisition step,
The volume of the second audio signal is calculated by the volume calculation step,
And further comprising a confirmation step of determining whether or not the calculated volume of the second audio signal is appropriate,
16. The sound processing method according to claim 14, wherein the confirmation step determines that the sound volume is appropriate when the volume value of the second sound signal is within a predetermined range.

17. The user is notified of the reason why the volume of the second audio signal is not appropriate when the volume of the second audio signal is not determined to be appropriate in the confirmation step. Voice processing method.

18. The voice processing method according to claim 10, further comprising a standard value setting step of setting the analog amplification value and the digital amplification value to a predetermined standard value.

A program for causing a computer to implement the voice processing method according to any one of claims 10 to 18.