JP2018207318A

JP2018207318A - Voice processing device, control method therefor, program and storage medium

Info

Publication number: JP2018207318A
Application number: JP2017111168A
Authority: JP
Inventors: 和広並木; Kazuhiro Namiki
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2018-12-27
Anticipated expiration: 2037-06-05
Also published as: JP6887315B2

Abstract

To provide a voice processing device for implementing automatic level control of a voice signal in such a manner that audibility is prevented from being deteriorated even in a case where an excessively low level voice is inputted after a short pronunciation.SOLUTION: A voice processing device comprises: signal processing means for processing an inputted voice signal in such a manner that an amplitude level of the voice signal is settled within a predetermined range between an upper limit threshold and a lower limit threshold; level detection means for detecting the amplitude level of the voice signal that is outputted from the signal processing means; means for controlling a gain value to be set to the signal processing means based on the amplitude level that is detected by the level detection means; and hold means for holding the gain value that is being set to the signal processing means. The control means performs processing for decreasing the gain value to be set in a case where the amplitude level that is detected by the level detection means is changed from a state where the amplitude level is lower than the upper limit threshold to a state where the amplitude level is equal to or higher than the upper limit threshold, and performs processing for increasing the gain value to be set in a case where the amplitude level is changed from a state where the amplitude level is not lower than the lower limit threshold to a state where the amplitude level is equal to or lower than the lower limit threshold.SELECTED DRAWING: Figure 1

Description

本発明は、音声の自動レベル制御に関する。 The present invention relates to automatic voice level control.

従来、入力された音声信号の振幅の大きさを適正なレベルに制御するために自動レベル制御（ＡＬＣ：ＡｕｔｏｍａｔｉｃＬｅｖｅｌＣｏｎｔｒｏｌ）機能を有する音声処理装置が知られている。ＡＬＣは、所定の閾値よりも過大なレベルの音声が入力されると、レベルを抑圧（リミット動作）し、過小なレベルの音声が入力されると、レベルを増幅（リカバリ動作）するような制御を行う。ＡＬＣでは、突発的な大きい音（過大レベル音）の直後に、小さい音（過小レベル音）が入力された場合のリカバリ動作を適切に制御する必要がある。特許文献１には、入力された音声信号のレベルと閾値との比較に基づいて、リカバリ動作における音声信号のレベルのゲインを制御する技術が記載されている。 2. Description of the Related Art Conventionally, a speech processing apparatus having an automatic level control (ALC) function for controlling the amplitude of an input speech signal to an appropriate level is known. The ALC controls such that the level is suppressed (limit operation) when an excessive level of sound is input than a predetermined threshold, and the level is amplified (recovery operation) when an excessive level of audio is input. I do. In ALC, it is necessary to appropriately control the recovery operation when a small sound (under level sound) is input immediately after a sudden loud sound (over level sound). Patent Document 1 describes a technique for controlling the gain of the level of the audio signal in the recovery operation based on a comparison between the level of the input audio signal and a threshold value.

特開２００４−１０４６９２号公報JP 2004-104692 A

上記特許文献１において、突発的な過大レベル音（以下、短発音）が連続して入力される場合、短発音の直後に過小レベル音が入力されると、音声信号のレベルのゲインを急激に増加するようにリカバリ動作が行われる。これにより、短発音以外の音声信号のレベルが短い周期で変動してしまうため、聞き取り難くなってしまう場合がある。また、例えば、風による雑音のように音声信号のレベルが激しく変動する場合に適応的に音声信号のレベル制御を行うことが困難であった。 In the above-mentioned Patent Document 1, when sudden excessive sound (hereinafter referred to as short sound) is continuously input, if an under sound is input immediately after the short sound, the gain of the level of the audio signal is rapidly increased. Recovery operation is performed so as to increase. As a result, the level of the audio signal other than the short sound may fluctuate with a short period, and it may be difficult to hear. In addition, for example, it is difficult to adaptively control the level of the audio signal when the level of the audio signal fluctuates violently such as wind noise.

本発明は、上記課題に鑑みてなされ、その目的は、短発音の後に過小レベル音が入力された場合でも聴感が悪化しないような音声信号の自動レベル制御を実現することである。 The present invention has been made in view of the above problems, and an object of the present invention is to realize automatic level control of an audio signal that does not deteriorate the audibility even when an under-level sound is input after a short sound.

上記課題を解決し、目的を達成するために、本発明の音声処理装置は、入力された音声信号の振幅レベルが上限の閾値と下限の閾値による所定の範囲内に収まるように、設定されたゲイン値を用いて前記音声信号を処理する信号処理手段と、前記信号処理手段から出力された音声信号の振幅レベルを検出するレベル検出手段と、前記レベル検出手段により検出された振幅レベルに基づいて前記信号処理手段に設定するゲイン値を制御する手段であって、前記レベル検出手段により検出された振幅レベルが前記上限の閾値よりも低い状態から前記上限の閾値以上となった場合に前記設定するゲイン値を小さくする処理を行い、前記振幅レベルが前記下限の閾値よりも低くない状態から前記下限の閾値以下となった場合に前記設定するゲイン値を大きくする処理を行う制御手段と、前記信号処理手段に設定されているゲイン値を保持する保持手段と、を備え、前記制御手段は、前記レベル検出手段により検出された振幅レベルが前記上限の閾値以上ではない状態から前記上限の閾値以上となった場合に、前記上限の閾値以上となる前に設定されたゲイン値を保持するように前記保持手段を制御し、前記レベル検出手段により検出された振幅レベルが前記上限の閾値以上となった後、前記上限の閾値以上となっている状態が所定の時間以上継続せずに、前記振幅レベルが前記下限の閾値以下となった場合に、前記保持手段により保持されたゲイン値に基づく、前記保持されたゲイン値以下の所定の補正ゲイン値を上限として、前記設定するゲイン値を大きくする処理を行う。 In order to solve the above problems and achieve the object, the audio processing device of the present invention is set so that the amplitude level of the input audio signal falls within a predetermined range based on the upper limit threshold and the lower limit threshold. Based on signal processing means for processing the audio signal using a gain value, level detection means for detecting the amplitude level of the audio signal output from the signal processing means, and the amplitude level detected by the level detection means A means for controlling a gain value set in the signal processing means, wherein the setting is performed when the amplitude level detected by the level detection means becomes lower than the upper limit threshold or more than the upper limit threshold. A process of reducing the gain value is performed, and the gain value to be set is set when the amplitude level is not lower than the lower limit threshold value or less than the lower limit threshold value. Control means for performing processing, and holding means for holding a gain value set in the signal processing means, wherein the control means is configured such that the amplitude level detected by the level detection means is the upper limit threshold value. When the threshold value is equal to or higher than the upper threshold value from a state other than the above, the holding means is controlled to hold the gain value set before the upper threshold value is exceeded, and the level detection means detects the gain value. After the amplitude level becomes equal to or greater than the upper limit threshold, the state in which the amplitude level is equal to or greater than the upper limit threshold does not continue for a predetermined time or more and the amplitude level is equal to or less than the lower limit threshold. Based on the gain value held by the means, a process of increasing the set gain value is performed with a predetermined correction gain value equal to or lower than the held gain value as an upper limit.

本発明によれば、短発音の後に過小レベル音が入力された場合でも聴感が悪化しないような音声信号の自動レベル制御を実現することができる。 According to the present invention, it is possible to realize automatic level control of an audio signal that does not deteriorate the audibility even when an under-level sound is input after a short sound.

実施形態１の音声処理装置の構成を示す図。1 is a diagram illustrating a configuration of a voice processing device according to a first embodiment. 実施形態１の動作を示すフローチャート。5 is a flowchart showing the operation of the first embodiment. 短発音が連続して入力された場合の従来と本実施形態のリカバリ動作を比較して示す図。The figure which compares and shows the recovery operation | movement of the conventional and this embodiment when a short pronunciation is input continuously. 実施形態２の音声処理装置の構成を示す図。FIG. 4 is a diagram illustrating a configuration of a sound processing apparatus according to a second embodiment. 実施形態２の動作を示すフローチャート。9 is a flowchart showing the operation of the second embodiment. 風量と補正係数の関係を示す図。The figure which shows the relationship between an air volume and a correction coefficient. 音声処理装置の基本構成を示す図。The figure which shows the basic composition of a speech processing unit. 振幅レベルが大きい音声信号が短い時間に入力された場合と長い時間に入力された場合のリカバリ動作を比較して示す図。The figure which compares and shows the recovery operation | movement when the audio | voice signal with a large amplitude level is input for a short time, and when it is input for a long time. ＡＬＣの処理により設定されるゲイン値と音声信号の振幅レベルの関係を示す図。The figure which shows the relationship between the gain value set by the process of ALC, and the amplitude level of an audio | voice signal.

以下に、本発明の実施形態について詳細に説明する。尚、以下に説明する実施の形態は、本発明を実現するための一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施の形態に限定されるものではない。また、後述する各実施形態の一部を適宜組み合わせて構成してもよい。 Hereinafter, embodiments of the present invention will be described in detail. The embodiment described below is an example for realizing the present invention, and should be appropriately modified or changed according to the configuration and various conditions of the apparatus to which the present invention is applied. It is not limited to the embodiment. Moreover, you may comprise combining suitably one part of each embodiment mentioned later.

以下、本実施形態の音声処理装置として、静止画や動画などの画像データを撮影し音声データと共に記録可能なデジタルビデオカメラなどの撮像装置に適用した例を説明する。 Hereinafter, as an audio processing apparatus according to this embodiment, an example will be described in which the present invention is applied to an imaging apparatus such as a digital video camera that can capture image data such as still images and moving images and record it together with audio data.

［自動レベル制御におけるリカバリ動作］
本実施形態の音声処理装置は、マイクなどの音声入力部から入力された音声信号の振幅の大きさを表す振幅レベルが適正レベルである所定の範囲内に収まるように音声信号の振幅レベルを自動で調整する自動レベル制御（ＡＬＣ）機能を有する。ＡＬＣにより、スピーカなどの音声出力部の過負荷を低減して劣化を防止し、かつダイナミックレンジを最適化して音質を改善することができる。 [Recovery operation in automatic level control]
The audio processing apparatus according to the present embodiment automatically adjusts the amplitude level of the audio signal so that the amplitude level indicating the amplitude of the audio signal input from the audio input unit such as a microphone is within a predetermined range that is an appropriate level. It has an automatic level control (ALC) function to adjust in With ALC, it is possible to reduce the overload of a sound output unit such as a speaker to prevent deterioration, and to optimize the dynamic range to improve the sound quality.

ＡＬＣでは、入力された音声信号の振幅レベルが適正レベルである所定の範囲内に収まるように音声信号にゲイン値を掛ける信号処理を行い、音声信号の振幅レベルを自動で制御する。 In ALC, signal processing is performed to multiply the audio signal by a gain value so that the amplitude level of the input audio signal falls within a predetermined range that is an appropriate level, and the amplitude level of the audio signal is automatically controlled.

まず、本実施形態の音声処理装置の構成および機能を説明する前に、図７を参照して、音声処理装置の基本構成と、短発音に対するＡＬＣによるリカバリ動作について説明する。 First, before describing the configuration and functions of the speech processing apparatus according to the present embodiment, the basic configuration of the speech processing apparatus and the recovery operation by ALC for short pronunciation will be described with reference to FIG.

図７において、音声処理装置１００は、入力された音声信号の振幅レベルが所定の適正範囲の上限閾値Ｘより大きい場合は音声信号に掛けるゲイン値を減少させるリミット動作を行う。また、音声処理装置１００は、入力された音声信号の振幅レベルが所定の適正範囲の下限閾値Ｙより小さい場合はゲイン値を増加させるリカバリ動作を行う。 In FIG. 7, the audio processing device 100 performs a limit operation to decrease the gain value applied to the audio signal when the amplitude level of the input audio signal is larger than the upper limit threshold value X of a predetermined appropriate range. Also, the audio processing device 100 performs a recovery operation to increase the gain value when the amplitude level of the input audio signal is smaller than the lower limit threshold value Y of a predetermined appropriate range.

制御部１０２は、後述するレベル検出部１０３、タイマー部１０４およびゲイン設定部１０５の動作を制御する。レベル検出部１０３は、信号処理部１０１から出力された音声信号の振幅レベルを検出し、検出結果を制御部１０２に出力する。制御部１０２は、レベル検出部１０３により検出された振幅レベルに応じたゲイン値を決定するようにゲイン設定部１０５を制御する。ゲイン設定部１０５は、制御部１０２から指示された振幅レベルに応じてゲイン値を設定し、設定したゲイン値を信号処理部１０１に出力する。信号処理部１０１は、入力された音声信号に、ゲイン設定部１０５で設定されたゲイン値を掛けて振幅レベルを増幅または減衰させる。 The control unit 102 controls operations of a level detection unit 103, a timer unit 104, and a gain setting unit 105, which will be described later. The level detection unit 103 detects the amplitude level of the audio signal output from the signal processing unit 101, and outputs the detection result to the control unit 102. The control unit 102 controls the gain setting unit 105 so as to determine a gain value corresponding to the amplitude level detected by the level detection unit 103. The gain setting unit 105 sets a gain value according to the amplitude level instructed from the control unit 102, and outputs the set gain value to the signal processing unit 101. The signal processing unit 101 multiplies the input audio signal by the gain value set by the gain setting unit 105 to amplify or attenuate the amplitude level.

タイマー部１０４は、レベル検出部１０３により検出される音声信号の振幅レベルが急激に大きくなり（上限閾値Ｘ以上となり）、再度レベルが小さくなるまでの時間（上限閾値Ｘ以上の状態が継続している時間）を計測し、計測結果を制御部１０２に出力する。制御部１０２は、タイマー部１０４により計測された時間に応じて、入力音が短発音か否かを判定する。 In the timer unit 104, the amplitude level of the audio signal detected by the level detection unit 103 suddenly increases (becomes the upper limit threshold X or more), and the time until the level decreases again (the state of the upper limit threshold X or more continues. And the measurement result is output to the control unit 102. The control unit 102 determines whether or not the input sound is a short sound according to the time measured by the timer unit 104.

ゲイン設定部１０５は、レベル検出部１０３により検出された信号処理部１０１で増幅された音声信号の振幅レベルが上限閾値Ｘよりも大きい場合にはゲイン値を減少させる。上限閾値Ｘより大きなレベルの場合には複数の閾値Ａ、Ｂ、Ｃ（但し、Ａ、Ｂ、Ｃ＞Ｘ）を設定できるようにしてもよく、より大きな振幅レベルでは各閾値に対してゲイン値をさらに減少させる。また、信号処理部１０１により処理された後の音声信号の振幅レベルが下限閾値Ｙよりも小さい場合は、ゲイン値を増加させる。下限閾値Ｙよりも小さなレベルの場合には複数の閾値Ｄ、Ｅ、Ｆ（但し、Ｄ、Ｅ、Ｆ＞Ｙ）を設定できるようにしてもよく、より大きな振幅レベルの音声信号が入力されると各閾値に応じてゲイン値を増加させる（但し、Ｘ＞Ｙとする）。 The gain setting unit 105 decreases the gain value when the amplitude level of the audio signal amplified by the signal processing unit 101 detected by the level detection unit 103 is larger than the upper limit threshold value X. When the level is larger than the upper limit threshold value X, a plurality of threshold values A, B, and C (A, B, C> X) may be set. Is further reduced. When the amplitude level of the audio signal after being processed by the signal processing unit 101 is smaller than the lower limit threshold Y, the gain value is increased. In the case of a level smaller than the lower threshold Y, a plurality of thresholds D, E, F (D, E, F> Y) may be set, and an audio signal having a larger amplitude level is input. The gain value is increased according to each threshold value (where X> Y).

図８は短発音が入力された場合（ａ）と過大レベル音が長い時間入力された場合（ｂ）のリカバリ動作を比較して示している。 FIG. 8 shows a comparison of the recovery operation when a short sound is input (a) and when an excessive level sound is input for a long time (b).

図８（ａ）の場合は、制御部１０２は、レベル検出部１０３で検出された振幅レベルが上限閾値Ｘより大きいと判定され、タイマー部１０４で計測した大きな振幅レベルの時間が閾値より短いため、短発音が入力されたと判定する。その際、タイマー部１０４はカウンタ値を「ＡＴＴ＿ＣＮＴ」（ＡＴＴ＿ＣＮＴは数値、ＡＴＴ＿ＣＮＴ＞０）に設定し（Ｔ１）、例えば定期的に減少させていく（Ｔ２）。ＡＴＴ＿ＣＮＴが０になるまでに入力される音声信号の振幅レベルが小さくなり、制御部１０２がリカバリ動作を行うと判定した場合には、リカバリ動作のゲイン値の上昇を大きくして素早く適正レベルになるように振幅レベルを増幅する。これを「ファーストリカバリ」と呼ぶ。また、ＡＴＴ＿ＣＮＴが０になった後は、リカバリ動作のゲイン値の上昇を小さくする（Ｔ３〜Ｔ４）。これを「スローリカバリ」と呼ぶ。 In the case of FIG. 8A, the control unit 102 determines that the amplitude level detected by the level detection unit 103 is larger than the upper limit threshold value X, and the time of the large amplitude level measured by the timer unit 104 is shorter than the threshold value. It is determined that a short pronunciation has been input. At that time, the timer unit 104 sets the counter value to “ATT_CNT” (ATT_CNT is a numerical value, ATT_CNT> 0) (T1), and periodically decreases (T2), for example. When the amplitude level of the audio signal input before ATT_CNT becomes 0 decreases and the control unit 102 determines that the recovery operation is to be performed, the gain value of the recovery operation is increased to quickly reach an appropriate level. So as to amplify the amplitude level. This is called “first recovery”. Further, after ATT_CNT becomes 0, the increase in the gain value of the recovery operation is reduced (T3 to T4). This is called “slow recovery”.

図８（ｂ）の場合は、制御部１０２は、タイマー部１０４で計測した大きな振幅レベルの時間が長いため、短発音以外の音が入力されたと判定する。制御部１０２は、タイマー部１０４において振幅レベルが小さなレベルから大きなレベルに変化した場合にＡＴＴ＿ＣＮＴを設定するが（Ｔ１）、大きなレベルの音が入力されている期間はリカバリ動作を行わない。その間に、ＡＴＴ＿ＣＮＴは、大きなレベルの音が入力されている時間（Ｔ１〜Ｔ３）で０になる。その後、入力される音声信号の振幅レベルが小さくなると、ＡＴＴ＿ＣＮＴが０であるためリカバリ動作はスローリカバリになる（Ｔ４）。 In the case of FIG. 8B, the control unit 102 determines that a sound other than the short sound is input because the time of the large amplitude level measured by the timer unit 104 is long. The control unit 102 sets ATT_CNT when the amplitude level changes from a small level to a large level in the timer unit 104 (T1), but does not perform a recovery operation during a period in which a high level sound is input. In the meantime, ATT_CNT becomes 0 at the time (T1 to T3) when a high level sound is input. Thereafter, when the amplitude level of the input audio signal becomes small, ATT_CNT is 0, so that the recovery operation is slow recovery (T4).

このようにリカバリ動作を行うことにより、入力された音声信号は信号処理部１０１で適正なレベルに制御される。 By performing the recovery operation in this way, the input audio signal is controlled to an appropriate level by the signal processing unit 101.

［実施形態１］次に、図１を参照して、実施形態１の自動レベル制御について説明する。 [Embodiment 1] Next, automatic level control of Embodiment 1 will be described with reference to FIG.

図１は、実施形態１の音声処理装置１００の構成を示している。以下では、上述した図７と同様の構成の説明は省略し、相違点を中心に説明を行う。 FIG. 1 shows a configuration of the speech processing apparatus 100 according to the first embodiment. In the following, description of the same configuration as that in FIG. 7 will be omitted, and description will be made focusing on the differences.

図１において、ゲイン保持部１０６には、ゲイン設定部１０５で設定されるゲイン値が常に入力されている。ゲイン保持部１０６は、後述するように、制御部１０２からの指示に応じて、その時点で入力されているゲイン値を保持する。制御部１０２は、入力音が短発音であると判定した場合は、短発音と判定する直前に保持したゲイン値を上限としてゲイン設定部１０５で設定するように制御する。よって、ゲイン補正部１０７は、ゲイン保持部１０６で保持したゲイン値以下となるようにゲイン設定部１０５で設定されるゲイン値を補正する。ゲイン補正部１０７でゲイン値を補正する際に、ゲイン保持部１０６に保持したゲイン値に乗算する補正係数ｋは、１以下（ｋ≦１）とする。ゲイン補正部１０７で算出された所定の補正ゲイン値を、ゲイン設定部１０５に設定された値として置き換え、信号処理部１０１で音声信号にゲイン値を掛けて出力する。 In FIG. 1, a gain value set by the gain setting unit 105 is always input to the gain holding unit 106. As will be described later, the gain holding unit 106 holds the gain value input at that time in response to an instruction from the control unit 102. When it is determined that the input sound is a short sound, the control unit 102 performs control so that the gain setting unit 105 sets the gain value held immediately before the short sound is determined as an upper limit. Therefore, the gain correction unit 107 corrects the gain value set by the gain setting unit 105 so as to be equal to or less than the gain value held by the gain holding unit 106. When the gain correction unit 107 corrects the gain value, the correction coefficient k to be multiplied by the gain value held in the gain holding unit 106 is 1 or less (k ≦ 1). The predetermined correction gain value calculated by the gain correction unit 107 is replaced with the value set in the gain setting unit 105, and the signal processing unit 101 multiplies the audio signal by the gain value and outputs the result.

＜自動レベル制御フロー＞次に、図２を参照して、本実施形態の音声処理装置１００の動作を説明する。なお、図２の処理は、制御部１０２が不図示のメモリに格納された制御プログラムを読み出して実行し、音声処理装置１００の各部を制御することにより実現される。後述する図５でも同様である。 <Automatic Level Control Flow> Next, the operation of the speech processing apparatus 100 of this embodiment will be described with reference to FIG. 2 is realized when the control unit 102 reads out and executes a control program stored in a memory (not shown) and controls each unit of the audio processing apparatus 100. The same applies to FIG. 5 described later.

以下では、音声処理装置１００は、ゲイン値を用いて入力される音声信号のレベルを制御する処理を行い、音声信号を出力しているものとする。 In the following, it is assumed that the audio processing device 100 performs a process of controlling the level of the input audio signal using the gain value and outputs the audio signal.

Ｓ２０１では、制御部１０２は、ゲイン設定部１０５により設定されたゲイン値に基づく信号処理部１０１による処理後の音声信号の振幅レベルをレベル検出部１０３で検出する。 In S 201, the control unit 102 detects the amplitude level of the audio signal processed by the signal processing unit 101 based on the gain value set by the gain setting unit 105 by the level detection unit 103.

Ｓ２０２では、制御部１０２は、Ｓ２０１で検出した信号処理部１０１による処理後の音声信号の振幅レベルが所定の適正範囲内の上限閾値以上であるか否かを判定する。制御部１０２は、音声信号の振幅レベルが上限閾値以上であると判定した場合はＳ２０３へ進み、上限閾値未満であると判定した場合は処理を終了する。 In S202, the control unit 102 determines whether or not the amplitude level of the audio signal processed by the signal processing unit 101 detected in S201 is equal to or higher than an upper limit threshold within a predetermined appropriate range. If the control unit 102 determines that the amplitude level of the audio signal is greater than or equal to the upper limit threshold value, the control unit 102 proceeds to S203, and if it determines that the audio signal amplitude level is less than the upper limit threshold value, ends the process.

Ｓ２０３では、制御部１０２は、ゲイン設定部１０５で設定したゲイン値を保持するようにゲイン保持部１０６に指示する。また、制御部１０２は、短発音が入力された場合は短発音の直前のゲイン値を保持するようにゲイン保持部１０６を制御する。保持時間は次に短発音が入力されるまでとする。 In step S 203, the control unit 102 instructs the gain holding unit 106 to hold the gain value set by the gain setting unit 105. In addition, when a short sound is input, the control unit 102 controls the gain holding unit 106 so as to hold the gain value immediately before the short sound. The holding time is until the next short pronunciation is input.

Ｓ２０４では、制御部１０２は、音声信号の振幅レベルが閾値以上となってから当該閾値以上となっている状態が継続している時間をタイマー部１０４で計測する。 In S 204, the control unit 102 causes the timer unit 104 to measure the time during which the state in which the amplitude level of the audio signal is equal to or higher than the threshold value continues for the threshold value or higher.

Ｓ２０５では、制御部１０２は、タイマー部１０４で計測された時間に基づいて、入力音が短発音か否かを判定する。制御部１０２は、音声信号の振幅レベルが閾値以上となっている状態が継続している時間が所定の時間未満であった場合には、短発音であると判定してＳ２０６に進み、所定の時間以上継続している場合には短発音ではないと判定してＳ２０７に進む。 In step S 205, the control unit 102 determines whether the input sound is a short pronunciation based on the time measured by the timer unit 104. If the time during which the state where the amplitude level of the audio signal is equal to or greater than the threshold is less than the predetermined time, the control unit 102 determines that the sound is short and proceeds to S206. If it continues for more than the time, it is determined that the sound is not short and the process proceeds to S207.

Ｓ２０６では、制御部１０２は、ゲイン補正部１０７において、Ｓ２０３で保持したゲイン値に補正係数ｋ（ｋ≦１）を乗算してゲイン値を補正する。 In S206, the control unit 102 corrects the gain value by multiplying the gain value held in S203 by the correction coefficient k (k ≦ 1) in the gain correction unit 107.

Ｓ２０７では、制御部１０２は、Ｓ２０５で短発音が入力されたと判定した場合は、Ｓ２０６でゲイン補正部１０７により補正したゲイン値を、リカバリ動作時におけるゲインの上限値とするようにゲイン設定部１０５に設定する。また、制御部１０２は、Ｓ２０５で短発音ではない判定した場合は、ゲイン設定部１０５において、図９で後述するように、補正したゲイン値をリカバリ動作時の上限値として設定せずにゲイン値の設定を行う。 In S207, if the control unit 102 determines that a short sound is input in S205, the gain setting unit 105 sets the gain value corrected by the gain correction unit 107 in S206 as the upper limit value of the gain during the recovery operation. Set to. If the control unit 102 determines in S205 that the sound is not short, the gain setting unit 105 does not set the corrected gain value as the upper limit value during the recovery operation, as will be described later with reference to FIG. Set up.

ここで、図９を参照して、補正ゲイン値をリカバリ動作時のゲインの上限値として設定しない場合のＡＬＣのゲイン値の設定の処理を説明する。図９は、ゲイン設定部１０５で設定されるゲイン値とレベル検出部１０３で検出された音声信号の振幅レベルの関係を示している。 Here, with reference to FIG. 9, the process of setting the gain value of ALC when the correction gain value is not set as the upper limit value of the gain during the recovery operation will be described. FIG. 9 shows the relationship between the gain value set by the gain setting unit 105 and the amplitude level of the audio signal detected by the level detection unit 103.

ゲイン設定部１０５では、レベル検出部１０３により検出された音声信号の振幅レベルが、上限閾値Ｘよりも大きい場合には、ゲイン値を減少させ、音声信号の振幅レベルが大きいほどゲイン値を減少させる。具体的には、通常値Ｇｎと、その値より小さいＧｌｉｍの間の値とする（Ｇｎ＞Ｇｌｉｍ）。また、音声信号の振幅レベルが、下限閾値Ｙよりも小さい場合には、ゲイン値を増加させ、音声信号の振幅レベルが小さいほどゲイン値を増加させる（但し、Ｘ＞Ｙ）。具体的には、ゲイン値を通常値Ｇｎと、その値よりも大きいＧｒｅｃの間の値とする（Ｇｒｅｃ＞Ｇｎ）。但し、ゲイン値はＧｒｅｃを上限値とし、Ｇｌｉｍを下限値とする。音声信号の振幅レベルが、それ以外の場合には、ゲイン値を通常値Ｇｎに固定する。 The gain setting unit 105 decreases the gain value when the amplitude level of the audio signal detected by the level detection unit 103 is greater than the upper threshold X, and decreases the gain value as the amplitude level of the audio signal increases. . Specifically, it is set to a value between the normal value Gn and a smaller Glim (Gn> Glim). Further, when the amplitude level of the audio signal is smaller than the lower limit threshold Y, the gain value is increased, and the gain value is increased as the amplitude level of the audio signal is decreased (X> Y). Specifically, the gain value is set to a value between the normal value Gn and a larger Grec (Grec> Gn). However, the gain value has Grec as an upper limit value and Glim as a lower limit value. When the amplitude level of the audio signal is other than that, the gain value is fixed to the normal value Gn.

図２に戻り、Ｓ２０８では、制御部１０２は、ゲイン設定部１０５で設定したゲイン値を用いてリカバリ動作を行う。 Returning to FIG. 2, in S 208, the control unit 102 performs a recovery operation using the gain value set by the gain setting unit 105.

リカバリ動作は、信号処理部１０１が、ゲイン設定部１０５で設定されたゲイン値に基づいて音声信号の振幅レベルを増幅する処理である。制御部１０２は、例えば、音声信号の振幅レベルが図９の下限閾値Ｙよりも小さい場合には、ゲイン設定部１０５でゲイン値を増加して、信号処理部１０１で振幅レベルを増幅させる。 The recovery operation is a process in which the signal processing unit 101 amplifies the amplitude level of the audio signal based on the gain value set by the gain setting unit 105. For example, when the amplitude level of the audio signal is smaller than the lower limit threshold Y in FIG. 9, the control unit 102 increases the gain value by the gain setting unit 105 and amplifies the amplitude level by the signal processing unit 101.

図３は、短発音が連続して入力された場合の従来と本実施形態のリカバリ動作を比較して示している。図３（ａ）は入力音の波形変化、図３（ｂ）は従来のリカバリ動作による出力音の波形変化、図３（ｃ）は従来のリカバリ動作におけるゲイン値の変化をそれぞれ示している。図３（ａ）の入力音は連続する短発音の間の音の振幅レベルの大きさが小さい場合と大きい場合が交互、あるいはランダムに入力された状態を想定している。 FIG. 3 shows a comparison between the recovery operation of the present embodiment and the conventional case where short pronunciations are continuously input. 3A shows the waveform change of the input sound, FIG. 3B shows the waveform change of the output sound due to the conventional recovery operation, and FIG. 3C shows the change of the gain value in the conventional recovery operation. It is assumed that the input sound in FIG. 3A is input alternately or randomly when the amplitude level of the sound between consecutive short sounds is small and large.

図３（ａ）のような短発音が入力されると、レベル検出部１０３により、信号処理部１０１からの音声信号の振幅レベルが上限閾値よりも大きいと判定するため、音声信号の振幅レベルを上限閾値以下となるようにゲインを制御するリミット動作を行う。制御部１０２は、レベル検出部１０３により、信号処理部１０１からの音声信号の振幅レベルが上限閾値よりも大きいと判定された場合、ゲイン設定部１０５に対して、ゲイン値を下げるように指示を出す。そして、信号処理部１０１からの音声信号の振幅レベルが上限閾値よりも低くなると、ゲイン設定部１０５に対し、ゲインを保持するよう制御する。 When a short pronunciation as shown in FIG. 3A is input, the level detection unit 103 determines that the amplitude level of the audio signal from the signal processing unit 101 is larger than the upper limit threshold. A limit operation is performed to control the gain so that it is below the upper threshold. When the level detection unit 103 determines that the amplitude level of the audio signal from the signal processing unit 101 is greater than the upper limit threshold, the control unit 102 instructs the gain setting unit 105 to decrease the gain value. put out. When the amplitude level of the audio signal from the signal processing unit 101 becomes lower than the upper limit threshold, the gain setting unit 105 is controlled to hold the gain.

また、信号処理部１０１からの音声信号の振幅レベルが下限閾値よりも小さくなった場合、制御部１０２は、ゲインを徐々に上げていくようにゲイン設定部１０５を制御する。例えば、図９に示すようにゲイン値Ｇｒｅｃよりも大きくならないゲイン値に設定する。リミット動作後の音声信号の振幅レベルが小さいほどゲイン値の上昇を急激にし、リミット動作後の音声信号の振幅レベルが大きいほどゲインの上昇を緩やかにする。但し、リカバリ動作時の音声信号の振幅レベルは上限閾値を超えないものとする。 Further, when the amplitude level of the audio signal from the signal processing unit 101 becomes smaller than the lower limit threshold, the control unit 102 controls the gain setting unit 105 so as to gradually increase the gain. For example, as shown in FIG. 9, the gain value is set not to be larger than the gain value Grec. As the amplitude level of the audio signal after the limit operation is smaller, the gain value is increased more rapidly. As the amplitude level of the audio signal after the limit operation is increased, the gain increase is made slower. However, it is assumed that the amplitude level of the audio signal during the recovery operation does not exceed the upper threshold.

図３（ａ）のように短発音の直後に過小レベル音が入力されると、入力音の振幅レベルを速やかに適正レベルにするためリカバリ動作を行い、図３（ｃ）に示すように、ゲイン値を大きくする必要がある。また、短発音の直後に振幅レベルの大きい音（振幅レベルが短発音より小さく、所定の適正範囲内のレベルと同等な音）が入力されると、ゲイン値を小さくする。リカバリ動作において、短発音の直後に過小レベル音が入力された場合にゲイン値を急激に増加させると、急激に大きくなった音が聞き取れたり、音が不安定に聞こえる場合があるため、聞き取りが困難であったり、違和感があるなど、聴感が悪化する傾向がある。 When an under-level sound is input immediately after a short sound as shown in FIG. 3 (a), a recovery operation is performed to quickly set the amplitude level of the input sound to an appropriate level, and as shown in FIG. 3 (c), It is necessary to increase the gain value. Further, when a sound having a large amplitude level (a sound whose amplitude level is smaller than that of the short sound and equal to a level within a predetermined appropriate range) is input immediately after the short sound, the gain value is decreased. In recovery operation, if an under-level sound is input immediately after a short sound, if the gain value is increased rapidly, a suddenly loud sound may be heard or the sound may be unstable. Sense of hearing tends to deteriorate due to difficulty and discomfort.

図３（ｄ）は、図３（ａ）の短発音の直後に過小レベル音が入力された場合のリカバリ動作による出力音の波形変化、図３（ｅ）は本実施形態のリカバリ動作におけるゲイン値の変化をそれぞれ示している。図３（ｅ）のように、本実施形態の音声処理装置１００は、短発音が入力される前のゲイン値をゲイン保持部１０６において保持する。そして、短発音の直後に過小レベル音が入力された場合のリカバリ動作において、ゲイン保持部１０６が保持したゲイン値を上限として、ゲイン設定部１０５により設定されるゲイン値を制御する。 FIG. 3 (d) shows the waveform change of the output sound due to the recovery operation when an under-level sound is input immediately after the short sound in FIG. 3 (a), and FIG. 3 (e) shows the gain in the recovery operation of this embodiment. Each change in value is shown. As shown in FIG. 3E, the sound processing apparatus 100 according to the present embodiment holds the gain value before the short sound is input in the gain holding unit 106. Then, in the recovery operation when an under-level sound is input immediately after a short sound, the gain value set by the gain setting unit 105 is controlled with the gain value held by the gain holding unit 106 as the upper limit.

例えば、短発音の場合、図８のＡＴＴ＿ＣＮＴが０になる前に音声信号の振幅レベルが大きくなる。そのため、ファーストリカバリ動作が行われるが、このとき、制御部１０２は、ゲイン保持部１０６に保持されたゲイン値をゲイン補正部１０７により補正した補正ゲイン値を、リカバリ動作時におけるゲインの上限値としてゲイン設定部１０５に設定する。そして、ファーストリカバリ動作中は、ゲイン設定部１０５によりゲインを徐々に大きくして、ゲインがゲイン補正部１０７により補正された上限値に達すると、それ以降、ゲイン値を上げないようにする。従って、リカバリ動作時には、ゲイン保持部１０６に保持されたゲイン値を上限として、ゲイン設定部１０５にゲイン値が設定される。 For example, in the case of short pronunciation, the amplitude level of the audio signal increases before ATT_CNT in FIG. Therefore, the fast recovery operation is performed. At this time, the control unit 102 uses the correction gain value obtained by correcting the gain value held in the gain holding unit 106 by the gain correction unit 107 as the upper limit value of the gain during the recovery operation. The gain setting unit 105 is set. During the fast recovery operation, the gain is gradually increased by the gain setting unit 105, and when the gain reaches the upper limit value corrected by the gain correction unit 107, the gain value is not increased thereafter. Therefore, during the recovery operation, the gain value is set in the gain setting unit 105 with the gain value held in the gain holding unit 106 as the upper limit.

このように、リカバリ動作のためのゲイン値に上限を設けることで、短発音の後に過小レベル音が入力された場合に、それ以前のゲイン値を保持しておき、リカバリ動作では保持したゲイン値を上度としてリカバリ動作が行われるように制御される。これにより、出力音の聴感が良くなることが期待できる。 In this way, by setting an upper limit on the gain value for the recovery operation, when an under-level sound is input after a short sound, the previous gain value is retained, and the gain value retained in the recovery operation is retained. Control is performed so that the recovery operation is performed at a higher level. This can be expected to improve the audibility of the output sound.

［実施形態２］次に、図４および図５を参照して、実施形態２の自動レベル制御について説明する。 [Embodiment 2] Next, automatic level control of Embodiment 2 will be described with reference to FIGS.

実施形態１では、入力音に短発音が入力され、短発音の後に過小レベル音が入力される場合であったが、実施形態２では短発音の後に風による雑音（以下、風雑音）が入力された場合について説明する。 In the first embodiment, a short pronunciation is input to the input sound, and an under-level sound is input after the short pronunciation. However, in the second embodiment, wind noise (hereinafter, wind noise) is input after the short sound is input. The case will be described.

図４は、実施形態２の音声処理装置１００の構成を示している。図５は、実施形態２の音声処理装置１００の動作を示すフローチャートである。以下では、上述した図１と同様の構成および図２と同様の処理については同じ符号を付して説明は省略し、相違点を中心に説明を行う。 FIG. 4 shows the configuration of the speech processing apparatus 100 of the second embodiment. FIG. 5 is a flowchart illustrating the operation of the speech processing apparatus 100 according to the second embodiment. In the following description, the same components as those in FIG. 1 and the processes similar to those in FIG. 2 are denoted by the same reference numerals, description thereof will be omitted, and differences will be mainly described.

入力音声に対してＡＬＣによるレベル制御を行う場合、風雑音が入力されるとそれと共に入力された音声や音楽に対しても影響がある。風雑音が大きい場合は、ＡＬＣのリミット動作でゲイン値を減少させているが、風雑音と共に入力された音声や音楽もリミット動作を受けるため、小さな音になってしまう。また、急に風が止んだ場合は音声や音楽だけになるので、閾値よりも音声や音楽の振幅レベルが小さい場合は、ファーストリカバリ動作になり、音声や音楽に対してゲイン値を増加させ、振幅レベルを大きくする。このように、風雑音が入力されたり、されなかったりして入力音の振幅レベルが大きく変動する場合、音声や音楽の振幅レベルが一定にならず、不安定な音として聞こえてしまい、聞き取りが困難であったり、違和感があるなど、聴感が悪化することがある。 When level control by ALC is performed on the input voice, if wind noise is input, it also affects the voice and music input together with the wind noise. When the wind noise is large, the gain value is reduced by the ALC limit operation. However, since the voice and music input together with the wind noise are also subjected to the limit operation, the sound becomes small. Also, if the wind suddenly stops, only the voice and music will be, so if the amplitude level of the voice or music is lower than the threshold, it will be fast recovery operation, increase the gain value for the voice and music, Increase the amplitude level. In this way, when the amplitude level of the input sound fluctuates greatly due to wind noise being input or not being input, the amplitude level of the voice or music is not constant, and it may be heard as an unstable sound, and listening is not possible. It may be difficult or uncomfortable, and hearing may worsen.

そこで、本実施形態では、音声処理装置１００に風量検出部４０１と補正値調整部４０２を設け、短発音の後に風雑音が入力された場合を考慮してゲイン値を補正するように構成している。 Therefore, in the present embodiment, the sound processing device 100 is provided with the air volume detection unit 401 and the correction value adjustment unit 402 so that the gain value is corrected in consideration of a case where wind noise is input after a short sound. Yes.

図４において、風量検出部４０１は、入力音に含まれる風雑音のレベルをフィルタ回路などで検出する。例えば、音声処理装置１００に対し、ステレオマイクからの左右の２チャンネルの音声信号が入力される場合、右チャンネルと左チャンネルの音声信号の差を用いて風量を検出する。詳しくは、風量検出部４０１は、左右チャンネルの差信号の絶対値が所定の閾値より大きい場合は風量が多い（風雑音が多い、大きい）ことを示す信号を風量情報として出力する。また、風量検出部４０１は、差信号の絶対値が所定の閾値より小さい場合は風量が少ない（風雑音が少ない、小さい）ことを示す信号を風量情報として出力する。このように、風雑音があると、低域周波数成分（例えば１００Ｈｚ以下）において２つの音声信号の位相差が大きくなる特性を利用して、風雑音の量を検出している。 In FIG. 4, the air volume detection unit 401 detects the level of wind noise included in the input sound with a filter circuit or the like. For example, when left and right two-channel audio signals from a stereo microphone are input to the audio processing device 100, the air volume is detected using the difference between the right and left channel audio signals. Specifically, the air volume detection unit 401 outputs, as air volume information, a signal indicating that the air volume is large (the wind noise is large or large) when the absolute value of the difference signal between the left and right channels is greater than a predetermined threshold. Further, the air volume detection unit 401 outputs a signal indicating that the air volume is low (wind noise is low or low) as the air volume information when the absolute value of the difference signal is smaller than a predetermined threshold. As described above, when there is wind noise, the amount of wind noise is detected by using the characteristic that the phase difference between the two audio signals increases in the low frequency component (for example, 100 Hz or less).

補正値調整部４０２は、ゲイン補正部１０７の補正係数ｋの値（補正値）を風量検出部４０１で検出された風量情報に応じて適応的に変化させる。図６は風量とゲイン値の補正係数ｋの関係を示している。風量検出部４０１で風量が多く入力されたことが検出されると、ゲイン値の補正係数ｋを小さな値に、風量が少なく入力された場合は補正係数ｋを大きな値になるように調整する。 The correction value adjustment unit 402 adaptively changes the value (correction value) of the correction coefficient k of the gain correction unit 107 according to the air volume information detected by the air volume detection unit 401. FIG. 6 shows the relationship between the air volume and the correction coefficient k of the gain value. When the air volume detection unit 401 detects that a large amount of air is input, the gain coefficient correction coefficient k is adjusted to a small value, and when the air volume is input to a small amount, the correction coefficient k is adjusted to a large value.

次に、図５を参照して、本実施形態の音声処理装置１００の動作を説明する。 Next, the operation of the speech processing apparatus 100 of this embodiment will be described with reference to FIG.

Ｓ５０１では、制御部１０２は、風量検出部４０１で風雑音が検出されたか判定し、風雑音が検出された場合はＳ５０２へ進み、風雑音が検出されなかった場合はＳ５０３へ進む。 In S501, the control unit 102 determines whether or not wind noise is detected by the air volume detection unit 401. If wind noise is detected, the process proceeds to S502, and if wind noise is not detected, the process proceeds to S503.

Ｓ５０２では、風雑音が検出された場合、制御部１０２は、補正値調整部４０２でゲイン補正部１０７の補正係数ｋの値を検出された風雑音の量に応じて適応的に変更する。 In S 502, when wind noise is detected, the control unit 102 adaptively changes the value of the correction coefficient k of the gain correction unit 107 by the correction value adjustment unit 402 according to the detected amount of wind noise.

その後、Ｓ５０３〜Ｓ５１０では、図２のＳ２０１〜Ｓ２０８と同様の処理を行う。 Thereafter, in S503 to S510, processing similar to S201 to S208 in FIG. 2 is performed.

本実施形態によれば、図６に示すように風量が多いほどゲイン値の最大値を低下させることができるので、風雑音によって処理後の主音声（声や音楽）の振幅レベルが不安定になることを抑制することができる。 According to the present embodiment, as the air volume increases, the maximum gain value can be reduced as shown in FIG. 6, so that the amplitude level of the processed main voice (voice or music) becomes unstable due to wind noise. It can be suppressed.

なお、本実施形態では、２つの音声信号の差信号に基づいて風量を検出していたが、これに限らず、単に２つの音声信号の低域周波数のレベルを所定の風量の閾値と比較して求めてもよい。また、別途風圧センサ、風速計、レーザとカメラを用いる装置など空気変位を検出できる装置を用いて風量を検出してもよい。 In this embodiment, the air volume is detected based on the difference signal between the two audio signals. However, the present invention is not limited to this, and the low frequency level of the two audio signals is simply compared with a predetermined air volume threshold. You may ask. Alternatively, the air volume may be detected by using a device capable of detecting air displacement such as a wind pressure sensor, an anemometer, or a device using a laser and a camera.

また、図６に示した値や風量に応じたゲイン補正係数の変化の様子は、これらに限定されるものではなく、ＡＬＣを行うシーンでチューニングでき、適正な値に設定することができるものとする。 In addition, the change in the gain correction coefficient according to the value and the air volume shown in FIG. 6 is not limited to these, and can be tuned in a scene where ALC is performed and set to an appropriate value. To do.

また、各実施形態の音声処理装置をデジタルビデオカメラなどの撮像装置に適用することも可能であるが、これに限定されない。各実施形態の音声処理装置は、外部の音声を記録、または入力して出力するような装置であればどのような装置であっても適用することができる。例えば、携帯電話やその一種であるスマートフォン（眼鏡型端末や腕時計型端末を含む）、タブレット端末、ボイスレコーダ、メディアプレーヤなどであってもよい。 In addition, the audio processing device of each embodiment can be applied to an imaging device such as a digital video camera, but is not limited thereto. The sound processing apparatus according to each embodiment can be applied to any apparatus as long as it records or inputs and outputs external sound. For example, it may be a mobile phone or a type of smartphone (including eyeglass-type terminals and wristwatch-type terminals), a tablet terminal, a voice recorder, a media player, and the like.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００…音声処理装置、１０１…信号処理部、１０２…制御部、１０３…レベル検出部、１０４…タイマー部、１０５…ゲイン設定部、１０６…ゲイン保持部、１０７…ゲイン補正部、４０１…風量検出部、４０２…補正値調整部 DESCRIPTION OF SYMBOLS 100 ... Voice processing apparatus 101 ... Signal processing part 102 ... Control part 103 ... Level detection part 104 ... Timer part 105 ... Gain setting part 106 ... Gain holding part 107 ... Gain correction part 401 ... Air volume detection , 402... Correction value adjustment unit

Claims

Signal processing means for processing the audio signal using a set gain value so that the amplitude level of the input audio signal falls within a predetermined range by an upper limit threshold and a lower limit threshold;
Level detection means for detecting the amplitude level of the audio signal output from the signal processing means;
A means for controlling a gain value set in the signal processing means based on the amplitude level detected by the level detection means, wherein the amplitude level detected by the level detection means is lower than the upper limit threshold value. The gain to be set when the gain level to be set is reduced when the threshold value is equal to or higher than the upper threshold value, and when the amplitude level is not lower than the lower threshold value or lower than the lower threshold value Control means for performing processing to increase the value;
Holding means for holding a gain value set in the signal processing means,
The control means, when the amplitude level detected by the level detection means is not equal to or higher than the upper limit threshold, when the amplitude level is equal to or higher than the upper limit threshold, the gain value set before the upper limit threshold or higher is set. Controlling the holding means to hold
After the amplitude level detected by the level detection means becomes equal to or higher than the upper limit threshold, the amplitude level is equal to or lower than the lower limit threshold without continuing the state of being equal to or higher than the upper limit threshold for a predetermined time or longer. In this case, a process of increasing the set gain value with an upper limit of a predetermined correction gain value equal to or less than the held gain value based on the gain value held by the holding means is performed. Voice processing device.

After the amplitude level detected by the level detection unit becomes equal to or higher than the upper limit threshold, and after the state where the amplitude level is equal to or higher than the upper limit threshold continues for a predetermined time or longer, the control means The audio processing apparatus according to claim 1, wherein when the threshold value is equal to or lower than the lower limit threshold value, the process of increasing the set gain value is performed without setting the correction gain value as an upper limit.

A timer means;
The audio processing apparatus according to claim 1, wherein the control unit measures a time during which the amplitude level detected by the level detection unit is equal to or greater than the upper limit threshold by the timer unit.

The audio processing apparatus according to claim 1, wherein the control unit determines the correction gain value by multiplying a gain value held in the holding unit by a predetermined correction value of 1 or less.

An air volume detecting means for detecting an amount of wind noise included in the input audio signal;
The audio processing apparatus according to claim 1, wherein the control unit further determines the correction gain value based on an amount of wind noise detected by the air volume detection unit.

The control means determines the correction gain value by multiplying the gain value held in the holding means by a predetermined correction value of 1 or less, and the amount of wind noise detected by the air volume detection means is large. The speech processing apparatus according to claim 5, wherein the predetermined correction value is set to a smaller value.

Signal processing means for processing the audio signal using a set gain value so that the amplitude level of the input audio signal falls within a predetermined range by an upper limit threshold and a lower limit threshold;
Level detection means for detecting the amplitude level of the audio signal output from the signal processing means;
Control means for controlling a gain value set in the signal processing means based on the amplitude level detected by the level detection means;
Holding means for holding a gain value set in the signal processing means, and a control method for a sound processing apparatus comprising:
When the amplitude level detected by the level detection means becomes lower than the upper limit threshold from a state lower than the upper limit threshold, the gain value to be set is reduced, and the amplitude level is lower limit threshold A control step for performing a process of increasing the gain value to be set when the lower limit threshold or less is reached from a state not lower than
In the control step, when the amplitude level detected by the level detection unit is not equal to or higher than the upper limit threshold and is equal to or higher than the upper limit threshold, the gain value set before the upper limit threshold or higher is set. Controlling the holding means to hold
After the amplitude level detected by the level detection means becomes equal to or higher than the upper limit threshold, the amplitude level is equal to or lower than the lower limit threshold without continuing the state of being equal to or higher than the upper limit threshold for a predetermined time or longer. In this case, a process of increasing the set gain value with an upper limit of a predetermined correction gain value equal to or less than the held gain value based on the gain value held by the holding means is performed. Control method for a voice processing apparatus.

A program for causing a computer to execute the control method according to claim 7.

A computer-readable storage medium storing a program for causing a computer to execute the control method according to claim 7.