JP7257834B2

JP7257834B2 - Speech processing device, speech processing method, and speech processing system

Info

Publication number: JP7257834B2
Application number: JP2019056800A
Authority: JP
Inventors: 大誠永石
Original assignee: Asahi Kasei EMD Corp
Current assignee: Asahi Kasei EMD Corp
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2023-04-14
Anticipated expiration: 2039-03-25
Also published as: JP2020161884A

Description

本開示は音声処理装置、音声処理方法、および音声処理システムに関する。 The present disclosure relates to an audio processing device, an audio processing method, and an audio processing system.

スピーカから出力された音をマイクが拾うことによって耳障りな騒音を発生する現象（ハウリング）が知られている。ハウリングは音声を聞き取る際の支障となる。そのため、ハウリングを抑制する様々な装置および方法が提案されている。 A phenomenon (howling) is known in which an offensive noise is generated when a microphone picks up sound output from a speaker. Howling interferes with listening to voice. Therefore, various devices and methods for suppressing howling have been proposed.

例えば、特許文献１のハウリング抑制装置はカラオケ装置で使用される。特許文献１のハウリング抑制装置は、ハウリングが検知されている期間において、アナログドライ音経路をデジタルドライ音経路に切り替える。 For example, the howling suppression device of Patent Document 1 is used in a karaoke machine. The howling suppression device of Patent Literature 1 switches the analog dry sound path to the digital dry sound path during a period in which howling is detected.

特開２０１８－０５６８９３号公報JP 2018-056893 A

しかし、特許文献１の技術では、マイクに入力される音声が、その場にいる者が発した声であるか、話者の発声以外の音（例えばノイズ、電話を介した音声等）であるかを区別しない。そのため、特許文献１の技術では、ハウリングが発生しにくい肉声に対しても、肉声以外の音声に対しても、画一的な音声処理が実行される。しかしながら、ハウリング抑制効果等の音響効果を施す音声処理は、話者の発声状態（例えば、話者が発声しているか否か等）に応じて制御されることが望ましい。 However, in the technique of Patent Document 1, the voice input to the microphone is the voice uttered by a person present on the spot, or a sound other than the utterance of the speaker (for example, noise, voice via telephone, etc.). does not distinguish between Therefore, in the technique disclosed in Patent Document 1, uniform audio processing is performed for both human voice in which howling is unlikely to occur and audio other than human voice. However, it is desirable that the audio processing for applying acoustic effects such as howling suppression effects is controlled according to the speaking state of the speaker (for example, whether the speaker is speaking or not).

本開示は、話者の発声状態に応じて音響効果を施す音声処理を制御する音声処理装置、音声処理方法、および音声処理システムの提供を目的とする。 An object of the present disclosure is to provide a speech processing device, a speech processing method, and a speech processing system that control speech processing for applying sound effects according to the utterance state of a speaker.

本開示の音声処理装置は、外部から入力される音声に基づく音声信号を出力するマイクロフォンと、二酸化炭素濃度を測定する濃度測定部と、前記二酸化炭素濃度に基づいて、前記音声信号に施す音響効果を制御する制御信号を生成する音声信号制御部と、前記制御信号に基づいて前記音声信号に前記音響効果を施す音声信号処理部と、を備える。 The sound processing device of the present disclosure includes a microphone that outputs a sound signal based on sound input from the outside, a concentration measurement unit that measures the carbon dioxide concentration, and a sound effect that is applied to the sound signal based on the carbon dioxide concentration. and an audio signal processing unit that applies the sound effect to the audio signal based on the control signal.

本開示の音声処理方法は、外部から入力される音声に基づく音声信号を出力するステップと、二酸化炭素濃度を測定するステップと、前記二酸化炭素濃度に基づいて、前記音声信号に施す音響効果を制御する制御信号を生成するステップと、前記制御信号に基づいて前記音声信号に前記音響効果を施すステップと、を含む。 The audio processing method of the present disclosure includes the steps of outputting an audio signal based on an externally input audio, measuring a carbon dioxide concentration, and controlling a sound effect applied to the audio signal based on the carbon dioxide concentration. and applying the sound effect to the audio signal based on the control signal.

本開示の音声処理システムは、音声を収音する収音装置と、音声出力装置とを備え、前記収音装置は、外部から入力される音声に基づく音声信号を出力するマイクロフォンと、二酸化炭素濃度を測定する濃度測定部と、前記二酸化炭素濃度に基づいて、前記音声信号に施す音響効果を制御する制御信号を生成する音声信号制御部と、を備え、前記収音装置および／または前記音声出力装置が、前記制御信号に基づいて前記音声信号に前記音響効果を施す音声信号処理部を備え、前記音声出力装置が、前記音響効果が施された前記音声信号を音声として出力する音声出力部を備える。 A sound processing system of the present disclosure includes a sound collecting device that collects sound and a sound output device. The sound collecting device includes a microphone that outputs a sound signal based on sound input from the outside, and a carbon dioxide concentration and an audio signal control unit that generates a control signal for controlling a sound effect applied to the audio signal based on the carbon dioxide concentration, the sound collecting device and / or the audio output The device comprises an audio signal processing unit that applies the sound effect to the audio signal based on the control signal, and the audio output device includes an audio output unit that outputs the audio signal to which the sound effect has been applied as sound. Prepare.

本開示によれば、話者の発声状態に応じて音響効果を施す音声処理を制御する音声処理装置、音声処理方法、および音声処理システムを提供することができる。 Advantageous Effects of Invention According to the present disclosure, it is possible to provide a speech processing device, a speech processing method, and a speech processing system that control speech processing for applying sound effects according to the utterance state of a speaker.

本発明の第１実施形態における音声処理装置の一例を示す概略構成図である。1 is a schematic configuration diagram showing an example of a speech processing device according to a first embodiment of the present invention; FIG. 図１の音声処理装置の動作を説明するためのフローチャート図である。2 is a flowchart for explaining the operation of the speech processing device of FIG. 1; FIG. 図１の音声処理装置の一例を搭載した拡声器を示す図である。2 is a diagram showing a loudspeaker equipped with an example of the audio processing device of FIG. 1; FIG. 本発明の第２実施形態における音声処理装置の一例を示す概略構成図である。FIG. 5 is a schematic configuration diagram showing an example of a speech processing device according to a second embodiment of the present invention; 図４の音声処理装置の動作を説明するためのフローチャート図である。5 is a flowchart for explaining the operation of the audio processing device of FIG. 4; FIG. 本発明の実施形態における音声処理システムの一例を示す概略構成図である。1 is a schematic configuration diagram showing an example of a speech processing system according to an embodiment of the present invention; FIG.

以下、図面を参照して本開示の実施形態を説明する。 Embodiments of the present disclosure will be described below with reference to the drawings.

（音声処理装置）
（第１実施形態）
まず、第１実施形態を説明する。図１は、本実施形態における音声処理装置１０の一例を示す概略構成図である。音声処理装置１０は、話者の発声状態に応じて音響効果を施すために音声信号に対する音声処理を実行する。ここで、音声処理装置１０は、使用者の呼気を検出し、入力情報が使用者の肉声であるか否かを判定する。音声処理装置１０は、入力情報についての判定結果に応じて、実行する音声処理を変更する。 (sound processing device)
(First embodiment)
First, the first embodiment will be explained. FIG. 1 is a schematic configuration diagram showing an example of a speech processing device 10 according to this embodiment. The speech processing device 10 performs speech processing on speech signals in order to apply sound effects according to the utterance state of the speaker. Here, the speech processing device 10 detects the user's exhalation and determines whether or not the input information is the user's natural voice. The speech processing device 10 changes speech processing to be executed according to the determination result of the input information.

音声処理装置１０は、マイクロフォン２０と、濃度測定部３０と、音声信号制御部４０と、音声信号処理部５０と、を備える。音声処理装置１０は、音声出力部６０を更に備えてよい。 The audio processing device 10 includes a microphone 20 , a density measuring section 30 , an audio signal control section 40 and an audio signal processing section 50 . The audio processing device 10 may further include an audio output unit 60 .

例えば、音声処理装置１０は、拡声器（図３参照）、ハンドマイクまたは自動車内のハンズフリー通話装置等に搭載されてよい。 For example, the audio processing device 10 may be installed in a loudspeaker (see FIG. 3), a handheld microphone, a hands-free communication device in a car, or the like.

（マイクロフォン）
マイクロフォン２０は、音声処理装置１０の外部から入力される音声に基づく音声信号を音声信号処理部５０に出力する。マイクロフォン２０としては、音声を取得可能なものであれば特に限定されず、例えばコンデンサマイク、圧電マイク、ダイナミックマイクとすることができる。
マイクロフォン２０は、音声処理装置１０の内部に配置し、マイクロフォン２０と音声処理装置１０の外部とを音声処理装置１０の開口部を介して連通させてもよいし、或いは、音声処理装置１０より露出された状態で配置されてもよい。
ここで、音声処理装置１０は、外部から入力される入力情報を取得する。入力情報は音声および呼気を含む。マイクロフォン２０は、入力情報のうちの音声を取得する。音声は、使用者が発した声、ノイズおよび電話を介した音声等を含み得る。 (microphone)
The microphone 20 outputs an audio signal based on a sound input from the outside of the audio processing device 10 to the audio signal processing unit 50 . The microphone 20 is not particularly limited as long as it can acquire sound, and may be, for example, a condenser microphone, a piezoelectric microphone, or a dynamic microphone.
The microphone 20 may be arranged inside the audio processing device 10, and the microphone 20 and the outside of the audio processing device 10 may be communicated through an opening of the audio processing device 10, or may be exposed from the audio processing device 10. It may be arranged in a state where
Here, the speech processing device 10 acquires input information input from the outside. Input information includes speech and breath. The microphone 20 acquires the voice of the input information. Speech may include user-generated voice, noise, voice over the phone, and the like.

（濃度測定部）
濃度測定部３０は、入力情報のうちの呼気に基づいて変化する空気中の二酸化炭素濃度を測定する。濃度測定部３０としては、二酸化炭素濃度を測定可能であれば限定されず、例えば非分散型赤外線分析法を用いたガスセンサを用いることができる。また、濃度測定部３０における二酸化炭素濃度の検知方式は、絶対値検知、相対値検知のどちらでもよい。
濃度測定部３０の音声処理装置１０内の配置は、特に限定されないが、例えばマイクロフォン２０が音声処理装置１０の内部に配置されている場合には、濃度測定部３０を、音声処理装置１０の開口部付近に設けることができる。また、マイクロフォン２０が音声処理装置１０より露出された状態で配置されている場合には、濃度測定部３０を、露出したマイクロフォン２０付近に設けることができる。濃度測定部３０による空気中の二酸化炭素濃度は、使用者の発声に伴う呼気を含む空気が、開口部を介して音声処理装置１０の内部に取り込まれることで変化し、または、使用者の呼気を含んだ空気が、マイクロフォン２０に吹き込まれることで変化する。なお、濃度測定部３０は、使用者がマイクロフォン２０に向かって発声した際に、濃度測定部３０による二酸化炭素濃度の測定が可能であればマイクロフォン２０と濃度測定部３０とが離間することは許容される。
濃度測定部３０は、常時動作、または、適切な間隔（例えば１秒）での間欠動作を行ってよい。 (Concentration measuring part)
The concentration measurement unit 30 measures the concentration of carbon dioxide in the air, which changes based on the exhaled breath among the input information. The concentration measuring unit 30 is not limited as long as it can measure the carbon dioxide concentration, and for example, a gas sensor using a non-dispersive infrared analysis method can be used. Further, the carbon dioxide concentration detection method in the concentration measurement unit 30 may be either absolute value detection or relative value detection.
The arrangement of the concentration measurement unit 30 in the sound processing device 10 is not particularly limited, but for example, when the microphone 20 is arranged inside the sound processing device 10, the concentration measurement unit 30 can be placed at the opening of the sound processing device 10. It can be installed near the part. Further, when the microphone 20 is arranged in a state exposed from the sound processing device 10, the density measuring section 30 can be provided near the exposed microphone 20. FIG. The concentration of carbon dioxide in the air measured by the concentration measurement unit 30 changes when air containing exhalation accompanying the user's vocalization is taken into the voice processing device 10 through the opening, is blown into the microphone 20 and changes. When the user speaks into the microphone 20, the concentration measuring unit 30 allows the microphone 20 and the concentration measuring unit 30 to be separated if the carbon dioxide concentration can be measured by the concentration measuring unit 30. be done.
The concentration measurement unit 30 may operate constantly or intermittently at appropriate intervals (for example, 1 second).

（音声信号制御部）
音声信号制御部４０は、濃度測定部３０が測定した二酸化炭素濃度を取得する。また、音声信号制御部４０は、取得した二酸化炭素濃度に基づいて、音声信号に施す音響効果を制御する制御信号を生成する。音声信号制御部４０による信号の生成は、例えばマイクロプロセッサ、マイクロコントローラ、または、ＣＰＵ（Central Processing Unit）などの演算処理装置により実現される。
音声信号制御部４０は、二酸化炭素濃度に基づいて、使用者が発声中か否かの判定を行う。判定の詳細は後述する。音声信号制御部４０は、判定結果、すなわち、使用者が発声中か否かに応じて、異なる制御信号を生成する。生成された制御信号は、音声信号処理部５０に出力される。本実施形態において、音声信号制御部４０は、使用者が発声中と判定した場合に、第１の制御信号を生成する。また、音声信号制御部４０は、使用者が発声していないと判定した場合に、第２の制御信号を生成する。 (Audio signal control unit)
The audio signal control section 40 acquires the carbon dioxide concentration measured by the concentration measurement section 30 . Also, the audio signal control unit 40 generates a control signal for controlling the acoustic effect applied to the audio signal based on the acquired carbon dioxide concentration. The signal generation by the audio signal control unit 40 is realized by an arithmetic processing device such as a microprocessor, a microcontroller, or a CPU (Central Processing Unit).
The audio signal control unit 40 determines whether or not the user is speaking based on the carbon dioxide concentration. Details of the determination will be described later. The audio signal control unit 40 generates different control signals depending on the determination result, that is, whether or not the user is speaking. The generated control signal is output to the audio signal processing section 50 . In this embodiment, the audio signal control unit 40 generates the first control signal when determining that the user is speaking. Also, the audio signal control unit 40 generates a second control signal when it is determined that the user does not speak.

（音声信号処理部）
音声信号処理部５０は、音声信号制御部４０から取得した制御信号に基づいて、マイクロフォン２０から取得した音声信号に対して音響効果を施す。上記のように、音声信号制御部４０は、使用者が発声中か否かに応じて異なる制御信号を生成する。音声信号処理部５０は、音声信号に対して、制御信号に応じた音響効果を施すことができる。
本実施形態において、音声信号処理部５０が音声信号に対して施す音響効果は、周波数フィルタ効果とすることができる。ここで、音声信号に対して施す音響効果は、周波数フィルタ効果に限定されるものではない。別の例として、音声信号処理部５０は、音声信号に対して、ノイズ除去等の音響効果を施してよい。また、さらに別の例として、音声信号処理部５０は、音声信号を増幅させる音響効果（アンプ）を施してもよい。また、その他の例として、音声信号処理部５０は、音声信号に対して、音程補正加工および歪み加工といったいわゆるボイスエフェクト等の音響効果を施してもよい。
音声信号処理部５０は、例えばマイクロプロセッサ、マイクロコントローラ、または、ＣＰＵ（Central Processing Unit）などの演算処理装置により実現される。 (Audio signal processor)
The audio signal processing section 50 applies sound effects to the audio signal acquired from the microphone 20 based on the control signal acquired from the audio signal control section 40 . As described above, the audio signal control unit 40 generates different control signals depending on whether the user is speaking or not. The audio signal processing unit 50 can apply sound effects to the audio signal according to the control signal.
In this embodiment, the acoustic effect applied to the audio signal by the audio signal processing unit 50 can be a frequency filter effect. Here, the acoustic effect applied to the audio signal is not limited to the frequency filter effect. As another example, the audio signal processing unit 50 may apply acoustic effects such as noise removal to the audio signal. As still another example, the audio signal processing unit 50 may apply a sound effect (amplifier) for amplifying the audio signal. As another example, the audio signal processing unit 50 may apply sound effects such as so-called voice effects such as pitch correction processing and distortion processing to the audio signal.
The audio signal processing unit 50 is realized by an arithmetic processing device such as a microprocessor, a microcontroller, or a CPU (Central Processing Unit).

音声信号処理部５０が施す音響効果が上記の周波数フィルタ効果である場合、音声信号処理部５０は、第１ハウリング抑制フィルタおよび第２ハウリング抑制フィルタを備える。音声信号処理部５０は、第１の制御信号（使用者が発声中と判断された場合に生成される）を取得した場合に、音声信号に対して第１ハウリング抑制フィルタを適用する。つまり、音声信号処理部５０は、第１の制御信号に従って、音声信号に対して第１の周波数帯域におけるハウリング抑制効果を施す。第１の周波数帯域は、例えば人の声の主な周波数帯域である１００Ｈｚから１０００Ｈｚ、または、１００Ｈｚから１０００Ｈｚのうち、使用する音響装置や周辺環境に依存するハウリングが発生しやすい周波数帯域である。音声信号処理部５０は、使用者が発声中と判定された場合に、音声信号に対して第１の周波数帯域の音量を下げてハウリングを抑制する。また、音声信号処理部５０は、第２の制御信号（使用者が発声していないと判断された場合に生成される）を取得した場合に、音声信号に対して第２ハウリング抑制フィルタを適用する。つまり、音声信号処理部５０は、第２の制御信号に従って、音声信号に対して第２の周波数帯域におけるハウリング抑制効果を施す。第２の周波数帯域は、第１の周波数帯域よりも広い。第２の周波数帯域は、例えば人の可聴領域である２０Ｈｚから２００００Ｈｚのうち、使用する音響装置や周辺環境に依存するハウリングが発生しやすい周波数帯域である。音声信号処理部５０は、使用者が発声していないと判定された場合に、音声信号に対して第２の周波数帯域の音量を下げてハウリングを抑制する。したがって、本実施形態の音声処理装置１０によれば、ハウリングの発生しやすさに応じて効果的にハウリング抑制を実行することができる。
ここで、別の例として、音声信号処理部５０は、それぞれが第１の制御信号または第２の制御信号に従って適用される３つ以上のハウリング抑制フィルタを備えてよい。
また、必要に応じて、ハウリング抑制効果が施された音声信号に対して、さらに、別の音響効果を施してもよい。具体的には、例えば、ハウリング抑制効果が施された音声信号を増幅させる音響効果を施してもよい。 When the acoustic effect applied by the audio signal processing unit 50 is the above frequency filter effect, the audio signal processing unit 50 includes a first howling suppression filter and a second howling suppression filter. The audio signal processing unit 50 applies the first howling suppression filter to the audio signal when the first control signal (generated when it is determined that the user is speaking) is obtained. That is, the audio signal processing unit 50 applies the howling suppression effect in the first frequency band to the audio signal according to the first control signal. The first frequency band is, for example, a frequency band from 100 Hz to 1000 Hz, which is the main frequency band of human voice, or from 100 Hz to 1000 Hz, in which howling is likely to occur depending on the acoustic device used or the surrounding environment. When it is determined that the user is speaking, the audio signal processing unit 50 reduces the volume of the first frequency band with respect to the audio signal to suppress howling. Further, the audio signal processing unit 50 applies the second howling suppression filter to the audio signal when the second control signal (generated when it is determined that the user is not vocalizing) is acquired. do. That is, the audio signal processing unit 50 applies the howling suppression effect in the second frequency band to the audio signal according to the second control signal. The second frequency band is wider than the first frequency band. The second frequency band is, for example, within the human audible range of 20 Hz to 20000 Hz, in which howling that is likely to occur depends on the acoustic device used or the surrounding environment. When it is determined that the user does not speak, the audio signal processing unit 50 reduces the volume of the second frequency band with respect to the audio signal to suppress howling. Therefore, according to the audio processing device 10 of the present embodiment, it is possible to effectively suppress howling according to how likely it is to occur.
Here, as another example, the audio signal processing section 50 may comprise three or more howling suppression filters, each of which is applied according to the first control signal or the second control signal.
Further, if necessary, another sound effect may be applied to the audio signal to which the howling suppression effect has been applied. Specifically, for example, a sound effect that amplifies an audio signal to which the howling suppression effect has been applied may be applied.

また、第１実施形態の変形例において、音声信号処理部５０が施す音響効果が音声信号を増幅させる効果であるとき、音声信号処理部５０は、第１の制御信号を取得した場合に、音声信号を増幅させる音響効果を施す一方、また、第２の制御信号を取得した場合に、音声信号を増幅させる効果を施さない。或いは、音声信号処理部５０は、第２の制御信号を取得した場合に、音声信号の出力を停止することでハウリングを抑制する。 Further, in the modified example of the first embodiment, when the acoustic effect applied by the audio signal processing unit 50 is an effect of amplifying the audio signal, the audio signal processing unit 50, when acquiring the first control signal, A sound effect for amplifying the signal is applied, and an effect for amplifying the audio signal is not applied when the second control signal is acquired. Alternatively, the audio signal processing unit 50 suppresses howling by stopping the output of the audio signal when the second control signal is acquired.

（音声出力部）
音声出力部６０では、音声信号処理部５０によって音響効果が施された音声信号を音声として出力する。音声出力部６０は、例えばスピーカであってもよく、音声出力部６０は、音声信号処理部５０によって音響効果が施された音声信号を音声として出力可能なものであれば、この種類に限定されるものではない。 (Audio output part)
The audio output unit 60 outputs the audio signal to which the sound effect has been applied by the audio signal processing unit 50 as audio. The audio output unit 60 may be, for example, a speaker, and the audio output unit 60 is limited to this type as long as it can output the audio signal to which the sound effect has been applied by the audio signal processing unit 50 as audio. not something.

図２は、音声処理装置１０の処理手順の一例を示すフローチャートである。音声処理装置１０は、フローチャートに記載の処理を実行することによって、音声処理方法を実現する。 FIG. 2 is a flow chart showing an example of a processing procedure of the speech processing device 10. As shown in FIG. The speech processing device 10 implements the speech processing method by executing the processes described in the flowcharts.

音声処理装置１０のマイクロフォン２０は、外部から入力される入力情報のうちの音声（入力音声）を取得する。そして、マイクロフォン２０は、入力音声に応じて音声信号を音声信号処理部５０に出力する（ステップＳ１）。 The microphone 20 of the audio processing device 10 acquires audio (input audio) from input information input from the outside. Then, the microphone 20 outputs an audio signal to the audio signal processing section 50 according to the input audio (step S1).

音声処理装置１０の濃度測定部３０は、入力情報のうちの呼気を含む空気を取得する。そして、濃度測定部３０は呼気に基づいて変化する空気中の二酸化炭素濃度を測定する（ステップＳ２）。ステップＳ２は、予め使用者によって設定されたタイミング、或いは、使用者によって指定される任意のタイミングで実行されてよい。予め設定されたタイミングは一定の周期で定められてよい。また、予め設定されたタイミングは、音声処理装置１０の電源がオンとなったときでよい。また、予め設定されたタイミングは、話者である使用者とマイクロフォン２０の距離が変動して、入力情報のうちの音声の音量が予め設定した閾値を超えたときでよい。 The concentration measurement unit 30 of the speech processing device 10 acquires air containing exhaled breath from the input information. Then, the concentration measurement unit 30 measures the concentration of carbon dioxide in the air, which changes based on exhalation (step S2). Step S2 may be executed at timing preset by the user or at arbitrary timing designated by the user. The preset timing may be determined at regular intervals. Also, the preset timing may be when the voice processing device 10 is powered on. Also, the preset timing may be when the distance between the user, who is the speaker, and the microphone 20 fluctuates and the volume of the voice in the input information exceeds a preset threshold.

音声処理装置１０の音声信号制御部４０は、濃度測定部３０で測定された二酸化炭素濃度が、予め設定した閾値以上と判定する場合に、使用者が発声中であると判定する。また、音声信号制御部４０は、二酸化炭素濃度が予め設定した閾値より低いと判定する場合に、使用者が発声していないと判定する（ステップＳ３）。本実施形態において、閾値は、固定値であって音声信号制御部４０が記憶している。 The audio signal control unit 40 of the audio processing device 10 determines that the user is speaking when the carbon dioxide concentration measured by the concentration measurement unit 30 is equal to or higher than a preset threshold. Also, when determining that the carbon dioxide concentration is lower than the preset threshold value, the audio signal control unit 40 determines that the user does not speak (step S3). In this embodiment, the threshold value is a fixed value and is stored in the audio signal control section 40 .

使用者が発声中であると判定された場合に（ステップＳ３のＹＥＳ）、音声処理装置１０の音声信号処理部５０は、音声信号を第１ハウリング抑制フィルタで処理する（ステップＳ４）。 When it is determined that the user is speaking (YES in step S3), the audio signal processing unit 50 of the audio processing device 10 processes the audio signal with the first howling suppression filter (step S4).

使用者が発声していないと判定された場合に（ステップＳ３のＮＯ）、音声処理装置１０の音声信号処理部５０は、音声信号を第２ハウリング抑制フィルタで処理する（ステップＳ５）。 When it is determined that the user does not speak (NO in step S3), the audio signal processing unit 50 of the audio processing device 10 processes the audio signal with the second howling suppression filter (step S5).

使用者がマイクロフォン２０に向かって話していない場合に、ノイズなどの肉声より広い周波数帯域を有する音声がマイクロフォン２０へ入力される。このとき、第２ハウリング抑制フィルタによって入力音声の音声信号に対して処理が行わなければ、人の可聴領域である２０Ｈｚから２００００Ｈｚでハウリングが発生する可能性がある。また、使用者がマイクロフォン２０に向かって話す場合に、人の声の主な周波数帯域である１００Ｈｚから１０００Ｈｚの音量が、その他の周波数帯域の音量に比べて相対的に大きくなる。そのため、１００Ｈｚから１０００Ｈｚ以外の周波数帯域で、ハウリングは起こりにくい。このとき、仮に第２ハウリング抑制フィルタによって入力音声の音声信号に対する処理を行うと、ハウリングが起こりにくい周波数帯域の音量も同時に下げてしまう。その結果、音声信号処理部５０から出力される音声信号、すなわち、音声信号処理部５０によって音響効果が施された音声信号の音質が悪化する。そこで、使用者がマイクロフォン２０に向かって話す場合に、ハウリングが発生しやすい１００Ｈｚから１０００Ｈｚの周波数帯域のみのボリュームを下げる音響効果を施す第１ハウリング抑制フィルタによって音声信号を処理すれば、音質を悪化させることなくハウリングを抑制することができる。 When the user is not speaking into the microphone 20, voice such as noise having a wider frequency band than the human voice is input to the microphone 20. - 特許庁At this time, if the audio signal of the input audio is not processed by the second howling suppression filter, howling may occur in the human audible range of 20 Hz to 20000 Hz. Also, when the user speaks into the microphone 20, the volume of the main frequency band of human voice from 100 Hz to 1000 Hz is relatively louder than the volume of other frequency bands. Therefore, howling is less likely to occur in a frequency band other than 100 Hz to 1000 Hz. At this time, if the audio signal of the input audio were to be processed by the second howling suppression filter, the volume of the frequency band in which howling is unlikely to occur would also be reduced at the same time. As a result, the sound quality of the audio signal output from the audio signal processing unit 50, that is, the audio signal to which the sound effect has been applied by the audio signal processing unit 50 is deteriorated. Therefore, when the user speaks into the microphone 20, if the audio signal is processed by the first howling suppression filter that reduces the volume only in the frequency band from 100 Hz to 1000 Hz where howling is likely to occur, the sound quality deteriorates. howling can be suppressed without

ステップＳ４またはステップＳ５が実行された後に、音声信号処理部５０から出力された音声信号は、音声出力部６０によって出力音声が生成される（ステップＳ６）。
なお、ステップＳ４またはステップＳ５が実行された後であってステップＳ６の前に、ハウリング抑制効果が施された音声信号に対して、さらに、音声信号を増幅させる音響効果を施してもよい。 After step S4 or step S5 is executed, the audio signal output from the audio signal processing unit 50 is used to generate output audio by the audio output unit 60 (step S6).
Note that after step S4 or step S5 is executed and before step S6, an acoustic effect for amplifying the audio signal may be further applied to the audio signal to which the howling suppression effect has been applied.

（第２実施形態）
次に、第２実施形態を説明する。図４は、本実施形態における音声処理装置１０の一例を示す概略構成図である。本実施形態における音声処理装置１０は、第１実施形態における音声処理装置１０の構成に加えて、濃度記憶部１００を更に備える。濃度記憶部１００は、音声信号制御部４０による使用者の発声状態の判定で用いられる閾値を記憶する。 (Second embodiment)
Next, a second embodiment will be described. FIG. 4 is a schematic configuration diagram showing an example of the speech processing device 10 in this embodiment. The speech processing device 10 according to the present embodiment further includes a density storage unit 100 in addition to the configuration of the speech processing device 10 according to the first embodiment. The density storage unit 100 stores a threshold value used by the audio signal control unit 40 to determine the vocalization state of the user.

人の呼気中の二酸化炭素濃度は通常４％程度であるが、発せられた呼気は空気中を拡散するため、二酸化炭素濃度は口元からの距離が大きくなるほど低下する。また、新鮮な空気中の二酸化炭素濃度は０．０４％以下であるが、換気が悪い屋内の空気中の二酸化炭素濃度は０．４％程度となる。このように、二酸化炭素濃度は、使用者と濃度測定部３０の間の距離および使用環境によって変動する。例えば、濃度測定部３０で測定された二酸化炭素濃度が、屋内の換気が悪いことによって０．４％程度のとき、固定の閾値との比較では誤って使用者の発声と判定するおそれがある。そのため、使用者が発声状態であるか否かを判断する二酸化炭素濃度の閾値を、使用環境に応じて変動させることによって検出の精度を高めることが可能である。 The concentration of carbon dioxide in human exhaled breath is usually about 4%, but since emitted exhaled breath diffuses in the air, the carbon dioxide concentration decreases as the distance from the mouth increases. Also, the carbon dioxide concentration in fresh air is 0.04% or less, but the carbon dioxide concentration in indoor air with poor ventilation is about 0.4%. Thus, the carbon dioxide concentration varies depending on the distance between the user and the concentration measurement unit 30 and the usage environment. For example, when the concentration of carbon dioxide measured by the concentration measuring unit 30 is about 0.4% due to poor ventilation in the room, comparison with a fixed threshold may erroneously determine that it is the user's utterance. Therefore, it is possible to improve the accuracy of detection by varying the carbon dioxide concentration threshold for determining whether the user is speaking or not according to the usage environment.

音声信号制御部４０は、濃度測定部３０によって測定された空気中の二酸化炭素濃度と濃度記憶部１００に記憶された閾値とを比較することによって、使用者が発声中か否かの判定を行う。濃度記憶部１００に記憶される二酸化炭素濃度の閾値は、濃度測定部３０で測定された二酸化炭素濃度、或いは、濃度測定部３０で測定された二酸化炭素濃度を演算した値である。例えば、濃度測定部３０で測定された二酸化炭素濃度を演算した値は、使用者である話者が当該装置１０を使用している状態で一定期間測定し、当該期間中の最小の二酸化炭素濃度を閾値としてもよいし、或いは、使用者である話者が周囲にいない状態で一定期間測定し、当該期間中の最大の二酸化炭素濃度を閾値としてもよい。例えば、濃度測定部３０で測定された二酸化炭素濃度を演算した値は、二酸化炭素濃度に所定の割合を乗じた値であってよく、これにより、例えば、使用者と濃度測定部３０とが多少離れていたとしても、発声中か否かを判断できるようにしてもよい。 The audio signal control unit 40 compares the concentration of carbon dioxide in the air measured by the concentration measurement unit 30 with the threshold value stored in the concentration storage unit 100 to determine whether or not the user is speaking. . The carbon dioxide concentration threshold value stored in the concentration storage unit 100 is the carbon dioxide concentration measured by the concentration measurement unit 30 or a value obtained by calculating the carbon dioxide concentration measured by the concentration measurement unit 30 . For example, the value obtained by calculating the carbon dioxide concentration measured by the concentration measurement unit 30 is measured for a certain period of time while the speaker, who is the user, is using the device 10, and the minimum carbon dioxide concentration during that period is may be used as the threshold value, or the measurement may be performed for a certain period of time without the user, the speaker, and the maximum carbon dioxide concentration during that period may be used as the threshold value. For example, the calculated value of the carbon dioxide concentration measured by the concentration measuring unit 30 may be a value obtained by multiplying the carbon dioxide concentration by a predetermined ratio. It may be possible to determine whether or not the user is speaking even if the user is away from the user.

濃度記憶部１００が記憶する閾値は、予め設定されたタイミング、或いは、使用者によって指定される任意のタイミングで更新される。予め設定されたタイミングは一定の周期で定められてよい。また、予め設定されたタイミングは、音声処理装置１０の電源がオンとなったときでよい。また、予め設定されたタイミングは、話者である使用者とマイクロフォン２０の距離が変動して、入力情報のうちの音声の音量が予め設定した閾値を超えたときでよい。また、予め設定されたタイミングとは、使用者とマイクロフォン２０の距離が変動して、入力情報のうちの濃度測定部３０で測定された二酸化炭素濃度が予め設定した閾値を超えたときでよい。また、予め設定されたタイミングは、使用環境中の二酸化炭素濃度が変動して、ある範囲内の二酸化炭素濃度が一定期間以上、濃度測定部３０で測定されたときでよい。閾値が更新されることによって、音声処理装置１０と使用者の口元との距離の変動、使用環境中の二酸化炭素濃度の変動があっても、音声信号制御部４０は、高精度に使用者の発声状態を判定することができる。例えば、濃度測定部３０において０．４％程度の二酸化炭素濃度が連続して１分間以上測定された場合は、使用環境の空気中の二酸化炭素濃度が０．４％程度であると判断し、使用者が発声しているタイミング、或いは、使用者が指定する任意のタイミングで二酸化炭素濃度を測定し、これを基に使用者の発声状態を判定する二酸化炭素濃度の閾値が定められてもよい。 The threshold value stored in the density storage unit 100 is updated at preset timing or arbitrary timing designated by the user. The preset timing may be determined at regular intervals. Also, the preset timing may be when the voice processing device 10 is powered on. Also, the preset timing may be when the distance between the user, who is the speaker, and the microphone 20 fluctuates and the volume of the voice in the input information exceeds a preset threshold. The preset timing may be when the distance between the user and the microphone 20 fluctuates and the carbon dioxide concentration measured by the concentration measuring unit 30 among the input information exceeds a preset threshold value. Also, the preset timing may be when the carbon dioxide concentration in the usage environment fluctuates and the carbon dioxide concentration within a certain range is measured by the concentration measuring unit 30 for a certain period of time or longer. By updating the threshold, even if there is a change in the distance between the sound processing device 10 and the user's mouth and a change in the carbon dioxide concentration in the usage environment, the sound signal control unit 40 can accurately detect the user's A vocalization state can be determined. For example, when a carbon dioxide concentration of about 0.4% is continuously measured by the concentration measuring unit 30 for one minute or more, it is determined that the carbon dioxide concentration in the air in the usage environment is about 0.4%, The carbon dioxide concentration may be measured at the timing when the user is speaking or at any timing specified by the user, and a carbon dioxide concentration threshold may be determined based on which the user's speaking state is determined. .

（濃度記憶部）
濃度記憶部１００は、使用者の発声状態を判定する二酸化炭素濃度の閾値を記憶することが可能なものであれば特に制限されない。例えば、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）やＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）に代表される半導体メモリ等の主記憶装置や、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）に代表される磁気ディスクや、ＣＤ－ＲＯＭやＤＶＤ－ＲＯＭに代表される光ディスク、ＵＳＢメモリや各種メモリーカード、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）に搭載されるフラッシュメモリ等の補助記憶装置を適用することができる。 (Concentration storage unit)
The concentration storage unit 100 is not particularly limited as long as it can store a carbon dioxide concentration threshold for determining the vocalization state of the user. For example, main storage devices such as semiconductor memories typified by DRAM (Dynamic Random Access Memory) and SRAM (Static Random Access Memory), magnetic disks typified by HDD (Hard Disk Drive), CD-ROMs and DVD- Auxiliary storage devices such as optical discs represented by ROMs, USB memories, various memory cards, and flash memories mounted on SSDs (Solid State Drives) can be applied.

図５は、音声処理装置１０の、処理手順の一例を示すフローチャートである。音声処理装置１０では、まず、使用者の発声状態を判定する二酸化炭素濃度の閾値を更新するタイミングかどうかを判定する（ステップＳ１１）。 FIG. 5 is a flowchart showing an example of a processing procedure of the speech processing device 10. As shown in FIG. The speech processing device 10 first determines whether or not it is time to update the carbon dioxide concentration threshold for determining the vocalization state of the user (step S11).

そして、二酸化炭素濃度の閾値を更新するタイミングだった場合（ステップＳ１１のＹＥＳ）は、濃度記憶部１００に記憶されている二酸化炭素濃度の閾値を消去する（ステップＳ１２）。一方、二酸化炭素濃度の閾値を更新するタイミングではない場合（ステップＳ１１のＮＯ）、二酸化炭素濃度の閾値が記憶されているかどうかを判定する（ステップＳ１３）。 If it is time to update the carbon dioxide concentration threshold (YES in step S11), the carbon dioxide concentration threshold stored in the concentration storage unit 100 is deleted (step S12). On the other hand, if it is not the timing to update the carbon dioxide concentration threshold (NO in step S11), it is determined whether or not the carbon dioxide concentration threshold is stored (step S13).

そして、濃度記憶部１００において二酸化炭素濃度の閾値が記憶されている場合（ステップＳ１３のＹＥＳ）は、第１実施形態と同様に、濃度測定部３０において測定された二酸化炭素濃度に応じた処理が実施される（ステップＳ１６からステップＳ２１）。一方、濃度記憶部１００において二酸化炭素濃度の閾値が記憶されていない場合（ステップＳ１３のＮＯ）は、予め設定したタイミング、或いは、使用者によって指定される任意のタイミングで、濃度測定部３０において二酸化炭素濃度を測定する（ステップＳ１４）。予め設定されたタイミングは、使用者がマイクロフォン２０に対して発声し、入力情報のうちの音声の音量が予め設定した閾値を超えたときでよい。 Then, when the threshold value of the carbon dioxide concentration is stored in the concentration storage unit 100 (YES in step S13), the process corresponding to the carbon dioxide concentration measured by the concentration measurement unit 30 is performed as in the first embodiment. (Step S16 to Step S21). On the other hand, if the concentration storage unit 100 does not store a carbon dioxide concentration threshold value (NO in step S13), the concentration measurement unit 30 can store carbon dioxide at a preset timing or at an arbitrary timing designated by the user. Carbon concentration is measured (step S14). The preset timing may be when the user speaks into the microphone 20 and the sound volume of the input information exceeds a preset threshold.

そして、ステップＳ１４において測定された二酸化炭素濃度を基に、使用者の発声状態を判定する二酸化炭素濃度の閾値を濃度記憶部１００に記憶する（ステップＳ１５）。 Then, based on the carbon dioxide concentration measured in step S14, the carbon dioxide concentration threshold for determining the vocalization state of the user is stored in the concentration storage unit 100 (step S15).

ところで、第１実施形態および第２実施形態にかかる音声処理装置１０において、音声処理装置１０の各構成要素および機能は再配置可能であってよい。例えば、音声処理装置１０、特に音声信号処理部５０の構成および機能の一部または全部を、他の装置に包含させてもよい。 By the way, in the speech processing device 10 according to the first embodiment and the second embodiment, each component and function of the speech processing device 10 may be rearranged. For example, part or all of the configuration and functions of the audio processing device 10, particularly the audio signal processing section 50, may be included in another device.

（音声処理システム）
つづいて、本実施形態に係る音声処理システムを説明する。
図６は、本実施形態に係る音声処理システム２００の一例を示す概略構成図である。音声処理システム２００は、ハウリング抑制をするために音声信号に対する音声処理を実行する。ここで、音声処理システム２００は、使用者の呼気を検出し、入力情報が使用者の肉声であるか否かを判定する。音声処理システム２００は、入力情報についての判定結果に応じて、実行する音声処理を変更する。
音声処理システム２００は、例えば、カラオケ店舗でのカラオケシステム、コンサートホールでの音響設備等に搭載されてよい。 (Voice processing system)
Next, the voice processing system according to this embodiment will be described.
FIG. 6 is a schematic configuration diagram showing an example of the audio processing system 200 according to this embodiment. The audio processing system 200 performs audio processing on audio signals to suppress howling. Here, the speech processing system 200 detects the user's exhalation and determines whether or not the input information is the user's natural voice. The speech processing system 200 changes speech processing to be executed according to the determination result of the input information.
The audio processing system 200 may be installed, for example, in a karaoke system in a karaoke store, audio equipment in a concert hall, or the like.

本実施形態における音声信号処理システムは、音声を収音する収音装置２１０と、音声出力装置２２０とを備える。
また、収音装置２１０は、外部から入力される音声に基づく音声信号を出力するマイクロフォン２０と、二酸化炭素濃度を測定する濃度測定部３０と、二酸化炭素濃度に基づいて、音声信号に施す音響効果を制御する制御信号を生成する音声信号制御部４０と、を備える。さらに、収音装置２１０および／または音声出力装置２２０が、制御信号に基づいて音声信号に音響効果を施す音声信号処理部５０を備える（図６では収音装置２１０が音声信号処理部５０を備える）。また、音声出力装置２２０が、音響効果が施された音声信号を音声として出力する音声出力部６０を備える。
本実施形態においては、音声処理システム２００は、さらに、収音装置２１０が、二酸化炭素濃度の閾値を記憶する濃度記憶部１００を備えてもよい。 The audio signal processing system in this embodiment includes a sound pickup device 210 that picks up sound and an audio output device 220 .
The sound collection device 210 also includes a microphone 20 that outputs an audio signal based on a sound input from the outside, a concentration measurement unit 30 that measures the carbon dioxide concentration, and a sound effect applied to the audio signal based on the carbon dioxide concentration. and an audio signal control unit 40 that generates a control signal for controlling the . Furthermore, the sound pickup device 210 and/or the audio output device 220 include an audio signal processing unit 50 that applies sound effects to the audio signal based on the control signal (in FIG. 6, the sound pickup device 210 includes the audio signal processing unit 50). ). The audio output device 220 also includes an audio output unit 60 that outputs an audio signal to which sound effects have been applied as audio.
In the present embodiment, the sound processing system 200 may further include a concentration storage unit 100 in which the sound collection device 210 stores the carbon dioxide concentration threshold.

本実施形態において、収音装置２１０が備える、マイクロフォン２０、濃度測定部３０および音声信号制御部４０、濃度記憶部１００、収音装置２１０および／または音声出力装置２２０が備える音声信号処理部５０、並びに、音声出力装置２２０が備える音声出力部６０は、上記の本実施形態の音声処理装置１０の各構成要素と同様にすることができる。 In the present embodiment, the microphone 20, the concentration measurement unit 30 and the audio signal control unit 40 provided in the sound collection device 210, the concentration storage unit 100, the sound collection device 210 and/or the sound signal processing unit 50 provided in the sound output device 220, In addition, the audio output unit 60 included in the audio output device 220 can be the same as each component of the audio processing device 10 of the present embodiment described above.

また、収音装置２１０および音声出力装置２２０は、例えばネットワーク２３０を介して情報を送受信可能である。ネットワーク２３０は、例えば無線または有線の任意の情報伝達経路を含んでよい。 Also, the sound collection device 210 and the audio output device 220 can transmit and receive information via the network 230, for example. Network 230 may include any communication path, for example wireless or wired.

以上、本実施形態の音声処理システム２００によれば、ハウリングの発生しやすさに応じて効果的にハウリング抑制を実行することができる。 As described above, according to the audio processing system 200 of the present embodiment, it is possible to effectively suppress howling according to how likely it is to occur.

本発明を諸図面や実施形態に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。したがって、これらの変形や修正は本発明の範囲に含まれることに留意されたい。例えば、各手段、各ステップ等に含まれる機能等は論理的に矛盾しないように再配置可能であり、複数の手段やステップ等を１つに組み合わせたり、あるいは分割したりすることが可能である。
さらに、本発明は、ハウリング抑制以外の目的にも用いてもよい。 Although the present invention has been described with reference to drawings and embodiments, it should be noted that those skilled in the art can easily make various changes and modifications based on this disclosure. Therefore, please note that these variations and modifications are included in the scope of the present invention. For example, the functions included in each means, each step, etc. can be rearranged so as not to be logically inconsistent, and it is possible to combine a plurality of means, steps, etc. into one or divide them. .
Furthermore, the present invention may be used for purposes other than howling suppression.

１０音声処理装置
２０マイクロフォン
３０濃度測定部
４０音声信号制御部
５０音声信号処理部
６０音声出力部
１００濃度記憶部
２００音声処理システム
２１０収音装置
２２０音声出力装置
２３０ネットワーク
10 Audio processing device 20 Microphone 30 Density measurement unit 40 Audio signal control unit 50 Audio signal processing unit 60 Audio output unit 100 Density storage unit 200 Audio processing system 210 Sound pickup device 220 Audio output device 230 Network

Claims

a microphone that outputs an audio signal based on audio input from the outside;
a concentration measuring unit that measures carbon dioxide concentration;
an audio signal control unit that generates a control signal for controlling a sound effect applied to the audio signal based on the carbon dioxide concentration;
an audio signal processing unit that applies the sound effect to the audio signal based on the control signal ,
The audio signal control unit determines whether or not the user is speaking based on the carbon dioxide concentration, generates the control signal according to whether the user is speaking,
The audio signal processing unit applies a howling suppression effect in a first frequency band to the audio signal when it is determined that the user is speaking, and it is determined that the user is not speaking. and applying a howling suppression effect to the audio signal in a second frequency band wider than the first frequency band .

further comprising a concentration storage unit that stores the threshold value of the carbon dioxide concentration;
2. The speech processing apparatus according to claim 1 , wherein said speech signal control section compares said carbon dioxide concentration with said threshold to determine whether said user is speaking.

3. The audio processing device according to claim 1 , further comprising an audio output unit that outputs, as audio, the audio signal to which the sound effect has been applied.

a step of outputting an audio signal based on an externally input audio;
measuring carbon dioxide concentration;
generating a control signal for controlling a sound effect applied to the audio signal based on the carbon dioxide concentration;
applying the sound effect to the audio signal based on the control signal ;
The step of generating the control signal includes determining whether or not the user is vocalizing based on the carbon dioxide concentration, generating the control signal according to whether the user is vocalizing,
The step of applying a sound effect applies a howling suppression effect in a first frequency band to the audio signal when it is determined that the user is speaking, and when it is determined that the user is not speaking. and applying a howling suppression effect to the audio signal in a second frequency band wider than the first frequency band .

A sound processing system comprising a sound collecting device for collecting sound and a sound output device,
The sound collecting device includes a microphone that outputs an audio signal based on a sound input from the outside, a concentration measurement unit that measures the carbon dioxide concentration, and a sound effect applied to the audio signal based on the carbon dioxide concentration. and an audio signal control unit that generates a control signal to
The sound collecting device and/or the audio output device comprises an audio signal processing unit that applies the sound effect to the audio signal based on the control signal,
The audio output device includes an audio output unit that outputs the audio signal to which the sound effect has been applied as audio,
The audio signal control unit determines whether or not the user is speaking based on the carbon dioxide concentration, generates the control signal according to whether the user is speaking,
The audio signal processing unit applies a howling suppression effect in a first frequency band to the audio signal when it is determined that the user is speaking, and it is determined that the user is not speaking. and applying a howling suppression effect to the audio signal in a second frequency band wider than the first frequency band .