WO2016067644A1 - Speech adjustment device - Google Patents

Speech adjustment device Download PDF

Info

Publication number
WO2016067644A1
WO2016067644A1 PCT/JP2015/055093 JP2015055093W WO2016067644A1 WO 2016067644 A1 WO2016067644 A1 WO 2016067644A1 JP 2015055093 W JP2015055093 W JP 2015055093W WO 2016067644 A1 WO2016067644 A1 WO 2016067644A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
signal
time
unit
voice
Prior art date
Application number
PCT/JP2015/055093
Other languages
French (fr)
Japanese (ja)
Inventor
中村 圭介
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2016067644A1 publication Critical patent/WO2016067644A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Definitions

  • the present invention relates to a sound adjustment device.
  • voice recognition if the microphone sensitivity is too low, the necessary voice signal cannot be obtained sufficiently and voice recognition cannot be performed correctly.
  • speech recognition even if the sensitivity of the microphone is too high, distortion of the speech signal and ambient noise are picked up and speech recognition cannot be performed correctly.
  • the speech recognition apparatus disclosed in Patent Document 1 detects an average speech speech included in an input speech signal in a speech speech interval after detecting a time interval in which the user speech speech is included in the input speech data as a speech speech interval.
  • the level and the level of noise included in the input voice signal in the time interval other than the utterance voice interval are calculated, the input voice signal level is estimated from the noise level and the average utterance voice level, and the input voice
  • the gain of the input amplifier is set so that the level obtained by amplifying the signal level with the input amplifier becomes a level suitable for speech recognition.
  • the voice recognition device of Patent Document 1 includes a talk switch, and determines that the period from when the user presses the talk switch down to the end of the utterance voice section is the section including the user's utterance voice. Depressing the talk switch for each input has the problem of complicating user operations.
  • voice input data includes a user's speech signal and a noise signal, and it is difficult to determine when only the user's speech signal has ended.
  • the present invention has been made in view of the above problems, and an object of the present invention is to appropriately adjust the input gain and output gain of a sound signal in accordance with the use environment and improve the sound recognition rate. Is to provide.
  • the voice adjustment device of the present invention includes a microphone signal input unit that converts an electrical signal input from a microphone into a voice signal, a voice signal strength measurement unit that measures the strength of the voice signal and compares it with a preset voice strength threshold. , A silent time measurement unit that measures a silent period in which a silent state where the voice signal is smaller than a voice intensity threshold continues, and a voiced time that measures a voiced time in which a voice signal whose voice signal is greater than the voice intensity threshold continues, A measurement unit and an audio adjustment unit that adjusts the audio signal by comparing the silent time or the sound time with a preset time threshold value are provided.
  • the audio adjustment unit includes a microphone sensitivity adjustment unit for adjusting the input gain of the microphone signal input unit, and the input gain is set when the silence time is longer than a preset silence time threshold. The input gain is lowered when the sound duration is longer than a preset sound duration threshold.
  • the audio adjustment unit includes a microphone sensitivity adjustment unit for adjusting the input gain of the microphone signal input unit, a speaker signal output unit that outputs an audio signal to the speaker, and a speaker signal output unit.
  • a speaker volume adjustment unit for adjusting the output gain of the sound, and when the silence time is longer than a preset silent time threshold, the input gain is increased and the output gain is lowered to set the voice time with a preset voice time. When it is longer than the threshold value, the input gain is decreased and the output gain is increased.
  • the input gain and output gain of the audio signal are appropriately adjusted according to the usage environment, and the voice recognition rate is increased.
  • An audio adjustment device that can be improved can be provided.
  • FIG. 1 is a basic configuration diagram of an audio adjustment device of the present invention. It is a block diagram of the audio
  • FIG. It is a principle figure which shows the adjustment method of the microphone sensitivity using a silence time. It is a principle figure which shows the adjustment method of the microphone sensitivity using sound time.
  • FIG. 6 is a flowchart illustrating a microphone sensitivity adjustment step according to the first embodiment. It is a block diagram of the audio
  • FIG. It is a flowchart which shows the adjustment step of the speaker volume of Example 2. It is a block diagram of the audio
  • FIG. 1 shows a basic configuration of the sound adjustment device 20 of the present invention.
  • the voice adjustment device 20 of the present invention includes a microphone signal input unit 31 that converts an electric signal input from the microphone 10 into a voice signal, and a voice signal strength measurement that measures the strength of the voice signal and compares it with a preset voice strength threshold.
  • the unit 32 is provided.
  • the sound adjusting device 20 includes a silent time measuring unit 33 that measures a silent time during which a silent state in which the voice signal is smaller than the voice strength threshold continues, and a voiced state in which the voice signal is greater than the voice strength threshold.
  • a voiced time measuring unit 34 for measuring the sound time and a voice adjusting unit 40 for adjusting the voice signal by comparing the silent time or the voiced time with a preset time threshold value are provided.
  • the time that humans can speak continuously by conversation is considered to be about 5 seconds at most because of the restriction of breathing, and it is almost impossible to continue speaking without breathing for about 15 seconds. Therefore, in the audio adjusting device 20 of the present invention, when the audio signal continues in a state (sounding state) larger than the predetermined audio intensity threshold for an arbitrary fixed time (5 to 10 seconds), the audio The signal is regarded not as human speech but ambient noise, and the speech signal is adjusted so as to reduce the influence of such noise on speech recognition.
  • FIG. 2 shows a configuration of the sound adjustment device 21 according to the first embodiment.
  • the voice adjustment device 21 according to the first embodiment is a voice adjustment device mounted on a robot or an information terminal having a voice output function such as a voice recognition function or voice synthesis, and a microphone according to the magnitude of ambient noise in the usage environment.
  • the voice recognition function is improved by adjusting the sensitivity (input gain).
  • the sound adjustment device 21 receives a microphone signal input unit 31 that inputs an electrical signal from the microphone 10 and converts it into a sound signal, and measures the strength of the sound signal and compares it with a preset sound intensity threshold. And an audio signal intensity measuring unit 32.
  • the sound adjustment device 21 includes a silence time measuring unit 33 that measures a silence period in which a silence state in which the sound signal is smaller than the sound intensity threshold continues, and a sound condition in which the sound signal is greater than the sound intensity threshold continues.
  • a sound duration measuring unit 34 for measuring the sound duration is provided.
  • a microphone 10 that converts external sound into an electrical signal is connected to the sound adjustment device 21, and a sound signal that has been digitized via the microphone signal input unit 31 is passed to the sound signal intensity measurement unit 32. .
  • the audio signal intensity measuring unit 32 has, for example, a preset audio intensity threshold value, and compares the audio intensity threshold value with the current audio signal value to determine the presence or absence of audio.
  • FIG. 3 is an explanatory diagram for adjusting the microphone sensitivity (input gain) when the audio signal intensity measurement unit 32 determines that the audio signal is silent.
  • the silent time measuring unit 33 measures the silent time (elapsed time in the silent state).
  • the microphone sensitivity of the microphone 10 is increased by the microphone sensitivity adjustment unit 41.
  • the silent time threshold is specifically about 10 to 60 seconds, and the microphone sensitivity increase rate is preferably about 1 to 5%.
  • the microphone sensitivity adjustment unit 41 adjusts the microphone sensitivity gradually in a quiet environment by setting the silent time threshold and the microphone sensitivity increase rate as described above.
  • the microphone sensitivity is increased by several percent. If the silent time further continues from that time by the silent time threshold (T12), the microphone sensitivity is increased by several percent again. This is repeated throughout the silence period.
  • silence time threshold (T11) and the silence time threshold (T12) are not necessarily equal. Further, it is not necessary to make the rate of increase of each microphone sensitivity equal.
  • the microphone sensitivity rises the silence time will not continue for a long time, and the microphone sensitivity will settle down to a certain level suitable for the usage environment.
  • FIG. 4 is an explanatory diagram for adjusting the microphone sensitivity (input gain) when the audio signal intensity measurement unit 32 determines that the audio signal is sound.
  • the sound time measuring unit 34 measures the sound time (elapsed time of the sound state).
  • the microphone sensitivity of the microphone 10 is lowered by the microphone sensitivity adjustment unit 41.
  • the sound duration threshold is specifically about 5 to 20 seconds, and the rate of decrease in sensitivity is preferably about 10 to 50%, and is usually set based on the time during which a person can speak with one breath.
  • the microphone sensitivity adjustment unit 41 adjusts the microphone sensitivity of the microphone to be quickly lowered under a noisy environment by setting the sound duration threshold and the microphone sensitivity increase rate as described above.
  • the microphone sensitivity is reduced by several percent to several tens of percent. Further, if the sound duration continues from that time for the sound duration threshold (T22), the microphone sensitivity is lowered again by several percent to several tens percent. This is repeated throughout the duration of the sound.
  • the sound time threshold (T21) and the sound time threshold (T22) are not necessarily equal. Further, it is not necessary to make the rate of decrease of the input gain equal in each.
  • the microphone sensitivity decreases, the sound duration will not continue as long as before, and the microphone sensitivity will settle to a certain level suitable for the usage environment.
  • FIG. 5 shows a processing flow for adjusting the microphone sensitivity (input gain) in the sound adjustment device 21 of the first embodiment.
  • the processing flow will be described with reference to FIG. S11.
  • S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound.
  • the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
  • the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S15). S15. When the silent time is longer than the silent time threshold, it is determined that there is a margin in microphone sensitivity in a quiet environment with no noise around the microphone, and the microphone sensitivity of the microphone signal input unit 31 is increased by an arbitrary ratio. The process returns to the start of the sound adjustment process (S11). S16. When the voice signal strength measuring unit 32 determines that there is sound (there is a voice signal), the voiced time measuring unit 34 measures the voiced time (the duration of a section where the voice signal is present). S17.
  • the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity. If the sound time is longer than the sound time threshold, the process proceeds to the next process (S18). S18. If the sound time is longer than the sound time threshold, it is determined that there is constant noise around the sound, and the microphone sensitivity of the microphone 10 is lowered by an arbitrary ratio by the microphone signal input unit 31, and then the sound adjustment processing Return to the start (S11).
  • the present invention is characterized in that the length of the silence state and the sound state is measured from the voice signal and adjusted to a voice signal suitable for voice recognition. That is, according to the sound adjustment device 21 of the first embodiment, the ambient noise and the human conversation are distinguished by the length of time during which the silent time or the sound time continues, and the optimum microphone sensitivity is obtained according to the use environment. Can be set.
  • FIG. 6 shows the configuration of the audio adjustment device 22 according to the second embodiment.
  • the audio adjusting device 22 according to the second embodiment includes a speaker volume adjusting unit 42 and a speaker signal output unit 43 as the audio adjusting unit 40, and the volume of the speaker 50 (depending on the usage environment such as a quiet environment or a noisy environment with a lot of noise). This is an audio adjustment device that adjusts the output gain easily.
  • an audio signal is input through the microphone 10 and the microphone signal input unit 31, the intensity of the audio signal is measured by the audio signal intensity measurement unit 32, and the silent time measurement unit 33 and the sound time measurement unit 34 are measured.
  • the process until the silent time and the voiced time are measured is the same as that of the sound adjustment device 21 of the first embodiment.
  • the speaker volume adjustment unit 42 controls the volume of the speaker 50.
  • the silent time threshold is preferably about 10 to 60 seconds
  • the volume decrease rate is preferably about 1 to 5%.
  • the speaker volume adjustment unit 42 adjusts the volume of the speaker 50 to gradually decrease in a quiet environment by setting the silent time threshold and the volume decrease rate.
  • the speaker volume adjustment unit 42 increases the volume (output gain) of the speaker 50 by the speaker volume adjustment unit 42 when the sound duration is longer than an arbitrary sound duration threshold set in advance.
  • the sound duration threshold is specifically about 5 to 20 seconds, and the rate of increase in volume is preferably about 10 to 50%, and is usually set based on the time during which a person can speak with one breath.
  • the speaker volume adjustment unit 42 adjusts the volume of the speaker 50 to be quickly increased in a noisy environment by setting the silent time threshold and the volume decrease rate.
  • FIG. 7 shows a processing flow for adjusting the speaker volume (output gain) in the sound adjustment device 22 of the second embodiment.
  • the processing flow will be described with reference to FIG. S11.
  • S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound.
  • the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
  • the process returns to the start of the sound adjustment processing (S11) without changing the volume of the speaker. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S21). S21. If the silent time is longer than the silent time threshold, it is determined that the environment is quiet and there is no noise, the volume (output gain) of the speaker 50 is lowered at an arbitrary rate, and then the voice adjustment process is started. Return to (S11). S16. When the voice signal strength measuring unit 32 determines that there is sound (there is a voice signal), the voiced time measuring unit 34 measures the voiced time (the duration of a section in which the voice signal is present). S17.
  • the process proceeds to the next process (S22). S22. If the sound time is longer than the sound time threshold, it is determined that the sound is ambient noise, the sound volume (output gain) of the speaker 50 is increased by an arbitrary ratio, and then the sound adjustment process is started (S11). Return to.
  • the volume (output gain) of the speaker 50 is adjusted according to the intensity of ambient noise, the volume of human conversation is easily adjusted according to the usage environment. be able to.
  • FIG. 8 shows the configuration of the sound adjustment device 23 according to the third embodiment.
  • the sound adjustment device 23 according to the third embodiment is a combination of the sound adjustment device 21 according to the first embodiment and the sound adjustment device 22 according to the second embodiment.
  • the voice adjustment device 23 according to the third embodiment adjusts the sensitivity (input gain) of the microphone so that the voice recognition function is in a good state in a quiet environment or a noisy environment where there is a lot of noise.
  • This is an audio adjustment device that adjusts the volume (output gain) of a speaker so that it can be easily heard.
  • the voice adjustment device 23 of the third embodiment has the configuration of the voice adjustment device 21 of the first embodiment and the configuration of the voice adjustment device 22 of the second embodiment.
  • the details of each component of the sound adjustment device 23 are the same as those in the first and second embodiments, and thus the description thereof is omitted.
  • FIG. 9 shows a processing flow of sound adjustment of the microphone and the speaker by the sound adjustment device 23 of the third embodiment.
  • the processing flow will be described with reference to FIG. S11.
  • S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound. S13.
  • the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
  • the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity and the speaker volume. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S15). S15. If the silent time is longer than the silent time threshold, it is determined that the microphone sensitivity is sufficient in a quiet environment with no surrounding noise, and the microphone sensitivity (input sensitivity) is increased by an arbitrary ratio. After lowering the speaker volume (output gain) at an arbitrary rate, the process returns to the start of the audio adjustment process (S11). S16.
  • the voiced time measuring unit 34 measures the voiced time (the duration of a section in which the voice signal is present). S17. If the sound duration is shorter than an arbitrary sound duration threshold, it is determined that the conversation is normal, and the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity and the speaker volume. If the sound time is longer than the sound time threshold, the process proceeds to the next process (S18). S18. If the sound time is longer than the sound time threshold, it is determined that there is constant noise around the sound, and the microphone sensitivity (input sensitivity) is reduced by an arbitrary ratio. ) At an arbitrary rate, and then the process returns to the start of the audio adjustment process (S11).
  • the ambient noise and the human conversation are distinguished based on the length of time during which the silent time or the voiced time lasts. Since the speaker volume is set, it is possible to easily adjust the volume of human conversation according to the usage environment. In addition, by providing the voice adjustment device 23 of the present invention, it is possible to improve the voice recognition rate in a device having a voice recognition function.
  • the present invention can be used in any device having a voice recognition function and a voice response function.

Abstract

Provided is a speech adjustment device capable of improving a speech recognition rate by suitably adjusting the input gain and the output gain of a speech signal responsive to usage environment. This speech adjustment device (20) is provided with a speech adjustment unit (40) for adjusting a speech signal by comparing a soundless period or a sound period with a predetermined time threshold, the soundless period being the continuation of a soundless state where the intensity of the speech signal is less than a speech intensity threshold, and the sound period being the continuation of a sound state where the intensity of the speech signal is greater than the speech intensity threshold.

Description

音声調整装置Audio adjustment device
 本発明は、音声調整装置に関するものである。 The present invention relates to a sound adjustment device.
 近年では、ロボットやスマートフォンなど、様々な機器に音声認識機能が搭載され、使用者との音声対話により指示を実行する機器も増えつつある。このような音声対話型の機器では、音声認識率を向上させるために、マイクロフォンの感度(入力ゲイン)を周囲の雑音の状況に応じて逐次自動調整することが望ましい。 In recent years, various devices such as robots and smartphones are equipped with a voice recognition function, and the number of devices that execute instructions by voice dialogue with users is increasing. In such a voice interactive device, in order to improve the voice recognition rate, it is desirable that the sensitivity (input gain) of the microphone is automatically and sequentially adjusted according to the surrounding noise conditions.
 例えば、音声認識において、マイクロフォンの感度が低くすぎると、必要な音声信号が十分に得られず正しく音声認識できない。また、音声認識において、マイクロフォンの感度が高すぎても、音声信号の歪みや周囲の雑音を拾ってしまい正しく音声認識できない。 For example, in voice recognition, if the microphone sensitivity is too low, the necessary voice signal cannot be obtained sufficiently and voice recognition cannot be performed correctly. In speech recognition, even if the sensitivity of the microphone is too high, distortion of the speech signal and ambient noise are picked up and speech recognition cannot be performed correctly.
 そこで、マイクロフォンの感度を調整して適切な状態に維持する音声調整装置が実用化されている。 Therefore, an audio adjustment device that adjusts the sensitivity of the microphone and maintains it in an appropriate state has been put into practical use.
 例えば、特許文献1の音声認識装置は、入力音声データにユーザの発話音声が含まれる時間区間を発話音声区間として検出したうえで、発話音声区間において入力音声信号に含まれる発話音声の平均的なレベルと、発話音声区間以外の時間区間において入力音声信号に含まれる騒音のレベルとをそれぞれ算出し、上記騒音のレベルと平均的な発話音声レベルとにより入力音声信号のレベルを推定し、入力音声信号のレベルを入力アンプで増幅したレベルが音声認識に適合したレベルとなるように、入力アンプのゲインを設定する。 For example, the speech recognition apparatus disclosed in Patent Document 1 detects an average speech speech included in an input speech signal in a speech speech interval after detecting a time interval in which the user speech speech is included in the input speech data as a speech speech interval. The level and the level of noise included in the input voice signal in the time interval other than the utterance voice interval are calculated, the input voice signal level is estimated from the noise level and the average utterance voice level, and the input voice The gain of the input amplifier is set so that the level obtained by amplifying the signal level with the input amplifier becomes a level suitable for speech recognition.
日本国特許公報「特許第5457293号公報(登録日2014年1月17日)」Japanese Patent Gazette “Patent No. 5457293 (Registered on January 17, 2014)”
 しかしながら、特許文献1の音声認識装置では、入力音声データにユーザの発話音声が含まれる区間とそれ以外の区間とを正確に判別する必要があるが、このような発話音声の判別は一般的に非常に困難である。 However, in the speech recognition apparatus of Patent Document 1, it is necessary to accurately discriminate between a section in which the user's utterance voice is included in the input voice data and other sections, but such utterance voice is generally discriminated. It is very difficult.
 特許文献1の音声認識装置では、トークスイッチを備えており、ユーザのトークスイッチの押し下げ時点から発話音声区間の終了時点までをユーザの発話音声が含まれる区間と判別しているが、ユーザが音声入力毎にトークスイッチを押し下げることは、ユーザの操作を煩雑にさせるという問題がある。 The voice recognition device of Patent Document 1 includes a talk switch, and determines that the period from when the user presses the talk switch down to the end of the utterance voice section is the section including the user's utterance voice. Depressing the talk switch for each input has the problem of complicating user operations.
 また、トークスイッチを備えない一般的な音声認識装置では、別の手段で発話音声区間の開始時点を判定する必要がある。さらに、一般的に音声入力データには、ユーザの発話音声信号と騒音の信号とが混在しており、そのうちユーザの発話音声信号のみが終了した時点を判定することは困難である。 Further, in a general voice recognition apparatus that does not include a talk switch, it is necessary to determine the start time of the speech voice section by another means. Furthermore, in general, voice input data includes a user's speech signal and a noise signal, and it is difficult to determine when only the user's speech signal has ended.
 特許文献1の音声認識装置では、入力音声データにユーザの発話音声が含まれる区間とそれ以外の区間との判別が正確ではない場合、発話音声区間において入力音声信号に含まれる発話音声の平均的なレベルと、発話音声区間以外の時間区間において入力音声信号に含まれる騒音のレベルとをそれぞれ誤って算出してしまう虞があり、入力音声信号のレベルを使用環境に合わせて適切に調整できないという問題があった。 In the speech recognition apparatus disclosed in Patent Document 1, when the input speech data is not accurately discriminated between the section in which the user's speech voice is included and the other sections, the average speech speech included in the input speech signal in the speech speech section And the noise level included in the input voice signal in a time period other than the speech voice period may be erroneously calculated, and the input voice signal level cannot be adjusted appropriately according to the usage environment. There was a problem.
 本発明は上記の課題に鑑みてなされたものであり、その目的は、音声信号の入力ゲインや出力ゲインを使用環境に合わせて適切に調整し、音声認識率を向上させることができる音声調整装置を提供することである。 The present invention has been made in view of the above problems, and an object of the present invention is to appropriately adjust the input gain and output gain of a sound signal in accordance with the use environment and improve the sound recognition rate. Is to provide.
 本発明の音声調整装置は、マイクから入力された電気信号を音声信号に変換するマイク信号入力部と、音声信号の強度を測定して予め設定した音声強度閾値と比較する音声信号強度測定部と、音声信号が音声強度閾値よりも小さい無音状態が継続する無音時間を計測する無音時間測定部と、音声信号が音声強度閾値よりも大きい有音状態が継続する有音時間を計測する有音時間測定部と、無音時間または有音時間を予め設定した時間閾値と比較して音声信号を調整する音声調整部とを備えたことを特徴とする。 The voice adjustment device of the present invention includes a microphone signal input unit that converts an electrical signal input from a microphone into a voice signal, a voice signal strength measurement unit that measures the strength of the voice signal and compares it with a preset voice strength threshold. , A silent time measurement unit that measures a silent period in which a silent state where the voice signal is smaller than a voice intensity threshold continues, and a voiced time that measures a voiced time in which a voice signal whose voice signal is greater than the voice intensity threshold continues A measurement unit and an audio adjustment unit that adjusts the audio signal by comparing the silent time or the sound time with a preset time threshold value are provided.
 また、本発明の音声調整装置において、音声調整部は、マイク信号入力部の入力ゲインを調整するためのマイク感度調整部を備え、無音時間が予め設定した無音時間閾値より長いときに入力ゲインを上げて、有音時間が予め設定した有音時間閾値より長いときに入力ゲインを下げることを特徴とする。 In the audio adjustment device of the present invention, the audio adjustment unit includes a microphone sensitivity adjustment unit for adjusting the input gain of the microphone signal input unit, and the input gain is set when the silence time is longer than a preset silence time threshold. The input gain is lowered when the sound duration is longer than a preset sound duration threshold.
 また、本発明の音声調整装置において、音声調整部は、スピーカに音声信号を出力するスピーカ信号出力部と、スピーカ信号出力部の出力ゲインを調整するためのスピーカ音量調整部とを備え、無音時間が予め設定した無音時間閾値より長いときに出力ゲインを下げて、有音時間が予め設定した有音時間閾値より長いときに出力ゲインを上げることを特徴とする。 In the audio adjustment device of the present invention, the audio adjustment unit includes a speaker signal output unit that outputs an audio signal to the speaker, and a speaker volume adjustment unit for adjusting the output gain of the speaker signal output unit, and is a silent time The output gain is lowered when the time is longer than the preset silent time threshold, and the output gain is increased when the voiced time is longer than the preset voiced time threshold.
 また、本発明の音声調整装置において、音声調整部は、マイク信号入力部の入力ゲインを調整するためのマイク感度調整部と、スピーカに音声信号を出力するスピーカ信号出力部と、スピーカ信号出力部の出力ゲインを調整するためのスピーカ音量調整部とを備え、無音時間が予め設定した無音時間閾値より長いときに入力ゲインを上げるとともに出力ゲインを下げて、有音時間が予め設定した有音時間閾値より長いときに入力ゲインを下げるとともに出力ゲインを上げることを特徴とする。 In the audio adjustment device of the present invention, the audio adjustment unit includes a microphone sensitivity adjustment unit for adjusting the input gain of the microphone signal input unit, a speaker signal output unit that outputs an audio signal to the speaker, and a speaker signal output unit. A speaker volume adjustment unit for adjusting the output gain of the sound, and when the silence time is longer than a preset silent time threshold, the input gain is increased and the output gain is lowered to set the voice time with a preset voice time. When it is longer than the threshold value, the input gain is decreased and the output gain is increased.
 本発明によれば、有音時間と無音時間とに基づき入力ゲインや出力ゲインの調整を行うことで、音声信号の入力ゲインや出力ゲインを使用環境に合わせて適切に調整し、音声認識率を向上させることができる音声調整装置を提供することができる。 According to the present invention, by adjusting the input gain and output gain based on the voiced time and silent time, the input gain and output gain of the audio signal are appropriately adjusted according to the usage environment, and the voice recognition rate is increased. An audio adjustment device that can be improved can be provided.
本発明の音声調整装置の基本構成図である。1 is a basic configuration diagram of an audio adjustment device of the present invention. 実施例1の音声調整装置の構成図である。It is a block diagram of the audio | voice adjustment apparatus of Example 1. FIG. 無音時間を利用したマイク感度の調整方法を示す原理図である。It is a principle figure which shows the adjustment method of the microphone sensitivity using a silence time. 有音時間を利用したマイク感度の調整方法を示す原理図である。It is a principle figure which shows the adjustment method of the microphone sensitivity using sound time. 実施例1のマイク感度の調整ステップを示すフロー図である。FIG. 6 is a flowchart illustrating a microphone sensitivity adjustment step according to the first embodiment. 実施例2の音声調整装置の構成図である。It is a block diagram of the audio | voice adjustment apparatus of Example 2. FIG. 実施例2のスピーカ音量の調整ステップを示すフロー図である。It is a flowchart which shows the adjustment step of the speaker volume of Example 2. 実施例3の音声調整装置の構成図である。It is a block diagram of the audio | voice adjustment apparatus of Example 3. 実施例3のマイク感度及びスピーカ音量の調整ステップを示すフロー図である。It is a flowchart which shows the adjustment step of the microphone sensitivity of Example 3, and a speaker volume.
 図1に、本発明の音声調整装置20の基本構成を示す。本発明の音声調整装置20は、マイクロフォン10から入力した電気信号を音声信号に変換するマイク信号入力部31と、音声信号の強度を測定して予め設定した音声強度閾値と比較する音声信号強度測定部32を備えている。 FIG. 1 shows a basic configuration of the sound adjustment device 20 of the present invention. The voice adjustment device 20 of the present invention includes a microphone signal input unit 31 that converts an electric signal input from the microphone 10 into a voice signal, and a voice signal strength measurement that measures the strength of the voice signal and compares it with a preset voice strength threshold. The unit 32 is provided.
 また、音声調整装置20は、音声信号が音声強度閾値よりも小さい無音状態が継続する無音時間を計測する無音時間測定部33と、音声信号が音声強度閾値よりも大きい有音状態が継続する有音時間を計測する有音時間測定部34と、無音時間または有音時間を予め設定した時間閾値と比較して音声信号を調整する音声調整部40を備えている。 In addition, the sound adjusting device 20 includes a silent time measuring unit 33 that measures a silent time during which a silent state in which the voice signal is smaller than the voice strength threshold continues, and a voiced state in which the voice signal is greater than the voice strength threshold. A voiced time measuring unit 34 for measuring the sound time and a voice adjusting unit 40 for adjusting the voice signal by comparing the silent time or the voiced time with a preset time threshold value are provided.
 通常、人間が会話により連続して発話可能な時間は息継ぎの制約からせいぜい5秒程度までと考えられ、15秒程度息継ぎなしで発話し続けることはほとんど考えられない。そこで、本発明の音声調整装置20では、音声信号が所定の音声強度閾値よりも大きい状態(有音状態)で任意の一定時間(5~10秒)を超えて継続している場合、その音声信号は人間の音声ではなく周囲の雑音であるとみなし、このような雑音による音声認識への影響を低減するように音声信号を調整している。 Normally, the time that humans can speak continuously by conversation is considered to be about 5 seconds at most because of the restriction of breathing, and it is almost impossible to continue speaking without breathing for about 15 seconds. Therefore, in the audio adjusting device 20 of the present invention, when the audio signal continues in a state (sounding state) larger than the predetermined audio intensity threshold for an arbitrary fixed time (5 to 10 seconds), the audio The signal is regarded not as human speech but ambient noise, and the speech signal is adjusted so as to reduce the influence of such noise on speech recognition.
 また、本発明の音声調整装置20では、音声信号が所定の音声強度閾値よりも小さい状態(無音状態)で任意の一定時間(20~30秒)を超えて継続している場合、その音声信号は周囲の雑音を含んでいないとみなし、音声認識率が向上するように音声信号を調整している。本発明の音声調整装置20は、このような音声信号の調整により、例えば、音声認識機能を有する機器の音声認識率を向上させることができる。 Further, in the audio adjusting device 20 of the present invention, when the audio signal continues for a certain time (20 to 30 seconds) in a state (silence state) smaller than a predetermined audio intensity threshold, the audio signal Does not include ambient noise, and adjusts the audio signal so that the speech recognition rate is improved. The voice adjustment device 20 of the present invention can improve the voice recognition rate of a device having a voice recognition function, for example, by adjusting the voice signal.
 (実施例1)
 図2に、実施例1の音声調整装置21の構成を示す。実施例1の音声調整装置21は、音声認識機能や音声合成などの音声出力機能を有するロボットや情報端末に搭載される音声調整装置であり、使用環境における周囲の雑音の大きさに合わせてマイク感度(入力ゲイン)を調整して、音声認識機能を向上させるものである。
(Example 1)
FIG. 2 shows a configuration of the sound adjustment device 21 according to the first embodiment. The voice adjustment device 21 according to the first embodiment is a voice adjustment device mounted on a robot or an information terminal having a voice output function such as a voice recognition function or voice synthesis, and a microphone according to the magnitude of ambient noise in the usage environment. The voice recognition function is improved by adjusting the sensitivity (input gain).
 音声調整装置21は、図2に示すように、マイクロフォン10から電気信号を入力して音声信号に変換するマイク信号入力部31と、音声信号の強度を測定して予め設定した音声強度閾値と比較する音声信号強度測定部32を備える。 As shown in FIG. 2, the sound adjustment device 21 receives a microphone signal input unit 31 that inputs an electrical signal from the microphone 10 and converts it into a sound signal, and measures the strength of the sound signal and compares it with a preset sound intensity threshold. And an audio signal intensity measuring unit 32.
 また、音声調整装置21は、音声信号が音声強度閾値よりも小さい無音状態が継続する無音時間を計測する無音時間測定部33と、音声信号が音声強度閾値よりも大きい有音状態が継続する有音時間を計測する有音時間測定部34を備える。 In addition, the sound adjustment device 21 includes a silence time measuring unit 33 that measures a silence period in which a silence state in which the sound signal is smaller than the sound intensity threshold continues, and a sound condition in which the sound signal is greater than the sound intensity threshold continues. A sound duration measuring unit 34 for measuring the sound duration is provided.
 さらに、音声調整装置21は、音声調整部40として、無音時間または有音時間を予め設定した時間閾値と比較し、その比較結果によってマイク信号入力部31の感度(入力ゲイン)を調整するマイク感度調整部41を備えている。 Furthermore, the voice adjustment device 21 compares the silent time or the voiced time with a preset time threshold as the voice adjustment unit 40 and adjusts the sensitivity (input gain) of the microphone signal input unit 31 based on the comparison result. An adjustment unit 41 is provided.
 音声調整装置21には、外部の音を電気信号に変換するマイクロフォン10が接続されており、ここからマイク信号入力部31を介して数値化された音声信号が音声信号強度測定部32に渡される。 A microphone 10 that converts external sound into an electrical signal is connected to the sound adjustment device 21, and a sound signal that has been digitized via the microphone signal input unit 31 is passed to the sound signal intensity measurement unit 32. .
 音声信号強度測定部32は、例えば、予め設定された音声強度閾値を持っていて、その音声強度閾値と現在の音声信号の値を比較して音声の有無を判定する。 The audio signal intensity measuring unit 32 has, for example, a preset audio intensity threshold value, and compares the audio intensity threshold value with the current audio signal value to determine the presence or absence of audio.
 なお、音声信号強度測定部32は、市販製品等の音声認識処理部に内蔵されることで音声強度閾値の設定が困難な場合があり、その場合は音声認識処理部の処理状況を取得して音声信号の有無を判定してもよい。 Note that the voice signal strength measurement unit 32 may be difficult to set the voice strength threshold by being incorporated in a voice recognition processing unit such as a commercial product. In this case, the processing status of the voice recognition processing unit is acquired. The presence or absence of an audio signal may be determined.
 図3は、音声信号強度測定部32で音声信号が無音と判定された時にマイク感度(入力ゲイン)を調整する説明図である。音声信号強度測定部32により、現在の音声信号が無音とみなせると判定した場合には、無音時間測定部33にて無音時間(無音状態の経過時間)を測定する。 FIG. 3 is an explanatory diagram for adjusting the microphone sensitivity (input gain) when the audio signal intensity measurement unit 32 determines that the audio signal is silent. When the audio signal intensity measuring unit 32 determines that the current audio signal can be regarded as silent, the silent time measuring unit 33 measures the silent time (elapsed time in the silent state).
 無音時間が予め設定された無音時間閾値を超えて継続した場合は、マイク感度調整部41にてマイクロフォン10のマイク感度を上げる。この無音時間閾値は、具体的には10秒~60秒程度で、マイク感度の上昇率は1~5%程度が望ましい。マイク感度調整部41は、上記のような無音時間閾値とマイク感度の上昇率の設定により、静かな環境下においては次第にマイク感度を上げていくように調整する。 When the silent time continues beyond a preset silent time threshold, the microphone sensitivity of the microphone 10 is increased by the microphone sensitivity adjustment unit 41. The silent time threshold is specifically about 10 to 60 seconds, and the microphone sensitivity increase rate is preferably about 1 to 5%. The microphone sensitivity adjustment unit 41 adjusts the microphone sensitivity gradually in a quiet environment by setting the silent time threshold and the microphone sensitivity increase rate as described above.
 具体的には、図3に示すように、無音時間が任意の無音時間閾値(T11)だけ継続すると、マイク感度を数%上げる。そこから更に無音時間が無音時間閾値(T12)だけ継続すると、再びマイク感度を数%上げる。これを無音時間が継続している間中繰り返す。 Specifically, as shown in FIG. 3, when the silent time lasts for an arbitrary silent time threshold (T11), the microphone sensitivity is increased by several percent. If the silent time further continues from that time by the silent time threshold (T12), the microphone sensitivity is increased by several percent again. This is repeated throughout the silence period.
 なお、無音時間閾値(T11)と無音時間閾値(T12)は、必ずしも等しくする必要はない。また、それぞれのマイク感度の上昇率を等しくする必要もない。 Note that the silence time threshold (T11) and the silence time threshold (T12) are not necessarily equal. Further, it is not necessary to make the rate of increase of each microphone sensitivity equal.
 マイク感度が上昇していくと、やがて、無音時間がこれまで程長く継続しなくなり、マイク感度が使用環境に適した一定の水準に落ち着く。 As the microphone sensitivity rises, the silence time will not continue for a long time, and the microphone sensitivity will settle down to a certain level suitable for the usage environment.
 図4は、音声信号強度測定部32で音声信号が有音と判定された時にマイク感度(入力ゲイン)を調整する説明図である。音声信号強度測定部32により、現在の音声信号が有音とみなせると判定した場合には、有音時間測定部34にて有音時間(有音状態の経過時間)を測定する。 FIG. 4 is an explanatory diagram for adjusting the microphone sensitivity (input gain) when the audio signal intensity measurement unit 32 determines that the audio signal is sound. When the sound signal intensity measuring unit 32 determines that the current sound signal can be regarded as sound, the sound time measuring unit 34 measures the sound time (elapsed time of the sound state).
 有音時間が予め設定された任意の有音時間閾値を超えて継続した場合は、マイク感度調整部41にてマイクロフォン10のマイク感度を下げる。この有音時間閾値は、具体的には5~20秒程度で、感度の下降率は10~50%程度が望ましく、通常、人が一呼吸で話すことのできる時間を目安に設定される。マイク感度調整部41は、上記のような有音時間閾値とマイク感度の上昇率の設定により、騒がしい環境下では速やかにマイクロフォンのマイク感度を下げるように調整する。 When the sound duration continues beyond an arbitrary sound duration threshold set in advance, the microphone sensitivity of the microphone 10 is lowered by the microphone sensitivity adjustment unit 41. The sound duration threshold is specifically about 5 to 20 seconds, and the rate of decrease in sensitivity is preferably about 10 to 50%, and is usually set based on the time during which a person can speak with one breath. The microphone sensitivity adjustment unit 41 adjusts the microphone sensitivity of the microphone to be quickly lowered under a noisy environment by setting the sound duration threshold and the microphone sensitivity increase rate as described above.
 具体的には、有音時間が任意の有音時間閾値(T21)だけ継続すると、マイク感度を数%~数十%下げる。更にそこから有音時間が有音時間閾値(T22)だけ継続すると、再びマイク感度を数%~数十%下げる。これを有音時間が継続している間中繰り返す。 Specifically, if the sound duration lasts for an arbitrary sound duration threshold (T21), the microphone sensitivity is reduced by several percent to several tens of percent. Further, if the sound duration continues from that time for the sound duration threshold (T22), the microphone sensitivity is lowered again by several percent to several tens percent. This is repeated throughout the duration of the sound.
 なお、有音時間閾値(T21)と有音時間閾値(T22)は必ずしも等しくする必要はない。また、それぞれでの入力ゲインの下降率も等しくする必要はない。 Note that the sound time threshold (T21) and the sound time threshold (T22) are not necessarily equal. Further, it is not necessary to make the rate of decrease of the input gain equal in each.
 やがてマイク感度が下降していくと有音時間がこれまで程長く継続しなくなり、マイク感度が使用環境に適した一定の水準に落ち着く。 As the microphone sensitivity decreases, the sound duration will not continue as long as before, and the microphone sensitivity will settle to a certain level suitable for the usage environment.
 図5に、実施例1の音声調整装置21において、マイク感度(入力ゲイン)を調整するための処理フローを示す。図5に基づいて処理フローを説明すると以下の通りである。
S11.音声調整処理を開始する。
S12.音声信号強度測定部32において、音声信号の強度を任意の音声強度閾値と比較し(あるいは音声認識中か否かを元にして)、音声信号の強度が音声強度閾値よりも大きければ有音、音声信号の強度が音声強度閾値よりも小さければ無音と判定する。
S13.音声信号強度測定部32において、無音(音声信号なし)と判定した場合は、無音時間測定部33で無音時間(音声信号のない区間の継続時間)を計測する。
S14.無音時間が任意の無音時間閾値よりも短かった場合は、マイク感度を変更せずに音声調整処理の開始(S11)に戻る。無音時間が無音時間閾値をより長かった場合は、次の処理(S15)に進む。
S15.無音時間が無音時間閾値よりも長かった場合は、周囲に雑音のない静かな環境でマイク感度に余裕があると判断して、マイク信号入力部31のマイク感度を任意の割合で上げた後、音声調整処理の開始(S11)に戻る。
S16.音声信号強度測定部32において、有音(音声信号あり)と判定した場合は、有音時間測定部34で有音時間(音声信号のある区間の継続時間)を測定する。
S17.有音時間が任意の有音時間閾値より短かった場合は、通常の会話であると判断して、マイク感度を変更せずに音声調整処理の開始(S11)に戻る。有音時間が有音時間閾値よりも長かった場合は、次の処理(S18)に進む。
S18.有音時間が有音時間閾値よりも長かった場合は、周囲に一定の雑音があると判断して、マイク信号入力部31でマイクロフォン10のマイク感度を任意の割合で下げた後、音声調整処理の開始(S11)に戻る。
FIG. 5 shows a processing flow for adjusting the microphone sensitivity (input gain) in the sound adjustment device 21 of the first embodiment. The processing flow will be described with reference to FIG.
S11. Start the audio adjustment process.
S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound.
S13. When the sound signal intensity measuring unit 32 determines that there is no sound (no sound signal), the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
S14. If the silent time is shorter than an arbitrary silent time threshold, the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S15).
S15. When the silent time is longer than the silent time threshold, it is determined that there is a margin in microphone sensitivity in a quiet environment with no noise around the microphone, and the microphone sensitivity of the microphone signal input unit 31 is increased by an arbitrary ratio. The process returns to the start of the sound adjustment process (S11).
S16. When the voice signal strength measuring unit 32 determines that there is sound (there is a voice signal), the voiced time measuring unit 34 measures the voiced time (the duration of a section where the voice signal is present).
S17. If the sound duration is shorter than an arbitrary sound duration threshold, it is determined that the conversation is a normal conversation, and the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity. If the sound time is longer than the sound time threshold, the process proceeds to the next process (S18).
S18. If the sound time is longer than the sound time threshold, it is determined that there is constant noise around the sound, and the microphone sensitivity of the microphone 10 is lowered by an arbitrary ratio by the microphone signal input unit 31, and then the sound adjustment processing Return to the start (S11).
 上記処理フローのように、本発明は、音声信号から無音状態と有音状態が継続する長さを測定して、音声認識に適した音声信号に調整することを特徴としている。すなわち、実施例1の音声調整装置21によれば、無音時間または有音時間が継続する長さにより、周囲の雑音と人間の会話とを区別しており、使用環境に応じて最適なマイク感度を設定することができる。 As in the above processing flow, the present invention is characterized in that the length of the silence state and the sound state is measured from the voice signal and adjusted to a voice signal suitable for voice recognition. That is, according to the sound adjustment device 21 of the first embodiment, the ambient noise and the human conversation are distinguished by the length of time during which the silent time or the sound time continues, and the optimum microphone sensitivity is obtained according to the use environment. Can be set.
 (実施例2)
 図6に、実施例2の音声調整装置22の構成を示す。実施例2の音声調整装置22は、音声調整部40としてスピーカ音量調整部42とスピーカ信号出力部43を備え、静かな環境や雑音の多い騒がしい環境など、使用環境に応じてスピーカ50の音量(出力ゲイン)を聞き取り易く調整する音声調整装置である。
(Example 2)
FIG. 6 shows the configuration of the audio adjustment device 22 according to the second embodiment. The audio adjusting device 22 according to the second embodiment includes a speaker volume adjusting unit 42 and a speaker signal output unit 43 as the audio adjusting unit 40, and the volume of the speaker 50 (depending on the usage environment such as a quiet environment or a noisy environment with a lot of noise). This is an audio adjustment device that adjusts the output gain easily.
 音声調整装置22において、音声信号がマイクロフォン10およびマイク信号入力部31を経て入力され、音声信号の強度が音声信号強度測定部32で測定され、無音時間測定部33および有音時間測定部34にて無音時間および有音時間を測定するまでの処理は、実施例1の音声調整装置21と同様である。 In the audio adjustment device 22, an audio signal is input through the microphone 10 and the microphone signal input unit 31, the intensity of the audio signal is measured by the audio signal intensity measurement unit 32, and the silent time measurement unit 33 and the sound time measurement unit 34 are measured. The process until the silent time and the voiced time are measured is the same as that of the sound adjustment device 21 of the first embodiment.
 その後、実施例2の音声調整装置22では、無音時間測定部33にて、無音時間が予め設定された任意の無音時間閾値よりも長かった場合は、スピーカ音量調整部42にてスピーカ50の音量(出力ゲイン)を下げる。この無音時間閾値は、具体的には10秒~60秒程度で、音量の下降率は1~5%程度が望ましい。スピーカ音量調整部42は、このような無音時間閾値と音量の下降率の設定により、静かな環境下においてはスピーカ50の音量を次第に低下するように調整する。 After that, in the sound adjustment device 22 of the second embodiment, when the silence time measurement unit 33 determines that the silence time is longer than an arbitrary silence time threshold set in advance, the speaker volume adjustment unit 42 controls the volume of the speaker 50. Reduce (output gain). Specifically, the silent time threshold is preferably about 10 to 60 seconds, and the volume decrease rate is preferably about 1 to 5%. The speaker volume adjustment unit 42 adjusts the volume of the speaker 50 to gradually decrease in a quiet environment by setting the silent time threshold and the volume decrease rate.
 また、スピーカ音量調整部42は、有音時間が予め設定した任意の有音時間閾値よりも長かった場合は、スピーカ音量調整部42にてスピーカ50の音量(出力ゲイン)を上げる。この有音時間閾値は、具体的には5~20秒程度で、音量の上昇率は10~50%程度が望ましく、通常、人が一呼吸で話すことのできる時間を目安に設定される。スピーカ音量調整部42は、このような無音時間閾値と音量の下降率の設定により、騒がしい環境下においてはスピーカ50の音量を速やかに上げるように調整する。 Further, the speaker volume adjustment unit 42 increases the volume (output gain) of the speaker 50 by the speaker volume adjustment unit 42 when the sound duration is longer than an arbitrary sound duration threshold set in advance. The sound duration threshold is specifically about 5 to 20 seconds, and the rate of increase in volume is preferably about 10 to 50%, and is usually set based on the time during which a person can speak with one breath. The speaker volume adjustment unit 42 adjusts the volume of the speaker 50 to be quickly increased in a noisy environment by setting the silent time threshold and the volume decrease rate.
 図7に、実施例2の音声調整装置22において、スピーカ音量(出力ゲイン)を調整するため処理フローを示す。図7に基づいて処理フローを説明すると以下の通りである。
S11.音声調整処理を開始する。
S12.音声信号強度測定部32において、音声信号の強度を任意の音声強度閾値と比較し(あるいは音声認識中か否かを元にして)、音声信号の強度が音声強度閾値よりも大きければ有音、音声信号の強度が音声強度閾値よりも小さければ無音と判定する。
S13.音声信号強度測定部32において、無音(音声信号なし)と判定した場合は、無音時間測定部33で無音時間(音声信号のない区間の継続時間)を測定する。
S14.無音時間が任意の無音時間閾値よりも短かった場合は、スピーカの音量を変更せずに音声調整処理の開始(S11)に戻る。無音時間が無音時間閾値よりも長かった場合は、次の処理(S21)に進む。
S21.無音時間が無音時間閾値よりも長かった場合は、周囲に雑音がなく静かな環境であると判断して、スピーカ50の音量(出力ゲイン)を任意の割合で下げた後、音声調整処理の開始(S11)に戻る。
S16.音声信号強度測定部32において、有音(音声信号あり)と判定した場合は、有音時間測定部34で有音時間(音声信号のある区間の継続時間)を測定する。
S17.有音時間が任意の有音時間閾値より短かった場合は、通常の会話であると判断して、出力ゲインを変更せずに音声調整処理の開始(S11)に戻る。有音時間が有音時間閾値よりも長かった場合は、次の処理(S22)に進む。
S22.有音時間が有音時間閾値よりも長かった場合は、周囲の雑音であると判断して、スピーカ50の音量(出力ゲイン)を任意の割合で上げた後、音声調整処理の開始(S11)に戻る。
FIG. 7 shows a processing flow for adjusting the speaker volume (output gain) in the sound adjustment device 22 of the second embodiment. The processing flow will be described with reference to FIG.
S11. Start the audio adjustment process.
S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound.
S13. When the sound signal intensity measuring unit 32 determines that there is no sound (no sound signal), the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
S14. If the silent time is shorter than an arbitrary silent time threshold, the process returns to the start of the sound adjustment processing (S11) without changing the volume of the speaker. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S21).
S21. If the silent time is longer than the silent time threshold, it is determined that the environment is quiet and there is no noise, the volume (output gain) of the speaker 50 is lowered at an arbitrary rate, and then the voice adjustment process is started. Return to (S11).
S16. When the voice signal strength measuring unit 32 determines that there is sound (there is a voice signal), the voiced time measuring unit 34 measures the voiced time (the duration of a section in which the voice signal is present).
S17. If the sound duration is shorter than the arbitrary sound duration threshold, it is determined that the conversation is normal, and the flow returns to the start of the sound adjustment processing (S11) without changing the output gain. If the sound time is longer than the sound time threshold, the process proceeds to the next process (S22).
S22. If the sound time is longer than the sound time threshold, it is determined that the sound is ambient noise, the sound volume (output gain) of the speaker 50 is increased by an arbitrary ratio, and then the sound adjustment process is started (S11). Return to.
 実施例2の音声調整装置22によれば、周囲の雑音の強度に応じてスピーカ50の音量(出力ゲイン)を調整しているため、使用環境に合わせて人間の会話の音量を聞き取り易く調整することができる。 According to the voice adjustment device 22 of the second embodiment, since the volume (output gain) of the speaker 50 is adjusted according to the intensity of ambient noise, the volume of human conversation is easily adjusted according to the usage environment. be able to.
 (実施例3)
 実施例3の音声調整装置23の構成を図8に示す。実施例3の音声調整装置23は、実施例1の音声調整装置21と実施例2の音声調整装置22を組み合わせたものである。実施例3の音声調整装置23は、静かな環境や雑音の多い騒がしい環境などでは、マイクロフォンの感度(入力ゲイン)を音声認識機能が良好な状態となるよう調整すると共に、周囲の雑音に対してスピーカの音量(出力ゲイン)を聞き取り易く調整する音声調整装置である。
(Example 3)
FIG. 8 shows the configuration of the sound adjustment device 23 according to the third embodiment. The sound adjustment device 23 according to the third embodiment is a combination of the sound adjustment device 21 according to the first embodiment and the sound adjustment device 22 according to the second embodiment. The voice adjustment device 23 according to the third embodiment adjusts the sensitivity (input gain) of the microphone so that the voice recognition function is in a good state in a quiet environment or a noisy environment where there is a lot of noise. This is an audio adjustment device that adjusts the volume (output gain) of a speaker so that it can be easily heard.
 実施例3の音声調整装置23は、図8に示すように、実施例1の音声調整装置21の構成と、実施例2の音声調整装置22の構成を兼ね備えている。音声調整装置23の各構成の詳細は、実施例1や実施例2と同じであるため説明を省略する。 As shown in FIG. 8, the voice adjustment device 23 of the third embodiment has the configuration of the voice adjustment device 21 of the first embodiment and the configuration of the voice adjustment device 22 of the second embodiment. The details of each component of the sound adjustment device 23 are the same as those in the first and second embodiments, and thus the description thereof is omitted.
 図9に、実施例3の音声調整装置23によるマイクおよびスピーカの音声調整の処理フローを示す。図9に基づいて処理フローを説明すると以下の通りである。
S11.音声調整処理を開始する。
S12.音声信号強度測定部32において、音声信号の強度を任意の音声強度閾値と比較し(あるいは音声認識中か否かを元にして)、音声信号の強度が音声強度閾値よりも大きければ有音、音声信号の強度が音声強度閾値よりも小さければ無音と判定する。
S13.音声信号強度測定部32において、無音(音声信号なし)と判定した場合は、無音時間測定部33で無音時間(音声信号のない区間の継続時間)を測定する。
S14.無音時間が任意の無音時間閾値よりも短かった場合は、マイク感度及びスピーカ音量を変更せずに音声調整処理の開始(S11)に戻る。無音時間が無音時間閾値よりも長かった場合は、次の処理(S15)に進む。
S15.無音時間が無音時間閾値よりも長かった場合は、周囲に雑音のない静かな環境でマイク感度に余裕あると判断して、マイク感度(入力感度)を任意の割合で上げるとともに、続くS21で、スピーカ音量(出力ゲイン)を任意の割合で下げた後、音声調整処理の開始(S11)に戻る。
S16.音声信号強度測定部32において、有音(音声信号あり)と判定した場合は、有音時間測定部34で有音時間(音声信号のある区間の継続時間)を測定する。
S17.有音時間が任意の有音時間閾値より短かった場合は、通常の会話であると判断して、マイク感度及びスピーカ音量を変更せずに音声調整処理の開始(S11)に戻る。有音時間が有音時間閾値よりも長かった場合は、次の処理(S18)に進む。
S18.有音時間が有音時間閾値よりも長かった場合は、周囲に一定の雑音があると判断して、マイク感度(入力感度)を任意の割合で下げるとともに、続くS22で、スピーカ音量(出力ゲイン)を任意の割合で上げた後、音声調整処理の開始(S11)に戻る。
FIG. 9 shows a processing flow of sound adjustment of the microphone and the speaker by the sound adjustment device 23 of the third embodiment. The processing flow will be described with reference to FIG.
S11. Start the audio adjustment process.
S12. In the audio signal intensity measurement unit 32, the intensity of the audio signal is compared with an arbitrary audio intensity threshold (or based on whether the audio is being recognized), and if the intensity of the audio signal is greater than the audio intensity threshold, If the intensity of the audio signal is smaller than the audio intensity threshold, it is determined that there is no sound.
S13. When the sound signal intensity measuring unit 32 determines that there is no sound (no sound signal), the soundless time measuring unit 33 measures the soundless time (the duration of the section without the sound signal).
S14. If the silent time is shorter than an arbitrary silent time threshold, the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity and the speaker volume. If the silent time is longer than the silent time threshold, the process proceeds to the next process (S15).
S15. If the silent time is longer than the silent time threshold, it is determined that the microphone sensitivity is sufficient in a quiet environment with no surrounding noise, and the microphone sensitivity (input sensitivity) is increased by an arbitrary ratio. After lowering the speaker volume (output gain) at an arbitrary rate, the process returns to the start of the audio adjustment process (S11).
S16. When the voice signal strength measuring unit 32 determines that there is sound (there is a voice signal), the voiced time measuring unit 34 measures the voiced time (the duration of a section in which the voice signal is present).
S17. If the sound duration is shorter than an arbitrary sound duration threshold, it is determined that the conversation is normal, and the process returns to the start of the sound adjustment processing (S11) without changing the microphone sensitivity and the speaker volume. If the sound time is longer than the sound time threshold, the process proceeds to the next process (S18).
S18. If the sound time is longer than the sound time threshold, it is determined that there is constant noise around the sound, and the microphone sensitivity (input sensitivity) is reduced by an arbitrary ratio. ) At an arbitrary rate, and then the process returns to the start of the audio adjustment process (S11).
 実施例3の音声調整装置23によれば、無音時間または有音時間が継続する長さにより、周囲の雑音と人間の会話とを区別しているため、周囲の雑音の強度に合わせてマイク感度とスピーカ音量を設定しているため、使用環境に応じて人間の会話の音量を聞き取り易く調整することが可能である。また、本発明の音声調整装置23を備えることにより、音声認識機能を有する機器で音声認識率を向上させることができる。 According to the voice adjustment device 23 of the third embodiment, the ambient noise and the human conversation are distinguished based on the length of time during which the silent time or the voiced time lasts. Since the speaker volume is set, it is possible to easily adjust the volume of human conversation according to the usage environment. In addition, by providing the voice adjustment device 23 of the present invention, it is possible to improve the voice recognition rate in a device having a voice recognition function.
 本発明によれば、音声認識機能や音声応答機能を有するあらゆる機器での利用が可能である。 According to the present invention, it can be used in any device having a voice recognition function and a voice response function.
 10 マイクロフォン
 20、21、22、23 音声調整装置
 31 マイク信号入力部
 32 音声信号強度測定部
 33 無音時間測定部
 34 有音時間測定部
 40 音声調整部
 41 マイク感度調整部
 42 スピーカ音量調整部
 43 スピーカ信号出力部
 50 スピーカ
DESCRIPTION OF SYMBOLS 10 Microphone 20, 21, 22, 23 Audio | voice adjustment apparatus 31 Microphone signal input part 32 Audio | voice signal strength measurement part 33 Silent time measurement part 34 Sound time measurement part 40 Audio | voice adjustment part 41 Microphone sensitivity adjustment part 42 Speaker volume adjustment part 43 Speaker Signal output unit 50 Speaker

Claims (4)

  1.  マイクから入力された電気信号を音声信号に変換するマイク信号入力部と、
     前記音声信号の強度を測定して予め設定した音声強度閾値と比較する音声信号強度測定部と、
     前記音声信号が前記音声強度閾値よりも小さい無音状態が継続する無音時間を計測する無音時間測定部と、
     前記音声信号が前記音声強度閾値よりも大きい有音状態が継続する有音時間を計測する有音時間測定部と、
     前記無音時間または前記有音時間を予め設定した時間閾値と比較して前記音声信号を調整する音声調整部とを備えたことを特徴とする音声調整装置。
    A microphone signal input unit that converts an electrical signal input from the microphone into an audio signal;
    A voice signal strength measurement unit that measures the strength of the voice signal and compares it with a preset voice strength threshold;
    A silent time measuring unit for measuring a silent time during which a silent state in which the voice signal is smaller than the voice intensity threshold is continued;
    A sound duration measuring unit that measures a sound duration in which a sound state in which the voice signal is greater than the voice intensity threshold continues;
    An audio adjusting device comprising: an audio adjusting unit that adjusts the audio signal by comparing the silent time or the sounded time with a preset time threshold value.
  2.  前記音声調整部は、前記マイク信号入力部の入力ゲインを調整するためのマイク感度調整部を備え、
     前記無音時間が予め設定した無音時間閾値より長いときに前記入力ゲインを上げて、前記有音時間が予め設定した有音時間閾値より長いときに前記入力ゲインを下げることを特徴とする請求項1に記載の音声調整装置。
    The voice adjustment unit includes a microphone sensitivity adjustment unit for adjusting an input gain of the microphone signal input unit,
    The input gain is increased when the silent time is longer than a preset silent time threshold, and the input gain is decreased when the voiced time is longer than a preset silent time threshold. The sound adjustment device described in 1.
  3.  前記音声調整部は、スピーカに音声信号を出力するスピーカ信号出力部と、前記スピーカ信号出力部の出力ゲインを調整するためのスピーカ音量調整部とを備え、
     前記無音時間が予め設定した無音時間閾値より長いときに前記出力ゲインを下げて、前記有音時間が予め設定した有音時間閾値より長いときに前記出力ゲインを上げることを特徴とする請求項1に記載の音声調整装置。
    The audio adjustment unit includes a speaker signal output unit that outputs an audio signal to a speaker, and a speaker volume adjustment unit for adjusting an output gain of the speaker signal output unit,
    2. The output gain is lowered when the silent time is longer than a preset silent time threshold, and the output gain is increased when the voiced time is longer than a preset silent time threshold. The sound adjustment device described in 1.
  4.  前記音声調整部は、前記マイク信号入力部の入力ゲインを調整するためのマイク感度調整部と、スピーカに音声信号を出力するスピーカ信号出力部と、前記スピーカ信号出力部の出力ゲインを調整するためのスピーカ音量調整部とを備え、
     前記無音時間が予め設定した無音時間閾値より長いときに前記入力ゲインを上げるとともに前記出力ゲインを下げて、
     前記有音時間が予め設定した有音時間閾値より長いときに前記入力ゲインを下げるとともに前記出力ゲインを上げることを特徴とする請求項1に記載の音声調整装置。
    The audio adjustment unit adjusts an output gain of the microphone signal input unit, a microphone sensitivity adjustment unit for adjusting an input gain of the microphone signal input unit, a speaker signal output unit that outputs an audio signal to a speaker, and the speaker signal output unit. Speaker volume adjustment unit,
    Increasing the input gain and decreasing the output gain when the silence time is longer than a preset silence time threshold,
    The sound adjustment device according to claim 1, wherein when the sound duration is longer than a preset sound duration threshold, the input gain is decreased and the output gain is increased.
PCT/JP2015/055093 2014-10-29 2015-02-23 Speech adjustment device WO2016067644A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014219802A JP5863928B1 (en) 2014-10-29 2014-10-29 Audio adjustment device
JP2014-219802 2014-10-29

Publications (1)

Publication Number Publication Date
WO2016067644A1 true WO2016067644A1 (en) 2016-05-06

Family

ID=55346919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/055093 WO2016067644A1 (en) 2014-10-29 2015-02-23 Speech adjustment device

Country Status (2)

Country Link
JP (1) JP5863928B1 (en)
WO (1) WO2016067644A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108735207A (en) * 2017-04-25 2018-11-02 丰田自动车株式会社 Sound conversational system, sound dialogue method and computer readable storage medium
WO2021040834A1 (en) * 2019-08-29 2021-03-04 Microsoft Technology Licensing, Llc Automatic speech sensitivity adjustment feature

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6887315B2 (en) * 2017-06-05 2021-06-16 キヤノン株式会社 Speech processing device and its control method, program and storage medium
JP7404664B2 (en) 2019-06-07 2023-12-26 ヤマハ株式会社 Audio processing device and audio processing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58190993A (en) * 1982-05-01 1983-11-08 日産自動車株式会社 Voice detector for vehicle
JPH1091184A (en) * 1996-09-12 1998-04-10 Oki Electric Ind Co Ltd Sound detection device
JP2006209069A (en) * 2004-12-28 2006-08-10 Advanced Telecommunication Research Institute International Voice section detection device and program
WO2008114448A1 (en) * 2007-03-20 2008-09-25 Fujitsu Limited Speech recognition system, speech recognition program, and speech recognition method
JP2009175179A (en) * 2008-01-21 2009-08-06 Denso Corp Speech recognition device, program and utterance signal extraction method
JP2014075674A (en) * 2012-10-03 2014-04-24 Oki Electric Ind Co Ltd Audio signal processing device, method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58190993A (en) * 1982-05-01 1983-11-08 日産自動車株式会社 Voice detector for vehicle
JPH1091184A (en) * 1996-09-12 1998-04-10 Oki Electric Ind Co Ltd Sound detection device
JP2006209069A (en) * 2004-12-28 2006-08-10 Advanced Telecommunication Research Institute International Voice section detection device and program
WO2008114448A1 (en) * 2007-03-20 2008-09-25 Fujitsu Limited Speech recognition system, speech recognition program, and speech recognition method
JP2009175179A (en) * 2008-01-21 2009-08-06 Denso Corp Speech recognition device, program and utterance signal extraction method
JP2014075674A (en) * 2012-10-03 2014-04-24 Oki Electric Ind Co Ltd Audio signal processing device, method, and program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108735207A (en) * 2017-04-25 2018-11-02 丰田自动车株式会社 Sound conversational system, sound dialogue method and computer readable storage medium
CN108735207B (en) * 2017-04-25 2023-05-02 丰田自动车株式会社 Voice conversation system, voice conversation method, and computer-readable storage medium
WO2021040834A1 (en) * 2019-08-29 2021-03-04 Microsoft Technology Licensing, Llc Automatic speech sensitivity adjustment feature

Also Published As

Publication number Publication date
JP2016085420A (en) 2016-05-19
JP5863928B1 (en) 2016-02-17

Similar Documents

Publication Publication Date Title
US10631087B2 (en) Method and device for voice operated control
US10579327B2 (en) Speech recognition device, speech recognition method and storage medium using recognition results to adjust volume level threshold
US7171357B2 (en) Voice-activity detection using energy ratios and periodicity
EP2860730A1 (en) Speech processing
US20110004468A1 (en) Hearing aid and hearing-aid processing method
JP5863928B1 (en) Audio adjustment device
US10320967B2 (en) Signal processing device, non-transitory computer-readable storage medium, signal processing method, and telephone apparatus
JP2009178783A (en) Communication robot and its control method
EP2743923B1 (en) Voice processing device, voice processing method
US9749741B1 (en) Systems and methods for reducing intermodulation distortion
JP4876245B2 (en) Consonant processing device, voice information transmission device, and consonant processing method
KR20200026896A (en) Voice signal leveling
US8935168B2 (en) State detecting device and storage medium storing a state detecting program
WO2016017229A1 (en) Speech segment detection device, voice processing system, speech segment detection method, and program
US20200152185A1 (en) Method and Device for Voice Operated Control
JPH0635497A (en) Speech input device
JPS6257040B2 (en)
KR101602298B1 (en) Audio system using sound level meter
JP3284968B2 (en) Hearing aid with speech speed conversion function
WO2020217605A1 (en) Audio processing device
US7664635B2 (en) Adaptive voice detection method and system
Kyriakides et al. Isolated word endpoint detection using time-frequency variance kernels
JP2020161884A (en) Speech processing device, speech processing method, and speech processing system
JP2870421B2 (en) Hearing aid with speech speed conversion function
JP3257379B2 (en) Hearing aid with speech speed conversion function

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15853653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15853653

Country of ref document: EP

Kind code of ref document: A1