KR101961341B1

KR101961341B1 - Signal processing apparatus and method for barge-in speech recognition

Info

Publication number: KR101961341B1
Application number: KR1020170062037A
Authority: KR
Inventors: 윤동운
Original assignee: (주)오즈디에스피
Priority date: 2017-05-19
Filing date: 2017-05-19
Publication date: 2019-03-22
Also published as: KR20180126926A

Abstract

본 발명은 바지-인 음성 인식을 위한 신호 처리 장치 및 방법에 관한 것으로, 마이크를 통해 입력되는 에코의 클리핑이 일어나는 스피커의 출력 볼륨값에 대하여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 저장하고, 안내음성을 출력하는 스피커의 출력 볼륨값에 따라 대응하는 아날로그 이득값을 확인하여 아날로그 신호를 증폭하고 디지털 신호로 변환하여 에코를 제거한 후, 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하도록 에코가 제거된 디지털 신호를 증폭함으로써, 사용자의 발화음성과 섞인 에코를 효과적으로 제거할 수 있고 음성 인식의 정확성이 개선된다.[0001] The present invention relates to an apparatus and method for signal processing for pants-in speech recognition, in which an attenuated analog gain value is stored so that clipping does not occur with respect to an output volume value of a speaker where clipping of echo input through a microphone occurs, It is necessary to check the corresponding analog gain value according to the output volume value of the speaker outputting the voice, amplify the analog signal and convert it into a digital signal to remove the echo, By amplifying the signal, it is possible to effectively remove the echo mixed with the uttered voice of the user, and the accuracy of speech recognition is improved.

Description

[0001] The present invention relates to a signal processing apparatus and method for speech recognition,

본 발명은 음성 인식을 위한 신호 처리 기술에 관한 것으로, 더욱 상세하게는 안내음성의 출력 도중 발화를 허용하는 바지-인(barge-in) 방식의 음성 인식을 위해 마이크를 통해 입력된 신호를 처리하는 바지-인 음성 인식을 위한 신호 처리 장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a signal processing technique for speech recognition, and more particularly to a signal processing technique for processing a signal inputted through a microphone for voice recognition in a barge- To a signal processing apparatus and method for voice recognition of pants-in.

음성 인식 기술은 사용자가 발화한 음성에 따른 음향학적 신호를 단어나 문장으로 변환시키는 기술을 의미한다. 일반적으로 음성 인식 기능을 제공하는 시스템에서 사용자는, 음성 인식 진행을 위해 스피커로부터 출력되는 안내음성(prompt)을 듣고 이에 대응하는 음성을 발화(utterance)하여 마이크에 입력시키는데, 이때 안내음성이 모두 출력되어 종료되고 나서야 비로소 발화하는 것이 허용되므로 음성 인식 기능을 이용하는 사용자에게 불편함을 초래한다.Speech recognition technology refers to a technique of converting an acoustical signal according to a user's uttered voice into words or sentences. Generally, in a system for providing a voice recognition function, a user hears a prompt sound outputted from a speaker to proceed with speech recognition, utterances the voice corresponding thereto, and inputs the voice to the microphone. At this time, So that it is permitted to ignite only after it is finished, which is inconvenient for the user who uses the speech recognition function.

이러한 불편함을 개선하기 위해 바지-인(barge-in) 기능을 이용한 음성 인식 기술에 대한 개발이 이루어지고 있다. 바지-인(barge-in)은 음성인식을 위한 안내음성이 스피커로 출력되는 음성 인식 시스템에서 안내음성의 출력이 완료되기 전에 사용자가 음성을 발화하는 것을 허용하는 기능이다.In order to improve the inconvenience, a speech recognition technology using a barge-in function is being developed. A barge-in is a function that allows a user to utter a voice before completion of output of a guidance voice in a voice recognition system in which a guidance voice for voice recognition is output to a speaker.

음성 인식 시스템에서 바지-인 기능을 활용하면 사용자가 안내음성을 종료시까지 기다릴 필요 없이 안내음성의 일부 출력에 따라 그 의미를 이해한 후 곧바로 입력하고자 하는 음성을 발화할 수 있어, 음성 인식에 따른 소요 시간을 단축할 수 있다.In the speech recognition system, when the user uses the pants-in function, the user does not have to wait until the guidance voice is finished and can understand the meaning according to the partial output of the guidance voice and immediately utter the voice to be input. Time can be shortened.

그런데 안내음성의 출력 도중 바지-인 방식으로 발화하는 경우, 스피커로부터 출력된 안내음성은 에코의 형태로 사용자의 발화음성과 함께 마이크로 입력되므로, 이러한 에코를 제거하여 사용자의 발화음성을 분리하여야 음성 인식의 정확성을 높일 수 있다.However, when the guidance voice is generated by the pants-in method during the output of the guidance voice, the guidance voice output from the speaker is micro-input together with the user's speaking voice in the form of echo. Can be improved.

그런데 스피커로부터 출력되는 안내음성의 레벨이 커 마이크를 통해 입력되는 에코에 클리핑(clipping)이 발생하는 경우, 에코를 효과적으로 제거하는 것이 어려우므로 이를 개선하기 위한 방안이 요청된다.However, when clipping occurs in the echo input through the microphone due to the level of the guidance voice output from the speaker, it is difficult to effectively remove the echo, so a method for improving the echo is required.

공개특허공보 제10-2003-0073886호 (2003.09.19. 공개)Published Japanese Patent Application No. 10-2003-0073886 (published September 19, 2003)

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 안내음성의 출력 도중 발화를 허용하는 바지-인(barge-in) 방식을 이용한 음성 인식시, 에코의 형태로 마이크를 통해 입력되는 안내음성의 클리핑을 방지하여 효과적으로 에코를 제거하고 음성 인식 효율을 높일 수 있는 바지-인 음성 인식을 위한 신호 처리 장치 및 방법을 제공하기 위한 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to solve the above problems and to provide a method and apparatus for detecting a voice of a guidance voice input through a microphone in the form of an echo during speech recognition using a barge- The present invention is to provide a signal processing apparatus and method for voice recognition of pants-in which clipping is prevented, echo can be effectively removed, and voice recognition efficiency can be enhanced.

상기와 같은 목적을 달성하기 위한 본 발명의 바지-인 음성 인식을 위한 신호 처리 장치는, 안내음성을 출력하는 스피커의 출력 볼륨값에 대응하는 마이크의 아날로그 이득값 정보를 저장한 저장부, 상기 저장부에 저장된 아날로그 이득값에 따라 상기 마이크를 통해 입력된 아날로그 신호를 증폭하는 제1 증폭기, 상기 제1 증폭기에서 증폭된 아날로그 신호를 디지털 신호로 변환하는 아날로그-디지털 변환기, 상기 아날로그-디지털 변환기에서 변환된 디지털 신호에서 안내음성에 대응하는 에코를 제거하는 에코 제거기, 에코가 제거된 디지털 신호를 증폭하는 제2 증폭기, 및 상기 스피커의 출력 볼륨값에 대응하는 상기 마이크의 아날로그 이득값 정보를 상기 저장부에 저장하되, 상기 마이크를 통해 입력되는 에코의 클리핑이 일어나는 상기 스피커의 출력 볼륨값에 대하여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 상기 저장부에 저장하고, 안내음성을 출력하는 상기 스피커의 출력 볼륨값에 따라 상기 저장부에서 대응하는 아날로그 이득값을 확인한 후, 확인된 아날로그 이득값에 따라 상기 제1 증폭기가 상기 마이크를 통해 입력된 아날로그 신호를 증폭하도록 제어하며, 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하도록 상기 제2 증폭기를 제어하는 제어부를 포함한다.According to another aspect of the present invention, there is provided a signal processing apparatus for voice recognition of a pants-in, comprising: a storage unit for storing analog gain value information of a microphone corresponding to an output volume value of a speaker for outputting a guidance voice; A first amplifier for amplifying an analog signal input through the microphone according to an analog gain value stored in the first amplifier, an analog-to-digital converter for converting the analog signal amplified by the first amplifier into a digital signal, A second amplifier for amplifying the digital signal from which the echo is removed, and an analog gain value information of the microphone corresponding to the output volume value of the speaker, Wherein the clipping of the echo inputted through the microphone occurs in the speaker Stores the attenuated analog gain value so that clipping does not occur with respect to the output volume value, and confirms the corresponding analog gain value in the storage unit according to the output volume value of the speaker outputting the guidance voice, And a controller for controlling the first amplifier to amplify the analog signal inputted through the microphone according to the analog gain value and for controlling the second amplifier to recover the digital signal as much as the analog signal is attenuated.

본 발명의 바지-인 음성 인식을 위한 신호 처리 장치에 있어서, 상기 제어부는, 상기 스피커로부터 안내음성이 출력되면 상기 마이크를 통해 입력되는 에코에 클리핑이 일어나는 출력 볼륨값을 확인하고, 클리핑이 확인된 출력 볼륨값에 대응하는 아날로그 이득값을 줄여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 계산하며, 클리핑이 확인된 출력 볼륨값 및 대응하는 감쇄된 아날로그 이득값을 상기 저장부에 저장하는 것을 특징으로 한다.In the signal processing device for pants-in speech recognition of the present invention, when the guidance sound is output from the speaker, the controller checks an output volume value at which clipping occurs in the echo input through the microphone, Calculating an attenuated analog gain value such that clipping does not occur by reducing the analog gain value corresponding to the output volume value, and storing the clipped-confirmed output volume value and the corresponding attenuated analog gain value in the storage section .

본 발명의 바지-인 음성 인식을 위한 신호 처리 장치에 있어서, 상기 제어부는, 다음의 수학식을 이용하여 상기 스피커의 출력 볼륨값에 대응하는 아날로그 이득값을 계산하여 상기 저장부에 저장하는 것을 특징으로 한다.In the signal processing apparatus for speech recognition of the present invention, the control unit calculates an analog gain value corresponding to the output volume value of the speaker using the following equation and stores it in the storage unit .

인 경우

If

인 경우

If

이때 AG는 아날로그 이득값, v는 출력 볼륨값, T는 클리핑이 일어나지 않는 최대 볼륨값, A는 감쇄되지 않은 아날로그 이득값, S는 스피커의 출력 레벨을 나타낸다.Where AG is the analog gain value, v is the output volume value, T is the maximum volume value at which no clipping occurs, A is the non-attenuated analog gain value, and S is the output level of the speaker.

본 발명의 바지-인 음성 인식을 위한 신호 처리 장치에 있어서, 상기 제어부는, 다음의 수학식을 이용해 상기 제1 증폭기에서 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하기 위한 디지털 이득값을 계산하고, 계산된 디지털 이득값에 따라 상기 제2 증폭기에서 디지털 신호를 증폭하도록 제어하는 것을 특징으로 한다.The controller calculates a digital gain value for restoring a digital signal as much as the analog signal is attenuated in the first amplifier by using the following equation, And controls the second amplifier to amplify the digital signal according to the calculated digital gain value.

인 경우

If

인 경우

If

이때 DG는 디지털 이득값, AG는 아날로그 이득값, v는 출력 볼륨값, T는 클리핑이 일어나지 않는 최대 볼륨값, A는 감쇄되지 않은 아날로그 이득값을 나타낸다.In this case, DG represents a digital gain value, AG represents an analog gain value, v represents an output volume value, T represents a maximum volume value at which clipping does not occur, and A represents an attenuated analog gain value.

본 발명의 바지-인 음성 인식을 위한 신호 처리 장치에 있어서, 상기 제2 증폭기에서 증폭된 디지털 신호를 이용해 상기 마이크를 통해 에코와 함께 입력된 발화음성을 인식하는 음성 인식기를 더 포함하는 것을 특징으로 한다.The apparatus further comprises a speech recognizer for recognizing a speech sound inputted together with the echo through the microphone using the digital signal amplified by the second amplifier, do.

상기와 같은 목적을 달성하기 위한 본 발명의 바지-인 음성 인식을 위한 신호 처리 방법은, 신호 처리 장치가 안내음성을 출력하는 스피커의 출력 볼륨값에 대응하는 마이크의 아날로그 이득값 정보를 저장하되, 상기 마이크를 통해 입력되는 에코의 클리핑이 일어나는 상기 스피커의 출력 볼륨값에 대하여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 저장하는 단계, 상기 신호 처리 장치가 안내음성을 출력하는 상기 스피커의 출력 볼륨값에 따라 대응하는 아날로그 이득값을 확인하는 단계, 상기 신호 처리 장치가 확인된 아날로그 이득값에 따라 상기 마이크를 통해 입력된 아날로그 신호를 증폭하는 단계, 상기 신호 처리 장치가 증폭된 아날로그 신호를 디지털 신호로 변환하는 단계, 상기 신호 처리 장치가 변환된 디지털 신호에서 안내음성에 대응하는 에코를 제거하는 단계, 및 상기 신호 처리 장치가 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하도록 에코가 제거된 디지털 신호를 증폭하는 단계를 포함한다.According to another aspect of the present invention, there is provided a signal processing method for voice recognition of a pants-in-speech, the method comprising: storing information on analog gain of a microphone corresponding to an output volume value of a speaker for outputting a guidance voice, Storing an attenuated analog gain value such that clipping does not occur with respect to an output volume value of the speaker where clipping of an echo input through the microphone occurs; Determining a corresponding analog gain value, amplifying an analog signal inputted through the microphone in accordance with an identified analog gain value, the signal processing apparatus converting the amplified analog signal into a digital signal The signal processing apparatus comprising: Removing an echo, and the signal processing unit corresponding to the to recover the digital signal by the analog signal attenuation includes the step of amplifying the digital signal of the echo is eliminated.

본 발명의 바지-인 음성 인식을 위한 신호 처리 방법에 있어서, 상기 저장하는 단계는, 상기 신호 처리 장치가 상기 스피커로부터 안내음성이 출력되면 상기 마이크를 통해 입력되는 에코에 클리핑이 일어나는 출력 볼륨값을 확인하는 단계, 상기 신호 처리 장치가 클리핑이 확인된 출력 볼륨값에 대응하는 아날로그 이득값을 줄여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 계산하는 단계, 및 상기 신호 처리 장치가 클리핑이 확인된 출력 볼륨값 및 대응하는 감쇄된 아날로그 이득값을 저장하는 단계를 포함하는 것을 특징으로 한다.In the signal processing method for voice recognition of a pants-in voice of the present invention, when the signal processing apparatus outputs a guidance voice from the speaker, an output volume value at which clipping occurs in the echo input through the microphone Calculating a decayed analog gain value such that clipping does not occur by reducing the analog gain value corresponding to the output volume value at which the signal processing apparatus is confirmed to be clipped; and calculating, by the signal processing apparatus, Value and a corresponding attenuated analog gain value.

본 발명의 바지-인 음성 인식을 위한 신호 처리 방법에 있어서, 상기 저장하는 단계는, 상기 신호 처리 장치가 다음의 수학식을 이용하여 상기 스피커의 출력 볼륨값에 대응하는 아날로그 이득값을 계산하여 저장하는 것을 특징으로 한다.In the signal processing method for voice recognition of the present invention, the signal processing apparatus may calculate and store an analog gain value corresponding to the output volume value of the speaker using the following equation .

인 경우

If

인 경우

If

본 발명의 바지-인 음성 인식을 위한 신호 처리 방법에 있어서, 상기 증폭하는 단계는, 상기 신호 처리 장치가 다음의 수학식을 이용하여 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하기 위한 디지털 이득값을 계산하여 에코가 제거된 디지털 신호를 증폭하는 것을 특징으로 한다.In the signal processing method for voice recognition of a pants-in of the present invention, the step of amplifying may include: calculating a digital gain value for recovering a digital signal as much as the analog signal is attenuated using the following equation And amplifies the echo-canceled digital signal by calculation.

인 경우

If

인 경우

If

본 발명의 바지-인 음성 인식을 위한 신호 처리 방법에 있어서, 상기 신호 처리 장치가 증폭된 디지털 신호를 이용해 상기 마이크를 통해 에코와 함께 입력된 발화음성을 인식하는 단계를 더 포함하는 것을 특징으로 한다.In the signal processing method for voice recognition of pants in accordance with the present invention, the signal processing apparatus further comprises a step of recognizing a speech sound inputted together with the echo through the microphone using the amplified digital signal .

상기와 같은 목적을 달성하기 위해 본 발명은 상기한 바지-인 음성 인식을 위한 신호 처리 방법을 수행하는 프로그램을 기록한 컴퓨터 판독 가능한 기록매체를 제공한다.According to another aspect of the present invention, there is provided a computer-readable recording medium having recorded thereon a program for performing a signal processing method for voice recognition.

본 발명의 바지-인 음성 인식을 위한 신호 처리 장치 및 방법에 따르면 음성 인식을 위한 안내음성의 출력 도중 사용자의 음성이 발화되어, 발화음성과 에코 형태의 안내음성이 함께 마이크로 입력되는 경우, 에코에 해당하는 신호만을 효과적으로 제거할 수 있다.According to the signal processing apparatus and method for voice recognition of the pants-in of the present invention, when the voice of the user is uttered during the output of the guidance voice for voice recognition, and the voice of the user and the guidance voice of the echo- It is possible to effectively remove only the corresponding signal.

특히 스피커로부터 출력되는 안내음성의 레벨이 커 마이크를 통해 입력되는 에코에 클리핑이 발생할 수 있는 상황에서도, 마이크를 통해 입력된 신호의 이득을 적절히 조절하여 클리핑을 방지할 수 있어 에코를 효과적으로 제거할 수 있으며 이를 통해 발화음성의 왜곡을 방지하여 음성 인식 효율을 높일 수 있다.Especially, even when the level of the guidance voice output from the speaker is high and clipping may occur in the echo inputted through the microphone, clipping can be prevented by appropriately adjusting the gain of the signal inputted through the microphone, Thus, the speech recognition efficiency can be improved by preventing the speech speech distortion.

이렇듯 본 발명은 스피커 출력 레벨에 따라 마이크의 아날로그 이득을 달리 사용하여 바지-인 음성 인식 성능을 향상시킬 수 있는데, 낮은 스피커 출력 레벨의 범위에서는 하나의 최적의 마이크 이득값을 사용하여 음성인식 성능을 최적으로 유지하고, 높은 스피커 출력 레벨에서는 낮은 아날로그 이득값을 사용하여 마이크에 입력되는 에코가 클리핑되는 것을 방지함으로써, 에코를 효율적으로 제거하고 발화 왜곡을 방지할 수 있다.As described above, according to the present invention, the performance of the pants-in speech recognition can be improved by using the analog gain of the microphone differently according to the speaker output level. In the range of the low speaker output level, And at a high speaker output level, a low analog gain value is used to prevent the echo being input to the microphone from being clipped, thereby effectively eliminating echoes and preventing spurious distortion.

이를 통해 스피커 출력 볼륨값에 상관없이 작은 아날로그 이득값을 사용하여 마이크로 입력된 에코를 제거한 후 디지털 이득값에 따라 증폭하여 사용자 발화 신호의 크기를 복원해주는 경우, 에코의 클리핑이 발생하지 않는 낮은 스피커 볼륨대에서 발생하게 되는 사용자 발화 신호의 디지털 표현 값 손실을 방지할 수 있다.In this way, it is possible to remove the micro input echo using a small analog gain value regardless of the speaker output volume value, and then amplify according to the digital gain value to restore the size of the user's utterance signal. It is possible to prevent the loss of the digital representation value of the user's utterance signal that occurs in the system.

도 1은 본 발명의 일 실시예에 따른 신호 처리 장치의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 스피커의 출력 신호를 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따라 마이크를 통해 입력되는 에코를 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따라 마이크를 통해 입력되는 에코에서 클리핑이 발생한 모습을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따라 스피커의 출력 볼륨값에 대응하는 마이크의 아날로그 이득값 정보를 나타낸 도면이다.
도 6은 본 발명의 일 실시예에 따라 신호 처리를 위한 이득값을 계산하는 과정을 나타낸 도면이다.
도 7은 도 6의 실시예에 따라 신호를 처리하는 과정을 나타낸 도면이다.1 is a block diagram of a signal processing apparatus according to an embodiment of the present invention.
2 is a diagram illustrating output signals of a speaker according to an exemplary embodiment of the present invention.
3 is a diagram illustrating echoes input through a microphone according to an embodiment of the present invention.
4 is a diagram illustrating clipping occurring in an echo input through a microphone according to an embodiment of the present invention.
5 is a diagram illustrating analog gain value information of a microphone corresponding to an output volume value of a speaker according to an embodiment of the present invention.
6 is a diagram illustrating a process of calculating a gain value for signal processing according to an embodiment of the present invention.
7 is a diagram illustrating a process of processing a signal according to the embodiment of FIG.

하기의 설명에서는 본 발명의 실시예를 이해하는데 필요한 부분만이 설명되며, 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.In the following description, only parts necessary for understanding the embodiments of the present invention will be described, and the description of other parts will be omitted so as not to obscure the gist of the present invention.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 안 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 바람직한 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary meanings and the inventor is not limited to the meaning of the term in order to describe his invention in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely preferred embodiments of the present invention, and are not intended to represent all of the technical ideas of the present invention, so that various equivalents And variations are possible.

본 발명은 음성 인식을 위한 신호 처리 분야와 관련한 것이다. 이하, 첨부된 도면을 참조하여 본 발명의 실시예를 보다 상세하게 설명하기로 한다.The present invention relates to the field of signal processing for speech recognition. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 신호 처리 장치의 구성을 나타낸 도면이고, 도 2는 본 발명의 일 실시예에 따른 스피커의 출력 신호를 나타낸 도면이고, 도 3은 본 발명의 일 실시예에 따라 마이크를 통해 입력되는 에코를 나타낸 도면이고, 도 4는 본 발명의 일 실시예에 따라 마이크를 통해 입력되는 에코에서 클리핑이 발생한 모습을 나타낸 도면이며, 도 5는 본 발명의 일 실시예에 따라 스피커의 출력 볼륨값에 대응하는 마이크의 아날로그 이득값 정보를 나타낸 도면이다.FIG. 1 is a diagram illustrating a configuration of a signal processing apparatus according to an embodiment of the present invention. FIG. 2 is a diagram illustrating output signals of a speaker according to an embodiment of the present invention. FIG. FIG. 4 is a diagram illustrating a clipping occurring in an echo input through a microphone according to an embodiment of the present invention. FIG. 5 is a diagram illustrating an echo input through a microphone according to an embodiment of the present invention. And the analog gain value information of the microphone corresponding to the output volume value of the speaker.

도 1 내지 도 5를 참조하면 본 실시예의 신호 처리 장치(100)는, 스피커(1)에서 출력된 안내음성(prompt)에 따라 사용자(2)가 발화하는 음성이 마이크(microphone)(3)를 통해 입력되면, 입력된 신호를 음성 인식을 위해 처리하는 역할을 한다.1 to 5, the signal processing apparatus 100 according to the present embodiment is configured such that a voice uttered by the user 2 according to a guidance voice prompt output from the speaker 1 is transmitted to a microphone 3 And then processes the input signal for voice recognition.

먼저 음성 인식을 위해 스피커(1)로부터 안내음성이 출력되면 사용자(2)는 안내음성의 내용을 이해한 후 대응하는 음성을 발화하여 마이크(3)로 발화음성을 입력시킨다. 이때 사용자(2)는 스피커(1)를 통해 안내음성이 출력되는 도중 발화를 허용하는 바지-인(barge-in) 기능을 이용해 발화음성을 마이크(3)에 입력할 수 있다.First, when a guidance voice is output from the speaker 1 for voice recognition, the user 2 comprehends the contents of the guidance voice and then utteres the corresponding voice to input the voice to the microphone 3. At this time, the user 2 can input a speech sound into the microphone 3 by using a barge-in function that allows speech during speech while outputting a guidance voice through the speaker 1.

이때 스피커(1)로부터 출력된 안내음성은 에코의 형태로 마이크(3)에 입력되어 사용자(2)가 발화한 음성과 서로 섞인 형태로 마이크(3)에 입력되는데, 신호 처리 장치(100)는 에코 제거(echo canceller) 기술을 이용하여 마이크(3)를 통한 입력 신호 중 안내음성으로 인한 에코를 제거하고 사용자(2)의 발화음성만을 추출하여 음성 인식을 위해 전달함으로써, 음성 인식 성공률을 높일 수 있다.At this time, the guidance voice output from the speaker 1 is input to the microphone 3 in the form of echo and is input to the microphone 3 in a form mixed with the voice uttered by the user 2. The signal processing device 100 Echo canceller technology is used to remove the echo due to the guidance voice among the input signals through the microphone 3 and to extract only the speech voice of the user 2 for speech recognition, have.

이러한 신호 처리 장치(100)는 저장부(10), 제1 증폭기(20), 아날로그 디지털 변환기(30), 에코 제거기(40), 제2 증폭기(50), 음성 인식기(60) 및 제어부(70)를 포함하여 구성된다. The signal processing apparatus 100 includes a storage unit 10, a first amplifier 20, an analog-to-digital converter 30, an echo canceller 40, a second amplifier 50, a voice recognizer 60, ).

저장부(10)는 안내음성을 출력하는 스피커(1)의 출력 볼륨값에 대응하는 마이크(3)의 아날로그 이득값 정보를 저장한 저장소로서, 정보 저장을 위한 각종 메모리를 포함한다. 저장부(10)에 저장된 정보는 제어부(70)가 제1 증폭기(20)의 증폭 동작을 제어하는데 참조된다. 저장부(10)에 저장된 아날로그 이득값은 스피커(1)의 출력 볼륨값에 따라 서로 다른 이득값을 가질 수 있다.The storage unit 10 stores analog gain value information of the microphone 3 corresponding to the output volume value of the speaker 1 that outputs the guidance voice, and includes various memories for storing information. The information stored in the storage unit 10 is referred to when the control unit 70 controls the amplification operation of the first amplifier 20. The analog gain value stored in the storage unit 10 may have different gain values depending on the output volume value of the speaker 1. [

제1 증폭기(20)는 제어부(70)의 제어에 따라 저장부(10)에 저장된 아날로그 이득값에 대응하여 마이크(3)를 통해 입력된 아날로그 신호를 증폭하는 역할을 한다. 이때 마이크(3)를 통해 입력되는 신호는, 바지-인(barge-in) 기능에 따라 스피커(1)를 통해 출력된 안내음성과 사용자(2)의 발화음성이 섞인 형태의 신호일 수 있다.The first amplifier 20 amplifies the analog signal input through the microphone 3 in response to the analog gain value stored in the storage unit 10 under the control of the controller 70. At this time, the signal input through the microphone 3 may be a signal in which a guidance voice output through the speaker 1 and a speech voice of the user 2 are mixed according to a barge-in function.

아날로그-디지털 변환기(analog digital converter, ADC)(30)는 제1 증폭기(20)에서 증폭된 아날로그 신호를 디지털 신호로 변환하는 역할을 한다.An analog-to-digital converter (ADC) 30 converts the analog signal amplified by the first amplifier 20 into a digital signal.

에코 제거기(echo canceller)(40)는 아날로그-디지털 변환기(30)에서 변환된 디지털 신호에서 안내음성에 대응하는 에코 신호를 제거하는 역할을 한다.The echo canceller 40 serves to remove an echo signal corresponding to the announcement voice from the digital signal converted by the analog-to-digital converter 30.

에코 제거기(40)는 스피커(1)에서 출력되는 신호를 레퍼런스(reference)로 삼아 마이크(3)를 통해 입력된 신호에 포함된 에코를 제거하는데, 적응 필터(adaptive filter)를 사용하여 스피커(1)로부터 안내음성이 출력되어 마이크(3)로 입력되기까지의 시간 딜레이(delay)와 진폭(amplitude) 변화를 추정한 후, 이 추정값과 레퍼런스 신호를 사용하여 에코와 동일한 신호인 에코 추정(echo estimate) 신호를 생성한다. 그리고 에코 제거기(40)는 마이크(3)를 통한 입력 신호에서 에코 추정 신호를 차감하여 에코에 대응하는 신호를 제거한다.The echo canceller 40 removes the echo included in the signal input through the microphone 3 by using a signal output from the speaker 1 as a reference. The echo canceller 40 uses an adaptive filter to remove the echo from the speaker 1 (Echo estimate), which is the same signal as the echo using the estimated value and the reference signal, after estimating a time delay and an amplitude change from when the guidance voice is outputted to the microphone 3, ) Signal. The echo canceller 40 subtracts the echo estimation signal from the input signal through the microphone 3 and removes the signal corresponding to the echo.

제2 증폭기(50)는 제어부(70)의 제어에 따라 에코가 제거된 디지털 신호를 증폭하는 역할을 한다.The second amplifier 50 amplifies the echo-canceled digital signal under the control of the controller 70.

음성 인식기(60)는 제2 증폭기(50)에서 증폭된 디지털 신호를 이용해, 마이크(3)를 통해 에코와 함께 입력된 사용자(2)의 발화음성을 인식하는 역할을 한다. 실시예에 따라서 신호 처리 장치(100)는 마이크(3)를 통한 입력 신호를 처리하여 외부에 위치한 음성 인식기(60)로 전달하는 역할을 수행할 수 있고, 이와 달리 마이크(3)를 통한 입력 신호를 처리한 후 음성 인식기(60)를 이용해 직접 음성 인식을 수행할 수도 있다.The voice recognizer 60 recognizes the voice of the user 2 inputted together with the echo via the microphone 3 by using the digital signal amplified by the second amplifier 50. [ The signal processing apparatus 100 may process an input signal through the microphone 3 and transmit the processed signal to an external speech recognizer 60. Alternatively, It is possible to perform speech recognition directly using the speech recognizer 60. [

제어부(70)는 저장부(10), 제1 증폭기(20), 아날로그 디지털 변환기(30), 에코 제거기(40), 제2 증폭기(50) 및 음성 인식기(60)를 포함하는 신호 처리 장치(100)의 전반적인 동작을 제어하는 역할을 하며, 이를 위한 연산 유닛, 메모리, 프로그램 저장소 등을 포함한다.The control unit 70 includes a storage unit 10, a first amplifier 20, an analog-to-digital converter 30, an echo canceller 40, a second amplifier 50, 100, and includes an operation unit, a memory, a program storage, and the like for this purpose.

먼저 제어부(70)는 스피커(1)의 출력 볼륨값에 대응하는 마이크(3)의 아날로그 이득값 정보를 저장부(10)에 저장한다.First, the control unit 70 stores the analog gain value information of the microphone 3 corresponding to the output volume value of the speaker 1 in the storage unit 10.

스피커(1)의 출력 볼륨값은 스피커(1)로부터 출력되는 안내음성의 출력 레벨과 비례하는데, 스피커(1)의 출력 볼륨값이 너무 큰 경우 에코의 레벨 또한 매우 커지게 되며, 마이크(3)에 입력되어 디지털-아날로그 증폭기(30)를 거쳐 음성 인식을 위해 전달되는 에코 신호는 표현될 수 있는 한계를 넘어서 포화(saturation)되고, 최대값 또는 최소값으로 클리핑(clipping)되는 현상이 발생한다.The output volume value of the speaker 1 is proportional to the output level of the guidance voice output from the speaker 1. When the output volume value of the speaker 1 is too large, the level of the echo is also very large, And the echo signal transmitted through the digital-analog amplifier 30 for voice recognition is saturated beyond a limit that can be expressed, and clipping occurs at a maximum value or a minimum value.

도 2는 디지털-아날로그 증폭기(30)의 분해능(resolution)이 16bit인 경우에 스피커(1)로부터 출력된 안내음성을 나타내고, 도 3은 에코 형태의 안내음성이 마이크(3)로 입력된 모습을 나타낸다. 이 경우 스피커(1)의 출력 레벨이 크지 않으므로 클리핑이 일어나지 않는다.FIG. 2 shows a guidance voice output from the speaker 1 when the resolution of the digital-analog amplifier 30 is 16 bits. FIG. 3 shows a configuration in which a guide voice of an echo type is input to the microphone 3 . In this case, since the output level of the speaker 1 is not large, clipping does not occur.

반면 도 4는 스피커(1)의 출력 레벨이 커 클리핑이 일어난 상태를 나타낸다. 디지털-아날로그 증폭기(30)의 분해능이 16bit인 경우 최대값은 32767이고 최소값은 -32768인데, 스피커(1)의 출력 볼륨값이 너무 큰 경우 해당 범위를 넘어서는 포화 상태가 발생하고, 이에 따라 최대값 또는 최소값의 범위 내에서 클리핑이 일어나 비선형적인 왜곡이 발생한다. 그 결과 신호의 많은 정보가 사라지며 에코 제거기(40)에서 에코를 제거하기가 어려워지고, 후처리(post processing) 기술로 에코를 제거하는 경우 해당 에코와 겹쳐진 사용자(2)의 발화음성에 왜곡이 발생하게 된다.On the other hand, FIG. 4 shows a state in which the output level of the speaker 1 is clipped. When the resolution of the digital-analog amplifier 30 is 16 bits, the maximum value is 32767 and the minimum value is -32768. When the output volume value of the speaker 1 is too large, saturation exceeding the range occurs, Or clipping occurs within the range of the minimum value, resulting in nonlinear distortion. As a result, much information of the signal disappears, and it becomes difficult to remove the echo from the echo canceller 40, and when the echo is removed by the post processing technique, distortion of the voice of the user 2 overlapped with the echo .

따라서 안내음성에 따른 에코에 클리핑이 발생하는 것을 방지하는 것이 필요한데, 이를 위해 제어부(70)는 스피커(1)의 출력 볼륨값에 대응하는 마이크(3)의 아날로그 이득값 정보를 저장부(10)에 저장하되, 마이크(3)를 통해 입력되는 에코의 클리핑이 일어나는 스피커(1)의 출력 볼륨값에 대하여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 저장부(10)에 저장한다.For this purpose, the control unit 70 stores the analog gain value information of the microphone 3 corresponding to the output volume value of the speaker 1 in the storage unit 10, And stores the attenuated analog gain value in the storage unit 10 so that clipping does not occur with respect to the output volume value of the speaker 1 where clipping of the echo inputted through the microphone 3 occurs.

이때 제어부(70)는 사용자(2)의 발화 전에 스피커(1)로부터 안내음성이 출력되면, 마이크(3)로 입력되는 에코에 클리핑이 일어나는지 확인하고, 클리핑이 일어난 경우 해당 출력 볼륨값을 확인한다. 그리고 클리핑이 일어난 것이 확인된 출력 볼륨값에 대응하는 아날로그 이득값을 줄여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 계산한다. 이때 감쇄된 아날로그 이득값은 클리핑이 일어나지 않은 출력 볼륨값에 대응하는 아날로그 이득값보다 상대적으로 작은 값을 갖게 되며, 제어부(70)는 클리핑이 확인된 출력 볼륨값 및 대응하는 감쇄된 아날로그 이득값을 저장부(10)에 저장하여 제1 증폭기(20)의 증폭 동작 제어에 참조한다.At this time, if the guidance sound is outputted from the speaker 1 before the user 2 is ignited, the control unit 70 checks whether clipping occurs in the echo inputted to the microphone 3, and if the clipping occurs, the control unit 70 confirms the output volume value . And the attenuated analog gain value is calculated so that clipping does not occur by reducing the analog gain value corresponding to the detected output volume value. At this time, the attenuated analog gain value has a relatively smaller value than the analog gain value corresponding to the clipping-free output volume value, and the controller 70 outputs the clipped-confirmed output volume value and the corresponding attenuated analog gain value Is stored in the storage unit (10) and referred to the amplification operation control of the first amplifier (20).

이 경우 제어부(70)는 다음의 수학식을 이용하여 스피커(1)의 출력 볼륨값에 대응하는 아날로그 이득값을 계산하여 저장부(10)에 저장한다.In this case, the controller 70 calculates the analog gain value corresponding to the output volume value of the speaker 1 using the following equation and stores it in the storage unit 10.

인 경우

If

이때 AG는 아날로그 이득값, v는 출력 볼륨값, T는 클리핑이 일어나지 않는 최대 볼륨값, A는 감쇄되지 않은 아날로그 이득값, S는 스피커(1)의 출력 레벨을 나타낸다.In this case, AG represents the analog gain value, v represents the output volume value, T represents the maximum volume value at which clipping does not occur, A represents the non-attenuated analog gain value, and S represents the output level of the speaker (1).

도 5에 도시된 표는 이렇게 계산된 결과에 따라 제어부(70)에 의해 저장부(10)에 저장된 아날로그 이득값 정보를 나타내는데, 좌측 열(column)에는 스피커(1)의 출력 볼륨값이 위치하고, 중앙 열에는 스피커(1)의 출력 파워가 위치하며, 우측 열에는 각 출력 볼륨값에 대응하는 마이크(3)의 아날로그 이득값 정보가 위치한다.The table shown in FIG. 5 shows the analog gain value information stored in the storage unit 10 by the controller 70 according to the calculated result. The output volume value of the speaker 1 is located in the left column, The output power of the speaker 1 is located in the center column, and the analog gain value information of the microphone 3 corresponding to each output volume value is located in the right column.

도 5에서 스피커(1)의 출력 볼륨값이 1 내지 7인 경우에는 마이크(3)를 통해 입력된 에코가 포화(saturation)되지 않은 상태로서 감쇄되지 않은 일반적인 상태의 아날로그 이득값(A)을 나타낸다. 반면 스피커(1)의 출력 볼륨값이 8 내지 10인 경우에는 마이크(3)를 통해 입력된 에코가 포화되어 클리핑이 발생하는 상태이므로 일반적인 아날로그 이득값(A)보다 작은 값인 감쇄된 아날로그 이득값(B, B, D)을 나타낸다.5, when the output volume value of the speaker 1 is 1 to 7, the echo input through the microphone 3 is not saturated and is not attenuated, and represents a general state analog gain value A . On the other hand, when the output volume value of the speaker 1 is 8 to 10, since the echo inputted through the microphone 3 is saturated and clipping occurs, the attenuated analog gain value ( B, B, D).

상기한 수학식 1에 따르면 예를 들어 도 5에서 스피커(1)의 출력 볼륨값 8에 대응하는 아날로그 이득값은 B이다. 그리고 클리핑이 일어나지 않는 최대 볼륨값은 아날로그 이득값 A를 가지면서 볼륨값 중 최대인 값이므로 T는 7의 값을 가진다. 그리고 출력 볼륨값 7에서의 스피커(1)의 출력 레벨은 S(7)이고 출력 볼륨값 8에서의 스피커(1)의 출력 레벨은 S(8)이므로, 스피커(1)의 출력 볼륨값 8에 대응하는 아날로그 이득값 B는 다음과 같이 결정된다.According to Equation (1), for example, the analog gain value corresponding to the output volume value 8 of the speaker 1 in FIG. And the maximum volume value at which clipping does not occur is the maximum value of the volume value having the analog gain value A, so T has a value of 7. Since the output level of the speaker 1 at the output volume value 7 is S (7) and the output level of the speaker 1 at the output volume value 8 is S (8), the output level of the speaker 1 at the output volume value 8 The corresponding analog gain value B is determined as follows.

B ≤ A - [ S(8) - S(7) ] dBB? A - [S (8) - S (7)] dB

이때 [ S(8) - S(7) ] 부분은 출력 볼륨값이 7일 때의 스피커(1)의 출력 레벨에 비해 출력 볼륨값이 8일 때의 스피커(1)의 출력 레벨이 상대적으로 큰 정도를 의미하는데, 일반적인 아날로그 이득값 A에서 스피커(1) 출력이 커진 만큼의 이득 [ S(8) - S(7) ]을 줄임으로써, 출력 볼륨값 8일 때의 에코의 레벨을 클리핑이 발생하지 않는 레벨로 유지할 수 있다.At this time, the output level of the speaker 1 when the output volume value is 8 is relatively large compared to the output level of the speaker 1 when the output volume value is 7 (S (8) - S (7) (8) -S (7)] as the output of the speaker 1 becomes larger at a general analog gain value A, the level of the echo at the output volume value 8 is clipped It can be maintained at a level that does not.

마찬가지 방식으로 스피커(1)의 출력 볼륨값 9에 대응하는 아날로그 이득값 C는 다음과 같이 결정된다.Similarly, the analog gain value C corresponding to the output volume value 9 of the speaker 1 is determined as follows.

C ≤ A - [ S(9) - S(7) ] dBC? A - [S (9) - S (7)] dB

또한 스피커(1)의 출력 볼륨값 10에 대응하는 아날로그 이득값 D는 다음과 같이 결정된다.The analog gain value D corresponding to the output volume value 10 of the speaker 1 is determined as follows.

D ≤ A - [ S(10) - S(7) ] dBD? A - [S (10) - S (7)] dB

이렇게 제어부(70)가 저장부(10)에 스피커(1)의 출력 볼륨값에 대응하는 마이크(3)의 아날로그 이득값 정보를 저장한 후 스피커(1)로부터 안내음성이 출력되면, 제어부(70)는 안내음성을 출력하는 스피커(1)의 출력 볼륨값을 확인한 후 저장부(10)에 저장된 정보를 참조하여 대응하는 아날로그 이득값을 확인한다. 그리고 제어부(70)는 확인된 아날로그 이득값에 따라 제1 증폭기(20)를 제어하여 마이크(3)를 통해 입력된 아날로그 신호를 증폭한다. 이에 따라 제1 증폭기(20)에서 증폭된 신호에는 클리핑이 발생하지 않게 된다.After the control unit 70 stores the analog gain value information of the microphone 3 corresponding to the output volume value of the speaker 1 in the storage unit 10 and then outputs the guidance voice from the speaker 1, Confirms the output volume value of the speaker 1 that outputs the guidance voice, and then confirms the corresponding analog gain value by referring to the information stored in the storage unit 10. [ The control unit 70 controls the first amplifier 20 according to the identified analog gain value to amplify the analog signal input through the microphone 3. As a result, no clipping occurs in the signal amplified by the first amplifier 20.

제1 증폭기(20)에서 클리핑이 발생하지 않도록 증폭된 아날로그 신호는 아날로그-디지털 변환기(30)에서 디지털 신호로 변환되고, 에코 제거기(40)에서 안내음성에 대응되는 에코가 제거된다.The amplified analog signal is converted into a digital signal by the analog-to-digital converter 30 so that clipping does not occur in the first amplifier 20, and the echo corresponding to the guidance voice is removed from the echo canceler 40.

이후 제어부(70)는 제2 증폭기(50)를 제어하여 에코가 제거된 디지털 신호를 증폭하는데, 이때 클리핑이 일어나지 않도록 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하도록 제2 증폭기(50)의 증폭 동작을 제어한다.Thereafter, the controller 70 controls the second amplifier 50 to amplify the echo-canceled digital signal. At this time, the amplification operation of the second amplifier 50 is performed to recover the digital signal as much as the analog signal is attenuated, .

이 경우 제어부(70)는 다음의 수학식을 이용해 제1 증폭기(20)에서 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하기 위한 디지털 이득값을 계산하고, 계산된 디지털 이득값에 따라 제2 증폭기(50)에서 디지털 신호를 증폭하도록 제어한다.In this case, the control unit 70 calculates a digital gain value for restoring the digital signal as much as the analog signal is attenuated in the first amplifier 20 by using the following equation, and outputs the digital gain value to the second amplifier 50 to amplify the digital signal.

인 경우

If

이 경우 DG는 디지털 이득값, AG는 아날로그 이득값, v는 출력 볼륨값, T는 클리핑이 일어나지 않는 최대 볼륨값, A는 감쇄되지 않은 아날로그 이득값을 나타낸다.In this case, DG represents the digital gain value, AG represents the analog gain value, v represents the output volume value, T represents the maximum volume value at which clipping does not occur, and A represents the non-attenuated analog gain value.

즉 제어부(70)는 안내음성을 출력하는 스피커(1)의 현재 출력 볼륨값이, 클리핑이 일어나지 않는 최대 볼륨값인 T 이하인 경우라면, 제1 증폭기(20)를 제어할 때 참조한 아날로그 이득값에 감쇄분이 없으므로 디지털 이득값이 0으로 계산하여 제2 증폭기(50)가 디지털 신호를 증폭하지 않도록 제어한다.That is, if the current output volume value of the speaker 1 for outputting the guidance voice is equal to or less than T, which is the maximum volume value at which clipping does not occur, the controller 70 sets the analog gain value referenced when controlling the first amplifier 20 to Since there is no attenuation, the digital gain value is calculated as 0 so that the second amplifier 50 does not amplify the digital signal.

반면 제어부(70)는 안내음성을 출력하는 스피커(1)의 현재 출력 볼륨값이, 클리핑이 일어나지 않는 최대 볼륨값인 T를 초과하는 경우라면, 제1 증폭기(20)를 제어할 때 참조한 아날로그 이득값에 감쇄분이 존재하므로 제2 증폭기(50)가 아날로그 신호의 감쇄분만큼 디지털 신호를 증폭하도록 디지털 이득값을 계산하고, 계산된 디지털 이득값에 따라 제2 증폭기(50)를 제어하여 디지털 신호를 증폭한다.On the other hand, if the current output volume value of the speaker 1 outputting the announcement voice exceeds the maximum volume value T that does not cause clipping, the control unit 70 determines that the analog gain referred to when controlling the first amplifier 20 The second amplifier 50 calculates a digital gain value so as to amplify the digital signal by an amount corresponding to the attenuation of the analog signal and controls the second amplifier 50 according to the calculated digital gain value Amplify.

이때 제2 증폭기(50)에서 증폭된 신호는 에코가 제거된 신호로서 음성 인식기(60)로 전달되어 음성을 인식되는데 이용된다.At this time, the signal amplified by the second amplifier 50 is transmitted to the voice recognizer 60 as an echo-canceled signal, and is used to recognize the voice.

이때 음성 인식기(60)는 신호 처리 장치(100)에 포함되어 구성될 수 있고, 실시예에 따라서는 외부에 별도로 존재할 수도 있다. 음성 인식기(60)가 신호 처리 장치(100)의 외부에 존재하는 경우라면, 제어부(70)는 제2 증폭기(50)에서 증폭된 디지털 신호를 외부의 음성 인식기(60)로 전달하여 음성 인식이 진행되도록 지원할 수 있다.At this time, the voice recognizer 60 may be included in the signal processing apparatus 100, and may be separately provided in some embodiments. The controller 70 transmits the digital signal amplified by the second amplifier 50 to the external speech recognizer 60 so that speech recognition is performed Can be supported.

본 발명에 따라 바지-인 음성 인식을 위해 신호를 처리하는 과정에 대해서는 도 6 및 도 7을 참조하여 상세하게 설명하기로 한다.The process of processing a signal for speech recognition in accordance with the present invention will be described in detail with reference to FIG. 6 and FIG.

도 6은 본 발명의 일 실시예에 따라 신호 처리를 위한 이득값을 계산하는 과정을 나타낸 도면이다.6 is a diagram illustrating a process of calculating a gain value for signal processing according to an embodiment of the present invention.

도 6을 참조하면, 바지-인 음성 인식을 위한 신호 처리 장치는 스피커로부터 안내음성이 출력되면(S1), 마이크를 통해 입력되는 에코에 클리핑이 일어나는 출력 볼륨값을 확인한다(S2).Referring to FIG. 6, a signal processing device for pants-in speech recognition recognizes an output volume value at which clipping occurs in an echo input through a microphone (S2) when a guidance voice is output from a speaker (S1).

그리고 신호 처리 장치는 클리핑이 확인된 스피커의 출력 볼륨값별로, 각 출력 볼륨값에 대응하는 아날로그 이득값을 줄여 클리핑이 일어나지 않도록 감쇄된 아날로그 이득값을 계산하고(S3), 클리핑이 확인된 출력 볼륨값과 대응하는 감쇄된 아날로그 이득값 및 신호 복원을 위한 디지털 이득값을 저장한다(S4).Then, the signal processing apparatus calculates an attenuated analog gain value so as to prevent clipping by reducing the analog gain value corresponding to each output volume value of the clipped speaker output volume (S3) Value and a corresponding digital gain value for restoring the attenuated analog gain value (S4).

단계(S3)에서 신호 처리 장치는 상기한 수학식 1을 이용해 스피커의 출력 볼륨값에 대응하는 아날로그 이득값을 계산하여 단계(S4)에서 내부의 저장소에 출력 볼륨값과 매칭되는 아날로그 이득값 및 대응하는 디지털 이득값을 저장할 수 있다.In step S3, the signal processing apparatus calculates an analog gain value corresponding to the output volume value of the speaker by using Equation (1) described above. In step S4, the analogue gain value matching with the output volume value and the corresponding Can be stored.

도 7은 도 6의 실시예에 따라 신호를 처리하는 과정을 나타낸 도면이다.7 is a diagram illustrating a process of processing a signal according to the embodiment of FIG.

도 7을 참조하면, 도 6의 과정을 통해 마이크 입력 신호의 처리를 위한 아날로그 이득값 및 디지털 이득값을 저장한 신호 처리 장치는, 스피커로부터 안내음성이 출력되면(S5), 안내음성이 출력되는 스피커의 출력 볼륨값을 확인하고, 단계(S4)에서 저장한 정보를 참조하여 확인된 출력 볼륨값에 대응하는 아날로그 이득값 및 디지털 이득값을 확인한다(S6).Referring to FIG. 7, in the signal processing device storing the analog gain value and the digital gain value for processing the microphone input signal through the process of FIG. 6, when a guidance voice is output from the speaker (S5) The output volume value of the speaker is checked, and the analog gain value and the digital gain value corresponding to the confirmed output volume value are checked by referring to the stored information in step S4 (S6).

그리고 신호 처리 장치는 확인된 아날로그 이득값에 따라 마이크를 통해 입력된 아날로그 신호를 증폭하고(S7), 아날로그-디지털 변환기를 이용해 증폭된 아날로그 신호를 디지털 신호로 변환한다(S8).Then, the signal processor amplifies the analog signal inputted through the microphone according to the identified analog gain value (S7), and converts the amplified analog signal into the digital signal using the analog-digital converter (S8).

이후 신호 처리 장치는 단계(S8)에서 변환된 디지털 신호에서 에코 형태의 안내음성을 제거하고(S9), 단계(S6)에서 확인한 디지털 이득값에 따라 에코가 제거된 디지털 신호를 증폭한다(S10).Thereafter, the signal processing apparatus removes the echo-type announcement sound from the digital signal converted in step S8 (S9), and amplifies the echo-canceled digital signal according to the digital gain value confirmed in step S6 (step S10) .

단계(S10)에서 신호 처리 장치는 단계(S3)에서 아날로그 이득값을 감쇄한 만큼 디지털 신호를 복구하도록 디지털 신호를 증폭하며, 상기한 수학식 2를 이용해 아날로그 신호가 감쇄된 만큼 디지털 신호를 복구하기 위한 디지털 이득값을 계산하고, 계산된 디지털 이득값에 따라 에코가 제거된 디지털 신호를 증폭할 수 있다.In step S10, the signal processing device amplifies the digital signal by restoring the digital signal by attenuating the analog gain value in step S3, and restores the digital signal as much as the analog signal is attenuated by using Equation (2) And amplifies the echo canceled digital signal according to the calculated digital gain value.

그리고 증폭된 디지털 신호는 음성을 인식하는데 활용된다(S11).The amplified digital signal is used to recognize a voice (S11).

이와 같이 본원발명에서는 스피커에서 출력된 안내 음성이 에코 형태로 마이크를 통해 입력되면, 마이크의 이득을 적절하게 조절하여 클리핑을 방지함으로써, 사용자의 발화음성과 섞인 에코를 효과적으로 제거할 수 있으며, 음성 인식의 정확성이 개선된다.As described above, according to the present invention, when the guidance voice output from the speaker is input through the microphone in the form of echo, the gain of the microphone is appropriately adjusted to prevent clipping, thereby effectively removing echo mixed with the user's uttered voice, Is improved.

본 발명의 실시예에 따른 바지-인 음성 인식을 위한 신호 처리 방법은 다양한 컴퓨터 수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다.The signal processing method for pants-in speech recognition according to an embodiment of the present invention may be implemented in a form of a program readable by various computer means and recorded in a computer-readable recording medium.

한편, 본 명세서와 도면에 개시된 실시예들은 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형예들이 실시 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게는 자명한 것이다. 또한, 본 명세서와 도면에서 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다.It should be noted that the embodiments disclosed in the present specification and drawings are only illustrative of specific examples for the purpose of understanding, and are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art that other modifications based on the technical idea of the present invention are possible in addition to the embodiments disclosed herein. Furthermore, although specific terms are used in this specification and the drawings, they are used in a generic sense only to facilitate the description of the invention and to facilitate understanding of the invention, and are not intended to limit the scope of the invention.

10: 저장부 20: 제1 증폭기
30: 아날로그-디지털 변환기 40: 에코 제거기
50: 제2 증폭기 60: 음성 인식기
70: 제어부 100: 신호 처리 장치10: storage unit 20: first amplifier
30: analog-to-digital converter 40: echo canceller
50: second amplifier 60: speech recognizer
70: control unit 100: signal processing device

Claims

A storage unit for storing analog gain value information of a microphone corresponding to an output volume value of a speaker for outputting a guidance voice;
A first amplifier for amplifying an analog signal inputted through the microphone according to an analog gain value stored in the storage unit;
An analog-to-digital converter for converting the analog signal amplified by the first amplifier into a digital signal;
An echo canceller for removing an echo corresponding to a guidance voice from the digital signal converted by the analog-to-digital converter;
A second amplifier for amplifying the echo-canceled digital signal; And
Wherein the analog gain value information of the microphone corresponding to the output volume value of the speaker is stored in the storage unit so that clipping of the output volume value of the speaker where clipping of the echo inputted through the microphone occurs, Stores the gain value in the storage unit and, when the output volume value of the speaker for outputting the guidance voice exceeds a preset value, confirms the corresponding attenuated analog gain value in the storage unit, A controller for controlling the first amplifier to amplify an analog signal input through the microphone and for controlling the second amplifier to recover a digital signal as the analog signal is attenuated;
And a voice recognition unit for recognizing the voice of the user.

The method according to claim 1,
Wherein,
And a control unit for checking an output volume value at which clipping occurs in the echo input through the microphone when the guidance voice is output from the speaker and decreasing the analog gain value corresponding to the clipped output volume value to reduce the attenuated analog gain value And stores the clipped output volume value and the corresponding attenuated analog gain value in the storage unit. &Lt; Desc / Clms Page number 20 >

The method according to claim 1,
Wherein,
Wherein the analog gain value corresponding to the output volume value of the speaker is calculated using the following equation and stored in the storage unit.

If

If
AG: Analog gain value
v: Output volume value
T: Maximum volume value at which clipping does not occur
A: Undamped analog gain value
S: Output level of speaker

The method of claim 3,
Wherein,
Calculating a digital gain value for recovering the digital signal as much as the analog signal is attenuated in the first amplifier using the following equation and controlling the amplifier to amplify the digital signal according to the calculated digital gain value Characterized in that the signal processing device for speech recognition of pants.

If

If
DG: Digital gain value
AG: Analog gain value
v: Output volume value
T: Maximum volume value at which clipping does not occur
A: Undamped analog gain value

The method according to claim 1,
A voice recognizer for recognizing a speech voice inputted together with the echo through the microphone using the digital signal amplified by the second amplifier;
And a voice recognition unit for recognizing the voice of the user.

The signal processing apparatus stores the analog gain value information of the microphone corresponding to the output volume value of the speaker for outputting the guidance voice so that clipping does not occur with respect to the output volume value of the speaker where clipping of the echo inputted through the microphone occurs Storing an attenuated analog gain value;
Confirming a corresponding attenuated analog gain value when the output volume value of the speaker from which the signal processing apparatus outputs the guidance voice exceeds a predetermined value;
Amplifying an analog signal input through the microphone according to an identified analog gain value;
Converting the amplified analog signal into a digital signal;
Removing the echo corresponding to the announcement voice in the converted digital signal; And
Amplifying a digital signal from which the echo is removed so that the signal processing device recovers the digital signal as much as the analog signal is attenuated;
/ RTI > The method of claim < RTI ID = 0.0 > 1, < / RTI >

The method according to claim 6,
Wherein the storing step comprises:
Checking the output volume value at which clipping occurs in the echo input through the microphone when the signal processing apparatus outputs a guidance voice from the speaker;
Calculating an attenuated analog gain value such that clipping does not occur by reducing the analog gain value corresponding to the clipped output volume value; And
Storing the clipped output volume value and the corresponding attenuated analog gain value;
And a voice recognition unit for recognizing the voice of the user.

The method according to claim 6,
Wherein the storing step comprises:
Wherein the signal processing apparatus calculates and stores an analog gain value corresponding to an output volume value of the speaker using the following equation.

If

9. The method of claim 8,
Wherein the amplifying comprises:
Wherein the signal processor amplifies the echo-canceled digital signal by calculating a digital gain value for restoring the digital signal as much as the analog signal is attenuated using the following equation: Signal processing method.

If

The method according to claim 6,
Recognizing a speech uttered with the echo through the microphone using the amplified digital signal;
Further comprising the steps of: (a) inputting a speech signal to the speech recognition unit;

A computer-readable recording medium having recorded thereon a program for performing a signal processing method for pants-in speech recognition according to any one of claims 6 to 10.