KR20180056284A

KR20180056284A - Apparatus and method for speech recognition

Info

Publication number: KR20180056284A
Application number: KR1020160154375A
Authority: KR
Inventors: 양태영; 박민수
Original assignee: 주식회사 인텔로이드
Priority date: 2016-11-18
Filing date: 2016-11-18
Publication date: 2018-05-28
Also published as: KR101863098B1

Abstract

The present invention relates to an apparatus for recognizing a voice and a method thereof and, more particularly, to an apparatus for recognizing a voice and a method thereof, capable of precisely recognizing the voice by using a noise pattern included in a sound. According to an embodiment of the present invention, provided is the apparatus for recognizing a voice, which includes a sound collecting unit for collecting a sound for voice recognition and a processor for determining a noise pattern matched with the collected sound based on a partial time interval of the noise pattern and a partial time interval of the collected sound and recognizing the voice from the collected sound by using the determined noise pattern.

Description

[0001] APPARATUS AND METHOD FOR SPEECH RECOGNITION [0002]

본 발명은 음성 인식 장치 및 방법에 관한 것으로, 더욱 상세하게는 음향에 포함된 잡음 패턴을 이용하여 정밀한 음성 인식을 수행하는 음성 인식 장치 및 방법에 관한 것이다.The present invention relates to a speech recognition apparatus and method, and more particularly, to a speech recognition apparatus and method for performing accurate speech recognition using a noise pattern included in a sound.

음성 인식 기술은 사용자와 단말기(또는 기계) 간의 상호작용이 보다 원활하게 이루어지도록 할 수 있는 핵심기술 중 하나이다. 음성 인식 기술을 통해 단말기는 사용자의 음성을 듣고, 사용자의 음성을 이해할 수 있으며, 이해한 내용을 바탕으로 사용자에게 적절한 서비스를 제공할 수도 있다. 이에 따라, 사용자는 별도의 조작 없이도 단말기에 대하여 사용자가 원하는 서비스를 직관적으로 요청할 수 있다.Speech recognition technology is one of the key technologies that can facilitate the interaction between the user and the terminal (or machine) more smoothly. Through the speech recognition technology, the terminal can listen to the user's voice, understand the user's voice, and provide appropriate services to the user based on the understanding. Accordingly, the user can intuitively request a service desired by the user to the terminal without any additional operation.

음성 인식을 수행할 때 음성이 포함된 음향의 잡음 특성에 따라 인식률 또는 음성 인식의 정확도가 결정될 수 있다. 간단한 예로써, 배경 잡음의 잡음 레벨(noise level)이 높거나 신호 대 잡음 비가 매우 낮은 경우, 음성 인식을 수행하기가 함들 수 있다. 또는, 주파수 대역에 따른 에너지 분포, 특정 주파수 대역에 에너지가 집중되어있는지 여부 등의 특성에 따라 음성 인식의 성공률이 변화될 수 있다. 만약 사람의 음성과 연관된 주파수 대역에 많은 에너지가 집중된 배경 잡음이 음성과 혼재되어 있다면 음성 인식 장치는 상기 배경 잡음 및 상기 음성을 포함하는 음향에 대해서 음성 인식을 용이하게 수행할 수 없을 것이다. 이 외에도 음성 인식에 있어서 잡음 패턴 등 다양한 배경 잡음의 특성이 고려될 필요가 있다. 따라서, 배경 잡음의 특성에 대응하여, 음성 인식의 정확도를 높이기 위한 방안에 대한 연구가 필요하다.When performing speech recognition, the recognition rate or the accuracy of speech recognition can be determined according to the noise characteristics of the sound including the speech. As a simple example, speech recognition may be performed when the noise level of the background noise is high or the signal-to-noise ratio is very low. Alternatively, the success rate of the speech recognition may be changed depending on the characteristics such as the energy distribution according to the frequency band, whether energy is concentrated in a specific frequency band, and the like. If the background noise having a large energy concentration in the frequency band associated with the human voice is mixed with the speech, the speech recognition apparatus will not be able to easily perform speech recognition on the background noise and the sound including the speech. In addition, various background noise characteristics such as noise pattern in speech recognition need to be considered. Therefore, it is necessary to study a method for improving the accuracy of speech recognition in response to the characteristics of background noise.

본 발명은 상기와 같은 문제점을 해결하기 위해 안출된 것으로서, 본 발명의 목적은 배경 잡음의 특성을 고려한 음성 인식 기법을 제공하는데 있다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech recognition technique that takes into account the characteristics of background noise.

상기와 같은 과제를 해결하기 위한 본 발명의 실시예에 따르면, 음성 인식을 위한 음향을 수집하는 음향 수집부; 및 상기 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정하고, 상기 결정된 잡음 패턴을 이용하여 상기 수집한 음향으로부터 음성을 인식하는 프로세서를 포함하는 음성 인식 장치를 제공할 수 있다.According to another aspect of the present invention, there is provided an apparatus for generating sound, comprising: an acoustic collector for collecting sounds for speech recognition; And a processor for determining a noise pattern matching the collected sound on the basis of a part of time interval of the collected sound and a part of time period of the noise pattern and recognizing a voice from the collected sound using the determined noise pattern It is possible to provide a speech recognition apparatus including the speech recognition apparatus.

바람직하게는, 상기 프로세서는 상기 수집한 음향이 수집된 시간 및 장소 중 적어도 어느 하나를 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Preferably, the processor determines a noise pattern matching the collected sound based on at least one of a time and a place at which the collected sound is collected.

바람직하게는, 상기 프로세서는 상기 수집한 음향이 수집된 요일 및 계절 중 적어도 어느 하나를 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Preferably, the processor determines a noise pattern matched to the collected sounds based on at least one of the days and seasons on which the collected sounds are collected.

바람직하게는, 상기 음성 인식 장치는 적어도 하나의 외부 장치를 제어하고, 상기 프로세서는 상기 적어도 하나의 외부 장치의 제어에 따라 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Preferably, the speech recognition apparatus controls at least one external apparatus, and the processor determines a noise pattern matching the collected sound under the control of the at least one external apparatus.

바람직하게는, 상기 프로세서는 상기 적어도 하나의 외부 장치 중 상기 음성 인식 장치에 의해 턴 온(turn on)된 외부 장치에 따라 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Advantageously, the processor determines a noise pattern that matches the collected sound according to an external device turned on by the speech recognition device of the at least one external device.

바람직하게는, 상기 프로세서는 상기 적어도 하나의 외부 장치 중 어느 하나의 외부 장치를 턴 온한 경우, 상기 턴 온된 외부 장치가 턴 오프(turn off)될 때까지 잡음 패턴을 수집한다.Advantageously, the processor collects a noise pattern until the turned-on external device is turned off when the external device of any one of the at least one external device is turned on.

바람직하게는, 상기 음성 인식 장치는 영상을 감지하는 영상 수집부를 더 포함하고, 상기 프로세서는 상기 감지한 영상을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Preferably, the speech recognition apparatus further includes an image collection unit that senses an image, and the processor determines a noise pattern matching the collected sounds based on the sensed image.

바람직하게는, 상기 프로세서는 상기 감지한 영상으로부터 턴 온된 외부 장치를 식별한다.Advantageously, the processor identifies an external device that is turned on from the sensed image.

바람직하게는, 상기 프로세서는 상기 감지한 영상으로부터 상기 수집된 음향에 포함된 음성의 화자의 위치를 식별하고, 상기 식별한 화자의 위치를 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정한다.Preferably, the processor identifies the position of the speaker of the speech included in the collected sound from the sensed image, and determines a noise pattern matching the collected sound based on the position of the identified speaker.

바람직하게는, 상기 음성 인식 장치는 다른 음성 인식 장치와 정보를 송수신하는 송수신부를 더 포함하고, 상기 프로세서는 상기 송수신부를 통해 동일한 네트워크에 접속된 다른 음성 인식 장치로부터 상기 잡음 패턴을 수신한다.Preferably, the speech recognition apparatus further includes a transmission / reception section for transmitting / receiving information with another voice recognition apparatus, and the processor receives the noise pattern from another voice recognition apparatus connected to the same network through the transmission / reception section.

바람직하게는, 상기 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간의 길이는 주위 소음 크기에 따라 결정된다.Preferably, the length of the part of the time interval of the collected sound and the part of the time period of the noise pattern is determined according to the ambient noise magnitude.

본 발명의 다른 실시예에 따르면, 음성 인식을 위한 음향을 수집하는 단계; 상기 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정하는 단계; 및 상기 결정된 잡음 패턴을 이용하여 상기 수집한 음향으로부터 음성을 인식하는 단계를 포함하는 음성 인식 방법이 제공될 수 있다.According to another embodiment of the present invention, there is provided a method comprising: collecting sound for speech recognition; Determining a noise pattern matched to the collected sound based on a part of time interval of the collected sound and a part of time interval of the noise pattern; And recognizing speech from the collected sounds using the determined noise pattern.

본 발명의 실시예에 따르면, 수집된 음향에 매칭되는 잡음 패턴을 결정할 수 있고, 상기 잡음 패턴에 기초한 효과적인 음성 인식을 수행할 수 있다.According to the embodiment of the present invention, it is possible to determine a noise pattern matching the collected sound, and to perform effective speech recognition based on the noise pattern.

또한, 본 발명의 실시예에 따르면 잡음 패턴을 추가적으로 수집하거나 외부 기기로부터 수신할 수 있으며, 상기 비교 대상인 잡음 패턴의 종류/수의 증가에 따라 음성 인식의 정확도를 높일 수 있다.In addition, according to the embodiment of the present invention, the noise pattern can be additionally collected or received from an external device, and the accuracy of speech recognition can be increased according to the increase of the number / type of noise pattern to be compared.

도 1은 본 발명의 실시예에 따른 음성 인식 장치를 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 잡음 패턴의 매칭 방식을 나타낸 도면이다.
도 3은 본 발명의 실시예에 따른 잡음 패턴의 결정 방식을 나타낸 도면이다.
도 4는 본 발명의 실시예에 따른 잡음 패턴의 수집 방식을 나타낸 도면이다.
도 5는 본 발명의 실시예에 따른 음성 인식 방법을 나타낸 도면이다.1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
2 is a diagram illustrating a noise pattern matching method according to an embodiment of the present invention.
3 is a diagram illustrating a method of determining a noise pattern according to an embodiment of the present invention.
4 is a diagram illustrating a noise pattern collecting method according to an embodiment of the present invention.
5 is a diagram illustrating a speech recognition method according to an embodiment of the present invention.

본 발명은 음성 인식 장치 및 방법에 관한 것으로, 더욱 상세하게는 음향에 포함된 잡음 패턴을 이용하여 정밀한 음성 인식을 수행하는 음성 인식 장치 및 방법에 관한 것이다. 이하, 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.The present invention relates to a speech recognition apparatus and method, and more particularly, to a speech recognition apparatus and method for performing accurate speech recognition using a noise pattern included in a sound. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 음성 인식 장치를 나타낸 도면이다. 도 1에 따르면, 본 발명의 실시예에 따른 음성 인식 장치(100)는 음향 수집부(120) 및 프로세서(110)를 포함할 수 있다. 본 발명을 실시하는 방식에 따라서, 상기 음향 수집부(120) 및 프로세서(110)는 하나의 구성요소로 구현되거나 상기 음향 수집부(120)가 생략될 수도 있다.1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention. Referring to FIG. 1, a speech recognition apparatus 100 according to an embodiment of the present invention may include a sound collection unit 120 and a processor 110. According to the method of implementing the present invention, the sound collecting unit 120 and the processor 110 may be implemented as one component or the sound collecting unit 120 may be omitted.

음향 수집부(120)는 음성 인식을 위한 음향을 수집할 수 있다. 음향 수집부(120)는 음향을 전기적 신호인 음향 신호(또는 음성 신호)로 변환하는 마이크 등의 음향 수집 수단을 포함할 수 있다. 음향 수집부(120)는 상기 음향 수집 수단을 통해 상기 음성 인식 장치(100)의 주변 환경의 음향을 수집할 수 있다. 이때, 상기 음향은 배경 잡음 또는 음성을 포함할 수 있다.The sound collection unit 120 may collect sounds for speech recognition. The sound collection unit 120 may include a sound collection means such as a microphone for converting sounds into sound signals (or voice signals), which are electrical signals. The sound collecting unit 120 may collect the sound of the surroundings of the speech recognition apparatus 100 through the sound collecting unit. At this time, the sound may include background noise or speech.

프로세서(110)는 음성 인식 장치(100)의 전반적인 작동을 제어한다. 프로세서(110)는 각종 데이터와 신호의 연산 및 처리를 수행하고 음성 인식 장치(100)의 각 구성 요소를 제어할 수 있다. 프로세서(110)는 반도체 칩 또는 전자 회로 형태의 하드웨어로 구현되거나 상기 하드웨어를 제어하는 소프트웨어로 구현될 수 있으며, 상기 하드웨어와 상기 소프트웨어가 결합된 형태로 구현될 수도 있다.The processor 110 controls the overall operation of the speech recognition apparatus 100. The processor 110 may perform arithmetic processing and processing of various data and signals and may control each component of the voice recognition apparatus 100. The processor 110 may be implemented as hardware implemented in the form of a semiconductor chip or an electronic circuit, or software that controls the hardware, and may be implemented as a combination of the hardware and the software.

본 발명의 바람직한 실시예에 따르면, 상기 프로세서(110)는 상기 음향 수집부(110)를 통해 음향을 수집할 수 있으며, 상기 수집한 음향(또는 상기 음향이 전기 신호의 형태로 변환된 음향 신호/음성 신호)의 일부 시간 구간과 잡음 패턴의 일부 시간 구간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 그리고, 프로세서(110)는 상기 결정된 잡음 패턴을 이용하여 상기 수집한 음향으로부터 음성을 인식할 수 있다. 상기 음성 인식 장치(100)의 잡음 패턴의 매칭 방식 및 음성 인식 방식에 대한 상세한 설명은 도 2 내지 도 4를 설명할 때 다루도록 한다.According to a preferred embodiment of the present invention, the processor 110 may collect sound through the sound collecting unit 110, and the collected sound (or the sound signal / sound converted into the form of an electric signal, A voice signal) and a part of a time period of the noise pattern, the noise pattern matching the collected sound can be determined. Then, the processor 110 can recognize the voice from the collected sounds using the determined noise pattern. A detailed description of a noise pattern matching method and a speech recognition method of the speech recognition apparatus 100 will be described with reference to FIGS. 2 to 4. FIG.

도 2는 본 발명의 실시예에 따른 잡음 패턴의 매칭 방식을 나타낸 도면이다. 도 2에서 좌측의 n_1 내지 n_N은 잡음 패턴 1 내지 잡음 패턴 N을 나타내고, 우측의 s는 수집된 음향을 나타낸다. 여기서, 상기 s는 상기 음향이 전기 신호의 형태로 변환된 음향 신호/음성 신호를 의미할 수 있으며, 잡음 패턴은 시영역(time domain)에서의 배경 잡음의 형상을 의미할 수 있다. 또는, 상기 잡음 패턴은 배경 잡음 신호를 주파수 대역에 따른 에너지 분포의 변화를 시간의 흐름에 따라 나타낸 스펙트로그램(spectrogram)의 한 종류일 수 있다. 또는, 상기 잡음 패턴은 배경 잡음 신호의 각 주파수 성분을 나타내는 계수(예를 들어 STFT(Short Time Fourier Transform)의 계수)들을 시간의 흐름에 따라 나타낸 것일 수 있다. 상기 잡음 패턴은 음성 인식 장치가 자체적으로 보유하고 있는 신호일 수 있으나 이에 한정되지 않으며, 음성 인식 장치는 외부 기기 또는 외부 저장소로부터 잡음 패턴을 수신하고, 수신된 잡음 패턴을 음성 인식에 이용할 수도 있다.2 is a diagram illustrating a noise pattern matching method according to an embodiment of the present invention. 2, n_1 to n_N on the left side represent the noise pattern 1 to the noise pattern N, and s on the right side represents the collected sound. Here, s may denote an acoustic signal / voice signal in which the sound is converted into an electric signal, and a noise pattern may denote a shape of a background noise in a time domain. Alternatively, the noise pattern may be a kind of a spectrogram representing a change in energy distribution according to a time-series of a background noise signal according to a frequency band. Alternatively, the noise pattern may be a coefficient indicating a frequency component of the background noise signal (for example, a coefficient of a short time fourier transform (STFT)) in accordance with the passage of time. The noise pattern may be a signal held by the speech recognition apparatus itself, but the present invention is not limited thereto. The speech recognition apparatus may receive a noise pattern from an external device or an external storage, and use the received noise pattern for speech recognition.

전술한 바에 따르면, 음성 인식 장치는 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 즉, 음성 인식 장치는 수집된 음향 전체와 잡음 패턴 전체를 비교할 수도 있다. 하지만, 보다 효율적인 잡음 패턴 매칭을 위해 음성 인식 장치는 기 설정된 길이 L에 해당하는 음향 및 잡음 패턴을 상호 비교할 수 있다. 이때, 음성 인식 장치는 음향 및 잡음 패턴의 동일한 시간 구간을 상호 비교할 수 있다. 도 2에 따르면, 음성 인식 장치는 잡음 패턴 1 내지 잡음 패턴 N과 수집된 음향 s를 개별적으로 비교할 수 있다. 본 발명의 바람직한 실시예에 따르면, 수집된 음향에서 화자의 음성이 포함되어있지 않은 부분(예를 들어, 음향의 시작부의 0 초 내지 0.5 초 사이, 상기 음향이 변환된 음향 신호의 1 내지 5 프레임 등)과 상기 잡음 패턴이 상호 비교될 수 있다. 여기서, 음성 인식 장치는 잡음 패턴 1 내지 잡음 패턴 N의 0 내지 L의 시간 구간과 수집된 음향 s의 0 내지 L의 시간 구간을 상호 비교할 수 있다. 물론, 도 2에 도시된 매칭 방식은 예시에 불과한 것으로, 잡음 패턴과 수집된 음향이 비교되는 일부 시간 구간은 도 2에 한정되지 않는다.According to the above description, the speech recognition apparatus can determine a noise pattern matched to the collected sound based on a part of time interval of the collected sound and a part of time interval of the noise pattern. In other words, the speech recognition apparatus may compare the whole of the collected sound with the entire noise pattern. However, for more efficient noise pattern matching, the speech recognition apparatus can compare the acoustic and noise patterns corresponding to the predetermined length L. [ At this time, the speech recognition apparatus can compare the same time intervals of the acoustic and noise patterns. According to Fig. 2, the speech recognition apparatus can individually compare the noise pattern 1 to the noise pattern N and the collected sound s. According to a preferred embodiment of the present invention, in a part of the collected sound in which the speech of the speaker is not included (for example, between 0 second and 0.5 second of the beginning of the sound, 1 to 5 frames Etc.) and the noise pattern can be compared with each other. Here, the speech recognition apparatus can compare the time interval of 0 to L of the noise pattern 1 to the noise pattern N with the time interval of 0 to L of the collected sound s. Of course, the matching method shown in Fig. 2 is merely an example, and some time intervals in which the noise pattern and the collected sound are compared are not limited to Fig.

본 발명의 실시예에 따르면, 음성 인식 장치는 잡음 패턴 1 내지 잡음 패턴 N의 일부 시간 구간과 수집된 음향 s의 일부 시간 구간의 유사도를 비교할 수 있으며, 상기 수집된 음향 s의 일부 시간 구간과 가장 높은 유사도를 나타내는 잡음 패턴을 판별할 수 있다. 여기서, 상기 유사도 비교 방식은 수집된 음향의 일부 시간 구간과 잡음 패턴의 상관도(correlation)를 계산하는 방식으로 구현될 수 있다. 이 경우, 음성 인식 장치는 상기 잡음 패턴 1 내지 잡음 패턴 N의 일부 시간 구간과 상기 수집된 음향 s의 일부 시간 구간의 크로스 코릴레이션(cross correlation)를 계산할 수 있으며, 상기 음향의 일부 시간 구간과 가장 높은 상관도를 나타내는 잡음 패턴을 선택할 수 있다. 하지만, 음성 인식 장치가 상기 잡음 패턴 1 내지 잡음 패턴 N의 일부 시간 구간과 상기 수집된 음향 s의 일부 시간 구간의 유사도를 계산하는 방식은 이에 한정되지 않는다.According to the embodiment of the present invention, the speech recognition apparatus can compare the similarity of some time sections of the noise pattern 1 to the noise pattern N and some time sections of the collected sound s, A noise pattern indicating a high degree of similarity can be discriminated. Here, the similarity comparison method may be implemented by calculating a correlation between a part of time interval of the collected sound and a noise pattern. In this case, the speech recognition apparatus can calculate a cross correlation between a part of the time interval of the noise pattern 1 to the noise pattern N and a part of time interval of the collected sound s, A noise pattern indicating a high degree of correlation can be selected. However, the manner in which the speech recognition apparatus calculates the similarity degree between the partial time interval of the noise pattern 1 to the noise pattern N and the partial time interval of the collected sound s is not limited thereto.

한편, 본 발명의 바람직한 실시예에 따르면 상기 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간의 길이 L은 주위 소음 크기에 따라 결정될 수 있다. 상기 주위 소음 크기는 상기 음성 인식 장치가 수집한 음향에 포함된 배경 잡음의 추정된 잡음 레벨(noise level) 또는 배경 잡음 신호의 에너지를 의미할 수 있다. 이에 따라, 음성 인식 장치는 상기 주위 소음 크기가 증가할수록 상기 길이 L을 감소시킬 수 있고, 상기 주위 소음 크기가 감소할수록 상기 길이 L을 증가시킬 수 있다. 즉, 주위 소음 크기가 증가하면 수집된 음향의 일부 구간에 포함된 배경 잡음이 용이하게 검출될 수 있기 때문에(또는 배경 잡음의 신호 특성이 상대적으로 더욱 명확하게 나타나기 때문에) 상대적으로 짧은 시간 구간의 음향을 이용하더라도 잡음 패턴의 결정이 용이하게 수행될 수 있다. 이와는 반대로, 주위 수음 크기가 감소하면 음향의 배경 잡음의 신호 특성이 잘 나타나지 않기 때문에 잡음 패턴의 결정을 위해서는 보다 긴 길이의 시간 구간이 요구될 수 있다. 본 발명의 또 다른 실시예에 따르면, 음성 인식 장치는 수집된 음향의 신호 대 잡음 비(signal to noise ratio)에 기초하여 상기 일부 시간 구간의 길이 L을 결정할 수 있다. 즉, 음성 인식 장치는 수집된 음향의 신호 대 잡음 비가 감소하는 경우(즉, 상대적으로 배경 잡음이 약할 때) 상기 길이 L을 증가시킬 수 있고, 상기 신호 대 잡음 비가 증가하는 경우(즉, 상대적으로 배경 잡음이 강할 때) 상기 길이 L을 감소시킬 수 있다.Meanwhile, according to a preferred embodiment of the present invention, the length L of the part of time interval of the collected sound and the part of time interval of the noise pattern may be determined according to the ambient noise magnitude. The ambient noise magnitude may refer to the estimated noise level of the background noise included in the sound collected by the speech recognition apparatus or the energy of the background noise signal. Accordingly, the speech recognition apparatus can decrease the length L as the ambient noise size increases, and increase the length L as the ambient noise size decreases. That is, since the background noise contained in a part of the collected sound can be easily detected (or the signal characteristic of the background noise becomes relatively clear) when the ambient noise magnitude increases, the sound of the relatively short time interval The determination of the noise pattern can be easily performed. On the contrary, since the signal characteristics of the background noise of the sound are not well displayed when the surrounding sound-receiving size is reduced, a longer time period may be required to determine the noise pattern. According to another embodiment of the present invention, the speech recognition apparatus can determine the length L of the partial time period based on the signal to noise ratio of the collected sound. That is, the speech recognition apparatus can increase the length L when the signal-to-noise ratio of the collected sound decreases (i.e., when the background noise is relatively weak), and when the signal-to-noise ratio increases It is possible to reduce the length L when the background noise is strong.

상기 수집한 음향에 매칭되는 잡음 패턴은 상기 수집된 음향에 포함된 배경 잡음과 동일 또는 유사한 신호 특성(시영역에서의 특정 형상 포함 여부, 주파수 대역에 따른 에너지 분포 등)을 가지는 것으로 간주할 수 있다. 이때, 음성 인식 장치는 상기 결정된 잡음 패턴을 이용하여 상기 수집한 음향으로부터 음성을 인식할 수 있다. 본 발명의 바람직한 실시예에 따르면, 음성 인식 장치는 상기 결정된 잡음 패턴을 이용하여 상기 음성 신호에 포함된 배경 잡음을 제거할 수 있다. 여기서, 음성 인식 장치는 상기 수집된 음향에 따른 전기 신호, 즉 음향 신호(또는 음성 신호)로부터 상기 결정된 잡음 패턴을 차감하여 상기 음향 신호에 포함된 배경 잡음을 제거할 수 있다. 여기서, 상기 신호의 차감은 시영역 또는 주파수 영역(frequency domain)에서 수행될 수 있다. 상기 차감에 따른 배경 잡음 제거는 하나의 예시에 불과한 것으로, 다양한 방식으로 상기 음향 신호로부터 상기 배경 잡음이 제거될 수 있다. 음성 인식 장치는 배경 잡음이 제거된 음향 신호로부터 음성 인식에 필요한 기본적인 정보들인 음향학적 특징(acoustic feature)를 추출할 수 있다. 음성 인식 장치는 음성 신호를 기 설정된 길이의 프레임으로 분할하고, 각 프레임의 주파수 대역에 따른 에너지 분포 등의 정보를 상기 음향학적 특징으로써 추출할 수 있다. 바람직한 실시예에 따르면, 상기 주파수 대역에 따른 정보는 벡터 수치화될 수 있다. 또는, 상기 음향학적 특징은 피치(pitch), 포먼트(formant)일 수 있다. 상기 음향학적 특징을 추출하는 방법으로 LPC(Linear Predictive Coding) Cepstrum, PLP(Perceptual Linear Prediction) Cepstrum, Mel Frequency Cepstral Coefficient (MFCC), 필터뱅크 에너지 분석(Filter Bank Energy Analysis) 등이 사용될 수 있다. 그리고, 음성 인식 장치는 상기 음향학적 특징에 대응되는 언어의 기본 단위를 판별할 수 있다. 여기서 상기 언어의 기본 단위는 음성의 음소, 음절, 단어 등이 될 수 있다. 예를 들어, 음성 인식 장치는 영어 음성 ‘tea’에 포함된 음향 신호의 음향학적 특징이 단어 ‘tea’의 음소인 /t/, /i:/의 각 음향 모델(acoustic model)에 대응되는지 또는 상기 음향학적 특징이 상기 각 음소의 음향 모델과 얼마나 유사한지 등을 비교할 수 있다. 여기서, 상기 음향 모델은 적어도 하나의 가우시안 분포(Gaussian distribution)를 포함하는 혼합 모델(mixture model)일 수 있다. 음성 인식 장치는 상기 음향학적 특징과 적어도 하나의 상기 음향 모델 간의 유사도를 판별할 수 있으며, 특정 음향학적 특징에 대하여 가장 높은 유사도를 나타내는 음향 모델을 상기 특정 음향학적 특징에 대응되는 음향 모델인 것으로 판별할 수 있다. 음성 인식 장치는 상기 음향학적 특징에 대응되는 음향 모델에 연관된 언어의 기본 단위에 기초하여 상기 음향 신호를 음성 인식의 결과물인 텍스트 데이터로 변환할 수 있다. 전술한 예에서, 음성 인식 장치는 음향 신호에 포함된 영어 음성 ‘tea’의 각 음소에 대응되는 음향 모델을 판별할 수 있고, 상기 음향 모델에 연관된 음소를 조합하여 음성 인식의 결과물인 텍스트 데이터 ‘tea’를 생성할 수 있다.The noise pattern matched to the collected sounds may be regarded as having the same or similar signal characteristics (including a specific shape in the city area, energy distribution according to the frequency band, etc.) as background noise included in the collected sound . At this time, the speech recognition apparatus can recognize the speech from the collected sounds using the determined noise pattern. According to a preferred embodiment of the present invention, the speech recognition apparatus can remove the background noise included in the speech signal using the determined noise pattern. Here, the speech recognition apparatus may remove the background noise included in the sound signal by subtracting the determined noise pattern from an electrical signal corresponding to the collected sound, that is, an acoustic signal (or a voice signal). Here, the subtraction of the signal may be performed in a municipal area or a frequency domain. The background noise reduction due to the subtraction is only one example, and the background noise can be removed from the sound signal in various ways. The speech recognition apparatus can extract acoustic features, which are basic information necessary for speech recognition, from an acoustic signal from which background noise has been removed. The speech recognition apparatus can divide the speech signal into frames of predetermined length and extract information such as energy distribution according to the frequency band of each frame as the acoustic feature. According to a preferred embodiment, the information according to the frequency band may be vector quantized. Alternatively, the acoustic features may be pitch, formant. As a method for extracting the acoustic features, a LPC (Linear Predictive Coding) cepstrum, a PLP (Perceptual Linear Prediction) cepstrum, a Mel Frequency Cepstral Coefficient (MFCC), and a Filter Bank Energy Analysis can be used. Then, the speech recognition apparatus can discriminate the basic unit of language corresponding to the acoustic feature. Here, the basic unit of the language may be phonemes, syllables, words, and the like. For example, the speech recognition apparatus may determine whether the acoustic characteristics of the acoustic signals included in the English voice 'tea' correspond to the respective acoustic models of the phonemes / t /, / i: / of the word 'tea' It is possible to compare how the acoustic characteristics are similar to the acoustic models of the respective phonemes. Here, the acoustic model may be a mixture model including at least one Gaussian distribution. The speech recognition apparatus can discriminate the similarity between the acoustic feature and the at least one acoustic model and discriminates that the acoustic model having the highest degree of similarity with respect to the specific acoustic feature is an acoustic model corresponding to the specific acoustic feature can do. The speech recognition apparatus can convert the acoustic signal into text data, which is the result of speech recognition, based on a basic unit of language associated with the acoustic model corresponding to the acoustic feature. In the example described above, the speech recognition apparatus can discriminate an acoustic model corresponding to each phoneme of the English voice 'tea' included in the acoustic signal, and combines the phonemes associated with the acoustic model to generate text data ' tea 'can be generated.

음성 인식 장치는 음성 인식 결과물을 사용자에게 출력하거나, 음성 인식 결과물에 대응되는 후속 처리 과정을 수행할 수 있다. 예를 들어, 음성 인식 장치는 음성 인식 결과물에 포함된 기 설정된 단어/문장인 키워드(keyword)에 대응하는 서비스를 사용자에게 제공할 수 있다.The speech recognition apparatus can output the speech recognition result to the user or perform a subsequent processing process corresponding to the speech recognition result. For example, the speech recognition apparatus can provide the user with a service corresponding to a keyword, which is a preset word / sentence included in the speech recognition result.

본 발명의 실시예에 따른 음성 인식 장치는 잡음 패턴을 이용하여 음성 인식을 수행할 수 있다. 여기서, 환경 또는 조건(예를 들어, 시간 및 장소)이 동일/유사한 경우, 동일/유사한 잡음 패턴이 발생할 가능성이 높다. 따라서, 음성 인식 장치는 다양한 환경 또는 조건에 따른 잡음 패턴과 음향에 포함되어있는 배경 잡음의 잡음 패턴을 매칭시킬 수 있으며, 매칭되는 잡음 패턴을 이용함으로써 음향에 포함되어있는 배경 잡음을 제거하거나 억제할 수 있다. 결과적으로, 음성 인식 장치는 다양한 환경 또는 조건에 따른 잡음 패턴을 이용하여 음성 인식 성능을 향상시킬 수 있다.The speech recognition apparatus according to an embodiment of the present invention can perform speech recognition using a noise pattern. Here, if the environment or conditions (e.g., time and place) are the same / similar, there is a high likelihood that the same / similar noise pattern will occur. Accordingly, the speech recognition apparatus can match the noise pattern of the background noise included in the sound with the noise pattern according to various environments or conditions, and can remove or suppress the background noise included in the sound by using the matched noise pattern . As a result, the speech recognition apparatus can improve speech recognition performance by using noise patterns according to various environments or conditions.

도 3은 본 발명의 실시예에 따른 잡음 패턴의 결정 방식을 나타낸 도면이다.3 is a diagram illustrating a method of determining a noise pattern according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 음성 인식 장치는 상기 수집한 음향이 수집된 시간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 도 3(a)는 시간에 따른 잡음 패턴의 변화를 나타낸 도면이다. 즉, 동일한 장소라도 시간에 따라 음성 인식 장치에 의해 수집되는 음향에 포함되는 배경 잡음의 신호 특성 또는 잡음 패턴이 변화될 수 있다. 여기서, 음성 인식 장치는 음향이 수집된 시간에 대응하는 잡음 패턴을 선택하고, 상기 잡음 패턴을 이용한 음성 인식을 수행할 수 있다. 도 3(a)에서 t1이 0시, t6이 24시를 나타내며, 각 잡음 패턴은 특정 장소에서 하루 동안 수집된 잡음 패턴인 것으로 가정할 수 있다. 도 3(a)에 따르면, 음성 인식 장치는 시간 t2 내지 t3 동안 수집된 음향에 대한 음성 인식을 수행할 때, 잡음 패턴 2를 이용할 수 있다. 이때, 상기 잡음 패턴 2는 과거 여러 날의 시간 t2 내지 t3에 수집된 복수의 잡음 패턴에 대한 통계적 처리를 적용한 잡음 패턴의 대표 값일 수 있다. 예를 들어, 상기 잡음 패턴 2는 과거 여러 날의 시간 t2 내지 t3에 수집된 복수의 잡음 패턴의 평균값일 수도 있다. 한편, 특정 장소의 잡음 패턴은 요일에 따라 변화될 수 있다. 이 경우, 음성 인식 장치는 각 요일 마다 개별적으로 수집된 잡음 패턴으로부터, 음성 인식 대상인 음향이 수집된 요일에 대응하는 잡음 패턴을 선택할 수 있다. 그리고 음성 인식 장치는 상기 요일에 따른 잡음 패턴을 이용하여 전술한 음성 인식 과정을 수행할 수 있다. 한편, 특정 장소의 잡음 패턴은 계절에 따라 변화될 수 있다. 이 경우, 음성 인식 장치는 각 계절 마다 개별적으로 수집된 잡음 패턴으로부터, 음성 인식 대상인 음향이 수집된 계절에 대응하는 잡음 패턴을 선택할 수 있다. 그리고 음성 인식 장치는 상기 계절에 따른 잡음 패턴을 이용하여 전술한 음성 인식 과정을 수행할 수 있다.According to an embodiment of the present invention, the speech recognition apparatus may determine a noise pattern matching the collected sound based on the collected time of the collected sounds. 3 (a) is a diagram showing a change in a noise pattern with time. That is, even in the same place, the signal characteristic or the noise pattern of the background noise included in the sound collected by the speech recognition apparatus over time may be changed. Here, the speech recognition apparatus selects a noise pattern corresponding to the time at which the sound is collected, and can perform speech recognition using the noise pattern. In FIG. 3 (a), t1 represents 0 and t6 represents 24, and it can be assumed that each noise pattern is a noise pattern collected during a day at a specific place. According to Fig. 3 (a), the speech recognition apparatus can use the noise pattern 2 when performing speech recognition on sounds collected during the time t2 to t3. At this time, the noise pattern 2 may be a representative value of a noise pattern applied statistical processing for a plurality of noise patterns collected at times t2 to t3 of past several days. For example, the noise pattern 2 may be an average value of a plurality of noise patterns collected at times t2 to t3 of past several days. On the other hand, the noise pattern at a specific place can be changed according to the day of the week. In this case, the speech recognition apparatus can select a noise pattern corresponding to the day of the week in which the sound to be speech-recognized is collected, from the noise patterns individually collected for each day of the week. The speech recognition apparatus can perform the speech recognition process using the noise pattern according to the day of the week. On the other hand, the noise pattern of a specific place can be changed according to the season. In this case, the speech recognition apparatus can select the noise pattern corresponding to the season in which the sound as the speech recognition target is collected, from the noise pattern collected individually for each season. The speech recognition apparatus may perform the speech recognition process using the noise pattern according to the season.

본 발명의 다른 실시예에 따르면, 음성 인식 장치는 상기 수집한 음향이 수집된 장소를 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 즉, 장소에 따라 배경 잡음의 신호 특성 또는 잡음 패턴이 결정될 수 있다. 예를 들어, 거실의 배경 잡음과 화장실의 배경 잡음은 서로 전혀 다른 특성을 가진다. 따라서, 음성 인식 장치가 위치하는 장소에 따라 서로 다른 잡음 패턴을 이용하여 음향에 포함된 배경 잡음을 제거할 필요가 있다. 음성 인식 장치는 각 장소에서 수집된 잡음 패턴으로부터, 음성 인식 대상인 음향이 수집된 장소에 대응하는 잡음 패턴을 선택할 수 있다. 그리고 음성 인식 장치는 상기 장소에 따른 잡음 패턴을 이용하여 전술한 음성 인식 과정을 수행할 수 있다.According to another embodiment of the present invention, the speech recognition apparatus may determine a noise pattern matching the collected sound based on the collected place of the collected sounds. That is, the signal characteristic or the noise pattern of the background noise may be determined depending on the place. For example, the background noise of the living room and the background noise of the toilet have completely different characteristics. Therefore, it is necessary to remove the background noise included in the sound by using different noise patterns depending on the place where the speech recognition apparatus is located. The speech recognition apparatus can select a noise pattern corresponding to a location where the sound to be speech-recognized is collected from the noise pattern collected at each place. The speech recognition apparatus can perform the speech recognition process using the noise pattern according to the location.

배경 잡음의 신호 특성 또는 잡음 패턴은 음성 인식 장치에 대한 화자의 상대적인 위치에 기초하여 결정될 수도 있다. 도 3(b)에서 음성 인식 장치(100)가 거실 가운데 설치되어있고, 위치 A는 화장실에 인접한 장소, 위치 B는 현관에 인접한 장소, 위치 C는 부엌에 인접한 장소, 위치 D는 침실에 인접한 장소인 상황을 가정할 수 있다. 만약 화자(또는 사용자)가 위치 B에서 발화를 하는 경우, 화자의 음성은 현관의 배경 잡음과 혼합될 수 있으며, 음성 인식 장치는 상기 화자의 음성 및 상기 현관의 배경 잡음을 포함하는 음향을 수집할 수 있다. 여기서, 상기 화자가 B에 존재하는지 여부는 후술하는 음성 인식 장치(100)의 영상 수집부를 통해 감지될 수 있다. 이 경우, 음성 인식 장치는 상기 화자의 위치에 대응하는 잡음 패턴 - 즉 현관의 배경 잡음의 신호 특성과 연관된 잡음 패턴 - 을 이용함으로써 효과적인 음성 인식을 수행할 수 있다. 도 3(b)의 실시예에서, 전술한 위치는 화자와 음성 인식 장치의 사이의 거리 또는 음성 인식 장치를 기준으로 한 화자의 방향을 의미할 수도 있다. 음성 인식 장치는 화자와의 거리에 따른 잡음 패턴을 이용하여 음성 인식을 수행하거나 음성 인식 장치를 기준으로 한 화자의 방향(예를 들어 남서 45도 방향 등)에 따른 잡음 패턴을 이용하여 음성 인식을 수행할 수도 있다.The signal characteristic or noise pattern of the background noise may be determined based on the speaker's relative position to the speech recognition apparatus. 3 (b), the speech recognition device 100 is installed in the living room, the location A is adjacent to the toilet, the location B is adjacent to the entrance, the location C is adjacent to the kitchen, . If the speaker (or user) makes a speech at location B, the speaker's voice may be mixed with the background noise of the porch, and the speech recognition device collects the sound including the speaker's voice and the background noise of the porch . Here, whether or not the speaker exists in B can be detected through the image collection unit of the speech recognition apparatus 100 described later. In this case, the speech recognition apparatus can perform effective speech recognition by using the noise pattern corresponding to the position of the speaker, that is, the noise pattern associated with the signal characteristics of the background noise of the front door. In the embodiment of Fig. 3 (b), the above-mentioned position may mean the distance between the speaker and the speech recognition device or the direction of the speaker with reference to the speech recognition device. The speech recognition apparatus performs speech recognition using a noise pattern according to a distance to a speaker or performs speech recognition using a noise pattern according to a direction of a speaker based on the speech recognition apparatus (for example, 45 degrees in the south-west direction) .

본 발명의 다른 실시예에 따르면, 음성 인식 장치는 영상을 감지하는 영상 수집부를 더 포함할 수 있다. 여기서, 상기 영상 수집부는 카메라일 수 있으나 이에 한정되지 않는다. 음성 인식 장치는 상기 감지한 영상을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 도 3(c)에 따르면, 음성 인식 장치(100)는 외부 기기(또는 외부 장치/단말)인 청소기(200) 또는 믹서기(300)와 동일한 장소에 함께 존재할 수 있다. 음성 인식 장치(100)가 카메라를 포함하는 경우, 상기 카메라를 통해 상기 청소기(200) 또는 믹서기(300)가 턴 온(turn on) 상태인지 여부를 판별할 수 있다. 상기 외부 기기기의 턴 온 여부는 카메라를 통해 취득한 영상에 대한 이미지 신호 처리 기법 - 예를 들어 패턴 인식 등 - 를 통해 확인될 수 있다. 만약 청소기(200)의 작동으로 인해 발생될 수 있는 배경 잡음에 연관된 잡음 패턴 또는 믹서기(300)의 작동으로 인해 발생될 수 있는 배경 잡음에 연관된 잡음 패턴이 존재하는 경우, 음성 인식 장치(100)는 상기 각 외부 기기의 턴 온 여부를 판별하고, 턴 온된 외부 기기에 대응하는 잡음 패턴을 이용하여, 수집된 음향으로부터 상기 턴 온된 외부 기기로 인해 발생될 수 있는 배경 잡음을 제거할 수 있으며, 그 이후의 음성 인식 과정을 수행할 수 있다.According to another embodiment of the present invention, the speech recognition apparatus may further include an image collection unit for sensing an image. Here, the image collecting unit may be a camera, but is not limited thereto. The speech recognition apparatus may determine a noise pattern matching the collected sounds based on the sensed image. 3 (c), the speech recognition apparatus 100 may coexist in the same place as the cleaner 200 or the blender 300, which is an external device (or external device / terminal). When the voice recognition apparatus 100 includes a camera, the controller 300 may determine whether the vacuum cleaner 200 or the mixer 300 is turned on. The turning on of the external device can be confirmed through an image signal processing technique (for example, pattern recognition) for the image acquired through the camera. If there is a noise pattern associated with the background noise that may be generated due to the operation of the cleaner 200 or a background noise associated with the background noise that may be caused by the operation of the blender 300, It is possible to determine whether each of the external devices is turned on and remove background noise that may be generated due to the turned on external device from the collected sound by using a noise pattern corresponding to the turned on external device, The speech recognizing process of FIG.

여기서, 도 3(b)의 경우와 도 3(c)의 경우가 결합된 복합적인 상황을 가정할 수 있다. 예를 들어, 도 3(b)의 위치 B(현관 주변)에 화자가 존재하고, 음성 인식 장치(100) 주변에 청소기(200)가 위치하는 상황을 가정할 수 있다. 음성 인식 장치(100)는 영상 수집부를 통해 화자가 위치 B에 존재함을 감지할 수 있고, 청소기(200)가 턴 온 상태임을 감지할 수 있다. 이 경우, 음성 인식 장치(100)는 위치 B에 대응하는 현관의 잡음 패턴 및 청소기(200)에 대응하는 잡음 패턴을 동시에 사용하여 화자의 음성에 대한 음성 인식을 수행할 수 있다. 이때, 음성 인식 장치(100)는 상기 두 잡음 패턴을 수집된 음향으로부터 차감함으로써 상기 음향에 포함된 서로 다른 배경 잡음을 제거할 수 있다.Here, it can be assumed that a combination of the case of FIG. 3 (b) and the case of FIG. 3 (c) is combined. For example, it can be assumed that a speaker exists at the position B (around the entrance) of FIG. 3 (b), and the cleaner 200 is located around the voice recognition device 100. The speech recognition apparatus 100 can detect that the speaker exists at the position B through the image collection unit and can detect that the vacuum cleaner 200 is turned on. In this case, the speech recognition apparatus 100 can simultaneously perform the speech recognition of the speech of the speaker by using the noise pattern of the porch corresponding to the position B and the noise pattern corresponding to the cleaner 200 at the same time. At this time, the speech recognition apparatus 100 may remove the different background noise included in the sound by subtracting the two noise patterns from the collected sounds.

도 4는 본 발명의 실시예에 따른 잡음 패턴의 수집 방식을 나타낸 도면이다.4 is a diagram illustrating a noise pattern collecting method according to an embodiment of the present invention.

본 발명의 실시예에 따르면, 음성 인식 장치는 적어도 하나의 외부 장치를 제어할 수 있다. 도 4(a)에 따르면, 음성 인식 장치(100)는 제어 신호 ctrl_sig를 외부 장치인 청소기(200)로 전송할 수 있다. 이때, 음성 인식 장치(100)는 상기 제어 신호를 외부 장치로 전송하기 위한 유선 또는 무선 모듈을 포함할 수 있다. 이에 따라, 상기 제어 신호는 유선 케이블에 의한 유선 통신 방식, 무선랜(wireless LAN) 통신 방식, 데이터 통신 방식 또는 블루투스(Bluetooth) 통신 방식 등을 통해 외부 장치로 전송될 수 있다. 하지만 본 발명의 실시예에 따른 음성 인식 장치의 통신 방식은 이에 한정되지 않는다. 여기서, 상기 제어 신호 ctrl_sig는 상기 외부 장치의 턴 온, 턴 오프 또는 기타 다양한 작동 방식을 제어하는 신호일 수 있다. 도 4(b)에 따르면, 음성 인식 장치(100)는 턴 온에 해당하는 제어 신호를 청소기(200)에게 전송한 상황일 수 있으며, 이에 따라 청소기(200)는 턴 온되어 작동을 시작할 수 있다. 청소기(200)가 턴 온 되면, 청소기(200)의 작동에 따른 배경 잡음이 발생될 수 있으며, 음성 인식 장치(100)는 상기 배경 잡음에 대응하는 잡음 패턴이 포함된 음향을 취득할 수 있다. 이때, 음성 인식 장치(100)는 이미 보유 중인 잡음 패턴 또는 과거 외부 기기로부터 수신한 잡음 패턴과 상기 청소기(200)의 턴 온에 따른 잡음 패턴이 포함된 음향을 매칭시킬 수 있다. 만약, 상기 청소기(200)의 잡음 패턴이 이미 보유 중인 잡음 패턴 또는 과거 외부 기기로부터 수신한 잡음 패턴에 매칭되지 않는 경우(또는 상호 유사도가 낮은 경우), 음성 인식 장치(100)는 상기 청소기(200)의 잡음 패턴을 새로운 잡음 패턴으로 간주하고, 내장된 저장소 또는 외부의 저장소에 상기 새로운 잡음 패턴을 저장할 수 있다. 상기 새로운 잡음 패턴(즉, 청소기(200)의 잡음 패턴)은 추후 상기 청소기(200)가 턴 온 상태일 때 화자의 음성을 인식하는 과정에서 이용될 수 있다. 이러한 새로운 잡음 패턴의 취득 방식은 전술한 도 3의 상황에서도 적용될 수 있다. 즉, 음성 인식 장치(200)가 각 방향(예를 들어, 동서남북의 4 방위)에 대한 잡음 패턴을 보유하고 있지 않은 경우, 복수의 마이크 어레이(array)를 통한 빔포밍(beamforming)을 통해 각 방향에 대한 잡음 패턴을 새롭게 취득할 수 있다(각 방향으로부터 수집된 음향에 기존의 잡음 패턴에 매칭되는 잡음 패턴이 존재하지 않는 경우). 또는, 음성 인식 장치는 시간 또는 요일 또는 계절에 따른 잡음 패턴이 포함된 음향을 수집할 수 있으며, 수집된 음향에 포함된 잡음 패턴이 기존의 잡음 패턴에 매칭되지 않는 경우, 상기 음향이 수집된 시간 또는 요일 또는 계절에 대한 새로운 잡음 패턴인 것으로 간주할 수 있다.According to the embodiment of the present invention, the speech recognition apparatus can control at least one external apparatus. 4 (a), the speech recognition apparatus 100 can transmit the control signal ctrl_sig to the cleaner 200 as an external device. At this time, the voice recognition apparatus 100 may include a wired or wireless module for transmitting the control signal to an external device. Accordingly, the control signal can be transmitted to an external device through a wired communication method using a cable, a wireless LAN communication method, a data communication method, or a Bluetooth communication method. However, the communication method of the speech recognition apparatus according to the embodiment of the present invention is not limited thereto. Here, the control signal ctrl_sig may be a signal for controlling the turn-on, turn-off, or various other modes of operation of the external device. 4B, the voice recognition apparatus 100 may transmit a control signal corresponding to turn-on to the vacuum cleaner 200. Accordingly, the vacuum cleaner 200 may be turned on to start the operation . When the vacuum cleaner 200 is turned on, background noise due to the operation of the vacuum cleaner 200 may be generated, and the voice recognition device 100 may acquire sound including a noise pattern corresponding to the background noise. At this time, the voice recognition apparatus 100 can match the noise pattern already held or the noise pattern received from the external device in the past and the noise pattern including the noise pattern due to the turn-on of the vacuum cleaner 200. If the noise pattern of the cleaner 200 does not match the noise pattern already held or the noise pattern received from the external device in the past or the similarity is low, ) May be regarded as a new noise pattern, and the new noise pattern may be stored in a built-in storage or an external storage. The new noise pattern (i.e., the noise pattern of the vacuum cleaner 200) may be used in the process of recognizing the speaker's voice when the vacuum cleaner 200 is turned on in the future. This new noise pattern acquisition method can be applied even in the situation of FIG. 3 described above. That is, when the speech recognition apparatus 200 does not have a noise pattern for each direction (for example, four directions of north, south, south, east, and west), it performs beamforming through a plurality of microphone arrays, (When there is no noise pattern matching the existing noise pattern in the sound collected from each direction). Alternatively, the speech recognition apparatus may collect sounds including noise patterns according to time, day of the week, or season, and when the noise pattern included in the collected sounds does not match the existing noise pattern, Or a new noise pattern for a day of the week or a season.

음성 인식 장치는 음성 인식시 이용할 잡음 패턴의 수 또는 종류가 증가할 수록 수집되는 음향에 매칭되는 잡음 패턴을 찾을 확률이 높아지며, 이에 따라 수집된 음향으로부터 상기 매칭되는 잡음 패턴에 대응하는 배경 잡음을 용이하게 제거할 수 있다. 결과적으로, 음성 인식 장치는 음성 인식의 정확도를 높일 수 있다.As the number or types of noise patterns used in speech recognition increases, the speech recognition apparatus increases the probability of finding a noise pattern matched to the collected sound, thereby increasing the background noise corresponding to the matched noise pattern . As a result, the speech recognition apparatus can improve the accuracy of speech recognition.

한편, 본 발명의 실시예에 따른 음성 인식 장치(100)는 적어도 하나의 외부 장치를 제어할 수 있으며, 각 외부 장치의 턴 온에 따른 개별적인 잡음 패턴을 취득하거나, 턴 온된 복수의 외부 장치의 조합에 따른 잡음 패턴도 취득할 수 있다. 예를 들어, 음성 인식 장치는 세탁기 또는 믹서기를 개별적으로 턴 온시킨 뒤 세탁기의 잡음 패턴과 믹서기의 잡음 패턴을 개별적으로 취득하거나, 상기 세탁기 및 믹서기를 동시에 턴 온시킨 뒤 상기 세탁기 및 믹서기의 동시 턴 온에 따른 잡음 패턴도 취득할 수 있다. 음성 인식 장치(200)는 적어도 하나의 외부 장치 중 어느 하나의 외부 장치를 턴 온한 경우, 상기 턴 온된 외부 장치가 턴 오프될 때까지 잡음 패턴을 수집할 수 있다(도 4(c)).Meanwhile, the speech recognition apparatus 100 according to the embodiment of the present invention can control at least one external apparatus, and can acquire individual noise patterns according to the turn-on of each external apparatus, or can obtain a combination of a plurality of turned- It is possible to acquire a noise pattern according to the above-described method. For example, the voice recognition device may turn on the washing machine or the mixer separately, separately acquire the noise pattern of the washing machine and the noise pattern of the blender, turn on the washing machine and the blender simultaneously, It is possible to acquire a noise pattern corresponding to the ON. When the speech recognition apparatus 200 turns on any one of the at least one external apparatus, the noise recognition apparatus 200 may collect the noise pattern until the turned-on external apparatus is turned off (FIG. 4 (c)).

한편, 본 발명의 실시예에 따른 음성 인식 장치(100)는 외부 장치의 작동 상태를 변경하는 제어 신호를 상기 외부 장치로 전송할 수 있다. 예를 들어, 도 4(b)의 경우, 음성 인식 장치(100)는 청소기(200)의 흡입력을 증가시키는 제어 신호를 청소기(200)로 전송할 수 있으며, 상기 청소기(200)의 흡입력 증가에 따른 잡음 패턴을 취득할 수도 있다.Meanwhile, the speech recognition apparatus 100 according to the embodiment of the present invention may transmit a control signal for changing the operation state of the external apparatus to the external apparatus. 4 (b), the voice recognition apparatus 100 may transmit a control signal for increasing the suction force of the vacuum cleaner 200 to the vacuum cleaner 200, A noise pattern may be obtained.

전술한 방식에 따라 취득된 잡음 패턴은 음성 인식 과정에서 이용될 수 있다. 본 발명의 바람직한 실시예에 따르면, 상기 프로세서는 상기 적어도 하나의 외부 장치 중 상기 음성 인식 장치에 의해 턴 온된 외부 장치에 따라 상기 수집한 음향에 매칭되는 잡음 패턴을 결정할 수 있다. 음성 인식 장치(100)는 외부 장치를 제어할 수 있으므로, 상기 외부 장치의 턴 온 여부 역시 알 수 있다. 도 4(b)의 상황에서 화자가 발화하는 경우, 음성 인식 장치(100)는 청소기(200)의 턴 온 여부를 알 수 있으며, 수집된 음향에 포함된 청소기(200)의 잡음 패턴을 이용하여 상기 화자의 음성을 인식할 수 있다.The noise pattern acquired according to the above-described method can be used in the speech recognition process. According to a preferred embodiment of the present invention, the processor may determine a noise pattern that matches the collected sound according to an external device turned on by the speech recognition device among the at least one external device. Since the voice recognition apparatus 100 can control an external apparatus, it can also know whether the external apparatus is turned on. 4 (b), the speech recognition apparatus 100 can know whether the vacuum cleaner 200 is turned on or not. Using the noise pattern of the vacuum cleaner 200 included in the collected sound, The voice of the speaker can be recognized.

한편, 본 발명의 다른 실시예에 따르면, 상기 음성 인식 장치는 다른 음성 인식 장치와 정보를 송수신하는 송수신부(또는 전술한 유선 또는 무선 모듈)를 더 포함할 수 있다. 이 경우, 음성 인식 장치는 상기 송수신부를 통해 동일한 네트워크에 접속된 다른 음성 인식 장치로부터 상기 잡음 패턴을 수신할 수 있다. 여기서, 상기 동일한 네트워크는 동일한 액세스 포인트(access point)를 통해 통신을 수행하는 외부 기기 또는 동일한 공유기/브릿지/분배기를 통해 상호 연결된 외부 기기 또는 상기 음성 인식 장치로부터 기 설정된 거리 범위 이내에 위치하는 외부 기기를 총칭할 수 있다. 동일한 네트워크에 포함된 외부 기기는 음성 인식 장치와 유사한 배경 잡음 또는 잡음 패턴을 취득할 가능성이 존재하며, 이에 따라 잡음 패턴의 상호 공유를 통해 음성 인식 성능을 더 높일 수 있다. 또는, 음성 인식 장치는 도 3(b)의 상황에서, 위치 B에 설치된 타 음성 인식 장치 또는 위치 D에 설치된 타 음성 인식 장치로부터 각 타 음성 인식 장치가 취득한 잡음 패턴을 수신할 수 있으며, 수신된 잡음 패턴을 이용한 음성 인식을 수행할 수도 있다.According to another embodiment of the present invention, the speech recognition apparatus may further include a transmission / reception unit (or the aforementioned wired or wireless module) for transmitting / receiving information to / from another speech recognition apparatus. In this case, the speech recognition apparatus can receive the noise pattern from another speech recognition apparatus connected to the same network through the transmission / reception unit. Here, the same network may include an external device communicating through the same access point or an external device interconnected through the same router / bridge / distributor or an external device located within a predetermined distance range from the voice recognition device It can be called generic. External devices included in the same network may acquire a background noise or a noise pattern similar to that of the speech recognition device, and thus the speech recognition performance can be further improved through mutual sharing of noise patterns. 3 (b), the speech recognition apparatus can receive the noise patterns acquired by the other speech recognition apparatuses from other speech recognition apparatuses provided at the position B or at the position D, Speech recognition using a noise pattern may be performed.

본 발명의 실시예에 따른 음성 인식 장치는 타 음성 인식 장치와의 협업을 통해 정확한 음성 인식을 수행할 수도 있다. 예를 들어, 집 내부에 현관, 거실, 화장실이 순서대로 배치되어 있고, 제 1 음성 인식 장치가 현관과 거실 사이, 제 2 음성 인식 장치가 거실과 화장실 사이에 설치된 상황을 가정할 수 있다. 이때, 화자가 거실에서 발화한 경우, 제 1 음성 인식 장치 및 제 2 음성 인식 장치는 화자의 음성을 포함하는 음향을 개별적으로 취득할 수 있으며, 각 음성 인식 장치는 자신이 보유 중인 잡음 패턴(또는 외부로부터 수신한 잡음 패턴)의 일부 시간 구간과 상기 음향의 일부 시간 구간을 매칭시킬 수 있다. 이때, 각 음성 인식 장치는 상기 음향과 각자의 잡음 패턴의 매칭되는 정도(예를 들어, 0 내지 1 사이의 유사도에 관한 수치)를 산출할 수 있으며, 상기 매칭되는 정보를 상호 공유할 수 있다. 만약 제 1 음성 인식 장치가 보유 중인 잡음 패턴과 상기 음향(또는 음향에 포함된 배경 잡음)의 유사도가 0.9이고, 제 2 음성 인식 장치가 보유 중인 잡음 패턴과 상기 음향(또는 음향에 포함된 배경 잡음)의 유사도가 0.6인 경우, 제 1 음성 인식 장치가 후속 음성 인식 과정(음향 모델을 이용한 음성 인식 결과물 생성 등)을 수행하고, 그 수행 결과에 따른 서비스를 사용자에게 제공할 수 있으며, 제 2 음성 인식 장치는 후속 음성 인식 과정을 수행하지 않을 수 있다.The speech recognition apparatus according to the embodiment of the present invention may perform accurate speech recognition through collaboration with other speech recognition apparatuses. For example, it can be assumed that a front door, a living room, and a toilet are arranged in order in a house, a first voice recognition device is installed between a porch and a living room, and a second voice recognition device is installed between a living room and a bathroom. At this time, when the speaker has uttered in the living room, the first speech recognition apparatus and the second speech recognition apparatus can individually acquire the sound including the speech of the speaker, and each speech recognition apparatus recognizes the noise pattern A noise pattern received from the outside) and a part of time interval of the sound. At this time, each speech recognition apparatus can calculate the degree of matching between the sound and each noise pattern (for example, a value related to the degree of similarity between 0 and 1), and the matching information can be mutually shared. If the similarity between the noise pattern held by the first speech recognition device and the sound (or the background noise included in the sound) is 0.9, the noise pattern held by the second sound recognition device and the sound (or the background noise included in the sound ) Is 0.6, the first speech recognition apparatus can perform a subsequent speech recognition process (generation of a speech recognition result using an acoustic model), provide a service according to the result of the recognition to the user, The recognition device may not perform the subsequent speech recognition process.

즉, 본 발명의 실시예에 따른 음성 인식 장치는 다양한 환경 또는 조건에 따른 잡음 패턴을 수집할 수 있으며, 특히 기존의 잡음 패턴과 상이한 잡음 패턴은 새로운 잡음 패턴으로 간주하고 차후의 음향과의 매칭시 이용할 수 있다. 이를 통해 음성 인식 장치는 잡음 패턴의 수집을 통해 음성 인식의 정확도를 상승시킬 수 있다.That is, the speech recognition apparatus according to the embodiment of the present invention can collect noise patterns according to various environments or conditions, and in particular, a noise pattern different from a conventional noise pattern is regarded as a new noise pattern, Can be used. This allows the speech recognition device to increase the accuracy of speech recognition through the collection of noise patterns.

도 5는 본 발명의 실시예에 따른 음성 인식 방법을 나타낸 도면이다. 도 5에 따르면, 음성 인식 장치는 음성 인식을 위한 음향을 수집(S101)할 수 있다. 그리고, 음성 인식 장치는 상기 수집한 음향의 일부 시간 구간과 잡음 패턴의 일부 시간 구간을 기초로 상기 수집한 음향에 매칭되는 잡음 패턴을 결정(S102)할 수 있다. 그리고, 음성 인식 장치는 상기 결정된 잡음 패턴을 이용하여 상기 수집한 음향으로부터 음성을 인식(S103)할 수 있다. 음성 인식 장치가 수집한 음향에 매칭되는 잡음 패턴을 결정하고, 결정된 잡음 패턴을 이용하여 음성 인식을 수행하는 방식은 도 2 내지 도 4를 설명할 때 다루었으므로 이에 대한 기재는 생략한다.5 is a diagram illustrating a speech recognition method according to an embodiment of the present invention. According to FIG. 5, the speech recognition apparatus can collect sounds for speech recognition (S101). The speech recognition apparatus may determine a noise pattern matching the collected sound on the basis of the partial time interval of the collected sound and the partial time interval of the noise pattern (S102). Then, the speech recognition apparatus recognizes the speech from the collected sounds using the determined noise pattern (S103). The method of determining a noise pattern matched to the sound collected by the speech recognition apparatus and performing the speech recognition using the determined noise pattern has been described while referring to FIG. 2 to FIG. 4, and description thereof will be omitted.

전술한 본 발명의 실시예에 따르면, 수집된 음향에 매칭되는 잡음 패턴을 결정할 수 있고, 상기 잡음 패턴에 기초한 효과적인 음성 인식을 수행할 수 있다. 또한, 본 발명의 실시예에 따르면 잡음 패턴을 추가적으로 수집하거나 외부 기기로부터 수신할 수 있으며, 상기 비교 대상인 잡음 패턴의 종류/수의 증가에 따라 음성 인식의 정확도를 높일 수 있다.According to the embodiment of the present invention described above, it is possible to determine a noise pattern matching the collected sound, and to perform effective speech recognition based on the noise pattern. In addition, according to the embodiment of the present invention, the noise pattern can be additionally collected or received from an external device, and the accuracy of speech recognition can be increased according to the increase of the number / type of noise pattern to be compared.

이상에서 본 발명을 구체적인 실시예를 통하여 설명하였으나, 당업자라면 본 발명의 취지를 벗어나지 않는 범위 내에서 수정, 변경을 할 수 있을 것이다. 따라서 본 발명이 속하는 기술분야에 속한 사람이 본 발명의 상세한 설명 및 실시예로부터 용이하게 유추할 수 있는 것은 본 발명의 권리범위에 속하는 것으로 해석되어야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Accordingly, it is to be understood that within the scope of the appended claims, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

An acoustic collector for collecting sounds for voice recognition; And
Determining a noise pattern to be matched to the collected sound based on a part of time interval of the collected sound and a part of time period of the noise pattern and recognizing the voice from the collected sound using the determined noise pattern doing
Voice recognition device.

The method of claim 1,
The processor
A noise pattern matching the collected sound is determined based on at least one of a time and a place at which the collected sound is collected
Voice recognition device.

3. The method of claim 2,
The processor
A noise pattern matching with the collected sound is determined based on at least one of the days and the seasons on which the collected sounds are collected
Voice recognition device.

The method of claim 1,
Wherein the speech recognition device controls at least one external device,
Wherein the processor determines a noise pattern that matches the collected sound under control of the at least one external device
Voice recognition device.

5. The method of claim 4,
The processor determines a noise pattern matching the collected sound according to an external device turned on by the speech recognition device among the at least one external device
Voice recognition device.

5. The method of claim 4,
The processor collects a noise pattern until the turned-on external device is turned off when the external device of any one of the at least one external device is turned on
Voice recognition device.

In claim 1,
The speech recognition apparatus may further include an image collection unit for sensing an image,
The processor determines a noise pattern matching the collected sound on the basis of the sensed image
Voice recognition device.

8. The method of claim 7,
The processor identifies an external device turned on from the sensed image
Voice recognition device.

8. The method of claim 7,
The processor identifies the location of the speaker of the speech included in the collected sound from the sensed image,
And a noise pattern matching the collected sound is determined based on the position of the identified speaker
Voice recognition device.

The method of claim 1,
The speech recognition apparatus
Further comprising a transmission / reception section for transmitting / receiving information with another voice recognition apparatus,
The processor
Receiving the noise pattern from another speech recognition apparatus connected to the same network through the transceiving unit
Voice recognition device.

The method of claim 1,
The length of some time intervals of the collected sound and some time intervals of the noise pattern are determined according to the ambient noise magnitude
Voice recognition device.

Collecting sounds for speech recognition;
Determining a noise pattern matched to the collected sound based on a part of time interval of the collected sound and a part of time interval of the noise pattern; And
And recognizing speech from the collected sounds using the determined noise pattern
Speech recognition method.