KR20200046262A

KR20200046262A - Speech recognition processing method for noise generating working device and system thereof

Info

Publication number: KR20200046262A
Application number: KR1020180127145A
Authority: KR
Inventors: 김기현; 김현숙; 정진수; 최연주
Original assignee: 주식회사 케이티
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2020-05-07
Also published as: KR102333376B1

Abstract

The present invention relates to a voice recognition processing method for a noise generating working device, and a system thereof, and more particularly, to a voice recognition processing method and a system thereof to process voice recognition in consideration of various noise environments caused by operation states of a working device such as a robot cleaner when performing voice recognition for the working device, such as a robot cleaner, that causes noise, thereby facilitating accurate voice recognition of the working device that causes the noise. In a voice recognition processing method for a working device causing noise having different characteristics according to an operation mode, the present invention provides the voice recognition processing method including: an operation mode information receiving step of receiving information on the operation mode of the working device, by a voice recognition device; an adaptive noise reduction filtering step of performing adaptive noise reduction filtering on voice data of a user by differently setting the working device according to the operation mode of the working device; and a voice recognition processing step of processing a voice recognition function on the filtered voice data of the user.

Description

Speech recognition processing method for noise generating working device and system thereof

본 발명은 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에 관한 것으로서, 보다 구체적으로는 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 소음을 유발하는 작업 장치에 대한 보다 정확한 음성 인식을 가능케 하는 음성 인식 처리 방법 및 시스템에 관한 것이다.The present invention relates to a voice recognition processing method and system for a work device that causes noise, and more specifically, in performing speech recognition for a work device that causes noise, such as a robot cleaner, the operation of a work device such as a robot cleaner The present invention relates to a speech recognition processing method and system that enables more accurate speech recognition of a work device that causes noise by processing speech recognition in consideration of various noise environments caused by conditions.

최근 다양한 시스템에서 음성 인식 기술을 활용하여 사용자 인터페이스 환경을 보다 편리하게 개선하려는 시도가 이루어지고 있으며, 이에 대한 구체적인 예로서 로봇 청소기 등에서의 음성 인식 기능을 들 수 있다.Recently, attempts have been made to improve the user interface environment more conveniently by utilizing speech recognition technology in various systems, and specific examples thereof include a speech recognition function in a robot cleaner.

일반적으로 로봇 청소기는 스스로 주행하면서 바닥의 먼지 또는 이물질을 흡입하는 기기로서, 종래에는 통상적으로 사용자가 리모컨 등을 이용하여 상기 로봇 청소기의 동작 명령을 입력하거나, 상기 로봇 청소기에 구비되는 입력부를 이용하여 동작 명령을 입력하였다. In general, the robot cleaner is a device that sucks dust or foreign substances on the floor while driving on its own. Conventionally, a user normally inputs an operation command of the robot cleaner using a remote control or the like, or an input unit provided in the robot cleaner. An operation command was entered.

그런데, 위와 같은 명령 입력 방식은 사용자가 직접 버튼이나 리모컨을 조작하여야 하므로 불편한 문제가 있다. 이에 따라, 근래에 들어서는 사용자의 음성을 입력받아 동작되는 로봇 청소기가 시도되고 있다.However, the above command input method is inconvenient because the user has to directly operate the button or the remote control. Accordingly, in recent years, robot cleaners that operate by receiving a user's voice have been tried.

이에 따라, 음성 인식 기능을 가지는 로봇 청소기는 통상적으로 음성을 입력받기 위한 마이크 및 입력된 음성 신호를 인식하고 이에 대응하여 로봇 청소기를 제어하는 제어부를 포함할 수 있다. 그런데, 종래의 로봇 청소기의 경우, 로봇 청소기에 구비되는 마이크를 통해 사용자의 음성뿐만 아니라 주변의 소음까지 입력받게 되면서 음성을 정확하게 인식하기 어려운 문제가 있었다. 또한, 로봇 청소기의 동작 중에는 동작 시 발생하는 소음도 상기 마이크를 통하여 입력되므로, 음성 인식 오류가 발생할 가능성이 더욱 높아지게 된다.Accordingly, the robot cleaner having a voice recognition function may include a controller for recognizing a microphone and an input voice signal for receiving a voice, and controlling the robot cleaner in response thereto. However, in the case of the conventional robot cleaner, it is difficult to accurately recognize the voice while receiving not only the user's voice but also ambient noise through a microphone provided in the robot cleaner. In addition, since the noise generated during operation is input through the microphone during the operation of the robot cleaner, the possibility of a voice recognition error is further increased.

나아가, 로봇 청소기나 세탁기 등 작업 장치에서 유발되는 소음은 작업 장치의 동작 모드나 이동 위치 등에 따라 크게 달라질 수 있는 바, 이러한 다양한 소음 환경에서도 정확한 음성 인식을 처리하는 것은 더욱 어려운 문제가 된다.Furthermore, since the noise generated by a working device such as a robot cleaner or a washing machine may vary greatly depending on an operation mode or a moving position of the working device, it is more difficult to process accurate speech recognition in various noise environments.

이에 따라, 로봇 청소기 등 소음을 유발하는 작업 장치에서의 음성 인식을 수행함에 있어 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 가능케 할 수 있는 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식 기술을 개선하려는 요구가 지속되고 있으나, 아직 이에 대한 명쾌한 해법이 제시되지 못하고 있다.Accordingly, in performing voice recognition in a work device causing noise such as a robot vacuum cleaner, the generation of voice recognition errors due to various noise environments generated during operation of the robot vacuum cleaner as well as surrounding noise is suppressed and more accurate voice recognition is performed. There is a continuing demand to improve the speech recognition technology for noise-producing work devices, such as robot cleaners, which can make possible, but a clear solution has not yet been proposed.

대한민국 공개특허 제 10-2014-0071740호(2014년 6월 12일 공개)Republic of Korea Patent Publication No. 10-2014-0071740 (published on June 12, 2014)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위해 창안된 것으로, 로봇 청소기 등 소음을 유발하는 작업 장치에서의 음성 인식을 수행함에 있어 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 가능하게 하는 음성 인식 처리 방법 및 시스템을 제공하는 것을 목적으로 한다.The present invention was devised to solve the problems of the prior art as described above, and various noises generated during operation of the robot cleaner, as well as ambient noise in performing voice recognition in a work device causing noise such as a robot cleaner It is an object of the present invention to provide a speech recognition processing method and system that suppresses the occurrence of speech recognition errors due to the environment and enables more accurate speech recognition.

그 외 본 발명의 세부적인 목적은 아래에 기재되는 구체적인 내용을 통하여 이 기술 분야의 전문가나 연구자에게 자명하게 파악되고 이해될 수 있을 것이다.Other detailed objects of the present invention will be clearly understood and understood by experts or researchers in this technical field through specific contents described below.

상기 과제를 해결하기 위한 본 발명의 한 측면에 따른 음성 인식 처리 방법은, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법에 있어서, 음성 인식 장치가 상기 작업 장치의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신 단계; 상기 작업 장치의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터링 단계; 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 수행 단계;를 포함하는 것을 특징으로 한다.The speech recognition processing method according to an aspect of the present invention for solving the above problems is a speech recognition processing method for a work device causing noise of different characteristics according to an operation mode, wherein the speech recognition device is configured to An operation mode information receiving step of receiving information on an operation mode; An adaptive noise attenuation filtering step of performing adaptive noise attenuation filtering on user's voice data by changing settings according to an operation mode of the working device; And performing a speech recognition processing the speech recognition function on the filtered voice data of the user.

이때, 상기 동작 모드 정보 수신 단계에서는, 복수 종류의 작업 장치 중 작동 중인 작업 장치의 종류 및 동작 모드에 대한 정보를 수신하며, 상기 적응형 소음 감쇄 필터링 단계에서는, 상기 작동 중인 작업 장치의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.At this time, in the operation mode information receiving step, information on a type and operation mode of a working device among a plurality of types of working devices is received, and in the adaptive noise reduction filtering step, the type and operation of the working device are operated. Filtering can be performed by setting differently depending on the mode.

또한, 상기 동작 모드 정보 수신 단계에서는, 상기 작업 장치의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신하며, 상기 적응형 소음 감쇄 필터링 단계에서는, 상기 작업 장치의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수 있다.In addition, in the operation mode information receiving step, together with the operation mode of the work device, information on the position of the work device 200 is received together. In the adaptive noise reduction filtering step, the operation mode of the work device and Filtering can be performed by setting differently depending on the location.

또한, 상기 음성 인식 수행 단계에서는, 상기 필터링된 상기 사용자의 음성 데이터를 음성 인식 서버로 전송하여 음성 인식을 수행하도록 하며, 상기 음성 인식 서버에서는 상기 음성 인식 장치에서 수집된 소음 데이터를 이용하여 학습된 제2 필터를 이용하여 2차 필터링을 수행한 후 음성 인식을 수행할 수 있다.In addition, in the speech recognition step, the filtered speech data of the user is transmitted to a speech recognition server to perform speech recognition, and the speech recognition server is trained using noise data collected by the speech recognition device. After performing secondary filtering using the second filter, speech recognition may be performed.

이때, 상기 소음 데이터는, 상기 작업 장치의 실제 작동 시에 상기 음성 인식 장치에서 수집한 소음 데이터일 수 있다.In this case, the noise data may be noise data collected by the speech recognition device during actual operation of the working device.

또한, 상기 음성 인식 서버에서는, 상기 작업 장치의 동작 모드에 대한 정보와 상기 소음 데이터를 매핑해 저장하여, 상기 음성 데이터에 대한 음성 인식에 사용할 수 있다.In addition, the voice recognition server may map and store information on the operation mode of the working device and the noise data, and use the voice data for voice recognition.

또한, 상기 사용자의 음성 데이터에 대한 음성 인식에 앞서, 상기 작업 장치의 작동에 의해 유발된 소음이 아닌 상기 작업 장치가 동작하는 환경에서의 환경 소음을 전처리하는 환경 소음 전처리 단계를 더 포함할 수 있다.In addition, prior to speech recognition of the user's voice data, an environmental noise pre-processing step of pre-processing environmental noise in an environment in which the working device operates, not noise caused by the operation of the working device may be further included. .

이때, 상기 환경 소음 전처리 단계에서는, 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행할 수 있다.At this time, in the environmental noise pre-processing step, a signal-to-noise ratio (SNR) is calculated by dividing a speech section and a non-speech section for the speech data, and preprocessing by classifying speech data according to the signal-to-noise ratio (SNR) of the speech data You can do

또한, 본 발명의 다른 측면에 따른 음성 인식 처리 시스템은, 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템에 있어서, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치; 및 상기 작업 장치의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신부, 상기 작업 장치의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터부, 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 처리부를 구비하는 음성 인식 장치;를 포함하는 것을 특징으로 한다.In addition, the speech recognition processing system according to another aspect of the present invention, a speech recognition processing system for a work device that causes noise, the work device causing noise of different characteristics according to the operation mode; And an operation mode information receiving unit receiving information about an operation mode of the working device, and an adaptive noise reduction filter unit performing adaptive noise reduction filtering on the user's voice data by changing settings according to the operation mode of the working device. And a voice recognition device having a voice recognition processing unit that processes a voice recognition function for the filtered voice data of the user.

이에 따라, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서는, 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 수행할 수 있게 된다.Accordingly, in the voice recognition processing method and system for a work device causing noise according to an embodiment of the present invention, in performing voice recognition for a work device causing noise such as a robot cleaner, a work device such as a robot cleaner By processing the voice recognition in consideration of various noise environments caused by the operating state of the robot, it suppresses the occurrence of voice recognition errors due to various noise environments generated during operation of the robot cleaner, etc., as well as ambient noise, and more accurate voice recognition You can do it.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부도면은 본 발명에 대한 실시예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법의 순서도이다.
도 3은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템의 동작을 설명하는 도면이다.
도 4는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 및 소음의 학습을 설명하는 도면이다.
도 5는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 인식 알고리즘의 순서도이다.
도 6은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 환경 소음의 전처리를 설명하는 도면이다
도 7은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 환경 소음에 대한 전처리 프로세스를 설명하는 도면이다.The accompanying drawings, which are included as part of the detailed description to aid understanding of the present invention, provide embodiments of the present invention and describe the technical spirit of the present invention together with the detailed description.
1 is a block diagram of a voice recognition processing system for a work device that causes noise according to an embodiment of the present invention.
2 is a flowchart of a voice recognition processing method for a work device causing noise according to an embodiment of the present invention.
3 is a view for explaining the operation of the voice recognition processing method and system for a work device causing noise according to an embodiment of the present invention.
FIG. 4 is a diagram for explaining learning of speech and noise in a method and system for processing speech recognition for a work device causing noise according to an embodiment of the present invention.
5 is a flowchart of a voice recognition algorithm in a voice recognition processing method and system for a work device causing noise according to an embodiment of the present invention.
6 is a view for explaining a pre-processing of environmental noise in a voice recognition processing method and system for a work device causing noise according to an embodiment of the present invention.
7 is a view for explaining a voice recognition processing method for a work device causing noise and a pre-processing process for environmental noise in a system according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 이하에서는 특정 실시예들을 첨부된 도면을 기초로 상세히 설명하고자 한다.The present invention can be applied to various transformations and can have various embodiments. Hereinafter, specific embodiments will be described in detail with reference to the accompanying drawings.

이하의 실시예는 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.The following examples are provided to aid in a comprehensive understanding of the methods, devices and / or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시 예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. In describing the embodiments of the present invention, when it is determined that a detailed description of known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to a user's or operator's intention or practice. Therefore, the definition should be made based on the contents throughout this specification. The terminology used in the detailed description is only for describing embodiments of the present invention and should not be limiting. Unless expressly used otherwise, a singular form includes a plural form. In this description, expressions such as “comprising” or “equipment” are intended to indicate certain characteristics, numbers, steps, actions, elements, parts or combinations thereof, and one or more other than described. It should not be interpreted to exclude the presence or possibility of other characteristics, numbers, steps, actions, elements, or parts or combinations thereof.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Further, terms such as first and second may be used to describe various components, but the components are not limited by the terms, and the terms are used to distinguish one component from other components. Used only.

아래에서는, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에 대한 예시적인 실시 형태들을 첨부된 도면을 참조하여 차례로 설명한다.Hereinafter, exemplary embodiments of a voice recognition processing method and system for a work device causing noise according to an embodiment of the present invention will be sequentially described with reference to the accompanying drawings.

먼저, 도 1에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템(10)의 구성도가 도시되어 있다. 도 1에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템(10)은, 소음을 유발하는 작업 장치(200)에 대한 음성 인식 처리 시스템(10)으로서, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치(10); 및 상기 작업 장치(10)의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신부(110), 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터부(120), 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 처리부(130)를 구비하는 음성 인식 장치(100)를 포함하여 구성된다.First, FIG. 1 is a block diagram of a voice recognition processing system 10 for a work device that causes noise according to an embodiment of the present invention. As can be seen in FIG. 1, the speech recognition processing system 10 according to an embodiment of the present invention is a speech recognition processing system 10 for a work device 200 that causes noise, according to an operation mode Working devices 10 causing noise of different characteristics; And an operation mode information receiving unit 110 that receives information on an operation mode of the work device 10, and adaptive noise reduction filtering for user's voice data by changing settings according to an operation mode of the work device 200. It comprises an adaptive noise reduction filter unit 120 for performing, and a speech recognition device 100 having a speech recognition processing unit 130 for processing a speech recognition function for the filtered voice data of the user.

이때, 상기 음성 인식 장치(100)에서, 상기 동작 모드 정보 수신부(110)에서는, 복수 종류의 작업 장치(200) 중 작동 중인 작업 장치(200)의 종류 및 동작 모드에 대한 정보를 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터부(120)에서는, 상기 작동 중인 작업 장치(200)의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.At this time, in the voice recognition apparatus 100, the operation mode information receiving unit 110 may receive information on the type and operation mode of the working apparatus 200 in operation among the plurality of kinds of working apparatus 200, Accordingly, the adaptive noise reduction filter unit 120 may perform filtering by differently setting according to the type and operation mode of the working device 200 in operation.

또한, 상기 음성 인식 장치(100)에서, 상기 동작 모드 정보 수신부(110)에서는, 상기 작업 장치(200)의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터부(120)에서는, 상기 작업 장치(200)의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수도 있다.In addition, in the speech recognition device 100, the operation mode information receiving unit 110 may receive information on the location of the work device 200 together with the operation mode of the work device 200, Accordingly, the adaptive noise attenuation filter unit 120 may perform filtering by differently setting according to the operation mode and position of the working device 200.

이에 따라, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템(10)에서는, 로봇 청소기 등 소음을 유발하는 작업 장치(200)에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치(200)의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 수행할 수 있게 된다.Accordingly, in the voice recognition processing system 10 for a work device that causes noise, according to an embodiment of the present invention, in performing voice recognition for a work device 200 that causes noise, such as a robot cleaner, the robot By processing voice recognition in consideration of various noise environments caused by the operation state of the work device 200 such as a vacuum cleaner, generation of voice recognition errors due to various noise environments generated during operation of the robot cleaner, etc., as well as ambient noise And can perform more accurate speech recognition.

또한, 도 2에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법의 순서도가 도시되어 있다. 도 2에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 방법은, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치(200)에 대한 음성 인식 처리 방법으로서, 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신 단계(S100), 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터링 단계(S200) 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 수행 단계(S300)를 포함하게 된다.2 is a flowchart of a voice recognition processing method for a work device causing noise according to an embodiment of the present invention. As can be seen in Figure 2, the speech recognition processing method according to an embodiment of the present invention, as a speech recognition processing method for the work device 200 causing noise of different characteristics according to the operation mode, speech recognition The operation mode information receiving step (S100), in which the device 100 receives information on the operation mode of the work device 200, changes the setting according to the operation mode of the work device 200, so that the user It includes an adaptive noise attenuation filtering step (S200) for performing adaptive noise reduction filtering and a speech recognition performance step (S300) for processing the speech recognition function on the filtered voice data of the user.

이하, 도 1 및 도 2를 참조하여 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)을 각 구성별로 나누어 보다 자세하게 살핀다.Hereinafter, with reference to FIGS. 1 and 2, a voice recognition processing method and system 10 for a work device causing noise according to an embodiment of the present invention are divided into respective configurations and examined in more detail.

먼저, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하게 된다.First, in the operation mode information receiving step (S100), the voice recognition device 100 receives information about an operation mode of the work device 200.

이때, 상기 음성 인식 장치(100)는 음성 인식 기능을 가지는 인공지능(AI) 스피커이거나 여러 장치에 대한 음성인식 제어 기능을 제공하는 음성인식 허브 장치일 수 있으며, 또는 로봇 청소기에 대한 음성 명령 기능과 함께 충전 등 관리 기능을 가지는 관리 스테이션(maintenance station)일 수도 있다. 그러나, 본 발명이 반드시 이에 한정되는 것은 아니며, 이외에도 상기 작업 장치(200)에 대한 음성 인식 기능을 처리하는 다양한 장치들도 본 발명에서 사용될 수 있다.At this time, the voice recognition device 100 may be an artificial intelligence (AI) speaker having a voice recognition function or a voice recognition hub device that provides voice recognition control functions for various devices, or a voice command function for a robot cleaner It may also be a maintenance station having management functions such as charging. However, the present invention is not necessarily limited to this, and in addition, various devices that process a voice recognition function for the working device 200 may also be used in the present invention.

또한, 상기 작업 장치(200)는 로봇 청소기나 세탁기 등 작동 시에 소음을 유발하는 다양한 장치일 수 있다. 이때, 본 발명에서 상기 작업 장치(200)는 동작 모드에 따라 서로 다른 특성의 소음을 유발하게 된다. 보다 구체적인 예를 들어, 상기 로봇 청소기는 일반 청소 모드에서 보다 터보 청소 모드에서 보다 유발되는 소음의 대역 및 크기가 달라질 수 있다. 나아가, 사용자가 상기 음성 인식 장치(100)에 근접하여 발화하지 않고 원거리에서 발화하는 경우에는 상기 소음에 의한 영향이 더욱 커질 수 있는 바, 음성 인식에 오류가 발생할 가능성이 커질 수 있다.In addition, the working device 200 may be various devices that cause noise during operation such as a robot cleaner or a washing machine. At this time, in the present invention, the working device 200 causes noise of different characteristics according to the operation mode. For a more specific example, the robot cleaner may have a different band and size of noise caused in the turbo cleaning mode than in the normal cleaning mode. Furthermore, when a user speaks at a long distance without igniting near the speech recognition apparatus 100, the influence of the noise may be further increased, so that an error may occur in speech recognition.

이에 따라, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하고, 이에 따라 유발될 수 있는 소음의 특성을 고려하여 음성 인식을 수행하도록 함으로써, 소음에 의한 음성 인식 오류의 발생을 효과적으로 억제할 수 있게 된다.Accordingly, in the operation mode information receiving step (S100), the voice recognition device 100 receives information on the operation mode of the work device 200, and considers characteristics of noise that may be caused accordingly By performing speech recognition, it is possible to effectively suppress the occurrence of a speech recognition error due to noise.

이어서, 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하게 된다.Subsequently, in the adaptive noise attenuation filtering step (S200), adaptive noise attenuation filtering is performed on voice data of a user by changing settings according to an operation mode of the working device 200.

보다 구체적인 예를 들어, 상기 로봇 청소기가 터보 청소 모드로 동작하는 경우에는 일본 청소 모드로 동작하는 경우보다 소음의 크기가 커질 수 있으며, 나아가 소음이 유발되는 주파수 대역도 달라질 수 있다.For a more specific example, when the robot cleaner is operated in the turbo cleaning mode, the noise level may be greater than when the robot cleaner is operated in the Japanese cleaning mode, and further, the frequency band in which the noise is caused may be changed.

이에 따라, 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행함으로써, 사용자의 음성 데이터에서 소음을 효과적으로 감쇄시켜 음성 인식의 정확도 등을 효과적으로 개선할 수 있게 된다.Accordingly, in the adaptive noise reduction filtering step (S200), by adaptive noise reduction filtering on the user's voice data by changing the setting according to the operation mode of the working device 200, the user's voice data It is possible to effectively improve the accuracy of speech recognition by attenuating noise effectively.

나아가, 상기 동작 모드 정보 수신 단계(S100)에서는, 복수 종류의 작업 장치(200) 중 작동 중인 작업 장치(200)의 종류 및 동작 모드에 대한 정보를 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작동 중인 작업 장치(200)의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.Furthermore, in the operation mode information receiving step (S100), information on the type and operation mode of the working device 200 in operation among a plurality of types of work devices 200 may be received, and accordingly, the adaptive noise reduction In the filtering step (S200), filtering may be performed by differently setting according to the type and operation mode of the working device 200 in operation.

이때, 상기 음성 인식 장치(100)에서는 마이크 등의 음성 입력 수단을 자체적으로 구비하여 사용자의 음성을 입력받을 수 있겠으나, 본 발명이 반드시 이에 한정되는 것은 아니다.At this time, the voice recognition device 100 may be provided with a voice input means such as a microphone itself to receive a user's voice, but the present invention is not necessarily limited thereto.

또한, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 작업 장치(200)의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수도 있다.In addition, in the operation mode information receiving step (S100), information on the location of the work device 200 may be received together with the operation mode of the work device 200, and accordingly, the adaptive noise reduction filtering In step S200, filtering may be performed by differently setting according to an operation mode and a location of the working device 200.

즉, 상기 작업 장치(200)의 동작 모드뿐만 아니라, 작업 장치(200)의 종류에 따라서도 소음의 특성이 달라질 수 있으며, 또는 작업 장치(200)의 위치에 따라서도 소음의 특성이 달라질 수 있는 바, 이를 고려하여 필터링을 수행하여 줌으로써, 보다 효과적인 필터링을 수행할 수 있게 된다.That is, the characteristics of the noise may vary depending on the type of the working apparatus 200 as well as the operation mode of the working apparatus 200, or the characteristics of the noise may vary depending on the location of the working apparatus 200. Bar, by performing the filtering in consideration of this, it is possible to perform more effective filtering.

보다 구체적인 예를 들어, 작업 장치(200)가 로봇 청소기 또는 세탁기인 경우 그 작동시에 유발되는 소음의 주파수 대역과 크기가 달라질 수 있으며, 나아가 로봇 청소기 등 작업 장치(200)의 이동 위치에 따라서도 음성 인식 장치(100)에서 느끼게 되는 소음의 특성이 크게 달라질 수 있는 바, 상기 작업 장치(200)의 종류 또는 이동 위치에 대한 정보를 함께 고려하여 음성 데이터에 대한 필터링을 수행하는 경우 음성 인식의 정확도 등을 크게 개선할 수 있게 된다.For a more specific example, when the work device 200 is a robot cleaner or a washing machine, the frequency band and size of noise caused during its operation may vary, and further, depending on the moving position of the work device 200 such as a robot cleaner, Since the characteristics of the noise felt by the speech recognition device 100 may vary greatly, the accuracy of speech recognition is performed when filtering the speech data in consideration of information on the type or movement location of the working device 200 The back can be greatly improved.

마지막으로, 상기 음성 인식 수행 단계(S300)에서는 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하게 된다. 이때, 상기 음성 처리 장치(100)는 직접 상기 음성 인식을 위한 처리를 수행할 수도 있겠으나, 본 발명이 반드시 이에 한정되는 것은 아니며, 음성 인식 서버(400) 등 다른 장치로 음성 데이터를 전송하여 음성 인식 기능을 수행하게 할 수도 있다.Finally, in the step of performing the speech recognition (S300), the speech recognition function is processed for the filtered speech data of the user. At this time, the speech processing apparatus 100 may directly perform the processing for the speech recognition, but the present invention is not necessarily limited thereto, and the speech processing server 400 transmits speech data to another device, such as speech. It can also be used to perform recognition functions.

보다 구체적으로, 상기 음성 인식 수행 단계(S300)에서는, 상기 필터링된 상기 사용자의 음성 데이터를 음성 인식 서버(400)로 전송하여 음성 인식을 수행하도록 할 수 있다.More specifically, in the step of performing the speech recognition (S300), the filtered speech data of the user may be transmitted to the speech recognition server 400 to perform speech recognition.

나아가, 상기 음성 인식 서버(400)에서는 상기 음성 인식 장치(100)에서 수집된 소음 데이터를 이용하여 학습된 제2 필터를 이용하여 2차 필터링을 수행한 후 음성 인식을 수행할 수도 있다. 이때, 상기 소음 데이터로서 상기 작업 장치(200)의 실제 작동 시에 상기 음성 인식 장치(100)에서 수집한 소음 데이터를 사용함으로써, 실제 소음 데이터에 대한 보다 정확한 모델링 및 학습을 통해 음성 인식 오류의 발생을 보다 효과적으로 억제할 수 있게 된다.Furthermore, the speech recognition server 400 may perform secondary filtering using the second filter learned using the noise data collected by the speech recognition apparatus 100 and then perform speech recognition. At this time, by using the noise data collected by the speech recognition device 100 during the actual operation of the working device 200 as the noise data, generation of a speech recognition error through more accurate modeling and learning of the actual noise data Can be suppressed more effectively.

또한, 상기 음성 인식 서버(400)에서는, 상기 작업 장치(200)의 동작 모드에 대한 정보와 상기 소음 데이터를 매핑해 저장하고, 이에 대한 학습 등을 통해 상기 음성 데이터에 대한 음성 인식에 사용할 수 있다.In addition, the voice recognition server 400 may map and store information on the operation mode of the work device 200 and the noise data, and use the voice data for voice recognition through learning, etc. .

나아가, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법에서는, 상기 사용자의 음성 데이터에 대한 음성 인식에 앞서, 상기 작업 장치(200)의 작동에 의해 유발된 소음이 아닌 상기 작업 장치(200)가 동작하는 환경에서의 환경 소음을 전처리하는 환경 소음 전처리 단계(미도시)를 포함할 수 있다.Furthermore, in the voice recognition processing method for a work device causing noise according to an embodiment of the present invention, prior to voice recognition of the user's voice data, the noise caused by the operation of the work device 200 is generated. Non-environmental noise pre-processing step (not shown) may include pre-processing environmental noise in an environment in which the working device 200 operates.

보다 구체적으로, 상기 환경 소음 전처리 단계에서는, 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행할 수 있으며, 이를 통해 음성 인식 프로세스에 앞서 환경 소음에 대한 전처리를 수행하여 줌으로써 환경 소음에 의한 음성 인식 오류의 발생을 방지하여 보다 정확한 음성 인식을 수행할 수 있게 된다.More specifically, in the environmental noise pre-processing step, a signal-to-noise ratio (SNR) is calculated by dividing a voice section and a non-voice section for the voice data, and classifying the voice data according to the signal-to-noise ratio (SNR) of the voice data. In this way, pre-processing of environmental noise is performed prior to the speech recognition process, thereby preventing occurrence of a speech recognition error due to environmental noise, thereby enabling more accurate speech recognition.

또한, 도 3에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)의 동작을 보다 자세하게 설명하고 있다. 도 3에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템(10)에서 인공지능(AI) 스피커 등 음성 인식 장치(100)에는 로봇 청소기, 세탁기, 기타 IoT 단말 등의 상태 등 동작 모드에 대한 정보를 수신할 수 있는 무선 통신 모듈 등이 탑재되어 동작 모드 정보 수신부(110)를 구성할 수 있고, 또한 적응형 소음 감쇄 (Adaptive Noise Reduction) 소프트웨어가 탑재되어 적응형 소음 감쇄 필터부(120)를 구성할 수 있으며, 나아가 음성인식 앱 등이 설치되어 음성 인식 처리부(130)를 구성할 수 있다. In addition, FIG. 3 describes in detail the operation of the voice recognition processing method and system 10 for a work device causing noise according to an embodiment of the present invention. As can be seen in Figure 3, the voice recognition device 100, such as an artificial intelligence (AI) speaker in the voice recognition processing system 10 according to an embodiment of the present invention, the state of a robot cleaner, washing machine, other IoT terminal, etc. A wireless communication module capable of receiving information on the operation mode, etc., is mounted to configure the operation mode information receiving unit 110, and an adaptive noise reduction software is also installed to adaptive noise reduction filter. The unit 120 may be configured, and further, a speech recognition app, etc. may be installed to configure the speech recognition processing unit 130.

이때, 상기 적응적 소음 감쇄 소프트웨어는 다양한 IoT 단말에서 수집한 소음 데이터를 모델링하고 학습하여 적응적으로 필터링할 수 있는 기능을 갖출 수 있다. In this case, the adaptive noise reduction software may have a function of adaptively filtering by modeling and learning noise data collected from various IoT terminals.

또한, 상기 IoT 단말의 동작 모드로서는 IoT 단말에 대한 전원 On/Off 로부터 실행중인 각종 동작 모드 등에 대한 정보들을 수집할 수 있다. 보다 구체적인 예를 들어 자세히 설명하면, 로봇 청소기의 경우 일반 청소 모드와, 터보 청소 모드에서 발생하는 소음의 주파수 대역이 달라질 수 있으며, 또한 상기 로봇 청소기 등 작업 장치(200)와 인공지능(AI) 스피커 등 음성 인식 장치(100)의 거리에 따라서도 신호 대 잡음비(SNR) 값이 달라질 수 있다. 이에 따라, 도 3에서 볼 수 있는 바와 같이, IoT 제어 서버 등 작업 장치 제어 서버(300)에서는 상기 로봇 청소기 등 IoT 단말의 상태 등 동작 모드를 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)로 전송하여 적응적으로 필터링을 적응적으로 조절하게 된다. 또한, 상기 로봇 청소기의 관리 스테이션(maintenance station)이 음성 인식 장치(100)로 동작하는 경우, 상기 관리 스테이션에 탑재되는 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)에서는 로봇 청소기와의 거리 데이터를 확보하여 소음의 유입 정도를 판단하고 이에 따라 적응형 소음 감쇄 필터(Adaptive NR Filter)의 설정치를 조절하게 된다. 이때, 상기 적응형 소음 감쇄 필터(Adaptive NR Filter)를 통해 필터링된 음성 데이터는 필요에 따라 음성 인식 서버(400)로 전달되어 음성 인식이 수행될 수 있으며, 나아가 상기 음성 인식 서버(400)에서는 미리 학습된 소음 데이터를 반영하여 2차 필터링를 수행하여 음성 인식을 수행함으로써, 음성 인식의 정확도를 보다 개선할 수도 있다. 이때, 상기 소음 데이터는 상기 음성 인식 장치(100)가 사용되는 실제 환경에서 수집된 소음 데이터일 수 있으며, 본 발명에서는 추가적으로 주변에 사용하는 IoT 단말에서 유발되는 소음도 수집하고 학습함으로써 보다 정확한 음성 인식 성능을 도출할 수도 있다. 이때, 상기 음성 인식 서버(400)는 수집된 소음을 상기 IoT 단말 등 작업 장치(200)의 동작 모드 정보와 같이 매핑하여 저장하고 학습에 사용할 수 있다. 상기 매핑 정보에 대하여 보다 구체적인 예를 들어 설명하면, 상기 로봇 청소기는 청소 모드에 따라 소음의 크기, 주파수 대역을 포함하는 소음 특성이 달라진다. 이에 따라, 상기 로봇 청소기가 터보 청소 모드로 구동되면, 그에 따른 소음 데이터를 터보 청소 모드에 매핑해 저장하여 학습하며, 상기 로봇 청소기가 일반 청소 모드로 구동되면, 그에 따른 소음 데이터를 일반 청소 모드에 매핑해 저장하여 학습하게 된다.In addition, as the operation mode of the IoT terminal, information on various operation modes, etc. being executed from power on / off for the IoT terminal may be collected. In more detail, for example, in the case of the robot cleaner, the frequency band of the noise generated in the normal cleaning mode and the turbo cleaning mode may be different, and in addition, the working device 200 such as the robot cleaner and the artificial intelligence (AI) speaker The signal-to-noise ratio (SNR) value may also vary according to the distance of the speech recognition apparatus 100. Accordingly, as can be seen in FIG. 3, the work control device 300 such as an IoT control server transmits an operation mode such as the state of the IoT terminal such as the robot cleaner to adaptive noise reduction software (Adaptive NR SW) to adapt Filtering is adaptively adjusted. In addition, when the maintenance station of the robot cleaner operates as the voice recognition device 100, the adaptive noise reduction software (Adaptive NR SW) mounted on the management station secures distance data from the robot cleaner The inflow level of the noise is determined and the set value of the adaptive noise reduction filter is adjusted accordingly. At this time, the voice data filtered through the adaptive noise reduction filter (Adaptive NR Filter) may be transmitted to the voice recognition server 400 as necessary to perform voice recognition, and furthermore, in the voice recognition server 400, By performing secondary filtering by reflecting the learned noise data to perform speech recognition, the accuracy of speech recognition may be further improved. In this case, the noise data may be noise data collected in a real environment in which the speech recognition device 100 is used, and in the present invention, by further collecting and learning noise caused by an IoT terminal used in the surroundings, more accurate speech recognition performance Can also be derived. At this time, the voice recognition server 400 may map and store the collected noise as the operation mode information of the work device 200 such as the IoT terminal and use it for learning. When describing the mapping information with a more specific example, the robot cleaner has a noise characteristic including a noise level and a frequency band according to a cleaning mode. Accordingly, when the robot cleaner is driven in the turbo cleaning mode, the noise data is mapped and stored in the turbo cleaning mode to be learned, and when the robot cleaner is driven in the normal cleaning mode, the resulting noise data is transferred to the normal cleaning mode. You will learn by mapping and saving.

또한, 도 4에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)에서의 음성 및 소음의 학습을 설명하고 있다. In addition, FIG. 4 describes learning of speech and noise in the method and system 10 of speech recognition processing for a work device causing noise according to an embodiment of the present invention.

먼저, 도 4에서는, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서 적응적으로 소음을 인식하고 필터링하기 위해 음성 데이터베이스(410)와 소음 데이터베이스(420)를 학습하는 과정을 나타내고 있다. 이때, 상기 음성 데이터베이스(410)는 사용자의 음성 명령어를 모델링하고 학습하게 된다. 또한, 도 4에 도시된 바와 같이 주변에 작업 장치(200) 등 소음원이 있는 경우 소음원과의 거리 및 사용자 위치의 변경에 따른 다양한 조건에서의 음성 명령어를 녹음하여 학습을 수행하게 된다. 또한, 작업 장치(200)에 의한 소음 데이터를 수집함에 있어서는, 상기 작업 장치(200)의 동작 모드와 상기 작업 장치(200)까지의 거리 등의 정보를 매핑하여 소음 데이터베이스(420)에 저장하게 된다. 상기 작업 장치(200)의 동작 모드는 IoT 제어 서버(300) 등에서 수집하여 관리하는 IoT 단말 등 작업 장치(200)의 상태 정보로부터 수집할 수 있다. 이에 따라, 학습된 소음 데이터는 모델링 과정을 수행 후 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)의 필터링(Filter) 설정에 반영할 수 있으며, 나아가 상기 음성 인식 서버(400)에서의 소음 필터링(Noise Filtering)에서도 사용될 수 있다. 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는 상기한 바와 같이 다양한 소음 특성을 가지는 작업 장치(200)가 존재하는 환경에서도 뛰어난 음성 인식 성능을 확보할 수 있게 된다.First, in FIG. 4, a process of learning the voice database 410 and the noise database 420 to adaptively recognize and filter noise in the voice recognition processing method and system 10 according to an embodiment of the present invention Is showing. At this time, the voice database 410 models and learns a user's voice command. In addition, as shown in FIG. 4, when there is a noise source such as a work device 200 in the vicinity, learning is performed by recording voice commands under various conditions according to a change in distance from a noise source and a user's location. In addition, in collecting noise data by the work device 200, information such as an operation mode of the work device 200 and a distance to the work device 200 is mapped and stored in the noise database 420. . The operation mode of the work device 200 may be collected from status information of the work device 200 such as an IoT terminal collected and managed by the IoT control server 300 or the like. Accordingly, the learned noise data may be reflected in the filter setting of the adaptive noise reduction software (Adaptive NR SW) after performing the modeling process, and further, noise filtering in the voice recognition server 400 ). In the voice recognition processing method and system 10 according to an embodiment of the present invention, it is possible to secure excellent voice recognition performance even in an environment in which a work device 200 having various noise characteristics is present as described above.

보다 구체적으로, 도 5에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 인식 알고리즘의 순서도를 예시하고 있다.More specifically, FIG. 5 illustrates a flowchart of a speech recognition algorithm in a method and system for processing speech recognition for a work device causing noise according to an embodiment of the present invention.

도 5에 도시된 바와 같이 본 발명의 일 실시예에 따른 음성 인식 엔진은 전처리 구조를 가진다. 먼저, 사용자의 음성 신호가 입력되면 EPD(End Point Detection)를 통해 음성 신호의 끝점을 검출한다(S1100). 또한, 전처리 단계(Pre Processing)에서는 로봇 청소기 등 작업 장치(200)의 동작 모드나 위치에 따른 통해 소음 레벨을 반영한다(S1200). 여기서, 신호 대 잡음비(Signal to Noise Ratio, SNR)는 신호의 품위 레벨의 척도로 정의된다. 일반적으로 음성 신호는 단독으로 존재하지 않고 대개 잡음과 섞여 있으므로, 상기 신호 대 잡음비(SNR)은 그 비율을 나타낸다. 보다 구체적으로, 작동중인 로봇 청소기 등 작업 장치(200)가 관리 스테이션(maintenance station) 등 음성 인식 장치(100)와 가까워질수록 소음의 크기가 증가한다. 상기 전처리 단계(Pre Processing)에서는 로봇 청소기 등 작업 장치(200)의 위치 정보를 반영하여 보다 정밀하게 소음을 감쇄시키게 된다. 이어서, 상기 전처리 단계를 거친 음성 데이터에 대하여 인자 추출(Feature Extraction)을 수행하며(S1300), 이에 따라 음성 인식(Recognition) 과정을 거쳐(S1400) 결과 텍스트(Text)를 얻을 수 있게 되며, 상기 텍스트(Text)는 대화 서버(미도시) 등에서 명령어 해석 등 추가 처리를 수행하게 된다. 이에 따라, 상기 대화 서버는 상기 텍스트(Text)가 로봇 청소기에 대한 제어 명령이면 로봇 청소기에 대한 작업 장치 제어 서버(300)로 전달하며, 상기 작업 장치 제어 서버(300)는 로봇 청소기의 상태 정보를 수집 등을 거쳐 상기 제어 명령을 상기 로봇 청소기로 전송하게 된다. 5, the speech recognition engine according to an embodiment of the present invention has a pre-processing structure. First, when a user's voice signal is input, the end point of the voice signal is detected through End Point Detection (EPD) (S1100). In addition, in the pre-processing step (Pre Processing), the noise level is reflected through the operation mode or location of the working device 200 such as a robot cleaner (S1200). Here, the signal to noise ratio (SNR) is defined as a measure of the quality level of the signal. In general, the speech signal does not exist alone and is usually mixed with noise, so the signal-to-noise ratio (SNR) represents the ratio. More specifically, as the working device 200, such as a robot cleaner in operation, approaches the voice recognition device 100, such as a maintenance station, the noise level increases. In the pre-processing step, noise is attenuated more precisely by reflecting the location information of the working device 200 such as a robot cleaner. Subsequently, a feature extraction is performed on the speech data that has been subjected to the pre-processing step (S1300), and thus, through the speech recognition process (S1400), the resulting text can be obtained. (Text) performs additional processing such as command interpretation in a conversation server (not shown). Accordingly, if the text (Text) is a control command for the robot cleaner, the conversation server transmits it to the work device control server 300 for the robot cleaner, and the work device control server 300 receives status information of the robot cleaner. After the collection, the control command is transmitted to the robot cleaner.

또한, 도 6에서 볼 수 있는 바와 같이, 본 발명에서는 상기 작업 장치(200)의 작동에 의해 유발된 소음이 아닌 상기 작업 장치(200)가 동작하는 환경에서의 환경 소음을 전 처리를 통하여 음성 인식 성능을 보다 효과적으로 개선할 수 있다. In addition, as can be seen in FIG. 6, in the present invention, the environmental noise in the environment in which the working device 200 operates, not the noise caused by the operation of the working device 200, is recognized through pre-processing. Performance can be improved more effectively.

이에 따라, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행하게 된다.Accordingly, in the speech recognition processing method and system 10 according to an embodiment of the present invention, a signal-to-noise ratio (SNR) is calculated by dividing a speech section and a non-speech section for the speech data, and Pre-processing is performed by classifying voice data according to a noise ratio (SNR).

보다 구체적으로, 본 발명의 일 실시예에 따른 상기 환경 소음에 대한 전처리 알고리즘은 소음 신호를 기반으로 Smoothed PM(Peak Magnitude), MCR(Mean Crossing Ratio)과 SNR Maximum, Tilda MCR 등 효과적인 결정 규칙을 적용하여 상대적으로 잡음이 적은 신호를 Class A로, 중간 정도의 잡음이 낀 신호를 Class B로, 그리고 상대적으로 잡음이 많이 낀 신호를 Class C로 분류하여 음성 인식 성능을 더욱 개선할 수 있게 된다.More specifically, the pre-processing algorithm for environmental noise according to an embodiment of the present invention applies effective decision rules such as Smoothed PM (Peak Magnitude), MCR (Mean Crossing Ratio) and SNR Maximum, and Tilda MCR based on the noise signal. Thus, a relatively low noise signal is classified as Class A, a moderately noisy signal is classified as Class B, and a relatively noisy signal is classified as Class C, thereby improving speech recognition performance.

보다 구체적으로 도 7에서는 상기 음성 데이터에 대하여 소음의 정도에 따라 Class A, Class B, Class C로 분류하기 위한 프로세스를 도시하고 있다. 도 7에서 볼 수 있는 바와 같이, 먼저 음성 신호가 입력되면(1510), NACF(Normalized Autocorrelation Function)을 사용하여(1520) NACF의 피크 크기(Peak Magnitude) 값(1530)과 입력된 음성 신호의 8 프레임의 신호 대 잡음비 최대치(Signal Noise Ratio Maximum) 값을 구한다(1560). 이러한 과정은 입력된 음성 신호에서 음성 및 비음성 구간을 구별하기 위하여 사용될 수 있다. More specifically, FIG. 7 shows a process for classifying the voice data into Class A, Class B, and Class C according to the degree of noise. As can be seen in FIG. 7, when a voice signal is first input (1510), a peak magnitude value (1530) of the NACF and 8 of the input voice signal using a Normalized Autocorrelation Function (NACF) (1520) The signal-to-noise ratio maximum value of the frame is obtained (1560). This process can be used to distinguish between speech and non-speech intervals from the input speech signal.

이때, 상기 NACF의 피크 인덱스(Peak Index)와 피크 크기(Peak Magnitude) 값은 아래 수학식 1과 같이 정의될 수 있다. In this case, the peak index (Peak Index) and the peak magnitude (Peak Magnitude) of the NACF may be defined as in Equation 1 below.

[수학식 1][Equation 1]

이에 따라, 수학식 1을 통해 모든 프레임에서 PIS(Peak Index Series), PMS(Peak Magnitude Series)를 측정할 수 있다. 상기 PIS와 PMS는 각각 음성 구간에서는 부드럽게 나타나게 되고, 반대로 비음성 구간에서는 진동이 매우 크게 나타나게 되므로, 이를 이용하여 상기 음성 신호의 음성 구간과 비음성 구간을 구별할 수 있게 된다.Accordingly, the peak index series (PIS) and peak magnitude series (PMS) can be measured in all frames through Equation (1). The PIS and the PMS appear smoothly in each voice section, and conversely, the vibration is very large in the non-speech section, so that the voice section and the non-speech section of the voice signal can be distinguished.

이어서, 상기 PM(Peak Magnitude)값에 롱텀 스무딩(Long term smoothing)을 적용(1540)한 후 교차 비율(Crossing Ratio) 값을 구한다(1550). 상기 스무딩된 PM(Smoothed PM)은 아래 수학식 2와 같이 정의될 수 있다.Subsequently, after applying long term smoothing to the peak magnitude (PM) value (1540), a crossing ratio value is obtained (1550). The smoothed PM may be defined as Equation 2 below.

[수학식 2][Equation 2]

여기서, α는 롱텀 스무딩 파라미터를 의미한다. Here, α means a long term smoothing parameter.

또한, 아래 수학식 3을 사용하여 상기 스무딩된 PM(Smoothed PM)의 새로운 교차 비율(New Crossing Ratio) 값을 구하게 된다(1580). In addition, a new crossing ratio value of the smoothed PM is obtained using Equation 3 below (1580).

[수학식 3][Equation 3]

또한, 본 발명에서는 신호 대 잡음비(Signal to Noise Ratio)를 음성 신호의 품위 레벨의 척도로 사용한다. 이때, 음성 신호는 단독으로 존재하지 않고 대개 잡음과 섞여있다. 이에 따라, 상기 음성 신호와 소음의 비율을 나타내는 척도로서 신호 대 잡음비(Signal To Noise Ratio)가 쓰이며, 이는 아래 수학식 4와 같이 정의될 수 있다. In addition, in the present invention, the signal-to-noise ratio is used as a measure of the quality level of a speech signal. At this time, the voice signal does not exist alone and is usually mixed with noise. Accordingly, a signal-to-noise ratio is used as a measure of the ratio of the voice signal and noise, which can be defined as in Equation 4 below.

[수학식 4][Equation 4]

여기서, 상기 En(t)는 초기 8프레임의 평균값을 말한다.Here, the En (t) refers to the average value of the initial 8 frames.

또한, 상기 신호 대 잡음비 최대치(SNR Maximum) 값을 구하기 위해서 아래 수학식 5를 활용할 수 있다(1560). In addition, Equation 5 below may be used to obtain the signal-to-noise ratio (SNR Maximum) value (1560).

[수학식 5][Equation 5]

나아가, 시그모이드(Sigmoid) 함수를 적용하여 아래 수학식 6과 같이 정의할 수 있다(1570). Furthermore, a sigmoid function may be applied to define as shown in Equation 6 below (1570).

[수학식 6][Equation 6]

이에 따라, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는, 종래의 방식보다 신호 대 잡음비(SNR)을 크게 향상시킬 수 있으며, 나아가 환경 소음을 그 정도에 따라 Class 별도 구분하여 전처리함으로써, 음성 인식의 성공률을 개선할 수 있게 된다.Accordingly, in the speech recognition processing method and system 10 according to an embodiment of the present invention, the signal-to-noise ratio (SNR) can be significantly improved compared to the conventional method, and further, the environmental noise is classified according to the class. By doing so, it is possible to improve the success rate of speech recognition.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments described in the present invention are not intended to limit the technical spirit of the present invention, but to explain them, and are not limited to these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical spirits within the equivalent range should be interpreted as being included in the scope of the present invention.

10 : 음성 인식 처리 시스템
100 : 음성 인식 장치
110 : 동작 모드 정보 수신부
120 : 적응형 소음 감쇄 필터부
130 : 음성 인식 처리부
200 : 작업 장치
300 : 작업 장치 제어 서버
310 : 작업 장치 제어부
320 : 통신부
400 : 음성 인식 서버
410 : 음성 데이터베이스
420 : 소음 데이터베이스
500 : 사용자10: speech recognition processing system
100: speech recognition device
110: operation mode information receiving unit
120: adaptive noise reduction filter
130: speech recognition processing unit
200: working device
300: working device control server
310: work device control
320: communication unit
400: speech recognition server
410: voice database
420: noise database
500: user

Claims

In the speech recognition processing method for a work device causing noise of different characteristics according to the operation mode,
An operation mode information receiving step in which a speech recognition device receives information on an operation mode of the work device;
An adaptive noise attenuation filtering step of performing adaptive noise attenuation filtering on user's voice data by changing settings according to an operation mode of the working device; And
Performing voice recognition processing a voice recognition function on the filtered voice data of the user;
Speech recognition processing method comprising a.

According to claim 1,
In the operation mode information receiving step,
Receives information on the type and operation mode of the working device among the plurality of types of working devices,
In the adaptive noise reduction filtering step,
A voice recognition processing method characterized in that filtering is performed by changing settings according to the type and operation mode of the working device.

According to claim 1,
In the operation mode information receiving step,
Together with the operation mode of the working device receives information about the location of the working device 200,
In the adaptive noise reduction filtering step,
A speech recognition processing method characterized in that filtering is performed by setting differently according to an operation mode and a location of the working device.

According to claim 1,
In the step of performing the speech recognition,
To transmit the filtered voice data of the user to a speech recognition server to perform speech recognition,
The speech recognition server performs speech recognition after performing secondary filtering using a second filter learned using noise data collected by the speech recognition apparatus, and then performs speech recognition.

According to claim 4,
The noise data,
A voice recognition processing method characterized in that it is noise data collected by the voice recognition device during actual operation of the working device.

According to claim 4,
In the speech recognition server,
A method of processing speech recognition, characterized in that information on the operation mode of the working device and the noise data are mapped and stored and used for speech recognition of the speech data.

According to claim 1,
Prior to speech recognition of the user's speech data,
And pre-processing environmental noise in an environment in which the working device operates, not noise caused by the operation of the working device.

The method of claim 7,
In the environmental noise pre-treatment step,
Signal to noise ratio (SNR) is calculated by dividing the voice section and the non-voice section for the voice data,
A speech recognition processing method, characterized in that pre-processing is performed by classifying speech data according to a signal-to-noise ratio (SNR) of the speech data.

In the speech recognition processing system for the work device causing the noise,
Working devices causing noise of different characteristics according to the operation mode; And
Operation mode information receiving unit for receiving information about the operation mode of the working device,
An adaptive noise reduction filter unit configured to perform adaptive noise reduction filtering on the user's voice data by changing settings according to an operation mode of the working device, and
A speech recognition device having a speech recognition processing unit that processes a speech recognition function for the filtered user's speech data;
Speech recognition processing system comprising a.