KR102333376B1

KR102333376B1 - Speech recognition processing method for noise generating working device and system thereof

Info

Publication number: KR102333376B1
Application number: KR1020180127145A
Authority: KR
Inventors: 김기현; 김현숙; 정진수; 최연주
Original assignee: 주식회사 케이티
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2021-12-02
Also published as: KR20200046262A

Abstract

본 발명은 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에 관한 것으로서, 보다 구체적으로는 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 소음을 유발하는 작업 장치에 대한 보다 정확한 음성 인식을 가능케 하는 음성 인식 처리 방법 및 시스템에 관한 것이다.
본 발명에서는, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법에 있어서, 음성 인식 장치가 상기 작업 장치의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신 단계; 상기 작업 장치의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터링 단계; 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 수행 단계;를 포함하는 것을 특징으로 하는 음성 인식 처리 방법을 개시한다.The present invention relates to a method and system for processing a voice recognition for a work device causing noise, and more particularly, to a work device such as a robot cleaner, when performing voice recognition for a work device causing noise, such as a robot cleaner, the operation of the work device, such as a robot cleaner The present invention relates to a voice recognition processing method and system capable of more accurate voice recognition of a noise-generating work device by processing the voice recognition in consideration of various noise environments induced by state.
In the present invention, there is provided a voice recognition processing method for a work device that induces noise of different characteristics according to an operation mode, comprising: receiving operation mode information, in which the voice recognition apparatus receives information on the operation mode of the work device; an adaptive noise reduction filtering step of performing adaptive noise reduction filtering on the user's voice data by changing settings according to the operation mode of the work device; and a voice recognition performing step of processing a voice recognition function on the filtered voice data of the user.

Description

Speech recognition processing method for noise generating working device and system thereof

본 발명은 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에 관한 것으로서, 보다 구체적으로는 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 소음을 유발하는 작업 장치에 대한 보다 정확한 음성 인식을 가능케 하는 음성 인식 처리 방법 및 시스템에 관한 것이다.The present invention relates to a voice recognition processing method and system for a work device that induces noise, and more particularly, when performing voice recognition for a work device causing noise, such as a robot cleaner, the operation of the work device, such as a robot cleaner The present invention relates to a voice recognition processing method and system capable of more accurate voice recognition of a noise-generating work device by processing the voice recognition in consideration of various noise environments induced by state.

최근 다양한 시스템에서 음성 인식 기술을 활용하여 사용자 인터페이스 환경을 보다 편리하게 개선하려는 시도가 이루어지고 있으며, 이에 대한 구체적인 예로서 로봇 청소기 등에서의 음성 인식 기능을 들 수 있다.Recently, attempts have been made to improve the user interface environment more conveniently by using voice recognition technology in various systems, and a specific example thereof may be a voice recognition function in a robot vacuum cleaner.

일반적으로 로봇 청소기는 스스로 주행하면서 바닥의 먼지 또는 이물질을 흡입하는 기기로서, 종래에는 통상적으로 사용자가 리모컨 등을 이용하여 상기 로봇 청소기의 동작 명령을 입력하거나, 상기 로봇 청소기에 구비되는 입력부를 이용하여 동작 명령을 입력하였다. In general, a robot cleaner is a device that sucks dust or foreign substances from the floor while driving by itself. Conventionally, a user inputs an operation command of the robot cleaner using a remote control or the like, or using an input unit provided in the robot cleaner. An action command was entered.

그런데, 위와 같은 명령 입력 방식은 사용자가 직접 버튼이나 리모컨을 조작하여야 하므로 불편한 문제가 있다. 이에 따라, 근래에 들어서는 사용자의 음성을 입력받아 동작되는 로봇 청소기가 시도되고 있다.However, the above command input method has an inconvenient problem because the user must directly operate a button or a remote control. Accordingly, recently, a robot cleaner operated by receiving a user's voice has been tried.

이에 따라, 음성 인식 기능을 가지는 로봇 청소기는 통상적으로 음성을 입력받기 위한 마이크 및 입력된 음성 신호를 인식하고 이에 대응하여 로봇 청소기를 제어하는 제어부를 포함할 수 있다. 그런데, 종래의 로봇 청소기의 경우, 로봇 청소기에 구비되는 마이크를 통해 사용자의 음성뿐만 아니라 주변의 소음까지 입력받게 되면서 음성을 정확하게 인식하기 어려운 문제가 있었다. 또한, 로봇 청소기의 동작 중에는 동작 시 발생하는 소음도 상기 마이크를 통하여 입력되므로, 음성 인식 오류가 발생할 가능성이 더욱 높아지게 된다.Accordingly, a robot cleaner having a voice recognition function may include a microphone for receiving a voice input and a controller for recognizing the input voice signal and controlling the robot cleaner in response thereto. However, in the case of a conventional robot cleaner, it is difficult to accurately recognize the voice as not only the user's voice but also the surrounding noise is input through the microphone provided in the robot cleaner. In addition, since noise generated during operation of the robot cleaner is also input through the microphone, the possibility of a voice recognition error is further increased.

나아가, 로봇 청소기나 세탁기 등 작업 장치에서 유발되는 소음은 작업 장치의 동작 모드나 이동 위치 등에 따라 크게 달라질 수 있는 바, 이러한 다양한 소음 환경에서도 정확한 음성 인식을 처리하는 것은 더욱 어려운 문제가 된다.Furthermore, noise induced by a work device such as a robot cleaner or a washing machine may vary greatly depending on an operation mode or a moving position of the work device, and it is more difficult to accurately process voice recognition even in such a variety of noise environments.

이에 따라, 로봇 청소기 등 소음을 유발하는 작업 장치에서의 음성 인식을 수행함에 있어 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 가능케 할 수 있는 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식 기술을 개선하려는 요구가 지속되고 있으나, 아직 이에 대한 명쾌한 해법이 제시되지 못하고 있다.Accordingly, when performing voice recognition in a work device that causes noise, such as a robot cleaner, it suppresses the occurrence of voice recognition errors caused by various noise environments that occur during operation of the robot cleaner, etc. as well as surrounding noise, and provides more accurate voice recognition There is a continuing demand to improve voice recognition technology for noise-generating work devices such as robot vacuums that can make this possible, but a clear solution has not yet been presented.

대한민국 공개특허 제 10-2014-0071740호(2014년 6월 12일 공개)Republic of Korea Patent Publication No. 10-2014-0071740 (published on June 12, 2014)

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위해 창안된 것으로, 로봇 청소기 등 소음을 유발하는 작업 장치에서의 음성 인식을 수행함에 있어 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 가능하게 하는 음성 인식 처리 방법 및 시스템을 제공하는 것을 목적으로 한다.The present invention has been devised to solve the problems of the prior art as described above, and in performing voice recognition in a work device that causes noise, such as a robot cleaner, not only ambient noise but also various noises generated during operation of the robot cleaner, etc. An object of the present invention is to provide a speech recognition processing method and system that suppresses the occurrence of speech recognition errors caused by the environment and enables more accurate speech recognition.

그 외 본 발명의 세부적인 목적은 아래에 기재되는 구체적인 내용을 통하여 이 기술 분야의 전문가나 연구자에게 자명하게 파악되고 이해될 수 있을 것이다.In addition, the detailed object of the present invention will be clearly understood and understood by an expert or researcher in this technical field through the specific contents described below.

상기 과제를 해결하기 위한 본 발명의 한 측면에 따른 음성 인식 처리 방법은, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법에 있어서, 음성 인식 장치가 상기 작업 장치의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신 단계; 상기 작업 장치의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터링 단계; 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 수행 단계;를 포함하는 것을 특징으로 한다.A voice recognition processing method according to an aspect of the present invention for solving the above problems is a voice recognition processing method for a work device that induces noise of different characteristics according to an operation mode, wherein the voice recognition apparatus an operation mode information receiving step of receiving information on the operation mode; an adaptive noise reduction filtering step of performing adaptive noise reduction filtering on the user's voice data by changing settings according to the operation mode of the work device; and a voice recognition performing step of processing a voice recognition function on the filtered voice data of the user.

이때, 상기 동작 모드 정보 수신 단계에서는, 복수 종류의 작업 장치 중 작동 중인 작업 장치의 종류 및 동작 모드에 대한 정보를 수신하며, 상기 적응형 소음 감쇄 필터링 단계에서는, 상기 작동 중인 작업 장치의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.In this case, in the operation mode information receiving step, information on the type and operation mode of a working device among a plurality of types of working devices is received, and in the adaptive noise reduction filtering step, the type and operation of the working device in operation Filtering can be performed by changing settings according to the mode.

또한, 상기 동작 모드 정보 수신 단계에서는, 상기 작업 장치의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신하며, 상기 적응형 소음 감쇄 필터링 단계에서는, 상기 작업 장치의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수 있다.In addition, in the operation mode information receiving step, information on the location of the work device 200 is received together with the operation mode of the work device, and in the adaptive noise reduction filtering step, the operation mode of the work device and Filtering can be performed by changing the settings according to the location.

또한, 상기 음성 인식 수행 단계에서는, 상기 필터링된 상기 사용자의 음성 데이터를 음성 인식 서버로 전송하여 음성 인식을 수행하도록 하며, 상기 음성 인식 서버에서는 상기 음성 인식 장치에서 수집된 소음 데이터를 이용하여 학습된 제2 필터를 이용하여 2차 필터링을 수행한 후 음성 인식을 수행할 수 있다.In addition, in the voice recognition performing step, the filtered voice data of the user is transmitted to a voice recognition server to perform voice recognition, and the voice recognition server uses the noise data collected by the voice recognition device to perform voice recognition. After performing secondary filtering using the second filter, speech recognition may be performed.

이때, 상기 소음 데이터는, 상기 작업 장치의 실제 작동 시에 상기 음성 인식 장치에서 수집한 소음 데이터일 수 있다.In this case, the noise data may be noise data collected by the voice recognition device when the working device is actually operated.

또한, 상기 음성 인식 서버에서는, 상기 작업 장치의 동작 모드에 대한 정보와 상기 소음 데이터를 매핑해 저장하여, 상기 음성 데이터에 대한 음성 인식에 사용할 수 있다.Also, in the voice recognition server, the information on the operation mode of the work device and the noise data are mapped and stored, and can be used for voice recognition of the voice data.

또한, 상기 사용자의 음성 데이터에 대한 음성 인식에 앞서, 상기 작업 장치의 작동에 의해 유발된 소음이 아닌 상기 작업 장치가 동작하는 환경에서의 환경 소음을 전처리하는 환경 소음 전처리 단계를 더 포함할 수 있다.In addition, prior to voice recognition of the user's voice data, the method may further include an environmental noise preprocessing step of pre-processing environmental noise in an environment in which the working device operates, not the noise caused by the operation of the working device. .

이때, 상기 환경 소음 전처리 단계에서는, 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행할 수 있다.At this time, in the environmental noise pre-processing step, a signal-to-noise ratio (SNR) is calculated by dividing a voice section and a non-voice section for the voice data, and the voice data is classified according to the signal-to-noise ratio (SNR) of the voice data to preprocess the voice data. can be performed.

또한, 본 발명의 다른 측면에 따른 음성 인식 처리 시스템은, 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템에 있어서, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치; 및 상기 작업 장치의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신부, 상기 작업 장치의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터부, 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 처리부를 구비하는 음성 인식 장치;를 포함하는 것을 특징으로 한다.In addition, a voice recognition processing system according to another aspect of the present invention provides a voice recognition processing system for a noise-inducing work device, comprising: a work device for generating noise of different characteristics according to an operation mode; and an operation mode information receiving unit configured to receive information on the operation mode of the work device, and an adaptive noise reduction filter unit configured to perform adaptive noise reduction filtering on the user's voice data by changing settings according to the operation mode of the work device. and a voice recognition device having a voice recognition processing unit that processes a voice recognition function for the filtered voice data of the user.

이에 따라, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서는, 로봇 청소기 등 소음을 유발하는 작업 장치에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 수행할 수 있게 된다.Accordingly, in the voice recognition processing method and system for a noise-inducing work device according to an embodiment of the present invention, in performing voice recognition for a noise-inducing work device, such as a robot cleaner, a work device such as a robot cleaner By processing voice recognition in consideration of various noise environments induced by the operation state of will be able to perform

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부도면은 본 발명에 대한 실시예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법의 순서도이다.
도 3은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템의 동작을 설명하는 도면이다.
도 4는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 및 소음의 학습을 설명하는 도면이다.
도 5는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 인식 알고리즘의 순서도이다.
도 6은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 환경 소음의 전처리를 설명하는 도면이다
도 7은 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 환경 소음에 대한 전처리 프로세스를 설명하는 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description to help the understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical spirit of the present invention.
1 is a block diagram of a voice recognition processing system for a noise-generating work device according to an embodiment of the present invention.
2 is a flowchart of a voice recognition processing method for a noise-generating work device according to an embodiment of the present invention.
3 is a view for explaining the operation of the voice recognition processing method and system for the noise-generating work device according to an embodiment of the present invention.
4 is a view for explaining learning of voice and noise in a voice recognition processing method and system for a noise-generating work device according to an embodiment of the present invention.
5 is a flowchart of a voice recognition algorithm in a voice recognition processing method and system for a noise-generating work device according to an embodiment of the present invention.
6 is a view for explaining pre-processing of environmental noise in a voice recognition processing method and system for a noise-generating work device according to an embodiment of the present invention
7 is a view for explaining a pre-processing process for environmental noise in a voice recognition processing method and system for a noise-generating work device according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 이하에서는 특정 실시예들을 첨부된 도면을 기초로 상세히 설명하고자 한다.The present invention can apply various transformations and can have various embodiments. Hereinafter, specific embodiments will be described in detail based on the accompanying drawings.

이하의 실시예는 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.The following examples are provided to provide a comprehensive understanding of the methods, apparatus and/or systems described herein. However, this is merely an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시 예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification. The terminology used in the detailed description is for the purpose of describing embodiments of the present invention only, and should not be limiting in any way. Unless explicitly used otherwise, expressions in the singular include the meaning of the plural. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, acts, elements, some or a combination thereof, one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, acts, elements, or any part or combination thereof.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In addition, terms such as first and second may be used to describe various components, but the components are not limited by the terms, and the terms are for the purpose of distinguishing one component from other components. used only as

아래에서는, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에 대한 예시적인 실시 형태들을 첨부된 도면을 참조하여 차례로 설명한다.Hereinafter, exemplary embodiments of a method and system for voice recognition processing for a noise-generating work device according to an embodiment of the present invention will be described in turn with reference to the accompanying drawings.

먼저, 도 1에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템(10)의 구성도가 도시되어 있다. 도 1에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템(10)은, 소음을 유발하는 작업 장치(200)에 대한 음성 인식 처리 시스템(10)으로서, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치(10); 및 상기 작업 장치(10)의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신부(110), 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터부(120), 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 처리부(130)를 구비하는 음성 인식 장치(100)를 포함하여 구성된다.First, in FIG. 1, there is shown a configuration diagram of a voice recognition processing system 10 for a noise-generating work device according to an embodiment of the present invention. As can be seen in FIG. 1 , the voice recognition processing system 10 according to an embodiment of the present invention is a voice recognition processing system 10 for a noise-inducing work device 200, depending on the operation mode. Working device 10 for causing noise of different characteristics; and an operation mode information receiving unit 110 for receiving information on the operation mode of the work device 10 , and adaptive noise reduction filtering for the user's voice data by changing settings according to the operation mode of the work device 200 . It is configured to include a voice recognition apparatus 100 having an adaptive noise reduction filter unit 120 for performing the above function, and a voice recognition processing unit 130 for processing a voice recognition function for the filtered user's voice data.

이때, 상기 음성 인식 장치(100)에서, 상기 동작 모드 정보 수신부(110)에서는, 복수 종류의 작업 장치(200) 중 작동 중인 작업 장치(200)의 종류 및 동작 모드에 대한 정보를 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터부(120)에서는, 상기 작동 중인 작업 장치(200)의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.At this time, in the voice recognition apparatus 100 , the operation mode information receiving unit 110 may receive information on the type and operation mode of the working device 200 among the plurality of types of working devices 200 , , accordingly, the adaptive noise reduction filter unit 120 may perform filtering by changing settings according to the type and operation mode of the working device 200 in operation.

또한, 상기 음성 인식 장치(100)에서, 상기 동작 모드 정보 수신부(110)에서는, 상기 작업 장치(200)의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터부(120)에서는, 상기 작업 장치(200)의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수도 있다.In addition, in the voice recognition apparatus 100, the operation mode information receiving unit 110 may receive information on the location of the work device 200 together with the operation mode of the work device 200, Accordingly, the adaptive noise reduction filter unit 120 may perform filtering by changing settings according to the operation mode and location of the work device 200 .

이에 따라, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 시스템(10)에서는, 로봇 청소기 등 소음을 유발하는 작업 장치(200)에 대한 음성 인식을 수행함에 있어서, 로봇 청소기 등 작업 장치(200)의 동작 상태에 따라 유발되는 다양한 소음 환경을 고려하여 음성 인식을 처리하도록 함으로써, 주변의 소음뿐만 아니라 로봇 청소기 등의 동작 시 발생하는 다양한 소음 환경에 의한 음성 인식 오류의 발생을 억제하고 보다 정확한 음성 인식을 수행할 수 있게 된다.Accordingly, in the voice recognition processing system 10 for the noise-inducing work device according to an embodiment of the present invention, in performing the voice recognition for the noise-inducing work device 200 such as a robot cleaner, the robot By processing the voice recognition in consideration of various noise environments induced by the operation state of the working device 200, such as a vacuum cleaner, voice recognition errors occur due to various noise environments that occur during operation of the robot cleaner, etc. as well as ambient noise It is possible to suppress and perform more accurate speech recognition.

또한, 도 2에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법의 순서도가 도시되어 있다. 도 2에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 방법은, 동작 모드에 따라 서로 다른 특성의 소음을 유발하는 작업 장치(200)에 대한 음성 인식 처리 방법으로서, 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하는 동작 모드 정보 수신 단계(S100), 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하는 적응형 소음 감쇄 필터링 단계(S200) 및 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하는 음성 인식 수행 단계(S300)를 포함하게 된다.Also, FIG. 2 is a flowchart of a method for processing a voice recognition for a noise-generating work device according to an embodiment of the present invention. As can be seen in FIG. 2 , the voice recognition processing method according to an embodiment of the present invention is a voice recognition processing method for a work device 200 that induces noises having different characteristics according to operation modes, and voice recognition In the operation mode information receiving step (S100) in which the device 100 receives information on the operation mode of the work device 200 , the user's voice data is set differently according to the operation mode of the work device 200 . It includes an adaptive noise reduction filtering step (S200) of performing adaptive noise reduction filtering, and a speech recognition performing step (S300) of processing a speech recognition function on the filtered user's voice data.

이하, 도 1 및 도 2를 참조하여 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)을 각 구성별로 나누어 보다 자세하게 살핀다.Hereinafter, with reference to FIGS. 1 and 2 , the voice recognition processing method and system 10 for a noise-generating work device according to an embodiment of the present invention are divided into each configuration and examined in more detail.

먼저, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하게 된다.First, in the operation mode information receiving step ( S100 ), the voice recognition apparatus 100 receives information on the operation mode of the work apparatus 200 .

이때, 상기 음성 인식 장치(100)는 음성 인식 기능을 가지는 인공지능(AI) 스피커이거나 여러 장치에 대한 음성인식 제어 기능을 제공하는 음성인식 허브 장치일 수 있으며, 또는 로봇 청소기에 대한 음성 명령 기능과 함께 충전 등 관리 기능을 가지는 관리 스테이션(maintenance station)일 수도 있다. 그러나, 본 발명이 반드시 이에 한정되는 것은 아니며, 이외에도 상기 작업 장치(200)에 대한 음성 인식 기능을 처리하는 다양한 장치들도 본 발명에서 사용될 수 있다.At this time, the voice recognition device 100 may be an artificial intelligence (AI) speaker having a voice recognition function or a voice recognition hub device that provides a voice recognition control function for various devices, or a voice command function for a robot cleaner and It may also be a management station having a management function such as charging. However, the present invention is not necessarily limited thereto, and in addition, various devices for processing a voice recognition function for the work device 200 may be used in the present invention.

또한, 상기 작업 장치(200)는 로봇 청소기나 세탁기 등 작동 시에 소음을 유발하는 다양한 장치일 수 있다. 이때, 본 발명에서 상기 작업 장치(200)는 동작 모드에 따라 서로 다른 특성의 소음을 유발하게 된다. 보다 구체적인 예를 들어, 상기 로봇 청소기는 일반 청소 모드에서 보다 터보 청소 모드에서 보다 유발되는 소음의 대역 및 크기가 달라질 수 있다. 나아가, 사용자가 상기 음성 인식 장치(100)에 근접하여 발화하지 않고 원거리에서 발화하는 경우에는 상기 소음에 의한 영향이 더욱 커질 수 있는 바, 음성 인식에 오류가 발생할 가능성이 커질 수 있다.Also, the working device 200 may be various devices that generate noise during operation, such as a robot cleaner or a washing machine. At this time, in the present invention, the working device 200 induces noise of different characteristics according to the operation mode. As a more specific example, the robot cleaner may have a different band and size of noise induced more in the turbo cleaning mode than in the general cleaning mode. Furthermore, when the user does not speak near the voice recognition apparatus 100 but speaks from a distance, the effect of the noise may be further increased, and thus the possibility of an error in voice recognition may increase.

이에 따라, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 음성 인식 장치(100)가 상기 작업 장치(200)의 동작 모드에 대한 정보를 수신하고, 이에 따라 유발될 수 있는 소음의 특성을 고려하여 음성 인식을 수행하도록 함으로써, 소음에 의한 음성 인식 오류의 발생을 효과적으로 억제할 수 있게 된다.Accordingly, in the operation mode information receiving step ( S100 ), the voice recognition apparatus 100 receives information on the operation mode of the operation apparatus 200 , and taking into account the characteristics of noise that may be induced accordingly By performing voice recognition, it is possible to effectively suppress the occurrence of a voice recognition error due to noise.

이어서, 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행하게 된다.Subsequently, in the adaptive noise reduction filtering step S200 , the adaptive noise reduction filtering is performed on the user's voice data by changing the settings according to the operation mode of the work device 200 .

보다 구체적인 예를 들어, 상기 로봇 청소기가 터보 청소 모드로 동작하는 경우에는 일본 청소 모드로 동작하는 경우보다 소음의 크기가 커질 수 있으며, 나아가 소음이 유발되는 주파수 대역도 달라질 수 있다.As a more specific example, when the robot cleaner operates in the turbo cleaning mode, the noise level may be greater than when the robot cleaner operates in the Japanese cleaning mode, and furthermore, the frequency band in which the noise is induced may vary.

이에 따라, 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드에 따라 설정을 달리하여 사용자의 음성 데이터에 대한 적응형 소음 감쇄 필터링을 수행함으로써, 사용자의 음성 데이터에서 소음을 효과적으로 감쇄시켜 음성 인식의 정확도 등을 효과적으로 개선할 수 있게 된다.Accordingly, in the adaptive noise reduction filtering step (S200), by performing adaptive noise reduction filtering on the user's voice data by changing the settings according to the operation mode of the work device 200, the user's voice data is By effectively reducing noise, it is possible to effectively improve the accuracy of voice recognition.

나아가, 상기 동작 모드 정보 수신 단계(S100)에서는, 복수 종류의 작업 장치(200) 중 작동 중인 작업 장치(200)의 종류 및 동작 모드에 대한 정보를 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작동 중인 작업 장치(200)의 종류 및 동작 모드에 따라 설정을 달리하여 필터링을 수행할 수 있다.Furthermore, in the operation mode information receiving step ( S100 ), information on the type and operation mode of the working device 200 among the plurality of types of working devices 200 may be received, and accordingly, the adaptive noise reduction In the filtering step (S200), filtering may be performed by changing settings according to the type and operation mode of the working device 200 in operation.

이때, 상기 음성 인식 장치(100)에서는 마이크 등의 음성 입력 수단을 자체적으로 구비하여 사용자의 음성을 입력받을 수 있겠으나, 본 발명이 반드시 이에 한정되는 것은 아니다.In this case, the voice recognition apparatus 100 may be provided with a voice input means such as a microphone by itself to receive the user's voice, but the present invention is not necessarily limited thereto.

또한, 상기 동작 모드 정보 수신 단계(S100)에서는, 상기 작업 장치(200)의 동작 모드와 함께 상기 작업 장치(200)의 위치에 대한 정보를 함께 수신할 수 있으며, 이에 따라 상기 적응형 소음 감쇄 필터링 단계(S200)에서는, 상기 작업 장치(200)의 동작 모드 및 위치에 따라 설정을 달리하여 필터링을 수행할 수도 있다.In addition, in the operation mode information receiving step ( S100 ), information on the position of the working device 200 may be received together with the operating mode of the working device 200 , and accordingly, the adaptive noise reduction filtering In step S200 , filtering may be performed by changing settings according to the operation mode and location of the work device 200 .

즉, 상기 작업 장치(200)의 동작 모드뿐만 아니라, 작업 장치(200)의 종류에 따라서도 소음의 특성이 달라질 수 있으며, 또는 작업 장치(200)의 위치에 따라서도 소음의 특성이 달라질 수 있는 바, 이를 고려하여 필터링을 수행하여 줌으로써, 보다 효과적인 필터링을 수행할 수 있게 된다.That is, the noise characteristics may vary depending on the type of the working device 200 as well as the operation mode of the working device 200 , or the noise characteristics may vary depending on the location of the working device 200 . However, by performing filtering in consideration of this, more effective filtering can be performed.

보다 구체적인 예를 들어, 작업 장치(200)가 로봇 청소기 또는 세탁기인 경우 그 작동시에 유발되는 소음의 주파수 대역과 크기가 달라질 수 있으며, 나아가 로봇 청소기 등 작업 장치(200)의 이동 위치에 따라서도 음성 인식 장치(100)에서 느끼게 되는 소음의 특성이 크게 달라질 수 있는 바, 상기 작업 장치(200)의 종류 또는 이동 위치에 대한 정보를 함께 고려하여 음성 데이터에 대한 필터링을 수행하는 경우 음성 인식의 정확도 등을 크게 개선할 수 있게 된다.As a more specific example, when the working device 200 is a robot cleaner or a washing machine, the frequency band and size of noise induced during its operation may vary, and further, depending on the movement position of the working device 200 such as the robot cleaner. Since the characteristics of noise felt by the voice recognition device 100 may vary greatly, the accuracy of voice recognition when filtering is performed on voice data in consideration of information on the type or movement position of the work device 200 together. The back can be greatly improved.

마지막으로, 상기 음성 인식 수행 단계(S300)에서는 필터링된 상기 사용자의 음성 데이터에 대하여 음성 인식 기능을 처리하게 된다. 이때, 상기 음성 처리 장치(100)는 직접 상기 음성 인식을 위한 처리를 수행할 수도 있겠으나, 본 발명이 반드시 이에 한정되는 것은 아니며, 음성 인식 서버(400) 등 다른 장치로 음성 데이터를 전송하여 음성 인식 기능을 수행하게 할 수도 있다.Finally, in the voice recognition performing step (S300), a voice recognition function is processed with respect to the filtered voice data of the user. In this case, the voice processing device 100 may directly perform the processing for the voice recognition, but the present invention is not necessarily limited thereto, and the voice data is transmitted to another device such as the voice recognition server 400 to transmit voice data. It can also perform a recognition function.

보다 구체적으로, 상기 음성 인식 수행 단계(S300)에서는, 상기 필터링된 상기 사용자의 음성 데이터를 음성 인식 서버(400)로 전송하여 음성 인식을 수행하도록 할 수 있다.More specifically, in the voice recognition performing step (S300), the filtered voice data of the user may be transmitted to the voice recognition server 400 to perform voice recognition.

나아가, 상기 음성 인식 서버(400)에서는 상기 음성 인식 장치(100)에서 수집된 소음 데이터를 이용하여 학습된 제2 필터를 이용하여 2차 필터링을 수행한 후 음성 인식을 수행할 수도 있다. 이때, 상기 소음 데이터로서 상기 작업 장치(200)의 실제 작동 시에 상기 음성 인식 장치(100)에서 수집한 소음 데이터를 사용함으로써, 실제 소음 데이터에 대한 보다 정확한 모델링 및 학습을 통해 음성 인식 오류의 발생을 보다 효과적으로 억제할 수 있게 된다.Furthermore, the voice recognition server 400 may perform secondary filtering using the second filter learned using the noise data collected by the voice recognition apparatus 100 and then perform voice recognition. At this time, by using the noise data collected by the voice recognition device 100 during the actual operation of the work device 200 as the noise data, a voice recognition error occurs through more accurate modeling and learning of the actual noise data. can be more effectively suppressed.

또한, 상기 음성 인식 서버(400)에서는, 상기 작업 장치(200)의 동작 모드에 대한 정보와 상기 소음 데이터를 매핑해 저장하고, 이에 대한 학습 등을 통해 상기 음성 데이터에 대한 음성 인식에 사용할 수 있다.In addition, in the voice recognition server 400, information on the operation mode of the work device 200 and the noise data are mapped and stored, and can be used for voice recognition of the voice data through learning, etc. .

나아가, 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법에서는, 상기 사용자의 음성 데이터에 대한 음성 인식에 앞서, 상기 작업 장치(200)의 작동에 의해 유발된 소음이 아닌 상기 작업 장치(200)가 동작하는 환경에서의 환경 소음을 전처리하는 환경 소음 전처리 단계(미도시)를 포함할 수 있다.Furthermore, in the voice recognition processing method for a noise-inducing work device according to an embodiment of the present invention, prior to voice recognition of the user's voice data, the noise induced by the operation of the work device 200 is However, it may include an environmental noise pre-processing step (not shown) of pre-processing environmental noise in an environment in which the work device 200 operates.

보다 구체적으로, 상기 환경 소음 전처리 단계에서는, 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행할 수 있으며, 이를 통해 음성 인식 프로세스에 앞서 환경 소음에 대한 전처리를 수행하여 줌으로써 환경 소음에 의한 음성 인식 오류의 발생을 방지하여 보다 정확한 음성 인식을 수행할 수 있게 된다.More specifically, in the environmental noise preprocessing step, a signal-to-noise ratio (SNR) is calculated by dividing a voice section and a non-voice section for the voice data, and the voice data is classified according to the signal-to-noise ratio (SNR) of the voice data. Thus, preprocessing can be performed, and through this, preprocessing of environmental noise is performed prior to the voice recognition process, thereby preventing the occurrence of voice recognition errors caused by environmental noise and performing more accurate voice recognition.

또한, 도 3에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)의 동작을 보다 자세하게 설명하고 있다. 도 3에서 볼 수 있는 바와 같이, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템(10)에서 인공지능(AI) 스피커 등 음성 인식 장치(100)에는 로봇 청소기, 세탁기, 기타 IoT 단말 등의 상태 등 동작 모드에 대한 정보를 수신할 수 있는 무선 통신 모듈 등이 탑재되어 동작 모드 정보 수신부(110)를 구성할 수 있고, 또한 적응형 소음 감쇄 (Adaptive Noise Reduction) 소프트웨어가 탑재되어 적응형 소음 감쇄 필터부(120)를 구성할 수 있으며, 나아가 음성인식 앱 등이 설치되어 음성 인식 처리부(130)를 구성할 수 있다. In addition, FIG. 3 describes in more detail the operation of the voice recognition processing method and system 10 for a noise-generating work device according to an embodiment of the present invention. As can be seen in FIG. 3 , in the voice recognition processing system 10 according to an embodiment of the present invention, the voice recognition device 100 such as an artificial intelligence (AI) speaker includes a robot cleaner, a washing machine, and other IoT terminals. A wireless communication module capable of receiving information on an operation mode, etc. is mounted to configure the operation mode information receiving unit 110, and adaptive noise reduction software is also mounted to the adaptive noise reduction filter. The unit 120 may be configured, and further, a voice recognition app may be installed to configure the voice recognition processing unit 130 .

이때, 상기 적응적 소음 감쇄 소프트웨어는 다양한 IoT 단말에서 수집한 소음 데이터를 모델링하고 학습하여 적응적으로 필터링할 수 있는 기능을 갖출 수 있다. In this case, the adaptive noise reduction software may have a function of adaptive filtering by modeling and learning noise data collected from various IoT terminals.

또한, 상기 IoT 단말의 동작 모드로서는 IoT 단말에 대한 전원 On/Off 로부터 실행중인 각종 동작 모드 등에 대한 정보들을 수집할 수 있다. 보다 구체적인 예를 들어 자세히 설명하면, 로봇 청소기의 경우 일반 청소 모드와, 터보 청소 모드에서 발생하는 소음의 주파수 대역이 달라질 수 있으며, 또한 상기 로봇 청소기 등 작업 장치(200)와 인공지능(AI) 스피커 등 음성 인식 장치(100)의 거리에 따라서도 신호 대 잡음비(SNR) 값이 달라질 수 있다. 이에 따라, 도 3에서 볼 수 있는 바와 같이, IoT 제어 서버 등 작업 장치 제어 서버(300)에서는 상기 로봇 청소기 등 IoT 단말의 상태 등 동작 모드를 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)로 전송하여 적응적으로 필터링을 적응적으로 조절하게 된다. 또한, 상기 로봇 청소기의 관리 스테이션(maintenance station)이 음성 인식 장치(100)로 동작하는 경우, 상기 관리 스테이션에 탑재되는 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)에서는 로봇 청소기와의 거리 데이터를 확보하여 소음의 유입 정도를 판단하고 이에 따라 적응형 소음 감쇄 필터(Adaptive NR Filter)의 설정치를 조절하게 된다. 이때, 상기 적응형 소음 감쇄 필터(Adaptive NR Filter)를 통해 필터링된 음성 데이터는 필요에 따라 음성 인식 서버(400)로 전달되어 음성 인식이 수행될 수 있으며, 나아가 상기 음성 인식 서버(400)에서는 미리 학습된 소음 데이터를 반영하여 2차 필터링를 수행하여 음성 인식을 수행함으로써, 음성 인식의 정확도를 보다 개선할 수도 있다. 이때, 상기 소음 데이터는 상기 음성 인식 장치(100)가 사용되는 실제 환경에서 수집된 소음 데이터일 수 있으며, 본 발명에서는 추가적으로 주변에 사용하는 IoT 단말에서 유발되는 소음도 수집하고 학습함으로써 보다 정확한 음성 인식 성능을 도출할 수도 있다. 이때, 상기 음성 인식 서버(400)는 수집된 소음을 상기 IoT 단말 등 작업 장치(200)의 동작 모드 정보와 같이 매핑하여 저장하고 학습에 사용할 수 있다. 상기 매핑 정보에 대하여 보다 구체적인 예를 들어 설명하면, 상기 로봇 청소기는 청소 모드에 따라 소음의 크기, 주파수 대역을 포함하는 소음 특성이 달라진다. 이에 따라, 상기 로봇 청소기가 터보 청소 모드로 구동되면, 그에 따른 소음 데이터를 터보 청소 모드에 매핑해 저장하여 학습하며, 상기 로봇 청소기가 일반 청소 모드로 구동되면, 그에 따른 소음 데이터를 일반 청소 모드에 매핑해 저장하여 학습하게 된다.In addition, as the operation mode of the IoT terminal, information on various operation modes being executed from power on/off of the IoT terminal may be collected. To explain in detail with a more specific example, in the case of a robot cleaner, the frequency band of noise generated in the normal cleaning mode and the turbo cleaning mode may be different, and the working device 200 such as the robot cleaner and artificial intelligence (AI) speaker The signal-to-noise ratio (SNR) value may also vary depending on the distance of the voice recognition apparatus 100 . Accordingly, as can be seen in FIG. 3 , the work device control server 300 such as the IoT control server transmits the operation mode such as the state of the IoT terminal such as the robot cleaner to the adaptive noise reduction software (Adaptive NR SW) to adapt Filtering is adaptively adjusted. In addition, when the maintenance station of the robot cleaner operates as the voice recognition device 100, the adaptive noise reduction software (Adaptive NR SW) mounted on the management station secures distance data from the robot cleaner and The level of noise inflow is determined and the setting value of the adaptive NR filter is adjusted accordingly. At this time, the voice data filtered through the adaptive noise reduction filter (Adaptive NR Filter) may be transferred to the voice recognition server 400 as necessary to perform voice recognition, and further, the voice recognition server 400 may perform voice recognition in advance. Accuracy of voice recognition may be further improved by performing voice recognition by performing secondary filtering by reflecting the learned noise data. In this case, the noise data may be noise data collected in an actual environment in which the voice recognition device 100 is used, and in the present invention, more accurate voice recognition performance by additionally collecting and learning noise induced by IoT terminals used in the vicinity can also be derived. In this case, the voice recognition server 400 may map and store the collected noise together with the operation mode information of the work device 200 such as the IoT terminal and use it for learning. When describing the mapping information as a more specific example, the robot cleaner has different noise characteristics including a noise level and a frequency band according to a cleaning mode. Accordingly, when the robot cleaner is driven in the turbo cleaning mode, the corresponding noise data is mapped and stored in the turbo cleaning mode and learned. You learn by mapping and saving.

또한, 도 4에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템(10)에서의 음성 및 소음의 학습을 설명하고 있다. In addition, in FIG. 4 , a voice recognition processing method for a noise-generating work device and learning of voice and noise in the system 10 are described according to an embodiment of the present invention.

먼저, 도 4에서는, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서 적응적으로 소음을 인식하고 필터링하기 위해 음성 데이터베이스(410)와 소음 데이터베이스(420)를 학습하는 과정을 나타내고 있다. 이때, 상기 음성 데이터베이스(410)는 사용자의 음성 명령어를 모델링하고 학습하게 된다. 또한, 도 4에 도시된 바와 같이 주변에 작업 장치(200) 등 소음원이 있는 경우 소음원과의 거리 및 사용자 위치의 변경에 따른 다양한 조건에서의 음성 명령어를 녹음하여 학습을 수행하게 된다. 또한, 작업 장치(200)에 의한 소음 데이터를 수집함에 있어서는, 상기 작업 장치(200)의 동작 모드와 상기 작업 장치(200)까지의 거리 등의 정보를 매핑하여 소음 데이터베이스(420)에 저장하게 된다. 상기 작업 장치(200)의 동작 모드는 IoT 제어 서버(300) 등에서 수집하여 관리하는 IoT 단말 등 작업 장치(200)의 상태 정보로부터 수집할 수 있다. 이에 따라, 학습된 소음 데이터는 모델링 과정을 수행 후 적응형 소음 감쇄 소프트웨어(Adaptive NR SW)의 필터링(Filter) 설정에 반영할 수 있으며, 나아가 상기 음성 인식 서버(400)에서의 소음 필터링(Noise Filtering)에서도 사용될 수 있다. 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는 상기한 바와 같이 다양한 소음 특성을 가지는 작업 장치(200)가 존재하는 환경에서도 뛰어난 음성 인식 성능을 확보할 수 있게 된다.First, in FIG. 4, the process of learning the voice database 410 and the noise database 420 in order to adaptively recognize and filter noise in the voice recognition processing method and system 10 according to an embodiment of the present invention. is indicating In this case, the voice database 410 models and learns the user's voice command. In addition, as shown in FIG. 4 , when there is a noise source such as the work device 200 in the vicinity, learning is performed by recording voice commands in various conditions according to the change of the distance from the noise source and the user's location. In addition, in collecting noise data by the working device 200 , information such as the operation mode of the working device 200 and the distance to the working device 200 is mapped and stored in the noise database 420 . . The operation mode of the work device 200 may be collected from state information of the work device 200 such as an IoT terminal that is collected and managed by the IoT control server 300 . Accordingly, the learned noise data can be reflected in the filter setting of the adaptive noise reduction software (Adaptive NR SW) after performing the modeling process, and furthermore, noise filtering in the voice recognition server 400 ) can also be used. In the voice recognition processing method and system 10 according to an embodiment of the present invention, excellent voice recognition performance can be secured even in an environment in which the work device 200 having various noise characteristics exists as described above.

보다 구체적으로, 도 5에서는 본 발명의 일 실시예에 따른 소음을 유발하는 작업 장치에 대한 음성 인식 처리 방법 및 시스템에서의 음성 인식 알고리즘의 순서도를 예시하고 있다.More specifically, FIG. 5 illustrates a flow chart of a voice recognition algorithm in a voice recognition processing method and system for a noise-generating work device according to an embodiment of the present invention.

도 5에 도시된 바와 같이 본 발명의 일 실시예에 따른 음성 인식 엔진은 전처리 구조를 가진다. 먼저, 사용자의 음성 신호가 입력되면 EPD(End Point Detection)를 통해 음성 신호의 끝점을 검출한다(S1100). 또한, 전처리 단계(Pre Processing)에서는 로봇 청소기 등 작업 장치(200)의 동작 모드나 위치에 따른 통해 소음 레벨을 반영한다(S1200). 여기서, 신호 대 잡음비(Signal to Noise Ratio, SNR)는 신호의 품위 레벨의 척도로 정의된다. 일반적으로 음성 신호는 단독으로 존재하지 않고 대개 잡음과 섞여 있으므로, 상기 신호 대 잡음비(SNR)은 그 비율을 나타낸다. 보다 구체적으로, 작동중인 로봇 청소기 등 작업 장치(200)가 관리 스테이션(maintenance station) 등 음성 인식 장치(100)와 가까워질수록 소음의 크기가 증가한다. 상기 전처리 단계(Pre Processing)에서는 로봇 청소기 등 작업 장치(200)의 위치 정보를 반영하여 보다 정밀하게 소음을 감쇄시키게 된다. 이어서, 상기 전처리 단계를 거친 음성 데이터에 대하여 인자 추출(Feature Extraction)을 수행하며(S1300), 이에 따라 음성 인식(Recognition) 과정을 거쳐(S1400) 결과 텍스트(Text)를 얻을 수 있게 되며, 상기 텍스트(Text)는 대화 서버(미도시) 등에서 명령어 해석 등 추가 처리를 수행하게 된다. 이에 따라, 상기 대화 서버는 상기 텍스트(Text)가 로봇 청소기에 대한 제어 명령이면 로봇 청소기에 대한 작업 장치 제어 서버(300)로 전달하며, 상기 작업 장치 제어 서버(300)는 로봇 청소기의 상태 정보를 수집 등을 거쳐 상기 제어 명령을 상기 로봇 청소기로 전송하게 된다. 5, the speech recognition engine according to an embodiment of the present invention has a pre-processing structure. First, when a user's voice signal is input, the end point of the voice signal is detected through EPD (End Point Detection) (S1100). Also, in the pre-processing step, the noise level is reflected according to the operation mode or position of the work device 200 such as a robot cleaner ( S1200 ). Here, the signal to noise ratio (SNR) is defined as a measure of the signal quality level. In general, since a voice signal does not exist alone and is usually mixed with noise, the signal-to-noise ratio (SNR) represents the ratio. More specifically, as the working device 200 such as a robot cleaner in operation approaches the voice recognition device 100 such as a maintenance station, the noise level increases. In the pre-processing step, the noise is more precisely attenuated by reflecting the position information of the work device 200 such as a robot cleaner. Next, a feature extraction is performed on the voice data that has undergone the pre-processing step (S1300), and accordingly, a result text can be obtained through a voice recognition process (S1400), and the text (Text) performs additional processing such as command interpretation in a conversation server (not shown). Accordingly, if the text is a control command for the robot cleaner, the dialog server transmits the control command for the robot cleaner to the work device control server 300 for the robot cleaner, and the work device control server 300 transmits the status information of the robot cleaner. The control command is transmitted to the robot cleaner through collection and the like.

또한, 도 6에서 볼 수 있는 바와 같이, 본 발명에서는 상기 작업 장치(200)의 작동에 의해 유발된 소음이 아닌 상기 작업 장치(200)가 동작하는 환경에서의 환경 소음을 전 처리를 통하여 음성 인식 성능을 보다 효과적으로 개선할 수 있다. In addition, as can be seen in FIG. 6 , in the present invention, voice recognition is performed by pre-processing environmental noise in the environment in which the working device 200 operates, not the noise caused by the operation of the working device 200 . Performance can be improved more effectively.

이에 따라, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는 상기 음성 데이터에 대해 음성 구간과 비음성 구간을 나누어 신호 대 잡음비(SNR)를 산출하고, 상기 음성 데이터의 신호 대 잡음비(SNR)에 따라 음성 데이터를 분류하여 전처리를 수행하게 된다.Accordingly, in the voice recognition processing method and system 10 according to an embodiment of the present invention, a signal-to-noise ratio (SNR) is calculated by dividing a voice section and a non-voice section for the voice data, and the signal-to-noise ratio of the voice data Pre-processing is performed by classifying speech data according to a noise ratio (SNR).

보다 구체적으로, 본 발명의 일 실시예에 따른 상기 환경 소음에 대한 전처리 알고리즘은 소음 신호를 기반으로 Smoothed PM(Peak Magnitude), MCR(Mean Crossing Ratio)과 SNR Maximum, Tilda MCR 등 효과적인 결정 규칙을 적용하여 상대적으로 잡음이 적은 신호를 Class A로, 중간 정도의 잡음이 낀 신호를 Class B로, 그리고 상대적으로 잡음이 많이 낀 신호를 Class C로 분류하여 음성 인식 성능을 더욱 개선할 수 있게 된다.More specifically, the pre-processing algorithm for the environmental noise according to an embodiment of the present invention applies effective decision rules such as Smoothed PM (Peak Magnitude), MCR (Mean Crossing Ratio), SNR Maximum, and Tilda MCR based on the noise signal. Thus, the speech recognition performance can be further improved by classifying a signal with relatively low noise into Class A, a signal with moderate noise into Class B, and a signal with relatively high noise into Class C.

보다 구체적으로 도 7에서는 상기 음성 데이터에 대하여 소음의 정도에 따라 Class A, Class B, Class C로 분류하기 위한 프로세스를 도시하고 있다. 도 7에서 볼 수 있는 바와 같이, 먼저 음성 신호가 입력되면(1510), NACF(Normalized Autocorrelation Function)을 사용하여(1520) NACF의 피크 크기(Peak Magnitude) 값(1530)과 입력된 음성 신호의 8 프레임의 신호 대 잡음비 최대치(Signal Noise Ratio Maximum) 값을 구한다(1560). 이러한 과정은 입력된 음성 신호에서 음성 및 비음성 구간을 구별하기 위하여 사용될 수 있다. More specifically, FIG. 7 shows a process for classifying the voice data into Class A, Class B, and Class C according to the level of noise. As can be seen in FIG. 7 , when a voice signal is first input (1510), using NACF (Normalized Autocorrelation Function) (1520), the NACF Peak Magnitude value 1530 and 8 of the input voice signal A signal-to-noise ratio maximum value of the frame is obtained (1560). This process may be used to distinguish between a voice and a non-voice section in the input voice signal.

이때, 상기 NACF의 피크 인덱스(Peak Index)와 피크 크기(Peak Magnitude) 값은 아래 수학식 1과 같이 정의될 수 있다. In this case, the peak index and peak magnitude of the NACF may be defined as in Equation 1 below.

[수학식 1][Equation 1]

이에 따라, 수학식 1을 통해 모든 프레임에서 PIS(Peak Index Series), PMS(Peak Magnitude Series)를 측정할 수 있다. 상기 PIS와 PMS는 각각 음성 구간에서는 부드럽게 나타나게 되고, 반대로 비음성 구간에서는 진동이 매우 크게 나타나게 되므로, 이를 이용하여 상기 음성 신호의 음성 구간과 비음성 구간을 구별할 수 있게 된다.Accordingly, a Peak Index Series (PIS) and a Peak Magnitude Series (PMS) can be measured in all frames through Equation 1 . The PIS and the PMS respectively appear soft in the voice section, and conversely, the vibrations appear very large in the non-voice section, so that the voice signal and the non-voice section of the voice signal can be distinguished using this.

이어서, 상기 PM(Peak Magnitude)값에 롱텀 스무딩(Long term smoothing)을 적용(1540)한 후 교차 비율(Crossing Ratio) 값을 구한다(1550). 상기 스무딩된 PM(Smoothed PM)은 아래 수학식 2와 같이 정의될 수 있다.Next, after long term smoothing is applied to the peak magnitude (PM) value (1540), a cross ratio value is obtained (1550). The smoothed PM may be defined as in Equation 2 below.

[수학식 2][Equation 2]

여기서, α는 롱텀 스무딩 파라미터를 의미한다. Here, α denotes a long-term smoothing parameter.

또한, 아래 수학식 3을 사용하여 상기 스무딩된 PM(Smoothed PM)의 새로운 교차 비율(New Crossing Ratio) 값을 구하게 된다(1580). In addition, a new crossing ratio value of the smoothed PM is obtained using Equation 3 below ( 1580 ).

[수학식 3][Equation 3]

또한, 본 발명에서는 신호 대 잡음비(Signal to Noise Ratio)를 음성 신호의 품위 레벨의 척도로 사용한다. 이때, 음성 신호는 단독으로 존재하지 않고 대개 잡음과 섞여있다. 이에 따라, 상기 음성 신호와 소음의 비율을 나타내는 척도로서 신호 대 잡음비(Signal To Noise Ratio)가 쓰이며, 이는 아래 수학식 4와 같이 정의될 수 있다. Also, in the present invention, a signal to noise ratio (SNR) is used as a measure of the quality level of a voice signal. At this time, the voice signal does not exist alone and is usually mixed with noise. Accordingly, a signal-to-noise ratio is used as a measure indicating the ratio of the voice signal to the noise, which can be defined as in Equation 4 below.

[수학식 4][Equation 4]

여기서, 상기 En(t)는 초기 8프레임의 평균값을 말한다.Here, En(t) refers to the average value of the initial 8 frames.

또한, 상기 신호 대 잡음비 최대치(SNR Maximum) 값을 구하기 위해서 아래 수학식 5를 활용할 수 있다(1560). In addition, Equation 5 below may be used to obtain the signal-to-noise ratio maximum (SNR Maximum) value (1560).

[수학식 5][Equation 5]

나아가, 시그모이드(Sigmoid) 함수를 적용하여 아래 수학식 6과 같이 정의할 수 있다(1570). Furthermore, it can be defined as in Equation 6 below by applying a sigmoid function (1570).

[수학식 6][Equation 6]

이에 따라, 본 발명의 일 실시예에 따른 음성 인식 처리 방법 및 시스템(10)에서는, 종래의 방식보다 신호 대 잡음비(SNR)을 크게 향상시킬 수 있으며, 나아가 환경 소음을 그 정도에 따라 Class 별도 구분하여 전처리함으로써, 음성 인식의 성공률을 개선할 수 있게 된다.Accordingly, in the voice recognition processing method and system 10 according to an embodiment of the present invention, the signal-to-noise ratio (SNR) can be significantly improved compared to the conventional method, and further, the environmental noise is classified into classes according to the degree. Thus, it is possible to improve the success rate of speech recognition by pre-processing.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains. Accordingly, the embodiments described in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and are not limited to these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

10 : 음성 인식 처리 시스템
100 : 음성 인식 장치
110 : 동작 모드 정보 수신부
120 : 적응형 소음 감쇄 필터부
130 : 음성 인식 처리부
200 : 작업 장치
300 : 작업 장치 제어 서버
310 : 작업 장치 제어부
320 : 통신부
400 : 음성 인식 서버
410 : 음성 데이터베이스
420 : 소음 데이터베이스
500 : 사용자10: speech recognition processing system
100: speech recognition device
110: operation mode information receiving unit
120: adaptive noise reduction filter unit
130: speech recognition processing unit
200: working device
300: work device control server
310: work device control unit
320: communication department
400: voice recognition server
410: voice database
420: noise database
500 : user

Claims

In the voice recognition processing method for a work device that induces noise of different characteristics according to an operation mode,
an operation mode information receiving step of receiving, by a voice recognition apparatus, information on an operation mode of the work device;
an adaptive noise reduction filtering step of adaptively performing filtering by changing a setting of adaptive noise reduction filtering for the user's voice data according to the operation mode of the work device; and
a voice recognition performing step of processing a voice recognition function on the filtered voice data of the user;
Speech recognition processing method comprising a.

According to claim 1,
In the operation mode information receiving step,
Receives information on the type and operation mode of a working device among a plurality of types of working devices,
In the adaptive noise reduction filtering step,
Speech recognition processing method, characterized in that the filtering is performed by changing settings according to the type and operation mode of the working device.

According to claim 1,
In the operation mode information receiving step,
Receives information about the location of the working device together with the operating mode of the working device,
In the adaptive noise reduction filtering step,
Speech recognition processing method, characterized in that the filtering is performed by changing settings according to the operation mode and location of the work device.

According to claim 1,
In the step of performing the voice recognition,
To perform voice recognition by transmitting the filtered voice data of the user to a voice recognition server,
The voice recognition processing method, characterized in that the voice recognition server performs secondary filtering using a second filter learned using the noise data collected by the voice recognition device, and then performs voice recognition.

5. The method of claim 4,
The noise data is
The voice recognition processing method, characterized in that the noise data collected by the voice recognition device when the working device is actually operated.

5. The method of claim 4,
In the voice recognition server,
The voice recognition processing method, characterized in that the information on the operation mode of the work device and the noise data are mapped and stored, and used for voice recognition of the voice data.

According to claim 1,
Prior to voice recognition of the user's voice data,
The method of claim 1, further comprising an environmental noise preprocessing step of preprocessing environmental noise in an environment in which the working device operates, not the noise caused by the operation of the working device.

8. The method of claim 7,
In the environmental noise pretreatment step,
calculating a signal-to-noise ratio (SNR) by dividing a voice section and a non-voice section for the voice data;
and performing pre-processing by classifying the voice data according to a signal-to-noise ratio (SNR) of the voice data.

In the voice recognition processing system for a work device causing noise,
Working devices that cause different characteristics of noise depending on the operating mode; and
an operation mode information receiving unit for receiving information on the operation mode of the work device;
an adaptive noise reduction filter unit configured to adaptively perform filtering by changing a setting of adaptive noise reduction filtering for the user's voice data according to the operation mode of the work device; and
a voice recognition device having a voice recognition processing unit that processes a voice recognition function on the filtered voice data of the user;
Speech recognition processing system comprising a.