KR20180131381A

KR20180131381A - Stand-along Voice Recognition based Agent Module for Precise Motion Control of Robot and Autonomous Vehicles

Info

Publication number: KR20180131381A
Application number: KR1020180044631A
Authority: KR
Inventors: 이덕진
Original assignee: 군산대학교산학협력단
Priority date: 2017-05-31
Filing date: 2018-04-17
Publication date: 2018-12-10
Also published as: KR20190057242A

Abstract

An objective of the present invention is to provide a stand-alone embedded voice recognition technique capable of precisely recognizing and analyzing a robot control voice command even in a space without internet connection and a voice recognition-based agent module capable of precise motion control of an autonomous vehicle. According to embodiments of the present invention, the stand-alone voice recognition-based agent module for precise motion control of a robot and an autonomous vehicle comprises: a voice recognition engine to perform language processing to output an inputted voice in recognizable text and then convert the recognizable text into speed control text for motion control to assign the speed control text; a robot operating system to receive a speed control command using a programming language, to which the speed control text is assigned by the voice recognition engine, to transfer a speed value to drive a robot or an autonomous vehicle; and a driving unit to receive the speed value from the robot operating system to drive the robot or the autonomous vehicle.

Description

[0001] The present invention relates to a stand-alone voice recognition based agent module for precise motion control of a robot and an autonomous mobile body,

본 발명은 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈에 관한 것으로서, 특히 음향모델, 언어모델, 데이터 사전을 포함하는 음성인식 알고리즘을 이용하여 음성 명령 인식 및 해석을 수행함으로써, 로봇 또는 자율이동체의 모션을 정밀하게 제어할 수 있는 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈에 관한 것이다.The present invention relates to a standalone speech recognition based agent module for robotic and autonomous mobile body precise motion control. More particularly, the present invention relates to a voice recognition based on a voice recognition algorithm and a voice recognition algorithm including an acoustic model, a language model, The present invention relates to a robot capable of precisely controlling motion of a robot or an autonomous mobile body, and a standalone voice recognition based agent module for precise motion control of an autonomous mobile body.

일반적으로 음성인식은 마이크 등 유무선 통신 방식을 통해 음성을 전달받아 단어나 문장으로 변환시키는 기술을 일컫는다. 이러한 음성인식은 인간에게 편의성을 더할 수 있어 근래에 로봇 분야, 핸드폰, 스마트홈, 자동차, 인포테인먼트 등의 휴대용 기기 분야에서 널리 각광받으며 사용되고 있다.Generally speaking, speech recognition refers to a technique of transferring voice to a word or sentence through a wired / wireless communication method such as a microphone. Such speech recognition can be added to human convenience and has been widely used in the fields of robotics, mobile phones, smart homes, automobiles, and infotainment.

이에 따라, 음성인식은 꾸준히 연구되어 왔으며, 앞으로도 활발한 연구가 진행될 예정이다.As a result, speech recognition has been studied steadily, and active research is expected to continue in the future.

최근에는, 음성인식을 위한 개방형(Open Source) 모듈이 공개되고 있고, 아마존 알렉사, 구글 홈 등과 같은 개방형 모듈을 이용한 음성인식 기반 스마트 홈, 개인비서 활용이 각광받고 있는 실정이다.Recently, Open Source module for voice recognition has been disclosed, and smart home based on voice recognition using an open module such as Amazon Alexa, Google Home, etc. and personal secretary have been attracting attention.

그러나, 이러한 개방형(Open Source) 모듈의 경우, 인터넷 연결을 통해서만 서비스가 제공되는 단점이 있기 때문에 인터넷 연결이 없는 공간에서도 로봇을 자유롭고 정밀하게 제어할 수 있는 단독형(Stand along) 임베디드 음성인식 모듈의 개발이 요구되고 있다.However, in the case of such an open source module, a stand-alone embedded speech recognition module capable of freely and precisely controlling the robot even in a space without an internet connection is provided because the service is provided only through an Internet connection Development is required.

본 발명은 상기의 문제점을 해결하기 위하여 인터넷 연결이 없는 공간에서도 로봇 제어 음성 명령을 정밀하게 인식 및 해석 할 수 있는 단독형 임베디드 음성인식 기술과 자율이동체의 모션을 정밀하게 제어할 수 있는 음성인식 기반 에이전트 모듈을 제공하는 데 목적이 있다.In order to solve the above problems, the present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide a speech recognition system capable of precisely recognizing and interpreting a robot- It is an object to provide an agent module.

상기 과제를 해결하기 위한 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은, 입력된 음성을 인식 가능한 텍스트로 출력하도록 언어처리한 후, 모션 제어를 위한 속도 제어 텍스트로 변환하여 할당하는 음성인식엔진; 상기 음성인식엔진에서 속도 제어 텍스트를 할당받는 프로그래밍 언어를 통해 속도 제어 명령을 전달받아 로봇 또는 자율이동체가 구동되도록 속도 값을 전달하는 로봇운영시스템; 상기 로봇운영시스템으로부터 속도 값을 전달받아 로봇 또는 자율이동체를 구동하는 구동부를 포함할 수 있다.In order to accomplish the above object, there is provided a standalone speech recognition based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention, language processing for outputting an input voice as recognizable text, A speech recognition engine for converting and assigning speed control text; A robot operating system that receives a speed control command through a programming language that is assigned a speed control text in the speech recognition engine and transmits a speed value such that the robot or the autonomous mobile is driven; And a driving unit that receives the velocity value from the robot operating system and drives the robot or the autonomous mobile unit.

여기서, 상기 음성인식엔진은 포켓스피닉스(Pocketsphinx) 또는 다른 개방형 음성인식 모듈을 포함할 수 있다.Here, the speech recognition engine may include Pocketphinx or other open speech recognition module.

또한, 상기 음성인식엔진은, 음성 인식에 필요한 특징 벡터를 추출하는 전처리부 및 음성인식 알고리즘을 저장하며, 상기 전처리부에서 추출된 특징 벡터를 상기 음성인식 알고리즘을 통해 분석하여 언어처리 하는 인식부를 포함할 수 있다.The speech recognition engine may include a preprocessing unit for extracting a feature vector necessary for speech recognition and a recognition unit for storing the speech recognition algorithm and analyzing the feature vector extracted by the preprocessing unit through the speech recognition algorithm and performing language processing can do.

또한, 상기 음성인식 알고리즘은, 마이크를 통해 입력된 음성에 대하여 적응성을 갖도록 하는 음향모델; 상기 추출된 특징 벡터를 상기 적응성을 가진 음향모델과 비교하여 인식 가능한 텍스트 형태로 변환하는 언어모델 및 상기 언어모델이 상기 추출된 특징 벡터와 상기 음향모델 비교 시에, 인식 가능한 텍스트 형태로 변환할 수 있는지 판별해 주는 데이터 사전을 포함할 수 있다.The speech recognition algorithm may include an acoustic model for adapting speech input through a microphone; A language model for converting the extracted feature vector into an recognizable text form by comparing the extracted feature vector with the adaptive acoustic model and a language model for converting the extracted feature vector into a recognizable text form when the extracted feature vector and the acoustic model are compared Quot; data dictionary "

또한, 상기 프로그래밍 언어는 파이썬(python) 또는 C/C++일 수 있다.Also, the programming language may be Python or C / C ++.

또한, 상기 프로그래밍 언어로부터 상기 로봇운영시스템으로의 속도 제어 명령 전달은 유선 또는 무선 통신방식을 통해서 전달될 수 있다.In addition, the speed control command transfer from the programming language to the robot operating system may be carried out via a wired or wireless communication scheme.

또한, 상기 구동부는 상기 로봇운영시스템으로부터 속도 값을 전달받아 속도를 저레벨 신호로 입력하는 저레벨 프로세서; 상기 저레벨 프로세서를 통해 입력된 속도 신호를 펄스 변조하는 PWM 생성기 및 상기 PWM 생성기로부터 변조된 펄스에 따라 로봇 또는 자율이동체를 구동하는 DC 모터를 포함할 수 있다.The driving unit may include a low-level processor for receiving a velocity value from the robot operating system and inputting the velocity as a low-level signal; A PWM generator for pulse-modulating the speed signal input through the low-level processor, and a DC motor for driving the robot or the autonomous mobile body according to the pulse modulated from the PWM generator.

본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은, 인터넷 연결이 없는 공간에서도 로봇을 자유롭고 정밀하게 제어할 수 있는 특징이 있다.The robot and the independent speech recognition based agent module for precise motion control of the autonomous moving object according to the embodiment of the present invention are characterized in that the robot can be freely and precisely controlled even in a space without internet connection.

또한, 본 발명의 일 구성인 음향모델이 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori)의 화자적응기법을 이용함으로써 보다 정확한 음성인식을 수행할 수 있다.In addition, more accurate speech recognition can be performed by using the speaker adaptation technique of Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori (MAP) as an acoustic model of the present invention.

또한, 개방형 음성인식엔진인 포켓스피닉스(Pocketsphinx)를 포함한 개방형 음성인식 엔진을 이용함으로써 비교적 저렴한 가격으로 제공될 수 있다.In addition, by using an open speech recognition engine including an open speech recognition engine Pocketsphinx, it can be provided at a relatively low price.

도 1은 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 구성을 도시한 블록도이다.
도 2는 본 발명의 일 구성인 음성인식엔진의 구성을 도시한 블록도이다.
도 3은 도 2의 음성인식엔진의 일 구성인 인식부의 구성을 도시한 블록도이다.
도 4는 본 발명의 일 구성인 음향모델의 화자적응단계를 도시한 블록도이다.
도 5는 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동 흐름도이다.
도 6은 도 5의 (c) 단계의 작동 흐름도이다.
도 7은 도 5의 (f) 단계의 작동 흐름도이다.1 is a block diagram illustrating a configuration of a standalone speech recognition based agent module for robotic and autonomous moving object precision motion control according to an embodiment of the present invention.
2 is a block diagram showing a configuration of a speech recognition engine which is an embodiment of the present invention.
3 is a block diagram showing the configuration of a recognition unit which is a configuration of the speech recognition engine of FIG.
4 is a block diagram showing a speaker adaptation step of an acoustic model which is an embodiment of the present invention.
5 is an operational flowchart of a standalone speech recognition based agent module for robotic and autonomous moving object precision motion control according to an embodiment of the present invention.
FIG. 6 is an operational flowchart of the step (c) of FIG.
7 is an operational flowchart of the step (f) of Fig.

이하, 도면을 참조한 본 발명의 설명은 특정한 실시 형태에 대해 한정되지 않으며, 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있다. 또한, 이하에서 설명하는 내용은 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, the description of the present invention with reference to the drawings is not limited to a specific embodiment, and various transformations can be applied and various embodiments can be made. It is to be understood that the following description covers all changes, equivalents, and alternatives falling within the spirit and scope of the present invention.

이하의 설명에서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용되는 용어로서, 그 자체에 의미가 한정되지 아니하며, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.In the following description, the terms first, second, and the like are used to describe various components and are not limited to their own meaning, and are used only for the purpose of distinguishing one component from another component.

본 명세서 전체에 걸쳐 사용되는 동일한 참조번호는 동일한 구성요소를 나타낸다.Like reference numerals used throughout the specification denote like elements.

본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 이하에서 기재되는 "포함하다", "구비하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로 해석되어야 하며, 하나 또는 그 이상의 다른 특징들이나, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. It is also to be understood that the terms " comprising, "" comprising, "or" having ", and the like are intended to designate the presence of stated features, integers, And should not be construed to preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

이하 본 발명의 실시 예를 첨부한 도 1 내지 도 7을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to FIGS. 1 to 7.

도 1은 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 구성을 도시한 블록도이며, 도 2는 본 발명의 일 구성인 음성인식엔진의 구성을 도시한 블록도이고, 도 3은 도 2의 음성인식엔진의 일 구성인 인식부의 구성을 도시한 블록도이며, 도 4는 본 발명의 일 구성인 음향모델의 화자적응단계를 도시한 블록도이다.FIG. 1 is a block diagram showing a configuration of a standalone speech recognition based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention. FIG. 2 is a block diagram illustrating a configuration of a speech recognition engine, FIG. 3 is a block diagram showing a configuration of a recognition unit which is a constitution of the speech recognition engine of FIG. 2, and FIG. 4 is a block diagram showing a speaker adaptation step of an acoustic model, which is a constitution of the present invention .

먼저, 도 1 내지 도 4를 참조하면, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈은 로봇 또는 자율이동체에 설치되며, 음성인식엔진(10), 로봇운영시스템(30), 구동부(40)를 포함할 수 있다.1 to 4, a standalone speech recognition based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention is installed in a robot or an autonomous mobile body, and includes a speech recognition engine 10, A robot operating system 30, and a driving unit 40. [

구체적으로, 음성인식엔진(10)은 로봇 또는 자율이동체에 설치된 마이크 및 유무선 통신 기반 음성 전달장치로부터 입력되는 음성을 인식할 수 있다. 또한, 음성인식엔진(10)은 포켓스피닉스(Pocketsphinx)로 구성될 수 있다. Specifically, the speech recognition engine 10 can recognize a voice input from a robot or a microphone installed in an autonomous mobile unit and a wired / wireless communication-based voice transmission apparatus. In addition, the speech recognition engine 10 may be configured as a PocketSphinx.

이는, 개방형 음성인식엔진으로써, 저렴한 장점이 있으며, 로봇 또는 자율이동체를 제어하기 위해 구성되는 로봇운영시스템(ROS) 및 C/C++ 또는 파이썬 등의 프로그래밍 언어와 호환성이 높고, 인식 속도가 높은 장점이 있다. 그러나, 상기 포켓스피닉스(pocketshpinx)는 바람직한 예로서, 한정되는 것은 아니며 다른 음성인식엔진(10)으로 구비될 수도 있다. This is an open speech recognition engine and has advantages of low cost, high compatibility with a programming language such as a robot operating system (ROS) and C / C ++ or Python configured to control a robot or an autonomous mobile, have. However, the above-described pocketshapinx is a preferred example, and the present invention is not limited thereto and may be provided with another speech recognition engine 10.

이하에서는 바람직한 예인 포켓스피닉스(Pocketshpinx)로 구성되는 음성인식엔진(10)을 기반으로 설명하기로 한다. Hereinafter, a description will be made on the basis of a speech recognition engine 10 composed of a preferred example PocketSpinx.

음성인식엔진(10)은 설치된 마이크 및 유무선 통신 기반 음성 전달장치로부터 입력된 음성을 인식 가능한 텍스트로 출력하는 언어처리를 수행할 수 있으며, 언어처리된 텍스트를 모션 제어를 위한 속도 제어 텍스트로 변환하여 할당할 수 있다.The speech recognition engine 10 can perform language processing for outputting voice inputted from the installed microphone and the wired / wireless communication based voice delivery device as recognizable text, and converts the language processed text into speed control text for motion control Can be assigned.

이를 위해, 음성인식엔진(10)은 도 2에 도시된 바와 같이 전처리부(12) 및 인식부(14)를 포함할 수 있다.To this end, the speech recognition engine 10 may include a preprocessing unit 12 and a recognition unit 14 as shown in FIG.

전처리부(12)는 음성 인식에 필요한 특징 벡터를 추출할 수 있다. 즉, 마이크로부터 음성이 입력되어 음성인식엔진(10)으로 들어오면, 전처리부(12)는 음성으로부터 음성학적 특징을 잘 표현해 줄 수 있는 특징 벡터를 추출할 수 있다. 이때, 전처리부(12)는 1/100(초) 단위로 특징 벡터를 추출할 수 있다. The preprocessing unit 12 can extract a feature vector necessary for speech recognition. That is, when speech is input from the microphones to the speech recognition engine 10, the preprocessing unit 12 can extract a feature vector that can express phonetic features from the speech. At this time, the preprocessing unit 12 can extract the feature vector in units of 1/100 (seconds).

또한, 전처리부(12)는 특징 벡터 추출 시 MFCC(Mel Frequency Cepstral Coefficients) 알고리즘을 이용하여 추출할 수 있다. 여기서, MFCC 알고리즘은 입력된 소리 전체를 일정 구간(short time)을 나누어, 이 구간에 대한 스펙트럼(spectrum)을 분석하여 특징을 추출할 수 있다. 예를 들어, 일정 구간의 길이를 20 내지 40ms 단위로 나누고, 각 단위에 해당하는 스펙트럼(spectrum) 즉, 주파수를 계산하는 방식이다.In addition, the preprocessing unit 12 can extract the feature vector using a Mel Frequency Cepstral Coefficients (MFCC) algorithm. Here, the MFCC algorithm divides the entire input sound into a short time, and extracts the characteristics by analyzing the spectrum of the input sound. For example, the length of a predetermined section is divided into 20 to 40 ms units, and a spectrum corresponding to each unit, that is, a frequency is calculated.

한편, 전처리부(12)는 마이크, 회선 등에서 비롯되는 채널 왜곡 및 배경 잡음 등을 포함하는 잡음을 처리하는 잡음처리를 수행하여 특징 벡터를 추출할 수 있다. 이는, 음성인식 성공률을 높이기 위함으로, 특징벡터 추출 후 보상하거나, 잡음에 강한 특징 벡터를 도입하는 등의 방법을 사용할 수 있다. Meanwhile, the preprocessing unit 12 may extract a feature vector by performing a noise process for processing noise including channel distortion and background noise caused by a microphone, a line, and the like. In order to increase the success rate of speech recognition, it is possible to use a method of extracting a feature vector and compensating it, or introducing a feature vector resistant to noise.

이를 통해, 특징 벡터를 추출한 전처리부(12)는 인식부(14)로 특징 벡터를 전송할 수 있다.Accordingly, the preprocessing unit 12 extracting the feature vector can transmit the feature vector to the recognition unit 14.

인식부(14)는 전처리부(12)에서 추출된 특징 벡터를 패턴 분석하여 언어처리 할 수 있다. 또한, 인식부(14)는 음성인식 알고리즘(16)을 저장할 수 있으며, 음성인식 알고리즘(16)을 통해 언어처리 할 수 있다. 여기서, 음성인식 알고리즘(16)은 도 3에 도시된 바와 같이 음향모델(16a), 언어모델(16b), 데이터 사전(16c)을 포함할 수 있다. The recognition unit 14 can pattern-analyze the feature vectors extracted by the preprocessing unit 12 and perform language processing. In addition, the recognition unit 14 may store the speech recognition algorithm 16 and may perform language processing through the speech recognition algorithm 16. Here, the speech recognition algorithm 16 may include an acoustic model 16a, a language model 16b, and a data dictionary 16c, as shown in FIG.

보다 구체적으로, 인식부(14)는 전처리부(12)에서 추출된 음성의 특징 벡터를 전달받아 음성인식엔진(10)의 데이터베이스에 저장된 음향모델(16a)과 패턴 비교하여 인식 결과를 얻을 수 있다. 여기서, 도 4에 도시된 바와 같이 음향모델(16a)은 인식률을 높이기 위해 스피커를 통해 전달되는 화자의 음성과 적응성을 갖도록 형성되며, 이를 위해 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori) 적응기법을 이용할 수 있다. More specifically, the recognition unit 14 receives the feature vector of the speech extracted from the preprocessing unit 12 and compares the pattern with the acoustic model 16a stored in the database of the speech recognition engine 10 to obtain the recognition result . As shown in FIG. 4, the acoustic model 16a is formed to have adaptability with the voice of the speaker transmitted through the speaker to increase the recognition rate. For this purpose, the acoustic model 16a includes Maximum Likelihood Linear Regression (MLLR) MAP (Maximum A Posteriori) adaptation technique.

이때, 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 적응 후, MAP(Maximum A Posteriori)을 실행할 수 있으며, 이를 통해 음향모델(16a)은 화자의 음성에 최대한 근접한 샘플을 제공하여 음성인식엔진(10)이 정확하게 음성을 인식할 수 있도록 할 수 있다.At this time, the acoustic model 16a may perform Maximum A Posteriori (MAP) after adaptation of Maximum Likelihood Linear Regression (MLLR), whereby the acoustic model 16a provides a sample as close as possible to the speech of the speaker, So that the user 10 can correctly recognize the voice.

한편, 음향모델(16a)은 한국어 또는 영어로 구성될 수 있다.On the other hand, the acoustic model 16a may be composed of Korean or English.

언어모델(16b)은 음향모델(16a)을 통해 인식된 음성에 대하여 언어처리 할 수 있다. 이를 위해, 언어모델(16b)은 단어 단위 검색 및 문장 단위 검색을 포함할 수 있다. The language model 16b can perform language processing on the voice recognized through the acoustic model 16a. To this end, the language model 16b may include a word unit search and a sentence unit search.

단어 단위 검색은 음소를 포함하여 진행되며, 데이터베이스에 저장된 음향모델(16a)과의 단어 단위의 또는 음소 단위의 패턴 비교를 통해 가능한 후보 단어 또는 후보 음소를 추출할 수 있다. 이때, 상기 과정을 거친 후보 단어 또는 후보 음소는 문장 단위 검색으로 진행될 수 있다.The word-based search is performed including phonemes, and candidate words or candidate phonemes can be extracted through word-by-word or phoneme-by-phoneme pattern comparison with the acoustic model 16a stored in the database. At this time, the candidate word or the candidate phoneme that has undergone the above process can be processed by a sentence unit search.

문장 단위 검색은 후보 단어 또는 후보 음소들의 정보를 토대로, 데이터 사전(Data dictionary)을 이용하여 문법 구조, 문장 문맥, 특정 주제 등에의 부합 여부를 판단하여 가장 적합한 단어나 음소를 판별할 수 있다.The sentence unit search can determine the most suitable word or phoneme by judging whether it matches the grammar structure, the sentence context, a specific topic or the like based on the information of candidate words or candidate phonemes using a data dictionary.

예를 들어, '우리는 바닷가에 간다' 라는 문장에서 불명확한 발음에 의해 '는'과 '능'의 구분이 어렵다고 가정하면, 단어 단위 검색에서는 '는'과 '능'이라는 두 개의 후보 단어를 결과로 생성할 수 있다. 이때, 문장 단위 검색에서는 데이터 사전(16c)을 이용한 문장 구조 분석을 통해 '는'은 문장에서 조사 역할을 담당하지만, '능'이라는 조사는 존재하지 않음을 인식하고 후보에서 배제할 수 있다. For example, assuming that it is difficult to distinguish between '' and '' by '' we go to the seashore '' by unclear pronunciation, two word candidates, 'a' and ' As a result. At this time, through the analysis of the sentence structure using the data dictionary (16c) in the sentence unit search, 'a' plays a role of investigation in the sentence, but it can be recognized that there is no investigation of 'ability' and can be excluded from the candidate.

즉, 언어모델(16b)은 어휘 및 문법 구조를 제약하여 인식성능을 향상시키도록 언어처리 과정을 수행할 수 있다. 이러한 방법을 통해 음성인식엔진(10)의 음성 인식은 더 빠르게 실행되며, 음성인식결과는 더 정확할 수 있다.That is, the language model 16b can perform language processing to improve recognition performance by restricting the vocabulary and grammar structure. In this way, the speech recognition of the speech recognition engine 10 is executed more quickly, and the speech recognition result can be more accurate.

여기서, 언어모델(16b)은 통계적 패턴 인식을 기반으로 하며, 단어 단위 검색과 문장 단위 검색 과정을 하나로 통합한 방식인 HMM(Hidden Markov Model) 기법을 사용할 수 있다. 이는, 음성 단위에 해당하는 패턴들의 통계적 정보를 확률 모델 형태로 저장하고, 미지의 입력패턴이 들어오면 각각의 모델에서 들어온 미지의 패턴이 나올 수 있는 확률을 계산함으로써 미지의 패턴에 가장 적합한 음성단위를 찾아내는 방법이다.Here, the language model 16b is based on statistical pattern recognition, and HMM (Hidden Markov Model), which is a combination of word unit search and sentence unit search, can be used. It stores the statistical information of the patterns corresponding to the speech unit in the form of a probability model and calculates the probability that an unknown pattern in each model comes out when an unknown input pattern comes in, .

한편, 언어모델(16b)은 언어처리 시, 로봇 또는 자율이동체의 모션을 제어할 수 있도록 처리된 언어를 텍스트화 하여 송출할 수 있다. On the other hand, the language model 16b can transmit processed texts in a text format so as to control motion of a robot or autonomous mobile body during language processing.

즉, 전처리부(12)에서 추출된 음성의 특징 벡터와 화자의 음성에 적응한 음향모델(16a)을 비교수행하여 인식된 음성은 언어모델(16b)을 통해 후보 단어 또는 후보 음소들을 추출하고, 후보 단어 또는 후보 음소들을 데이터 사전(16c)을 토대로 하여 가장 적합한 단어나 음소를 판별하여 정확한 문장 단위로 구분되며, 이때 문장 단위로 구분되도록 언어처리된 음성은 언어모델(16b)을 통해 텍스트화 될 수 있다.That is, the speech feature vector extracted from the preprocessing unit 12 is compared with the acoustic model 16a adapted to the speech of the speaker to extract candidate words or candidate phonemes through the language model 16b, The candidate word or candidate phonemes are distinguished by the correct sentence unit by discriminating the most suitable word or phoneme based on the data dictionary 16c. At this time, the speech processed to be divided into sentence units is textized through the language model 16b .

상기에서 언어 처리된 텍스트는 상술한 바와 같이 음성인식엔진(10)을 통해 로봇 또는 자율이동체의 모션 제어를 위해 모터의 속도를 조절할 수 있는 속도 제어 텍스트로 변환되어 할당될 수 있다.The language processed text may be converted to a speed control text that can control the speed of the motor for motion control of the robot or the autonomous mobile body through the speech recognition engine 10 as described above.

할당된 속도 제어 텍스트는 모션을 제어하도록 프로그래밍 언어(20)를 통해 로봇 제어 명령으로 치환되어 로봇운영시스템(30)에 전달될 수 있다. The assigned speed control text may be transferred to the robot operating system 30 by replacing it with a robot control command via the programming language 20 to control motion.

이때, 프로그래밍 언어(20)는 C/C++ 또는 파이썬(python)일 수 있다. 또한, 프로그래밍 언어(20)는 로봇운영시스템(30)과 유선 또는 무선으로 연결될 수 있다. 즉, 음성인식엔진(10)을 통해 속도 제어 텍스트를 할당받은 프로그래밍 언어(20)는 알고리즘을 통해 로봇 제어 명령을 생성하고 이를 로봇운영시스템(30)에 유선 또는 무선 방식으로 전달할 수 있다. 여기서, 무선 방식은 블루투스 또는 와이파이(Wi-Fi) 등의 방식으로 구성될 수 있다.At this time, the programming language 20 may be C / C ++ or Python. Also, the programming language 20 may be connected to the robot operating system 30 in a wired or wireless manner. That is, the programming language 20, which is assigned the speed control text through the speech recognition engine 10, can generate a robot control command through an algorithm and deliver it to the robot operating system 30 in a wired or wireless manner. Here, the wireless scheme may be configured by a method such as Bluetooth or Wi-Fi.

로봇운영시스템(30)은 프로그래밍 언어(20)로부터 속도 제어 명령을 전달받을 수 있다. 또한, 로봇운영시스템(30)은 로봇 또는 자율이동체를 구동하는 구동부(40)에 속도 값을 전달함으로써 구동부(40)를 제어할 수 있다. 즉, 로봇운영시스템(30)은 프로그래밍 언어(20)로부터 속도 제어 명령에 따른 속도 값을 조절하여 로봇 또는 자율이동체의 모션이 제어되도록 구동부(40)를 제어할 수 있다. The robot operating system 30 can receive the speed control command from the programming language 20. [ In addition, the robot operating system 30 can control the driving unit 40 by transmitting a velocity value to the robot 40 or the driving unit 40 that drives the autonomous moving object. That is, the robot operating system 30 can control the driving unit 40 so that the motion of the robot or the autonomous mobile unit is controlled by adjusting the speed value according to the speed control command from the programming language 20. [

구동부(40)는 로봇운영시스템(30)으로부터 속도 값을 전달받아 로봇 또는 자율이동체를 구동할 수 있다. 이를 위해, 구동부(40)는 저레벨 프로세서(42), PWM 생성기(44), DC 모터(46)를 포함할 수 있다.The driving unit 40 receives the velocity value from the robot operating system 30 and can drive the robot or the autonomous mobile unit. To this end, the driving unit 40 may include a low-level processor 42, a PWM generator 44, and a DC motor 46.

구체적으로, 저레벨 프로세서(42)는 로봇운영시스템(30)을 통해 전달된 속도 값을 저레벨(Low-level) 신호로 프로그래밍하여 입력할 수 있다.Specifically, the low-level processor 42 may program the speed value transmitted through the robot operating system 30 as a low-level signal and input the low-level signal.

PWM 생성기(44)는 저레벨 프로세서(42)로부터 저레벨(Low-level)로 프로그래밍되어 입력된 저레벨 신호의 펄스를 변조할 수 있다. 즉, 속도 값을 표현한 저레벨 신호의 펄스 변조를 통해 속도 값을 제어할 수 있다. The PWM generator 44 can be programmed from the low-level processor 42 to a low level to modulate a pulse of the input low-level signal. That is, the velocity value can be controlled by pulse modulation of the low-level signal expressing the velocity value.

DC 모터(46)는 로봇 또는 자율이동체의 바퀴 등과 연결되어, PWM 생성기(44)로 변조된 펄스에 따라 로봇 또는 자율이동체를 구동할 수 있다. 이때, DC 모터(46)는 각 바퀴마다 구비되는 것이 바람직하며, 각각의 DC 모터(46)의 제어속도를 달리하여 로봇 또는 자율이동체의 모션을 제어할 수 있다.The DC motor 46 is connected to the robot or the wheels of the autonomous mobile body, and can drive the robot or the autonomous mobile body according to the pulse modulated by the PWM generator 44. At this time, it is preferable that the DC motor 46 is provided for each wheel, and the motion of the robot or the autonomous mobile body can be controlled by varying the control speed of each DC motor 46.

이하, 도 5 내지 도 7을 참조하여, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동방법을 설명하기로 한다.Hereinafter, a method for operating a standalone speech recognition-based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention will be described with reference to FIG. 5 to FIG.

도 5는 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동 흐름도이며, 도 6은 도 5의 (c) 단계의 작동 흐름도이고, 도 7은 도 5의 (f) 단계의 작동 흐름도이다.FIG. 5 is an operational flowchart of a standalone speech recognition based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention, FIG. 6 is an operational flowchart of the step (c) of FIG. 5, 5 is an operation flowchart of step (f) of FIG.

도 5 내지 도 7을 참조하면, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈의 작동방법은 하기 (a) 내지 (f) 단계 순으로 진행될 수 있다. 5 to 7, a method of operating a standalone speech recognition based agent module for robotic and autonomous moving object precise motion control according to an embodiment of the present invention may be performed in the order of steps (a) to (f) .

(a) 마이크를 통한 사용자의 음성 입력 단계(S100)(a) a user's voice input step (S100) through a microphone,

- 로봇 또는 자율이동체에 설치된 마이크를 통해 사용자가 음성 명령을 입력할 수 있다. 이때, 음성 명령은 음성인식엔진(10)으로 전달될 수 있다.- The user can input a voice command through a microphone installed in the robot or autonomous mobile unit. At this time, the voice command can be transmitted to the voice recognition engine 10.

(b) 입력된 음성의 특징 벡터를 추출하는 단계(S200)(b) extracting a feature vector of the input speech (S200)

- 음성 명령이 음성인식엔진(10)에 도달하면, 음성인식엔진(10)의 전처리부(12)는 입력된 음성의 특징 벡터를 추출할 수 있다. 여기서, 음성인식엔진(10)은 포켓스피닉스(Pocketsphinx)로 구비될 수 있으며, MFCC(Mel Frequency Cepstral Coefficients) 알고리즘을 이용하여 1/100(초) 단위로 특징 벡터를 추출할 수 있다. When the voice command reaches the voice recognition engine 10, the preprocessing unit 12 of the voice recognition engine 10 can extract the feature vector of the input voice. Here, the speech recognition engine 10 may be provided with Pocketphinx, and the feature vector may be extracted in units of 1/100 (seconds) using an MFCC (Mel Frequency Cepstral Coefficients) algorithm.

(c) 특징 벡터를 음성인식 알고리즘을 이용하여 인식 가능한 텍스트로 변환하는 단계(S300)(c) converting the feature vector into recognizable text using a speech recognition algorithm (S300)

- 이 단계는 음성인식엔진(10)의 인식부(14)에서 수행될 수 있다. 이때, 음성인식 알고리즘(16)은 인식부(14)에 저장될 수 있다. 즉, 인식부(14)는 전처리부(12)로부터 추출된 특징 벡터를 전달받을 수 있으며, 인식부(14)는 저장된 음성인식 알고리즘(16)을 이용하여 인식 가능한 텍스트로 변환할 수 있다. This step may be performed in the recognition unit 14 of the speech recognition engine 10. At this time, the speech recognition algorithm 16 may be stored in the recognition unit 14. That is, the recognition unit 14 can receive the feature vector extracted from the preprocessing unit 12, and the recognition unit 14 can convert the recognition vector into the recognizable text using the stored speech recognition algorithm 16. [

구체적으로, 음성인식 알고리즘(16)은 음향모델(16a), 언어모델(16b), 데이터 사전(16c)을 포함할 수 있으며, 하기 3 단계로 수행될 수 있다.Specifically, the speech recognition algorithm 16 may include an acoustic model 16a, a language model 16b, and a data dictionary 16c, and may be performed in the following three steps.

1 단계 : 음향모델(16a)이 화자의 음성에 적응하는 단계(S310)Step 1: The acoustic model 16a adapts to the speaker's voice (S310)

- 음향모델(16a)은 전처리부(12)로부터 추출된 특징 벡터와 비교되어 음성 인식 결과를 도출할 수 있는데, 이때 화자마다 음성학적 특징이 다른점을 고려하여 음향모델(16a)이 화자적응할 수 있도록 형성될 수 있다. 여기서, 음향모델(16a)은 MLLR(Maximum Likelihood Linear Regression) 및 MAP(Maximum A Posteriori) 적응기법을 이용할 수 있으며, MLLR(Maximum Likelihood Linear Regression) 적응 수행 뒤, MAP(Maximum A Posteriori) 적응을 순차적으로 진행할 수 있다.The acoustic model 16a can be compared with the feature vector extracted from the preprocessing unit 12 to derive the speech recognition result. In this case, considering that the phonetic characteristics are different from each speaker, the acoustic model 16a can be adapted to the speaker . Here, the acoustic model 16a may use a Maximum Likelihood Linear Regression (MLLR) and a Maximum A Posteriori (MAP) adaptation scheme. After performing MLLR adaptation, MAP (Maximum A Posteriori) adaptation is performed sequentially You can proceed.

이를 통해, 보다 인식률을 높일 수 있다. Thus, the recognition rate can be further increased.

2 단계 : 화자에 적응한 음향모델과 특징 벡터를 비교하여 음성을 인식하는 단계(S320)Step 2: The voice is recognized by comparing the acoustic model adapted to the speaker with the feature vector (S320)

- 1 단계에서 초기 음향모델(16a)이 화자에 적응하면, 특징 벡터를 적응 음향모델(16a) 비교수행할 수 있다. 이때, 음향모델(16a)은 화자적응을 통해 보다 정확한 인식을 수행할 수 있다.If the initial acoustic model 16a adapts to the speaker in step 1, the feature vector may be compared with the adaptive acoustic model 16a. At this time, the acoustic model 16a can perform more accurate recognition through speaker adaptation.

3 단계 : 인식된 음성에 따라 언어모델(16b)이 후보 음소 또는 후보 단어를 추출 후, 데이터 사전(16c)을 이용하여 정확한 음성을 판별하여, 로봇 및 자율이동체 모션 제어를 위해 인식 가능한 텍스트 형태로 변환하는 단계(S330)Step 3: The language model 16b extracts the candidate phonemes or candidate words according to the recognized speech, and then the correct voice is discriminated by using the data dictionary 16c. Then, the recognition result is displayed in a recognizable text form for robot and autonomous mobile body motion control In operation S330,

- 2 단계에서 인식된 음성에 따라 언어모델(16b)은 HMM(Hidden Markov Model) 기법을 통한 단어 단위 검색과 문장 단위 검색을 통해 후보 음소 또는 후보 단어를 추출할 수 있다. 여기서, 언어모델(16b)은 추출된 후보 음소 또는 후보 단어를 기설정된 데이터 사전(16c)을 통해 비교하여 가장 적합한 단어나 음소를 판별할 수 있다.- According to the voice recognized in the second stage, the language model (16b) can extract candidate phonemes or candidate words through word unit search and sentence unit search through HMM (Hidden Markov Model) technique. Here, the language model 16b can compare the extracted candidate phonemes or candidate words through a predetermined data dictionary 16c to determine the most suitable word or phoneme.

상기의 과정을 통해 판별된 문장은 로봇 및 자율이동체 정밀 모션 제어를 위해 인식 가능한 텍스트 형태로 변환할 수 있다.The sentence discriminated through the above process can be converted into a recognizable text form for precise motion control of the robot and the autonomous mobile.

(d) 인식 가능하도록 변환된 텍스트를 속도 제어 텍스트로 변환하는 단계(S400)(d) converting the text converted to be recognizable into the speed control text (S400)

- 상기 음성인식 알고리즘(16)을 통해 도출된 인식 가능한 텍스트는 속도를 제어 할 수 있는 텍스트로 변환되어 프로그래밍 언어(20)로 전달될 수 있다. 이 단계는 음성인식엔진(10)에서 수행될 수 있다.The recognizable text derived through the speech recognition algorithm 16 can be converted into texts that can control the speed and delivered to the programming language 20. This step can be performed in the speech recognition engine 10.

(e) 속도 제어 텍스트를 프로그래밍 언어를 통해 속도 제어 명령을 생성하는 단계(S500)(e) generating a speed control command through a programming language of the speed control text (S500)

- 프로그래밍 언어(20)로 전달된 속도 제어 텍스트는 코딩된 알고리즘을 통해 속도 제어 명령을 생성할 수 있다. 이때, 생성된 속도 제어 명령은 Wi-fi 또는 블루투스(Bluetooth) 등의 유선방식 또는 무선방식으로 로봇 또는 자율이동체를 제어하는 로봇운영시스템에 전달될 수 있다.The speed control text transferred to the programming language 20 can generate a speed control command through a coded algorithm. At this time, the generated speed control command may be transmitted to a robot operating system that controls the robot or the autonomous mobile body in a wired or wireless manner such as Wi-fi or Bluetooth.

한편, 프로그래밍 언어(20)는 C/C++ 또는 파이썬(Python)으로 구비될 수 있다.Meanwhile, the programming language 20 may be provided in C / C ++ or Python.

(f) 상기 속도 제어 명령에 따라 로봇운영시스템(30)이 로봇 또는 자율이동체의 모션을 동작시키는 구동부(40)를 제어하는 단계(S600)(f) controlling (S600) a driving unit (40) for the robot operating system (30) to operate the robot or the motion of the autonomous mobile according to the speed control command,

- 속도 제어 명령을 전달받은 로봇운영시스템(30)은 로봇 또는 자율이동체의 모션을 동작시키는 구동부(40)가 속도를 내도록 속도 명령(속도 값)을 보내어 제어할 수 있다.- The robot operating system 30 receiving the speed control command can send and control a speed command (speed value) so that the driving unit 40 that operates the motion of the robot or the autonomous mobile unit sends the speed.

여기서, 구동부(40)는 저레벨 프로세서(42), PWM 생성기(44), DC 모터(46)를 포함하여 하기 3 단계 진행을 수행할 수 있다.The driving unit 40 may include a low-level processor 42, a PWM generator 44, and a DC motor 46 to perform the following three steps.

1 단계 : 상기 로봇운영시스템(30)으로부터 저레벨 프로세서(42)가 속도 값(속도 명령)을 전달받아 속도를 저레벨 신호로 입력하는 단계(S610)Step S610: The low level processor 42 receives the speed value (speed command) from the robot operating system 30 and inputs the speed as a low level signal (S610)

- 저레벨 프로세서(42)는 로봇운영시스템(30)으로부터 속도 값을 전달받을 수 있다. 이때, 속도 값을 전달받은 저레벨 프로세서(42)는 속도를 프로그래밍을 통해 저레벨 신호로 입력할 수 있다. The low-level processor 42 may receive the velocity value from the robot operating system 30. [ At this time, the low-level processor 42 receiving the velocity value can input the velocity as a low-level signal through programming.

2 단계 : PWM 생성기가 상기 저레벨 프로세서(42)를 통해 입력된 저레벨의 속도 신호를 펄스 변조하는 단계(S620)Step S620: The PWM generator pulse-modulates the low-level signal inputted through the low-level processor 42 (S620)

- 저레벨 프로세서(42)를 통해 프로그래밍 된 저레벨의 속도 신호를 PWM 생성기(44)가 인수받아 신호를 펄스 변조할 수 있다. - The PWM generator 44 can take the low level signal programmed through the low-level processor 42 and pulse-modulate the signal.

3 단계 : DC 모터가 상기 PWM 생성기(44)로부터 변조된 펄스에 따라 로봇 또는 자율이동체의 모션이 동작되도록 구동하는 단계(S630)Step 3: The DC motor drives the robot or the motion of the autonomous mobile according to the pulse modulated by the PWM generator 44 (S630)

- PWM 생성기(44)로부터 변조된 펄스는 로봇 또는 자율이동체의 바퀴마다 구비된 각각의 DC 모터(46)로 전송되어 DC 모터(46)가 펄스에 따라 속도가 제어됨으로써 로봇 및 자율이동체의 모션이 제어될 수 있다.- The pulse modulated from the PWM generator 44 is transmitted to each DC motor 46 provided for each wheel of the robot or the autonomous mobile body so that the speed of the DC motor 46 is controlled in accordance with the pulse so that the motion of the robot and the autonomous mobile body Lt; / RTI >

이에 따라, 종래의 개방형 음성인식 모듈은 인터넷 연결을 통해서만 서비스가 제공되었지만, 본 발명의 실시 예에 따른 로봇 및 자율이동체 정밀 모션 제어를 위한 단독형 음성인식 기반 에이전트 모듈 및 그의 작동방법은 인터넷 연결이 없는 공간에서도 로봇을 자유롭게 제어할 수 있는 특징이 있다.Accordingly, although the conventional open type speech recognition module is provided only through an Internet connection, a standalone speech recognition based agent module for precise motion control of a robot and an autonomous moving object according to an embodiment of the present invention, and an operation method thereof, There is a feature that the robot can be freely controlled even in the space without.

이상으로 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고 다른 구체적인 형태로 실시할 수 있다는 것을 이해할 수 있을 것이다. 따라서 이상에서 기술한 실시예는 모든 면에서 예시적인 것이며 한정적이 아닌 것이다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand. The embodiments described above are therefore to be considered in all respects as illustrative and not restrictive.

10 : 음성인식엔진
12 : 전처리부
14 : 인식부
16 : 음성인식 알고리즘
16a : 음향모델
16b : 언어모델
16c : 데이터 사전
20 : 프로그래밍 언어
30 : 로봇운영시스템
40 : 구동부
42 : 저레벨 프로세서
44 : PWM 생성기
46 : DC 모터10: Speech recognition engine
12:
14:
16: Speech Recognition Algorithm
16a: Acoustic model
16b: Language model
16c: Data Dictionary
20: Programming language
30: Robot operating system
40:
42: low-level processor
44: PWM generator
46: DC motor

Claims

A speech recognition engine for language processing to output the input voice as recognizable text, and then converting and assigning speed control text for motion control;
A robot operating system that receives a speed control command through a programming language that is assigned a speed control text in the speech recognition engine and transmits a speed value such that the robot or the autonomous mobile is driven;
And a driving unit for receiving the velocity value from the robot operating system and driving the robot or the autonomous mobile unit, and a standalone voice recognition based agent module for autonomous moving object precise motion control.

The method according to claim 1,
Wherein the speech recognition engine is a robot including a Pocketphinx or other open speech recognition module and a standalone speech recognition based agent module for autonomous mobile precise motion control.

The method according to claim 1,
Wherein the speech recognition engine comprises:
A preprocessor for extracting a feature vector necessary for speech recognition,
And a recognition unit for storing the speech recognition algorithm and analyzing the feature vector extracted by the preprocessing unit through the speech recognition algorithm and performing a language processing, and a standalone speech recognition based agent module for precise motion control of the autonomous mobile.

The method of claim 3,
The speech recognition algorithm includes:
An acoustic model for adapting the voice input through the microphone;
A language model that compares the extracted feature vector with the adaptive acoustic model to convert it into a recognizable text form,
A robot including a data dictionary for discriminating whether the language model can be converted into a recognizable text form when the extracted feature vector is compared with the acoustic model, and a standalone speech recognition based agent module for precision motion control of an autonomous moving object .

The method according to claim 1,
Characterized in that the programming language is Python or C / C ++, and a standalone speech recognition based agent module for precise motion control of an autonomous moving object.

The method according to claim 1,
Wherein the speed control command transmission from the programming language to the robot operating system is transmitted through a wired or wireless communication system.

The method according to claim 1,
The driving unit includes:
A low level processor for receiving a speed value from the robot operating system and inputting the speed as a low level signal;
A PWM generator for pulse-modulating a speed signal input through the low-level processor and
A robot including a DC motor for driving a robot or an autonomous mobile according to a pulse modulated from the PWM generator, and a standalone speech recognition based agent module for autonomous mobile precise motion control.