KR102181583B1

KR102181583B1 - System for voice recognition of interactive robot and the method therof

Info

Publication number: KR102181583B1
Application number: KR1020180171954A
Authority: KR
Inventors: 이성종
Original assignee: 수상에스티(주)
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-11-20
Also published as: KR20200081925A

Abstract

본 발명은 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 개시한다. 본 발명의 일실시례에 따른 교감형 로봇의 음성인식 시스템은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 음성 데이터 수신부, 상기 음성 데이터를 텍스트로 변환하는 텍스트 변환부, 상기 변환된 텍스트로부터 키워드를 추출하는 키워드 추출부, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 응답 텍스트 생성부, 상기 응답 텍스트를 음성 데이터로 변환하는 음성 변환부, 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 송신부를 포함할 수 있다.
본 발명의 일실시례에 따른 교감형 로봇은, 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받는 음성인식 버튼부, 사용자로부터 발화되는 음성을 입력받는 음성 입력부, 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송하는 음성 전송부 및 상기 외부 시스템으로부터의 응답 데이터를 수신하고 출력하는 음성 출력부를 포함한다.The present invention discloses a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof. The voice recognition system of a sympathetic robot according to an embodiment of the present invention includes a voice data receiving unit receiving voice data transmitted from an external terminal, a text conversion unit converting the voice data into text, and a keyword from the converted text. A keyword extraction unit for extracting, a response text generation unit for extracting a response text corresponding to the extracted keyword from pre-stored metadata, a voice conversion unit for converting the response text into speech data, and the converted speech data to the external It may include a transmitter for transmitting to the terminal.
The sympathetic robot according to an embodiment of the present invention includes a voice recognition button part receiving a command for starting a voice input through button manipulation, a voice input part receiving a voice uttered from a user, and recording data of the input voice. And a voice transmission unit for transmitting the PCM data to an external system and a voice output unit for receiving and outputting response data from the external system.

Description

Voice recognition sympathetic robot, sympathetic robot voice recognition system and its method {SYSTEM FOR VOICE RECOGNITION OF INTERACTIVE ROBOT AND THE METHOD THEROF}

본 발명은 음성인식 교감형 로봇, 교감형 로봇의 음성인식 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 교감형 로봇을 통해 사용자의 음성을 인식하고, 상응하는 이벤트를 생성하는 시스템 및 그 방법에 관한 것이다.The present invention relates to a voice recognition sympathetic robot, a voice recognition system and a method for the sympathetic robot, and more particularly, to a system and method for recognizing a user's voice through a sympathetic robot and generating a corresponding event. About.

사용자와 교감이 가능한 로봇, 인형 등은 유아나 어린이가 가지고 놀면서 신체 운동 발달 및 기능을 숙달하고, 상상력이나 창의력 개발을 통해 지능 발달 등 교육적으로 중요한 역할을 하기 때문에, 관련하여 교감형 로봇 또는 인형 기술 개발이 크게 관심을 받고 있다.Robots, dolls, etc. that can communicate with users play an important role in education, such as development of physical movements and functions, and development of intelligence through the development of imagination or creativity. Technology development is receiving great attention.

다만, 기존의 로봇 또는 인형은 제한된 소리를 출력하거나, 동작이 없으므로 사용자로 하여금 지속적으로 새로운 관심과 흥미를 끌어내기 어려운 한계가 있었다.However, since the existing robot or doll outputs a limited sound or does not have a motion, it is difficult to continuously draw new interest and interest from the user.

따라서, 사용자의 소리를 인식하여 응답하되, 사용자의 음성입력으로부터 사용자의 의도를 파악하고 이에 상응하는 응답을 표현할 수 있는 로봇 및 음성인식 시스템에 관한 연구가 필요하다.Accordingly, there is a need for a study on a robot and a speech recognition system capable of recognizing and responding to the user's voice, but identifying the user's intention from the user's voice input and expressing a corresponding response.

선행기술문헌 : 한국등록특허 제10-1791942호Prior art literature: Korean Patent Registration No. 10-1791942

본 발명은 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention minimizes the processing power of the voice recognition sympathetic robot by receiving the user's voice through the voice recognition sympathetic robot and transmitting it to the server to analyze the user's voice and output the corresponding response voice, It provides a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method that can reduce cost.

본 발명은 음성인식 교감형 로봇에 입력되는 음성을 서버로 전송하여 처리하되, MTU(Maximum Transmission Unit) 단위를 조정하여 데이터를 분할 전송함으로써, 상대적으로 낮은 사양의 하드웨어를 사용하면서도 고속의 음성인식이 가능해지는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention transmits and processes the voice input to the voice recognition sympathetic robot to the server, but by adjusting the MTU (Maximum Transmission Unit) unit to divide and transmit data, high-speed voice recognition is possible while using relatively low-spec hardware. It provides a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof.

본 발명은 무선통신을 통해 서버에 접속되어 각 사용자에 특화된 음성을 분석하여 그에 상응하는 음성을 출력함으로써, 개별 사용자의 언어습관 등 특성에 부합하여 보다 정확한 음성인식이 가능한 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention is connected to a server through wireless communication, analyzes a voice specific to each user, and outputs a corresponding voice, so that a voice recognition sympathetic robot capable of more accurate voice recognition according to characteristics such as language habits of individual users, Provides a robotic voice recognition system and method thereof.

본 발명은 입력된 음성을 텍스트로 변환하고 상기 텍스트로부터 키워드를 추출하되, 추출된 키워드의 유사어, 카테고리 속성을 추출함으로써, 상기 유사어와 카테고리 속성에 대응하는 응답 텍스트를 보다 효과적으로 생성할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention converts the input voice into text and extracts a keyword from the text, but by extracting the similar word and category attribute of the extracted keyword, speech recognition capable of more effectively generating a response text corresponding to the similar word and category attribute A sympathetic robot, a sympathetic robot voice recognition system, and a method thereof are provided.

본 발명의 일실시례에 따른 교감형 로봇의 음성인식 시스템은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 음성 데이터 수신부, 상기 음성 데이터를 텍스트로 변환하는 텍스트 변환부, 상기 변환된 텍스트로부터 키워드를 추출하는 키워드 추출부, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 응답 텍스트 생성부, 상기 응답 텍스트를 음성 데이터로 변환하는 음성 변환부, 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 송신부를 포함할 수 있다.The voice recognition system of a sympathetic robot according to an embodiment of the present invention includes a voice data receiving unit receiving voice data transmitted from an external terminal, a text conversion unit converting the voice data into text, and a keyword from the converted text. A keyword extraction unit for extracting, a response text generation unit for extracting a response text corresponding to the extracted keyword from pre-stored metadata, a voice conversion unit for converting the response text into speech data, and the converted speech data to the external It may include a transmitter for transmitting to the terminal.

본 발명의 일측에 따르면, 상기 외부 단말의 사용자를 식별하는 고유 키(primary key)를 수신하고, 상기 고유 키에 대응하는 설정값을 독출하는 사용자 관리부를 더 포함할 수 있다.According to an aspect of the present invention, it may further include a user management unit for receiving a primary key identifying a user of the external terminal and reading a setting value corresponding to the unique key.

본 발명의 일측에 따르면, 상기 키워드 추출부는, 상기 변환된 텍스트에 존재하는 다수의 명사를 추출하고, 상기 명사의 유사어 셋(set)을 생성하고, 상기 추출된 명사의 카테고리를 기설정된 카테고리에 매칭하여, 추출된 키워드마다 유사어 셋과 카테고리 속성을 부여할 수 있다.According to an aspect of the present invention, the keyword extraction unit extracts a plurality of nouns present in the converted text, generates a set of similar words of the noun, and matches the extracted noun category to a preset category. Thus, a similar word set and category attribute may be assigned to each extracted keyword.

본 발명의 일측에 따르면, 상기 응답 텍스트 생성부는, 상기 추출된 각각의 키워드의 유사어 셋과 카테고리 속성에 대응하여 연관된 질문 리스트 셋(set)을 각각 추출하고, 상기 질문 리스트 간의 공통 질문을 추출하여 상기 응답 텍스트를 생성할 수 있다.According to an aspect of the present invention, the response text generation unit extracts a set of related question lists corresponding to a set of similar words and a category attribute of each of the extracted keywords, and extracts a common question between the question lists, Response text can be generated.

본 발명의 일실시례에 따른 교감형 로봇은, 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받는 음성인식 버튼부, 사용자로부터 발화되는 음성을 입력받는 음성 입력부, 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송하는 음성 전송부 및 상기 외부 시스템으로부터의 응답 데이터를 수신하고 출력하는 음성 출력부를 포함한다.The sympathetic robot according to an embodiment of the present invention includes a voice recognition button part receiving a command for starting a voice input through button manipulation, a voice input part receiving a voice uttered from a user, and recording data of the input voice. And a voice transmission unit for transmitting the PCM data to an external system and a voice output unit for receiving and outputting response data from the external system.

본 발명의 일측에 따르면, 상기 출력부는, 음성코덱의 각 레지스터의 딜레이 값이 0인지 확인하고, 0이 아닌 경우에는 음성 코덱의 설정동작 대기를 위한 딜레이 함수를 콜하여 각 레지스터의 딜레이 값에 상응하는 대기시간을 부여할 수 있다.According to one aspect of the present invention, the output unit checks whether the delay value of each register of the voice codec is 0, and if it is not 0, it corresponds to the delay value of each register by calling a delay function for waiting for the setting operation of the voice codec. You can give it a waiting time.

본 발명의 일실시례에 따른 교감형 로봇의 음성인식 방법은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 단계, 상기 음성 데이터를 텍스트로 변환하는 단계, 상기 변환된 텍스트로부터 키워드를 추출하는 단계, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 단계, 상기 응답 텍스트를 음성 데이터로 변환하는 단계 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 단계를 포함한다.The voice recognition method of a sympathetic robot according to an embodiment of the present invention includes the steps of receiving voice data transmitted from an external terminal, converting the voice data into text, extracting keywords from the converted text, And extracting a response text corresponding to the extracted keyword from pre-stored metadata, converting the response text into voice data, and transmitting the converted voice data to the external terminal.

본 발명의 일실시례에 따르면, 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by receiving a user's voice through a voice recognition sympathetic robot and transmitting it to a server, the server analyzes the user's voice and outputs a corresponding response voice. A voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof that can minimize processing power and reduce costs are provided.

본 발명의 일실시례에 따르면, 음성인식 교감형 로봇에 입력되는 음성을 서버로 전송하여 처리하되, MTU(Maximum Transmission Unit) 단위를 조정하여 데이터를 분할 전송함으로써, 상대적으로 낮은 사양의 하드웨어를 사용하면서도 고속의 음성인식이 가능해지는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, the voice input to the voice recognition sympathetic robot is transmitted to a server and processed, but data is divided and transmitted by adjusting the MTU (Maximum Transmission Unit) unit, so that relatively low-spec hardware is used. A voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof that enable high-speed voice recognition are provided.

본 발명의 일실시례에 따르면, 무선통신을 통해 서버에 접속되어 각 사용자에 특화된 음성을 분석하여 그에 상응하는 음성을 출력함으로써, 개별 사용자의 언어습관 등 특성에 부합하여 보다 정확한 음성인식이 가능한 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by connecting to a server through wireless communication, analyzing a voice specific to each user and outputting a corresponding voice, more accurate voice recognition is possible in accordance with characteristics such as language habits of individual users. A sympathetic robot, a sympathetic robot voice recognition system, and a method thereof are provided.

본 발명의 일실시례에 따르면, 입력된 음성을 텍스트로 변환하고 상기 텍스트로부터 키워드를 추출하되, 추출된 키워드의 유사어, 카테고리 속성을 추출함으로써, 상기 유사어와 카테고리 속성에 대응하는 응답 텍스트를 보다 효과적으로 생성할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by converting the input voice into text and extracting a keyword from the text, and extracting the similar word and category attribute of the extracted keyword, the response text corresponding to the similar word and the category attribute is more effectively A voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof that can be generated are provided.

도 1은 본 발명의 실시예에 따른 음성인식 교감형 로봇을 통해 음성을 인식받고, 이를 교감형 로봇 음성인식 시스템으로 전달하여 음성인식에 대한 이벤트를 발생시키기 위한 시스템과 로봇 전체 구성을 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 교감형 로봇 음성인식 시스템의 세부구성을 나타낸 블록도이다.
도 3은 본 발명의 실시예에 따른 음성인식 교감형 로봇의 세부구성을 나타낸 블록도이다.
도 4는 본 발명의 실시예에 따른 교감형 로봇 음성인식 방법의 흐름을 나타낸 동작흐름도이다.1 is a diagram showing a system for generating an event for voice recognition by receiving a voice through a voice recognition sympathetic robot according to an embodiment of the present invention and transmitting the voice to the sympathetic robot voice recognition system and the overall configuration of the robot .
2 is a block diagram showing a detailed configuration of a sympathetic robot voice recognition system according to an embodiment of the present invention.
3 is a block diagram showing a detailed configuration of a voice recognition sympathetic robot according to an embodiment of the present invention.
4 is an operation flow diagram showing the flow of a sympathetic robot voice recognition method according to an embodiment of the present invention.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명의　실시예들을 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, embodiments of the present invention will be described in detail with reference to the contents described in the accompanying drawings. However, the present invention is not limited or limited by the embodiments. The same reference numerals in each drawing indicate the same member.

종래에 로봇, 인형 등을 통해 사용자의 음성을 인식하고 이에 대한 응답 메시지를 출력하는 기술은 사용자의 음성을 통해 사용자 질문의 의도를 정확하게 파악하지 못하였으며, 이에 따라 응답 메시지 또한 단순한 메시지들로 이루어지는 등의 문제점이 있었다.Conventionally, the technology of recognizing a user's voice through a robot or a doll and outputting a response message to it has not been able to accurately grasp the intention of a user's question through the user's voice. There was a problem.

본 발명은 상기 종래 기술의 문제점을 해결하기 위해 고안된 발명으로, 본 발명의 구성을 아래에 상세하게 설명한다.The present invention is an invention devised to solve the problems of the prior art, and the configuration of the present invention will be described in detail below.

도 1은 본 발명의 실시예에 따른 음성인식 교감형 로봇을 통해 음성을 인식받고, 이를 교감형 로봇 음성인식 시스템으로 전달하여 음성인식에 대한 이벤트를 발생시키기 위한 시스템과 로봇 전체 구성을 나타낸 도면이다.1 is a diagram showing a system for generating an event for voice recognition by receiving a voice through a voice recognition sympathetic robot according to an embodiment of the present invention and transmitting the voice to the sympathetic robot voice recognition system and the overall configuration of the robot .

도 1을 참고하면, 연결된 스마트 기기(300)으로 음성인식 교감형 로봇(200)이 에이전트 서버와 통신을 하기 위한 절차를 마련한 뒤 음성인식 교감형 로봇(200)을 통해 사용자가 인사, 질문, 감정표현 메시지 등을 음성을 통해 입력하면, 상기 입력된 음성 데이터는 로봇 음성인식 시스템(100)으로 전송할 수 있다.Referring to FIG. 1, after preparing a procedure for the voice recognition sympathetic robot 200 to communicate with the agent server with the connected smart device 300, the user greets, questions, and emotions through the voice recognition sympathetic robot 200. When an expression message or the like is input through voice, the input voice data may be transmitted to the robot voice recognition system 100.

이후, 로봇 음성인식 시스템(100)은 상기 음성 데이터를 텍스트로 변환하고, 키워드를 추출하여, 추출된 키워드에 상응하는 응답 텍스트를 생성하고, 이를 음성인식 교감형 로봇(200)에 송신하면, 음성인식 교감형 로봇(200)은 스피커 등을 통해 이를 출력하여 상기 사용자와 교감할 수 있다.Thereafter, the robot voice recognition system 100 converts the voice data into text, extracts the keyword, generates a response text corresponding to the extracted keyword, and transmits it to the voice recognition sympathetic robot 200, The recognition sympathetic robot 200 may communicate with the user by outputting it through a speaker or the like.

이때, 사용자는 음성인식 교감형 로봇(200)을 통해 음성을 입력할 수 있으며, 응답 음성을 음성인식 교감형 로봇(200)을 통하여 확인할 수 있다.At this time, the user may input a voice through the voice recognition sympathetic robot 200, and the response voice may be confirmed through the voice recognition sympathetic robot 200.

이하에서는 음성을 입력하고 이에 대응하는 응답 텍스트를 생성하기 위한 세부절차와 구성을 보다 상세하게 설명한다.Hereinafter, detailed procedures and configurations for inputting voice and generating a response text corresponding thereto will be described in more detail.

도 2는 본 발명의 실시예에 따른 교감형 로봇 음성인식 시스템의 세부구성을 나타낸 블록도이다.2 is a block diagram showing a detailed configuration of a sympathetic robot voice recognition system according to an embodiment of the present invention.

도 2를 참고하면, 교감형 로봇 음성인식 시스템(100)은 음성 데이터 수신부(110), 텍스트 변환부(120), 키워드 추출부(130), 응답 텍스트 생성부(140), 음성 변환부(150) 및 송신부(160)를 포함한다.Referring to FIG. 2, the sympathetic robot speech recognition system 100 includes a voice data receiving unit 110, a text conversion unit 120, a keyword extraction unit 130, a response text generation unit 140, and a voice conversion unit 150. ) And a transmitter 160.

음성 데이터 수신부(110)는 외부 단말로부터 전송되는 음성 데이터를 수신할 수 있다. 즉, 교감형 로봇(200) 또는 이와 연결된 스마트 기기(300)를 통해 사용자의 음성이 인식되면, 음성 데이터 수신부(110)는 이를 전송받아 수신할 수 있다.The voice data receiver 110 may receive voice data transmitted from an external terminal. That is, when the user's voice is recognized through the sympathetic robot 200 or the smart device 300 connected thereto, the voice data receiving unit 110 may receive and receive the received voice.

이때, 수신하는 음성 데이터는 PCM data를 포함한 다양한 형태의 데이터일 수 있다.At this time, the received voice data may be various types of data including PCM data.

일례로, 상기 교감형 로봇(200)은 Artik053을 사용할 수 있는데, 상기 Artik053에서 사용자의 음성이 인식되면 이는 상기 음성 데이터 수신부(110)로 전송되고, 이 과정에서 전송되는 음성 데이터가 설정된 MTU 값 이상이 되면, 여러 개의 패킷으로 분할되어 전송되도록 하여, 상기와 같이 상대적으로 낮은 사양의 하드웨어(ex. Artik053)를 사용하는 교감형 로봇(200)을 통해서도 고속의 음성인식이 지원될 수 있다.As an example, the sympathetic robot 200 may use Artik053. When the Artik053 recognizes the user's voice, it is transmitted to the voice data receiver 110, and the voice data transmitted in this process is equal to or greater than the set MTU value. In this case, a high-speed voice recognition can be supported even through the sympathetic robot 200 using relatively low-spec hardware (ex. Artik053) as described above by being divided into several packets and transmitted.

또한, Artik053과 같은 저사양 하드웨어에서는 API 서비스를 사용하기 위한 SDK를 설치할 수 없으므로 상기와 같이 음성 데이터를 작은 단위의 패킷으로 나누어 전달 받으면 기존 STT API에서 마이크 입력으로 처리되던 부분을 상기와 같이 통신을 통해 전달 받은 음성 데이터를 받아오는 형식으로 변환하여 사용할 수 있고, 이를 통해 본 발명의 일실시례와 같은 저사양 하드웨어에서도 음성을 텍스트로 신속히 변환하는 스트리밍 서비스를 제공할 수 있다.In addition, since the SDK for using the API service cannot be installed on low-end hardware such as Artik053, if the voice data is divided into small packets as described above and transmitted, the part processed as microphone input in the existing STT API is communicated as above. The received voice data can be converted into a received format and used, and through this, a streaming service for quickly converting voice into text can be provided even in low-end hardware such as an embodiment of the present invention.

상기 음성 데이터를 수신한 이후 텍스트 변환부(120)는 상기 음성 데이터를 텍스트로 변환할 수 있다. 이때, 교감형 로봇 음성인식 시스템(100)은 개별 사용마다 언어습관 등이 다르므로, 개별 사용자의 특성에 맞도록 상기 외부 단말의 사용자를 식별하는 고유 키(primary key)를 수신하고, 상기 고유 키에 대응하는 설정값을 독출하기 위해 사용자 관리부를 더 포함할 수 있다.After receiving the voice data, the text conversion unit 120 may convert the voice data into text. At this time, since the sympathetic robot voice recognition system 100 has different language habits for each individual use, it receives a primary key that identifies the user of the external terminal to suit the characteristics of the individual user, and the unique key A user management unit may be further included to read a setting value corresponding to.

즉, 사용자마다 음성인식 및 텍스트 변환을 위한 설정값을 달리하여 개별 사용자에게 최적화된 음성인식 및 텍스트 변환 과정을 진행함으로써, 사용자 맞춤형 음성인식이 이루어질 수 있다.That is, by varying set values for voice recognition and text conversion for each user and performing a process of voice recognition and text conversion optimized for individual users, user-customized voice recognition can be achieved.

한편, 음성을 텍스트로 변환하기 위한 STT(Speech to Text) 과정에서는 클라우드를 통해 지원되는 API 등을 사용할 수 있으며, 120개 이상의 언어와 방언을 인식하고, 머신러닝 기술을 사용하여 실시간 스트리밍 또는 사전 녹음 오디오를 처리할 수 있다. Meanwhile, in the STT (Speech to Text) process for converting speech to text, you can use APIs supported through the cloud, recognize more than 120 languages and dialects, and use machine learning technology to real-time streaming or pre-recording. Audio can be processed.

상기에서 음성 데이터가 텍스트로 변환되면, 키워드 추출부(130)는 상기 변환된 텍스트로부터 핵심 키워드를 추출할 수 있다.When the voice data is converted into text in the above, the keyword extracting unit 130 may extract a key keyword from the converted text.

이를 위해, 상기 변환된 텍스트에 존재하는 다수의 명사를 추출하고, 상기 명사의 유사어 셋(set)을 생성하고, 상기 추출된 명사의 카테고리를 기설정된 카테고리에 매칭하여, 추출된 키워드마다 유사어 셋과 카테고리 속성을 부여할 수 있다.To this end, a plurality of nouns present in the converted text are extracted, a similar word set of the noun is generated, and the extracted noun category is matched with a preset category, Category attribute can be assigned.

일례로, 사용자가 입력한 문장이 "내일 소풍 갈거야"인 경우, 상기 문장에 포함된 명사인 '내일', 과 '소풍'을 추출하고, '내일'의 유사어인 'tomorrow', '다음날', '이튿날' 등의 유사어 셋(set)을 추출하며, '내일'은 시간을 나타내는 단어이므로 카테고리 속성으로 '시간 단어'를 부여할 수 있다. As an example, if the sentence entered by the user is "I'm going on a picnic tomorrow", the nouns'tomorrow' and'excursion' included in the sentence are extracted, and'tomorrow','next day', which are similar words for'tomorrow' , And'the next day', etc., a set of similar words is extracted, and'tomorrow' is a word representing time, so'time word' can be assigned as a category attribute.

또한, '소풍'의 경우에도 유사어인 'picnic', '나들이', '야유회' 등의 유사어 셋(set)을 추출하고, '소풍'은 야외에서의 행동을 나타내는 단어이므로 카테고리 속성으로 '야외행동 단어'를 부여할 수 있다.Also, in the case of'excursion', a set of similar words such as'picnic','outing', and'outing party', which are similar words, is extracted, and'excursion' is a word representing outdoor behavior, so'outdoor behavior' as a category attribute. You can give it a word.

따라서, 상기 키워드에 대한 유사어 셋과 카테고리 속성을 이용하여 하기에서 설명될 응답 텍스트 생성부(140)에서 사용자의 의도를 보다 정확하게 파악하여 그에 상응하는 응답 텍스트를 도출하도록 할 수 있다.Accordingly, by using the similar word set and category attribute for the keyword, the response text generator 140, which will be described below, can more accurately identify the user's intention and derive a response text corresponding thereto.

따라서, 이와 연관하여 응답 텍스트 생성부(140)는 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출할 수 있다. Accordingly, in connection with this, the response text generator 140 may extract a response text corresponding to the extracted keyword from pre-stored metadata.

이를 위해 응답 텍스트 생성부(140)는, 상기 추출된 각각의 키워드의 유사어 셋과 카테고리 속성에 대응하여 연관된 질문 리스트 셋(set)을 각각 추출하고, 상기 질문 리스트 간의 공통 질문을 추출하여 상기 응답 텍스트를 생성할 수 있다.To this end, the response text generation unit 140 extracts a set of related question lists in correspondence with a set of similar words and a category attribute of each of the extracted keywords, and extracts a common question between the question lists to obtain the response text. Can be created.

일례로, 추출된 키워드 '내일'과 관련된 질문 리스트 셋에 질문이 5개 포함되고, '소풍'과 관련된 질문 리스트 셋에 질문이 7개 포함된 경우, 상기 두 질문 리스트 셋에 내용이 최대한 중첩되는 질문을 하나 추출하고, 이를 사용자에게 응답할 텍스트로 결정할 수 있다. As an example, if 5 questions are included in a question list set related to the extracted keyword'tomorrow' and 7 questions are included in a question list set related to'excursion', the contents overlap as much as possible in the two question list sets. You can extract a question and decide it as a text to answer to the user.

이에 따라 음성 변환부(150)는 상기 결정된 응답 텍스트를 음성 데이터로 변환할 수 있다.Accordingly, the voice conversion unit 150 may convert the determined response text into voice data.

여기서, 텍스트를 음성 변환하기 위한 TTS(Text to Speech) 과정에서는 딥러닝 기술을 사용하여 실제 사람의 음성처럼 소리를 합성하고, 다양한 언어, 음성을 설정할 수 있으며, 상기 사용자의 설정값에 따라 사용자의 언어습관과 유사한 형태로 음성을 생성할 수 있고, 사용자 지정어휘 또는 저장된 용어(회사 이름, 두문자어, 외래어, 신조어 등)에 따라 특정 단어의 발음을 반영하여 생성할 수 있다. Here, in the TTS (Text to Speech) process for converting text to speech, a deep learning technology is used to synthesize sounds like a real human voice, and various languages and voices can be set, and the user's Voices can be generated in a form similar to language habits, and can be generated by reflecting pronunciation of specific words according to user-specified words or stored terms (company name, acronyms, foreign words, new words, etc.).

송신부(160)는 상기 변환된 음성 데이터를 상기 외부 단말로 송신할 수 있다. 이때, 송신되는 데이터는 MP3 형식을 포함한 다양한 형식의 음성 데이터로 송신할 수 있다.The transmitter 160 may transmit the converted voice data to the external terminal. In this case, the transmitted data may be transmitted in various formats including MP3 format.

상기와 같이, 교감형 로봇 음성인식 시스템을 사용하여 음성을 인식하고 이에 상응하는 응답 텍스트를 생성함으로써, 개별 사용자 맞춤형 음성인식 및 응답 텍스트 생성이 가능하며, 사용자의 의도를 보다 정확하게 파악하여 이에 부합하는 응답을 제공할 수 있는 효과가 발생할 수 있다.As described above, by recognizing the voice using the sympathetic robot voice recognition system and generating the corresponding response text, individual user-customized voice recognition and response text can be generated, and the user's intention is more accurately identified and Effects can occur that can provide a response.

이하에서는 음성인식 교감형 로봇을 통해 음성을 입력받고 이를 교감형 로봇 음성인식 시스템에 전송하며, 교감형 로봇 음성인식 시스템으로부터 응답 텍스트(음성 변환된 데이터)를 수신하여 출력하는 구성을 보다 상세하게 설명한다.Hereinafter, a configuration in which a voice is input through a voice recognition sympathetic robot, transmits it to a sympathetic robot voice recognition system, and a configuration for receiving and outputting a response text (voice converted data) from the sympathetic robot voice recognition system will be described in more detail. do.

도 3은 본 발명의 실시예에 따른 음성인식 교감형 로봇의 세부구성을 나타낸 블록도이다. 이때, 음성인식 교감형 로봇(200)은 일례로 ARTIK053 보드를 내장할 수 있다.3 is a block diagram showing a detailed configuration of a voice recognition sympathetic robot according to an embodiment of the present invention. At this time, the voice recognition sympathetic robot 200 may, for example, incorporate an ARTIK053 board.

도 3을 참고하면, 음성인식 교감형 로봇(200)은 음성인식 버튼부(210), 음성 입력부(220), 음성 전송부(230) 및 음성 출력부(240)를 포함할 수 있다.Referring to FIG. 3, the voice recognition sympathetic robot 200 may include a voice recognition button unit 210, a voice input unit 220, a voice transmission unit 230, and a voice output unit 240.

음성인식 버튼부(210)는 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받을 수 있다. 즉, 종래의 경우 스마트 스피커 등을 통해 소리를 감지하는 것으로 음성입력을 개시하나, 본 발명의 일실시례에서는 사용자가 버튼을 조작하여야만 음성 입력을 개시하므로, 사용자의 적극적인 동작을 통해 음성입력을 개시하기 전까지는 사용자들의 음성대화를 모니터링 하지 않으므로, 사용자의 의도와 무관하게 대화내용이 녹음되고 제3자에게 유출되지 않도록 관리될 수 있다.The voice recognition button unit 210 may receive a command for starting a voice input through button manipulation. That is, in the conventional case, voice input is initiated by sensing sound through a smart speaker, but in one embodiment of the present invention, voice input is initiated only when the user operates a button. Since the user's voice conversation is not monitored until it is done, the conversation content can be recorded regardless of the user's intention and managed so that it is not leaked to a third party.

상기 버튼은 교감형 로봇의 손 부분에 위치하여, 버튼의 조작시 로봇의 손을 잡는 감성을 제공함으로써, 사용자가 로봇과 보다 교감을 느끼도록 설정할 수 있다.The button is located on the hand part of the sympathetic robot, and by providing the sensibility to hold the robot's hand when the button is operated, the user can be set to feel more sympathetic with the robot.

음성 입력부(220)는 사용자로부터 발화되는 음성을 입력받으며, 상기 음성인식 버튼을 통해 음성입력이 개시되면, 마이크 입력 등을 통해 사용자의 음성을 입력받을 수 있다.The voice input unit 220 receives a voice uttered from a user, and when voice input is started through the voice recognition button, the user's voice may be input through a microphone input or the like.

음성 전송부(230)는 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송할 수 있다. 즉, PCM data 형태로 데이터를 전송함으로써, 보다 효과적이고 손실없이 데이터를 전송할 수 있다.The voice transmission unit 230 may transmit the recorded data of the input voice to an external system in the form of PCM data. That is, by transmitting data in the form of PCM data, data can be transmitted more effectively and without loss.

한편, 음성 전송은 네트워크 인터페이스에서 세그먼트 없이 보낼 수 있는 최대 데이터그램 크기 값이며, 패킷이 한번에 보낼 수 있는 최대 크기인 MTU size를 590으로 설정 하여 전송할 수 있고, 보내는 데이터가 MTU 값 이상이 되면 여러 개의 패킷으로 분할되어 전송될 수 있어 보다 효과적으로 데이터 전송이 가능해질 수 있다.On the other hand, voice transmission is the maximum datagram size value that can be sent without a segment on the network interface, and it can be transmitted by setting the MTU size, which is the maximum size that a packet can send at one time, to 590. Since it can be divided into packets and transmitted, data can be transmitted more effectively.

음성 출력부(240)는 상기 외부 시스템으로부터 음성 데이터 형태의 응답 데이터를 수신하면 스피커 등의 출력장치를 통해 사용자가 인식할 수 있도록 응답 데이터를 출력할 수 있다.When receiving response data in the form of voice data from the external system, the voice output unit 240 may output response data so that the user can recognize it through an output device such as a speaker.

한편, 상기 음성 출력부(240)는 음성 출력을 위해 사용되는 코덱 셋팅시 발생하는 지연을 최소화하기 위해 하기의 방법을 사용할 수 있다.Meanwhile, the audio output unit 240 may use the following method in order to minimize a delay that occurs when setting a codec used for audio output.

코텍 사용전의 코덱 레지스터 설정과정에서 코덱의 설정동작을 기다리기 위해 delay 함수를 콜(call)하여 script[i].delay 만큼의 대기시간을 가지는데, 실제로는 script[i].delay 값은 0인 경우가 많다. 따라서, script[i].delay 함수 자체를 콜하는 시간에 따른 지연을 방지하고자 각 레지스터의 script[i].delay 값이 0인지 확인하고, 0이 아닌 경우에만 음성 코덱의 설정동작 대기를 위한 script[i].delay 함수를 콜하여 각 레지스터의 딜레이 값에 상응하는 대기시간을 부여할 수 있다. 여기서, script[i]의 멤버는 레지스터 주소이며, script[i].delay는 각 레지스터의 딜레이 값에 해당한다.In the process of setting the codec register before using the codec, the delay function is called to wait for the codec setting operation to have a waiting time as much as script[i].delay. Actually, if the value of script[i].delay is 0, the delay function is called. There are many. Therefore, to prevent the delay according to the time of calling the script[i].delay function itself, check whether the script[i].delay value of each register is 0, and only if it is not 0, the script for waiting for the setting operation of the voice codec By calling the [i].delay function, a waiting time corresponding to the delay value of each register can be given. Here, the member of script[i] is the register address, and script[i].delay corresponds to the delay value of each register.

상기와 같이 본 발명의 일실시례에 따른 교감형 로봇을 통해 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 장치가 제공될 수 있다.As described above, through the sympathetic robot according to an embodiment of the present invention, a device capable of minimizing the processing power of the voice recognition sympathetic robot and reducing costs may be provided.

도 4는 본 발명의 실시예에 따른 교감형 로봇 음성인식 방법의 흐름을 나타낸 동작흐름도이다.4 is an operation flow diagram showing the flow of a sympathetic robot voice recognition method according to an embodiment of the present invention.

한편, 하기에서는 음성인식 교감형 로봇(200)은 ARTIK053 보드를 내장하고, 소켓통신을 통해 교감형 로봇의 음성인식 시스템(100)과 통신하는 것을 일례로 설명한다.Meanwhile, in the following description, the voice recognition sympathetic robot 200 includes an ARTIK053 board and communicates with the voice recognition system 100 of the sympathetic robot through socket communication as an example.

이를 위해 단계(410)에서는 외부 단말로부터 전송되는 음성 데이터를 수신할 수 있다. To this end, in step 410, voice data transmitted from an external terminal may be received.

즉, ARTIK053 보드와의 소켓통신으로 Client(ARITK053)가 교감형 로봇의 음성인식 시스템(100) Server에 접속하게 되면, 사용자 구분을 위한 User_info_check()가 실행되며, 사용자는 고유 값을 갖는 primary key를 통해 구분될 수 있다.That is, when the client (ARITK053) connects to the voice recognition system 100 server of the sympathetic robot through socket communication with the ARTIK053 board, User_info_check() is executed for user identification, and the user selects a primary key with a unique value. It can be distinguished through.

Client 정보를 action_thread() 호출과 함께 넘겨주면 해당 client socket을 이용하여, google_cloud_streaming() 동작으로 사용자의 음성 데이터(PCM data)를 server로 가져올 수 있다. If the client information is passed along with the action_thread() call, the user's voice data (PCM data) can be imported to the server using the google_cloud_streaming() operation using the corresponding client socket.

다음으로 단계(420)에서는 상기 음성 데이터를 텍스트로 변환할 수 있다. 이때, Google cloud streaming Speech To Text API를 통해서 text로 변환할 수 있다.Next, in step 420, the voice data may be converted into text. At this time, it can be converted to text through the Google cloud streaming Speech To Text API.

단계(430)에서는 상기 변환된 텍스트로부터 키워드를 추출할 수 있고, 단계(440)에서는 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출할 수 있다.In step 430, a keyword may be extracted from the converted text, and in step 440, a response text corresponding to the extracted keyword may be extracted from pre-stored metadata.

단계(450)에서는 상기 응답 텍스트를 음성 데이터로 변환할 수 있고, 단계(460)에서는 상기 변환된 음성 데이터를 상기 외부 단말로 송신할 수 있다.In step 450, the response text may be converted into voice data, and in step 460, the converted voice data may be transmitted to the external terminal.

이를 위해 응답 text는 AWS Polly Text To Speech API를 이용하여 1-Chenal, Mono, 22050HZ의 mp3 file로 생성되고 해당 mp3 file을 FFmpeg module를 사용하여 2-Channel Stereo 44000HZ로 변환 및 ARTIK053 보드에게 전달하는 과정이 진행될 수 있다.For this, the response text is created as 1-Chenal, Mono, 22050HZ mp3 file using AWS Polly Text To Speech API, and the corresponding mp3 file is converted to 2-Channel Stereo 44000HZ using FFmpeg module and delivered to the ARTIK053 board. Can proceed.

상기와 같이 본 발명의 일실시례에 따르면, 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.As described above, according to an embodiment of the present invention, by receiving the user's voice through the voice recognition sympathetic robot and sending it to the server, the server analyzes the user's voice and outputs a corresponding response voice, Provided are a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method for minimizing the processing power of the robot type and reducing costs.

또한, 본 발명의 일실시례에 따르면, 저사양 하드웨어를 이용하여 전력소모가 적고 가벼워 휴대가 용이해지고, 초기비용을 현저히 낮출 수 있으며, 사용자가 이동중에도 고속 음성인식 서비스를 제공받을 수 있는 효과가 발생될 수 있다.In addition, according to an embodiment of the present invention, low power consumption and light weight are used to facilitate portability, significantly lower initial cost, and allow users to receive high-speed voice recognition services while on the move. Can be.

또한 본 발명의 일실시례에 따른, 교감형 로봇 음성인식 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, the sympathetic robot voice recognition method according to an embodiment of the present invention may be recorded in a computer-readable medium including program instructions for performing operations implemented by various computers. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The medium may be a program instruction specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of the program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 본 발명의 일실시례는 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명의 일실시례는 상기 설명된 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.　 따라서, 본 발명의 일실시례는 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, although an embodiment of the present invention has been described by a limited embodiment and drawings, an embodiment of the present invention is not limited to the above-described embodiment, which is a common knowledge in the field to which the present invention belongs. Anyone who has it can make various modifications and variations from these substrates. Accordingly, one embodiment of the present invention should be grasped only by the claims set forth below, and all equivalent or equivalent modifications thereof will be said to belong to the scope of the inventive concept.

100: 교감형 로봇 음성인식 시스템
110: 음성 데이터 수신부
120: 텍스트 변환부
130: 키워드 추출부
140: 응답 텍스트 생성부
150: 음성 변환부
160: 송신부
200: 교감형 로봇
210: 음성인식 버튼부
220: 음성 입력부
230: 음성 전송부
240: 음성 출력부100: sympathetic robot voice recognition system
110: voice data receiver
120: text conversion unit
130: keyword extraction unit
140: response text generator
150: voice conversion unit
160: transmitter
200: sympathetic robot
210: voice recognition button unit
220: voice input unit
230: voice transmission unit
240: audio output unit

Claims

A voice data receiving unit for receiving voice data transmitted from an external terminal;
A text conversion unit converting the voice data into text;
A keyword extraction unit for extracting a keyword from the converted text;
A response text generator for extracting a response text corresponding to the extracted keyword from pre-stored metadata;
A voice conversion unit converting the response text into voice data;
A transmitter for transmitting the converted voice data to the external terminal; And
User management unit for receiving a primary key identifying a user of the external terminal and reading a set value corresponding to the unique key
Including,
The keyword extraction unit,
Extracting a plurality of nouns present in the converted text,
Generate a set of similar words of the noun, and match the extracted category of the noun with a preset category,
For each extracted keyword, a similar word set and category attribute are assigned,
The response text generation unit,
Each extracted question list set corresponding to the similar word set and category attribute of each of the extracted keywords is extracted, and the response text is generated by extracting a common question between the question lists,
The external terminal,
A voice recognition button unit receiving a command for starting voice input through button manipulation;
A voice input unit receiving a voice spoken by a user;
A voice transmission unit that transmits the recorded data of the input voice to an external system in the form of PCM data, and divides and transmits the data in a predetermined MTU unit; And
An audio output unit for receiving and outputting response data from the external system;
Including,
The output unit,
Check if the delay value of each register of the voice codec included in the response data is 0, and if it is not 0, call a delay function to wait for the setting operation of the voice codec to give a waiting time corresponding to the delay value of each register. Voice recognition system of a sympathetic robot, characterized in that.

delete