KR20200140595A

KR20200140595A - System and Method for Supporting Intelligent Voice Service for Lightweight IoT devices

Info

Publication number: KR20200140595A
Application number: KR1020190067426A
Authority: KR
Inventors: 김종덕; 김동현; 허준환; 윤동글; 이성종; 이창홍
Original assignee: 부산대학교 산학협력단
Priority date: 2019-06-07
Filing date: 2019-06-07
Publication date: 2020-12-16
Also published as: KR102252526B1

Abstract

The present invention relates to a device and method for supporting an intelligent voice service for a lightweight IoT device to enables the construction of a voice-based conversational system for a lightweight IoT device that a voice-based conversational system is not supported. The device comprises: a low-performance terminal receiving and storing raw voice data and PCM data, periodically streaming the raw voice data and PCM data, and outputting a final response; an agent terminal receiving the PCM data from the low-performance terminal in real time, transmitting the received PCM data to an artificial intelligence voice processing platform, and transmitting a voice file received from an artificial intelligence voice processing platform to the low-performance terminal; and an artificial intelligence voice processing platform processing the PCM data received from the agent terminal in real time to convert the PCM data into a text, interpreting the text that the voice recognition is processed, and transmitting a response message to the agent terminal.

Description

System and Method for Supporting Intelligent Voice Service for Lightweight IOT devices}

본 발명은 지능형 음성 서비스 지원에 관한 것으로, 구체적으로 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치의 음성기반 대화형 시스템의 구축이 가능하도록 한 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법에 관한 것이다.The present invention relates to an intelligent voice service support, and specifically, to an intelligent voice service support device and method for a lightweight IoT device that enables the construction of a voice-based interactive system for a lightweight IoT device that does not support the voice-based interactive system. About.

일반적으로 전자 장치는 사용자의 편의를 위해 음성 인식 기술을 이용한 음성 입력 기능을 제공하고 있다.In general, electronic devices provide a voice input function using voice recognition technology for user convenience.

전자 장치는 사용자의 발화를 자연어 처리한다. 전자 장치는 자연어 처리를 통해 사용자의 의도를 파악하고, 사용자의 의도에 부합하는 결과를 제공한다.The electronic device processes the user's speech in natural language. The electronic device identifies the user's intention through natural language processing and provides a result that matches the user's intention.

나아가 전자 장치는 인공 지능 기반의 음성 서비스를 제공한다. 사용자는 발화를 통해 명령을 입력하고, 전자 장치는 발화에 대응하는 명령을 수행하여 사용자의 비서 역할을 수행할 수 있다.Furthermore, electronic devices provide artificial intelligence-based voice services. The user inputs a command through utterance, and the electronic device may perform the user's secretary role by performing a command corresponding to the utterance.

이때, 전자 장치는 사용자의 의도에 부합하는 동작을 수행하여야 한다.In this case, the electronic device must perform an operation in accordance with the user's intention.

이와 같은 인공 지능 기반의 음성 서비스 지원을 위한 음성인식기술은 4차 산업혁명 관련분야 중에서 가장 빠르게 발전될 산업으로 주목받고 있다.Voice recognition technology for supporting such artificial intelligence-based voice service is drawing attention as the industry that will develop the fastest among the fields related to the 4th industrial revolution.

음성인식기술은 마이크와 같은 소리 센서를 통해 얻은 음향학적 신호를 단어나 문장으로 변환시키는 기술을 말한다. 음성인식기술을 이용한 데이터의 입력속도는 터치나 키보드 방식과 같은 물리적 장치에 비해 빠르며, 이러한 물리적 장치를 사용할 수 없는 상황에서도 음성을 통한 조작 및 정보의 입력이 가능하다는 장점이 있다.Speech recognition technology refers to a technology that converts an acoustic signal obtained through a sound sensor such as a microphone into words or sentences. Data input speed using voice recognition technology is faster than physical devices such as a touch or keyboard method, and there is an advantage in that operation and information input through voice are possible even when such physical devices cannot be used.

이러한 특징들로 사람과 사물간 연동에 쓰이는 가장 보편적이고 직관적인 수단이 되고 있다.With these features, it is becoming the most common and intuitive means used for interworking between people and objects.

대화시스템은 인공지능(AI) 기술분야 중 하나로 텍스트를 통해 사용자의 질문을 파악하여 응답을 제공한다.The dialogue system is one of the field of artificial intelligence (AI) technology, and it provides a response by identifying a user's question through text.

최근 대화시스템은 음성인식기술과의 결합하여 사용되고 있으며, 현재 주로 사용되고 있는 물리적 장치에 비해서 유연성, 명료성, 표현력 면에서 더 뛰어나다. Recently, dialogue systems are used in combination with speech recognition technology, and are superior in terms of flexibility, clarity, and expressiveness compared to physical devices currently mainly used.

음성기반 대화형 시스템은 음성인식기술을 기반으로 인간에게 친숙한 자연어를 사용하기 때문에 특별한 지식이나 학습 없이 손쉽게 인공지능 대화시스템을 통한 다양한 서비스를 이용할 수 있다는 특징이 있다.Since the voice-based conversational system uses natural language familiar to humans based on speech recognition technology, it is characterized by being able to use various services through the artificial intelligence conversation system easily without special knowledge or learning.

최근 음성기반 대화형 시스템을 적용한 인공지능 스피커를 여러 대기업에서 잇따라 출시하거나 출시 계획을 발표하고 있으며, 인공지능의 발전과 IoT의 확산으로 점차 시장을 넓혀가고 있다.Recently, many large companies are releasing artificial intelligence speakers with a voice-based interactive system or announcing plans to release them, and the market is gradually expanding due to the development of artificial intelligence and the spread of IoT.

음성기반 대화형 시스템은 음성비서, 자율 주행차, 실시간 음성검색, 음성 통역 등 다양한 분야에서 활용될 것으로 기대되고 있다. 터치기술의 등장으로 휴대전화의 패러다임이 변화했듯, 음성인식기술을 이용한 음성기반 대화형 시스템의 발전으로 인공지능기술을 결합한 다양한 제품과 서비스의 발전에 큰 영향을 줄 것으로 보인다.The voice-based interactive system is expected to be used in various fields such as voice assistants, autonomous vehicles, real-time voice search, and voice interpretation. Just as the paradigm of mobile phones has changed with the advent of touch technology, the development of voice-based conversational systems using voice recognition technology is expected to have a great influence on the development of various products and services that combine artificial intelligence technology.

현재 시장에 출시된 음성기반 대화형 시스템들은 주로 음성처리와 인공지능 서비스의 직접적인 이용이 편리한 안드로이드(Android) 운영체제가 적용되어 있고, 이를 뒷받침할 고성능 단말이 사용된다.Currently, voice-based interactive systems released on the market mainly use the Android operating system, which is convenient for direct use of voice processing and artificial intelligence services, and high-performance terminals are used to support this.

도 1은 종래 기술의 고성능 단말의 음성처리 서비스를 나타낸 구성도이다.1 is a block diagram showing a voice processing service of a conventional high-performance terminal.

추가로 스피커로써 음질에 대한 부분과 음성 인식률을 높이기 위해 오디오 처리를 담당하는 고성능 장치들을 포함한 많은 추가적인 장치가 구성되기도 하며, 이를 기반으로 다양한 기능을 제공한다. In addition, many additional devices, including high-performance devices in charge of audio processing, are configured to increase the sound quality and speech recognition rate as a speaker, and provide various functions based on this.

하지만, 가격이 비싸고 지속적인 전원공급이 필요하며 크기가 크고 무겁기 때문에 이러한 음성기반 대화형 시스템을 다양한 목적을 가진 응용 서비스에 그대로 적용하는 것은 적합하지 않다.However, since it is expensive, requires continuous power supply, and is large and heavy, it is not suitable to apply such a voice-based interactive system to application services with various purposes as it is.

특히, 봉제로봇과 같이 휴대성이 최우선인 응용이나 유아용 교구와 같이 스피커의 음질이나 인식도가 아주 중요하지 않지만 인공지능과 음성인식이 결합된 시스템이 필요한 특수한 응용도 존재한다.Particularly, there are applications in which portability is the top priority such as sewing robots, and sound quality or recognition of speakers is not very important, such as teaching aids for children, but there are special applications that require a system combining artificial intelligence and voice recognition.

이러한 응용에 종래 기술의 음성기반 대화형 시스템을 그대로 적용하기에는 휴대성과 가격적 측면 등에서 어렵다.It is difficult in terms of portability and price to apply the voice-based interactive system of the prior art to such an application as it is.

따라서, 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치의 음성기반 대화형 시스템의 구축이 가능하도록 한 새로운 기술의 개발이 요구되고 있다.Accordingly, there is a need to develop a new technology that enables the construction of a voice-based interactive system for a lightweight IoT device that does not support a voice-based interactive system.

대한민국 공개특허 제10-2009-0090275호Republic of Korea Patent Publication No. 10-2009-0090275 대한민국 공개특허 제10-2017-0043055호Republic of Korea Patent Publication No. 10-2017-0043055 대한민국 공개특허 제10-2018-0121210호Republic of Korea Patent Publication No. 10-2018-0121210

본 발명은 종래 기술의 음성기반 대화형 시스템의 문제점을 해결하기 위한 것으로, 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치의 음성기반 대화형 시스템의 구축이 가능하도록 한 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is to solve the problems of the voice-based interactive system of the prior art, and intelligent voice for a lightweight IoT device that enables the construction of a voice-based interactive system of a lightweight IoT device that does not support the voice-based interactive system. An object thereof is to provide a service support apparatus and method.

본 발명은 저성능 단말과 인공지능 음성처리 플랫폼 사이에 에이전트(Agent) 단말을 구비하여 음성데이터를 실시간으로 전송하며 음성처리 및 대화형 서비스를 에이전트를 통해서 처리하여 음성처리 서비스를 효율적으로 지원할 수 있도록 한 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법을 제공하는데 그 목적이 있다.The present invention provides an agent terminal between a low-performance terminal and an artificial intelligence voice processing platform to transmit voice data in real time, and to efficiently support voice processing services by processing voice processing and interactive services through the agent. An object of the present invention is to provide an intelligent voice service support device and method for a lightweight IoT device.

본 발명은 저성능 단말에서 에이전트 단말을 통해 음성인식 서비스까지 연속적인 실시간 전송기법을 적용하여 지연시간을 최소화하고, 에이전트에서 단말로의 전송 역시 음성데이터의 실시간 전송기법을 적용하여 음성처리 서비스를 효율적으로 지원할 수 있도록 한 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법을 제공하는데 그 목적이 있다.The present invention minimizes the delay time by applying a continuous real-time transmission technique from a low-performance terminal to a voice recognition service through an agent terminal, and also applies a real-time transmission technique of voice data from an agent to a terminal to efficiently provide voice processing services. An object of the present invention is to provide an intelligent voice service support device and method for lightweight IoT devices that can be supported as

본 발명은 저성능 단말과 인공지능 음성처리 플랫폼의 사이에 스트리밍 방식의 음성인식 지원을 위한 에이전트 단말을 구비하고, 에이전트 단말이 다중 인터페이스를 지원하여 효율적인 음성처리 서비스가 가능하고, 적용 가능성을 높인 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법을 제공하는데 그 목적이 있다.The present invention is provided with an agent terminal for supporting voice recognition of a streaming method between a low-performance terminal and an artificial intelligence voice processing platform, and the agent terminal supports multiple interfaces to enable efficient voice processing service, and to increase applicability. An object of the present invention is to provide an intelligent voice service support device and method for IoT devices.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to the objects mentioned above, and other objects that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치는 가공되지 않은 음성 데이터, PCM 데이터를 수신하여 저장하고 주기적으로 스트리밍 전송하고, 최종 응답을 출력하는 저성능 단말;상기 저성능 단말로부터 PCM 데이터를 실시간 전송받아 인공지능 음성처리 플랫폼으로 전송하고, 인공지능 음성처리 플랫폼으로부터 받은 음성 파일을 저성능 단말로 전송하는 에이전트 단말;상기 에이전트 단말로부터 받은 PCM 데이터를 실시간 처리하여 텍스트로 변환하고, 음성 인식이 모두 처리된 텍스트를 해석하여 응답 메시지를 에이전트 단말로 전송하는 인공지능 음성처리 플랫폼;을 포함하는 것을 특징으로 한다.An intelligent voice service support device for a lightweight IoT device according to the present invention to achieve the above object is a low-performance terminal that receives and stores raw voice data and PCM data, periodically transmits streaming, and outputs a final response. ; An agent terminal that receives PCM data from the low-performance terminal in real time and transmits it to an artificial intelligence voice processing platform, and transmits a voice file received from the artificial intelligence speech processing platform to a low-performance terminal; Real-time processing of the PCM data received from the agent terminal And an artificial intelligence speech processing platform that converts the text into text and interprets the text processed by voice recognition and transmits a response message to the agent terminal.

여기서, 상기 인공지능 음성처리 플랫폼은, PCM 데이터를 실시간 처리하여 텍스트로 변환하여 에이전트 단말로 전송하고, 음성이 끝난 시점을 판단하여 리턴 값을 통하여 알리는 음성 인식 플랫폼과,수신받은 텍스트를 해석하여 알맞은 응답 메시지를 에이전트 단말로 전송하는 질의 응답 플랫폼과,응답 텍스트를 음성 파일로 변환하여 에이전트 단말로 전송하는 음성 합성 플랫폼을 포함하는 것을 특징으로 한다.Here, the artificial intelligence voice processing platform processes PCM data in real time, converts it into text, and transmits it to the agent terminal, determines the time when the voice has ended, and informs it through a return value, and a voice recognition platform that analyzes the received text and provides appropriate A Q&A platform for transmitting a response message to an agent terminal, and a speech synthesis platform for converting a response text into a voice file and transmitting it to the agent terminal.

그리고 상기 저성능 단말은, 가공되지 않은 음성 데이터, PCM 데이터를 수신하는 H/W 코덱과,특정 데시벨 이상이 감지되면 소리로 받아들이는 PCM 데이터 인식부와,PCM 데이터 인식부에서 인식된 데이터를 저장하는 PCM 데이터 버퍼와,주기적으로 일정한 크기의 데이터를 PCM 데이터 버퍼로 부터 읽어 RF 인터페이스를 통하여 스트리밍 방식으로 전송하는 PCM 데이터 전송부와,에이전트 단말로부터 응답 음성을 RF 인터페이스로 부터 수신받는 응답 음성 수신부와,응답 음성 수신부에서 수신한 응답 음성을 H/W 코덱을 통하여 최종 응답으로 출력하는 음성 응답 출력부를 포함하는 것을 특징으로 한다.In addition, the low-performance terminal stores the H/W codec that receives raw voice data and PCM data, a PCM data recognition unit that receives sound when a specific decibel or more is detected, and data recognized by the PCM data recognition unit. A PCM data buffer to perform, a PCM data transmission unit that periodically reads data of a certain size from the PCM data buffer and transmits it in a streaming method through an RF interface, a response voice receiver that receives a response voice from an agent terminal through the RF interface. , It characterized in that it comprises a voice response output unit for outputting the response voice received by the response voice receiver as a final response through the H/W codec.

그리고 상기 에이전트 단말은, RF 인터페이스를 통하여 저성능 단말로부터 PCM 데이터를 실시간 전송받아 인공지능 음성처리 플랫폼으로 전송하여 음성 인식을 요청하고, 인공지능 음성처리 플랫폼에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 처리부와,음성 인식 처리부에서 음성 인식이 모두 처리되면 텍스트를 인공지능 음성처리 플랫폼으로 전송하여 질의 응답 요청 및 질의 응답 수신을 수행하는 질의 응답 처리부와,질의 응답이 모두 완료되면 응답 텍스트를 질의 응답 처리부로부터 받아 인공지능 음성처리 플랫폼으로 음성 합성을 요청하고 인공지능 음성처리 플랫폼으로부터 변환된 음성 파일을 받아 RF 인터페이스를 통하여 저성능 단말로 전송하는 음성 합성 처리부를 포함하는 것을 특징으로 한다.And the agent terminal receives PCM data from the low-performance terminal through the RF interface in real time and transmits it to the artificial intelligence speech processing platform to request speech recognition, and when all speech is recognized by the converted text in the artificial intelligence speech processing platform. A speech recognition processing unit that combines up to and a query response processing unit that transmits the text to the artificial intelligence speech processing platform when all speech recognition is processed by the speech recognition processing unit and receives a query response request and a question response; Characterized in that it comprises a speech synthesis processing unit that receives text from the Q&A processing unit and requests speech synthesis to the artificial intelligence speech processing platform, and receives the converted speech file from the artificial intelligence speech processing platform and transmits it to a low-performance terminal through an RF interface. .

그리고 상기 음성 인식 처리부는, RF 인터페이스를 통하여 저성능 단말로부터 PCM 데이터를 실시간 전송받는 PCM 데이터 수신부와,PCM 데이터 수신부에서 받은 PCM 데이터를 인공지능 음성처리 플랫폼으로 전송하여 음성 인식을 요청하는 음성 인식 요청부와,인공지능 음성처리 플랫폼에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 수신 및 조합부를 포함하는 것을 특징으로 한다.In addition, the speech recognition processing unit includes a PCM data receiving unit that receives PCM data in real time from a low-performance terminal through an RF interface, and a speech recognition request that requests speech recognition by transmitting the PCM data received from the PCM data receiving unit to an artificial intelligence speech processing platform. And a speech recognition receiving and combining unit that combines the text converted by the unit and the artificial intelligence speech processing platform until all speech is recognized.

그리고 상기 질의 응답 처리부는, 음성 인식 처리부의 음성 인식 수신 및 조합부로부터 텍스트를 수신하여 수신 받은 텍스트를 인공지능 음성처리 플랫폼으로 전송하여 질의 응답 요청을 하는 질의 응답 요청부와,인공지능 음성처리 플랫폼에서 해당 텍스트를 해석하여 응답 메시지를 전송하면 이를 수신하는 질의 응답 수신부를 포함하는 것을 특징으로 한다.And the query response processing unit, a query response request unit for receiving a text from the speech recognition receiving and combining unit of the speech recognition processing unit and transmitting the received text to the artificial intelligence speech processing platform to request a query response, and an artificial intelligence speech processing platform It characterized in that it comprises a query response receiving unit for receiving the response message by interpreting the text in response.

그리고 상기 음성 합성 처리부는,질의 응답이 모두 완료되면 응답 텍스트를 질의 응답 처리부로부터 받아 인공지능 음성처리 플랫폼으로 음성 합성을 요청하는 음성 합성 요청부와,인공지능 음성처리 플랫폼으로부터 변환된 음성 파일을 수신하는 음성 합성 수신부와,음성 합성 수신부를 통하여 수신한 음성 파일을 RF 인터페이스를 통하여 저성능 단말로 전송하는 음성 합성 응답부를 포함하는 것을 특징으로 한다.And the speech synthesis processing unit receives the response text from the query response processing unit when all the query responses are completed, a speech synthesis request unit for requesting speech synthesis to the artificial intelligence speech processing platform, and receives a voice file converted from the artificial intelligence speech processing platform. And a speech synthesis response unit that transmits a speech file received through the speech synthesis reception unit to a low-performance terminal through an RF interface.

다른 목적을 달성하기 위한 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 방법은 저성능 단말로부터 PCM 데이터를 실시간 전송받아 인공지능 음성처리 플랫폼으로 전송하여 음성 인식을 요청하고, 인공지능 음성처리 플랫폼에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 처리 단계;음성 인식 처리 단계에서 음성 인식이 모두 처리되면 텍스트를 인공지능 음성처리 플랫폼으로 전송하여 질의 응답 요청 및 질의 응답 수신을 수행하는 질의 응답 처리 단계;질의 응답이 모두 완료되면 응답 텍스트를 인공지능 음성처리 플랫폼으로 전송하여 음성 합성을 요청하고 인공지능 음성처리 플랫폼으로부터 변환된 음성 파일을 받아 RF 인터페이스를 통하여 저성능 단말로 전송하는 음성 합성 처리 단계;를 포함하는 것을 특징으로 한다.The intelligent voice service support method for a lightweight IoT device according to the present invention for achieving another object is to receive PCM data from a low-performance terminal in real time and transmit it to an artificial intelligence voice processing platform to request voice recognition, and an artificial intelligence voice processing platform. A speech recognition processing step of combining the converted text in the speech until all speeches are recognized; When all speech recognition is processed in the speech recognition processing step, the text is transmitted to an artificial intelligence speech processing platform to request a query response and receive a query response. Q&A processing step; When all questions and answers are completed, the response text is transmitted to the artificial intelligence speech processing platform to request speech synthesis, and the converted speech file is received from the artificial intelligence speech processing platform and transmitted to the low-performance terminal through the RF interface. It characterized in that it comprises a; synthetic processing step.

여기서, 상기 음성 인식 처리 단계는, RF 인터페이스를 통하여 저성능 단말로부터 PCM 데이터를 실시간 전송받는 PCM 데이터 수신 단계와,PCM 데이터 수신 단계에서 받은 PCM 데이터를 인공지능 음성처리 플랫폼으로 전송하여 음성 인식을 요청하는 음성 인식 요청 단계와,인공지능 음성처리 플랫폼에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 수신 및 조합 단계를 포함하는 것을 특징으로 한다.Here, the speech recognition processing step includes a PCM data reception step of receiving PCM data in real time from a low-performance terminal through an RF interface, and a request for speech recognition by transmitting the PCM data received in the PCM data reception step to an artificial intelligence speech processing platform. And a speech recognition requesting step of performing a speech recognition request, and a speech recognition reception and combination step of combining the converted text in the artificial intelligence speech processing platform until all speeches are recognized.

그리고 상기 질의 응답 처리 단계는, 음성 인식 처리 단계를 통하여 조합된 텍스트를 인공지능 음성처리 플랫폼으로 전송하여 질의 응답 요청을 하는 질의 응답 요청 단계와,인공지능 음성처리 플랫폼에서 해당 텍스트를 해석하여 응답 메시지를 전송하면 이를 수신하는 질의 응답 수신 단계를 포함하는 것을 특징으로 한다.In addition, the query response processing step includes a query response request step in which the combined text through the speech recognition processing step is transmitted to an artificial intelligence speech processing platform to request a query response, and a response message by interpreting the text in the artificial intelligence speech processing platform. It characterized in that it comprises a query response receiving step of receiving the transmission when it is transmitted.

그리고 상기 음성 합성 처리 단계는, 질의 응답이 모두 완료되면 응답 텍스트를 질의 응답 처리부로부터 받아 인공지능 음성처리 플랫폼으로 음성 합성을 요청하는 음성 합성 요청 단계와,인공지능 음성처리 플랫폼으로부터 변환된 음성 파일을 수신하는 음성 합성 수신 단계와,음성 합성 수신 단계를 통하여 수신한 음성 파일을 RF 인터페이스를 통하여 저성능 단말로 전송하는 음성 합성 응답 단계를 포함하는 것을 특징으로 한다.In addition, the speech synthesis processing step includes a speech synthesis request step of receiving a response text from the query response processing unit and requesting speech synthesis to an artificial intelligence speech processing platform when all of the query responses are completed, and a voice file converted from the artificial intelligence speech processing platform. And a speech synthesis response step of transmitting the voice file received through the received speech synthesis reception step and the speech synthesis reception step to a low-performance terminal through an RF interface.

이상에서 설명한 바와 같은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법은 다음과 같은 효과가 있다.The apparatus and method for supporting an intelligent voice service for a lightweight IoT device according to the present invention as described above have the following effects.

첫째, 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치의 음성기반 대화형 시스템의 구축이 가능하도록 한다.First, it is possible to build a voice-based conversational system for a lightweight IoT device that does not support the voice-based conversational system.

둘째, 저성능 단말과 인공지능 음성처리 플랫폼 사이에 에이전트(Agent) 단말을 구비하여 음성데이터를 실시간으로 전송하며 음성처리 및 대화형 서비스를 에이전트를 통해서 처리하여 음성처리 서비스를 효율적으로 지원할 수 있도록 한다.Second, an agent terminal is provided between the low-performance terminal and the artificial intelligence voice processing platform to transmit voice data in real time, and to process voice processing and interactive services through the agent, so that the voice processing service can be efficiently supported. .

셋째, 저성능 단말에서 에이전트 단말을 통해 음성인식 서비스까지 연속적인 실시간 전송기법을 적용하여 지연시간을 최소화하고, 에이전트에서 단말로의 전송 역시 음성데이터의 실시간 전송기법을 적용하여 음성처리 서비스를 효율적으로 지원할 수 있다.Third, the delay time is minimized by applying a continuous real-time transmission technique from a low-performance terminal to a voice recognition service through the agent terminal, and the transmission from the agent to the terminal also applies the real-time transmission technique of voice data to efficiently provide voice processing services. You can apply.

넷째, 저성능 단말과 인공지능 음성처리 플랫폼의 사이에 스트리밍 방식의 음성인식 지원을 위한 에이전트 단말을 구비하여, 에이전트 단말이 다중 인터페이스를 지원하여 효율적인 음성처리 서비스가 가능하고, 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치에의 적용 가능성을 높일 수 있다.Fourth, the agent terminal for supporting streaming-type voice recognition is provided between the low-performance terminal and the artificial intelligence voice processing platform, so that the agent terminal supports multiple interfaces to enable efficient voice processing service, and the voice-based interactive system is It can increase the applicability to unsupported lightweight IoT devices.

도 1은 종래 기술의 고성능 단말의 음성처리 서비스를 나타낸 구성도
도 2는 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 구성도
도 3은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 상세 구성도
도 4는 본 발명에 따른 저성능 단말의 상세 구성도
도 5는 본 발명에 따른 에이전트 단말의 상세 구성도
도 6은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 방법을 나타낸 플로우 차트
도 7은 파일 방식의 음성 인식을 나타낸 동작 흐름도
도 8은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원의 전체 동작 흐름도
도 9는 본 발명에 따른 스트리밍 방식의 음성 인식을 나타낸 동작 흐름도
도 10a와 도 10b는 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 기준 처리 시간 및 전송 방법에 따른 녹음 시간별 처리 시간 그래프1 is a block diagram showing a voice processing service of a high-performance terminal of the prior art
2 is a block diagram of an intelligent voice service support device for a lightweight IoT device according to the present invention
3 is a detailed configuration diagram of an intelligent voice service support device for a lightweight IoT device according to the present invention
4 is a detailed configuration diagram of a low-performance terminal according to the present invention
5 is a detailed configuration diagram of an agent terminal according to the present invention
6 is a flow chart showing an intelligent voice service support method for a lightweight IoT device according to the present invention
7 is an operation flowchart showing file-based speech recognition
8 is an overall operation flow diagram of intelligent voice service support for a lightweight IoT device according to the present invention
9 is an operation flowchart showing speech recognition of a streaming method according to the present invention
10A and 10B are graphs of processing time for each recording time according to a reference processing time and a transmission method of an intelligent voice service support device for a lightweight IoT device according to the present invention.

이하, 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of an intelligent voice service supporting apparatus and method for a lightweight IoT device according to the present invention will be described in detail as follows.

본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.Features and advantages of the apparatus and method for supporting an intelligent voice service for a lightweight IoT device according to the present invention will become apparent through detailed description of each embodiment below.

도 2는 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 구성도이고, 도 3은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 상세 구성도이다.2 is a configuration diagram of an intelligent voice service support device for a lightweight IoT device according to the present invention, and FIG. 3 is a detailed configuration diagram of an intelligent voice service support device for a lightweight IoT device according to the present invention.

본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법은 음성기반 대화형 체계가 지원되지 않는 경량 IoT 장치의 음성기반 대화형 시스템의 구축이 가능하도록 한 것이다.The apparatus and method for supporting an intelligent voice service for a lightweight IoT device according to the present invention enables the construction of a voice-based interactive system for a lightweight IoT device that does not support a voice-based interactive system.

이를 위하여 본 발명은 저성능 단말과 인공지능 음성처리 플랫폼 사이에 에이전트(Agent) 단말을 구비하여 음성데이터를 실시간으로 전송하며 음성처리 및 대화형 서비스를 에이전트를 통해서 처리하는 구성을 포함할 수 있다.To this end, the present invention may include a configuration in which an agent terminal is provided between a low-performance terminal and an artificial intelligence voice processing platform to transmit voice data in real time, and to process voice processing and interactive services through the agent.

특히, 본 발명은 저성능 단말에서 에이전트 단말을 통해 음성인식 서비스까지 연속적인 실시간 전송기법을 적용하여 지연시간을 최소화하고, 에이전트에서 단말로의 전송 역시 음성데이터의 실시간 전송기법을 적용하여 음성처리 서비스를 지원하는 구성을 포함할 수 있다.In particular, the present invention minimizes the delay time by applying a continuous real-time transmission technique from a low-performance terminal to a voice recognition service through an agent terminal, and also applies a real-time transmission technique of voice data to the agent to the terminal. It may include a configuration that supports.

본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치는 저성능 단말(100)이 RF 인터페이스를 구비하고, 에이전트 단말(200)은 다중 RF 인터페이스를 구비하고, 인공지능 음성처리 플랫폼(300)은 음성인식 플랫폼(300a), 질의응답 플랫폼(300b), 음성 합성 플랫폼(300c)을 구비한다.In the intelligent voice service support device for a lightweight IoT device according to the present invention, the low-performance terminal 100 has an RF interface, the agent terminal 200 has multiple RF interfaces, and the artificial intelligence voice processing platform 300 is A voice recognition platform (300a), a Q&A platform (300b), and a voice synthesis platform (300c) are provided.

구체적으로, 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치는 도 2 및 도 3에서와 같이, 가공되지 않은 음성 데이터, PCM 데이터를 수신하여 저장하고 주기적으로 RF 인터페이스를 통하여 스트리밍 전송하고, 최종 응답을 출력하는 저성능 단말(100)과, 저성능 단말(100)로부터 PCM 데이터를 실시간 전송받아 인공지능 음성처리 플랫폼(300)으로 전송하고, 인공지능 음성처리 플랫폼(300)에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하고 음성 인식이 모두 처리되면 텍스트를 인공지능 음성처리 플랫폼(300)으로 전송하고, 인공지능 음성처리 플랫폼(300)으로부터 받은 음성 파일을 저성능 단말(100)로 전송하는 에이전트 단말(200)과, 에이전트 단말(200)로부터 받은 PCM 데이터를 실시간 처리하여 텍스트로 변환하고, 음성 인식이 모두 처리된 텍스트를 해석하여 응답 메시지를 에이전트 단말(200)로 전송하는 인공지능 음성처리 플랫폼(300)을 포함한다.Specifically, the intelligent voice service support device for a lightweight IoT device according to the present invention receives and stores raw voice data and PCM data as shown in FIGS. 2 and 3 and periodically transmits streaming through an RF interface, The text converted by the low-performance terminal 100 outputting the final response and the PCM data received from the low-performance terminal 100 in real time and transmitted to the artificial intelligence speech processing platform 300, and the artificial intelligence speech processing platform 300 Is combined until all speech is recognized, and when all speech recognition is processed, the text is transmitted to the artificial intelligence speech processing platform 300, and the speech file received from the artificial intelligence speech processing platform 300 is transferred to the low-performance terminal 100. Artificial intelligence that processes the transmitting agent terminal 200 and the PCM data received from the agent terminal 200 in real time and converts it into text, interprets the text that has been processed by voice recognition, and transmits a response message to the agent terminal 200 It includes a voice processing platform 300.

여기서, 인공지능 음성처리 플랫폼(300)은 PCM 데이터를 실시간 처리하여 텍스트로 변환하여 에이전트 단말(200)로 전송하고, 음성이 끝난 시점을 판단하여 리턴 값을 통하여 알리는 음성 인식 플랫폼(300a)과, 수신받은 텍스트를 해석하여 알맞은 응답 메시지를 에이전트 단말(200)로 전송하는 질의 응답 플랫폼(300b)과, 응답 텍스트를 음성 파일로 변환하여 에이전트 단말(200)로 전송하는 음성 합성 플랫폼(300c)을 포함한다.Here, the artificial intelligence voice processing platform 300 processes PCM data in real time, converts it to text, and transmits it to the agent terminal 200, determines the time when the voice has ended, and informs the voice recognition platform 300a through a return value, Including a Q&A platform 300b that interprets the received text and transmits an appropriate response message to the agent terminal 200, and a speech synthesis platform 300c that converts the response text into a voice file and transmits it to the agent terminal 200. do.

본 발명에 따른 저성능 단말의 상세 구성은 다음과 같다.The detailed configuration of the low-performance terminal according to the present invention is as follows.

도 4는 본 발명에 따른 저성능 단말의 상세 구성도이다.4 is a detailed configuration diagram of a low-performance terminal according to the present invention.

저성능 단말(100)은 가공되지 않은 음성 데이터, PCM 데이터를 수신하는 H/W 코덱(41)과, 특정 데시벨 이상이 감지되면 소리로 받아들이는 PCM 데이터 인식부(42)와, PCM 데이터 인식부(42)에서 인식된 데이터를 저장하는 PCM 데이터 버퍼(43)와, 주기적으로 일정한 크기의 데이터를 PCM 데이터 버퍼(43)로 부터 읽어 RF 인터페이스(45)를 통하여 스트리밍 방식으로 전송하는 PCM 데이터 전송부(44)와, 에이전트 단말로부터 응답 음성을 RF 인터페이스(45)로 부터 수신받는 응답 음성 수신부(46)와, 응답 음성 수신부(46)에서 수신한 응답 음성을 H/W 코덱(41)을 통하여 최종 응답으로 출력하는 음성 응답 출력부(47)를 포함한다.The low-performance terminal 100 includes an H/W codec 41 that receives raw voice data and PCM data, a PCM data recognition unit 42 that receives sound when a specific decibel or more is detected, and a PCM data recognition unit. A PCM data buffer 43 that stores the data recognized in 42, and a PCM data transmission unit that periodically reads data of a certain size from the PCM data buffer 43 and transmits it in a streaming method through the RF interface 45. (44) And, a response voice receiving unit 46 receiving a response voice from the agent terminal from the RF interface 45, and the response voice received from the response voice receiving unit 46 through the H/W codec 41 And a voice response output unit 47 that outputs as a response.

여기서, H/W 코덱(41)은 오디오 디지털 신호처리를 위한 부호화와 복호화를 위한 신호처리 기능을 제공하는 장치로, 녹음과 관련된 아날로그를 디지털로 변환하는 방법과 재생과 관련된 디지털을 아날로그로 변환하는 방법을 통해 오디오 데이터를 처리한다.Here, the H/W codec 41 is a device that provides a signal processing function for encoding and decoding for audio digital signal processing, a method of converting analog related to recording to digital, and converting digital related to reproduction to analog. The audio data through the method.

녹음과 재생에서 필요한 채널과 표본화, 양자화 정보 등을 연결된 단말과의 통신을 통해 설정한다.Channels, sampling and quantization information necessary for recording and playback are set through communication with the connected terminal.

본 발명에 따른 에이전트 단말의 상세 구성은 다음과 같다.The detailed configuration of the agent terminal according to the present invention is as follows.

도 5는 본 발명에 따른 에이전트 단말의 상세 구성도이다.5 is a detailed configuration diagram of an agent terminal according to the present invention.

에이전트 단말(200)은 RF 인터페이스(51)를 통하여 저성능 단말(100)로부터 PCM 데이터를 실시간 전송받아 인공지능 음성처리 플랫폼(300)으로 전송하여 음성 인식을 요청하고, 인공지능 음성처리 플랫폼(300)에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 처리부(52)와, 음성 인식 처리부(52)에서 음성 인식이 모두 처리되면 텍스트를 인공지능 음성처리 플랫폼(300)으로 전송하여 질의 응답 요청 및 질의 응답 수신을 수행하는 질의 응답 처리부(53)와, 질의 응답이 모두 완료되면 응답 텍스트를 질의 응답 처리부(53)로부터 받아 인공지능 음성처리 플랫폼(300)으로 음성 합성을 요청하고 인공지능 음성처리 플랫폼(300)으로부터 변환된 음성 파일을 받아 RF 인터페이스(51)를 통하여 저성능 단말(100)로 전송하는 음성 합성 처리부(54)를 포함한다.The agent terminal 200 receives PCM data from the low-performance terminal 100 in real time through the RF interface 51 and transmits it to the artificial intelligence speech processing platform 300 to request speech recognition, and the artificial intelligence speech processing platform 300 ), a speech recognition processing unit 52 that combines the converted text until all speeches are recognized, and when all speech recognition is processed by the speech recognition processing unit 52, the text is transmitted to the artificial intelligence speech processing platform 300 for query. The Q&A processing unit 53 performs response request and Q&A reception, and when all of the Q&A is completed, the response text is received from the Q&A processing unit 53, and a speech synthesis is requested to the artificial intelligence speech processing platform 300, and artificial intelligence And a speech synthesis processing unit 54 that receives the converted voice file from the voice processing platform 300 and transmits the converted voice file to the low-performance terminal 100 through the RF interface 51.

여기서, 음성 인식 처리부(52)는 RF 인터페이스(51)를 통하여 저성능 단말(100)로부터 PCM 데이터를 실시간 전송받는 PCM 데이터 수신부(52a)와, PCM 데이터 수신부(52a)에서 받은 PCM 데이터를 인공지능 음성처리 플랫폼(300)으로 전송하여 음성 인식을 요청하는 음성 인식 요청부(52b)와, 인공지능 음성처리 플랫폼(300)에서 변환된 텍스트를 음성이 모두 인식될 때까지 조합하는 음성 인식 수신 및 조합부(52c)를 포함한다.Here, the speech recognition processing unit 52 is a PCM data receiving unit 52a that receives PCM data in real time from the low-performance terminal 100 through the RF interface 51, and the PCM data received from the PCM data receiving unit 52a by artificial intelligence. A speech recognition request unit 52b that transmits to the speech processing platform 300 to request speech recognition, and a speech recognition reception and combination that combines text converted by the artificial intelligence speech processing platform 300 until all speeches are recognized. It includes a part 52c.

그리고 질의 응답 처리부(53)는 음성 인식 처리부(52)의 음성 인식 수신 및 조합부(52c)로부터 텍스트를 수신하여 수신 받은 텍스트를 인공지능 음성처리 플랫폼(300)으로 전송하여 질의 응답 요청을 하는 질의 응답 요청부(53a)와, 인공지능 음성처리 플랫폼(300)에서 해당 텍스트를 해석하여 응답 메시지를 전송하면 이를 수신하는 질의 응답 수신부(53b)를 포함한다.In addition, the query response processing unit 53 receives text from the speech recognition reception and combination unit 52c of the speech recognition processing unit 52 and transmits the received text to the artificial intelligence speech processing platform 300 to request a query response. It includes a response requesting unit 53a, and a query response receiving unit 53b for receiving a response message when the artificial intelligence voice processing platform 300 interprets the text and transmits the response message.

그리고 음성 합성 처리부(54)는 질의 응답이 모두 완료되면 응답 텍스트를 질의 응답 처리부(53)로부터 받아 인공지능 음성처리 플랫폼(300)으로 음성 합성을 요청하는 음성 합성 요청부(54a)와, 인공지능 음성처리 플랫폼(300)으로부터 변환된 음성 파일을 수신하는 음성 합성 수신부(54b)와, 음성 합성 수신부(54b)를 통하여 수신한 음성 파일을 RF 인터페이스(51)를 통하여 저성능 단말(100)로 전송하는 음성 합성 응답부(54c)를 포함한다.In addition, the speech synthesis processing unit 54 receives the response text from the query response processing unit 53 when all of the questions and answers are completed, and a speech synthesis request unit 54a that requests speech synthesis to the artificial intelligence speech processing platform 300, and the artificial intelligence. The voice synthesis receiver 54b for receiving the converted voice file from the voice processing platform 300 and the voice file received through the voice synthesis receiver 54b are transmitted to the low-performance terminal 100 through the RF interface 51 And a speech synthesis response unit 54c.

이와 같은 구성을 갖는 본 발명에 따른 에이전트 단말(200)의 전체 동작은 다음과 같다.The overall operation of the agent terminal 200 according to the present invention having such a configuration is as follows.

먼저, RF 인터페이스(51)로부터 PCM 데이터를 실시간 전송 받아 음성 인식 요청부(52b)를 통하여 인공지능 음성처리 플랫폼(300)의 음성 인식 플랫폼(300a)으로 전송한다.First, PCM data is received from the RF interface 51 in real time and transmitted to the voice recognition platform 300a of the artificial intelligence voice processing platform 300 through the voice recognition request unit 52b.

그리고 인공지능 음성처리 플랫폼(300)의 음성 인식 플랫폼(300a)은 PCM 데이터를 실시간 처리하여 텍스트로 변환하여 에이전트 단말(200)로 전송한다.In addition, the speech recognition platform 300a of the artificial intelligence speech processing platform 300 processes PCM data in real time, converts it into text, and transmits it to the agent terminal 200.

이어, 에이전트 단말(200)은 텍스트를 음성 인식 수신 및 조합부(52c)로 보내 음성이 모두 인식 될때까지 조합을 수행한다.Subsequently, the agent terminal 200 transmits the text to the speech recognition reception and combination unit 52c to perform combination until all speech is recognized.

여기서, 음성이 끝난 시점은 인공지능 음성처리 플랫폼(300)의 음성 인식 플랫폼(300a)에서 판단하며, 리턴 값을 통하여 종료 시점을 알린다.Here, the voice end point is determined by the voice recognition platform 300a of the artificial intelligence voice processing platform 300, and the end point is notified through the return value.

그리고 음성 인식이 모두 처리되면 음성 인식 수신 및 조합부(52c)에서 질의 응답 요청부(53a)로 텍스트를 전송하고, 질의 응답 요청부(53a)는 수신받은 텍스트를 인공지능 음성처리 플랫폼(300)의 질의 응답 플랫폼(300b)으로 전송한다.And when all of the speech recognition is processed, the speech recognition reception and combination unit 52c transmits the text to the query response request unit 53a, and the query response request unit 53a transmits the received text to the artificial intelligence speech processing platform 300. Is transmitted to the Q&A platform 300b.

이어, 질의 응답 플랫폼(300b)은 해당 텍스트를 해석하여 응답 메시지를 질의 응답 수신부(53b)로 전송한다.Subsequently, the Q&A platform 300b analyzes the text and transmits a response message to the Q&A receiving unit 53b.

이어, 질의 응답이 모두 완료되면 질의 응답 수신부(53b)는 음성 합성 요청부(54a)로 응답 텍스트를 전송한다.Subsequently, when all the questions and answers are completed, the question and answer receiving unit 53b transmits a response text to the speech synthesis request unit 54a.

음성 합성 요청부(54a)는 응답 텍스트를 인공지능 음성처리 플랫폼(300)의 음성 합성 플랫폼(300c)으로 전송한다.The speech synthesis request unit 54a transmits the response text to the speech synthesis platform 300c of the artificial intelligence speech processing platform 300.

그리고 음성 합성 플랫폼(300c)은 응답 텍스트를 음성 파일로 변환하여 에이전트 단말(200)의 음성 합성 수신부(54b)로 전송한다.In addition, the speech synthesis platform 300c converts the response text into a speech file and transmits it to the speech synthesis reception unit 54b of the agent terminal 200.

이어, 에이전트 단말(200)은 수신받은 최종 응답을 음성 합성 응답부(54c)로 전송하여 최종적으로 저성능 단말(100)로 전송되도록 한다.Subsequently, the agent terminal 200 transmits the received final response to the speech synthesis response unit 54c so that it is finally transmitted to the low-performance terminal 100.

본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 방법을 구체적으로 설명하면 다음과 같다.A method of supporting an intelligent voice service for a lightweight IoT device according to the present invention will be described in detail as follows.

도 6은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 방법을 나타낸 플로우 차트이다.6 is a flow chart showing a method of supporting an intelligent voice service for a lightweight IoT device according to the present invention.

먼저, 저성능 단말(100)이 가공되지 않은 음성 데이터, PCM 데이터를 H/W 코덱을 통하여 받아들여 저장한다.(S601)First, the low-performance terminal 100 receives and stores unprocessed voice data and PCM data through the H/W codec (S601).

이어, 주기적으로 일정한 크기의 데이터를 읽어 RF 인터페이스(45)를 통하여 스트리밍 전송한다.(S602)Subsequently, data of a certain size is periodically read and streamed through the RF interface 45 (S602).

그리고 에이전트 단말(200)이 RF 인터페이스(45)로부터 PCM 데이터를 실시간 전송받아 음성 인식 플랫폼(300a)으로 전송한다.(S603)In addition, the agent terminal 200 receives PCM data from the RF interface 45 in real time and transmits it to the speech recognition platform 300a (S603).

이어, 음성 인식 플랫폼(300a)에서 PCM 데이터를 실시간 처리하여 텍스트로 변환하여 에이전트 단말(200)로 전송한다.(S604)Subsequently, the PCM data is processed in real time by the speech recognition platform 300a, converted into text, and transmitted to the agent terminal 200 (S604).

그리고 에이전트 단말(200) 텍스트를 음성이 모두 인식될 때까지 조합하고 음성 인식이 모두 처리되면 텍스트를 질의 응답 플랫폼(300b)으로 전송한다.(S605)Then, the agent terminal 200 combines the text until all speech is recognized, and when all speech recognition is processed, the text is transmitted to the query response platform 300b (S605).

이어, 질의 응답 플랫폼(300b)에서 해당 텍스트를 해석하여 알맞은 응답 메시지를 에이전트 단말(200)로 전송한다.(S606)Subsequently, the query response platform 300b interprets the text and transmits an appropriate response message to the agent terminal 200 (S606).

그리고 질의 응답이 모두 완료되면 응답 텍스트를 음성 합성 플랫폼(300c)으로 전송하고(S607), 음성 합성 플랫폼(300c)이 응답 텍스트를 음성 파일로 변환하여 에이전트 단말(200)로 전송하고, 에이전트 단말(200)은 수신받은 최종 응답을 저성능 단말(100)로 전송한다.(S608)And when all of the questions and answers are completed, the response text is transmitted to the speech synthesis platform 300c (S607), and the speech synthesis platform 300c converts the response text into a speech file and transmits it to the agent terminal 200, and the agent terminal ( 200) transmits the received final response to the low-performance terminal 100 (S608).

이어, 저성능 단말(100)이 응답 음성을 RF 인터페이스로부터 수신받아 H/W 코덱을 통하여 최종 응답 출력을 한다.(S609)Then, the low-performance terminal 100 receives the response voice from the RF interface and outputs a final response through the H/W codec (S609).

도 7은 파일 방식의 음성 인식을 나타낸 동작 흐름도이고, 도 8은 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원의 전체 동작 흐름도이다.7 is an operation flow chart showing file-based voice recognition, and FIG. 8 is an overall operation flow diagram of intelligent voice service support for a lightweight IoT device according to the present invention.

그리고 도 9는 본 발명에 따른 스트리밍 방식의 음성 인식을 나타낸 동작 흐름도이다.And Figure 9 is an operation flow diagram showing the speech recognition of the streaming method according to the present invention.

에이전트 단말(200)은 저성능 단말(100)로부터 음성데이터를 수신하고, 이를 인공지능 음성처리 플랫폼(300)에 실시간으로 전송하여 텍스트로 변환한다.The agent terminal 200 receives voice data from the low-performance terminal 100, transmits it to the artificial intelligence voice processing platform 300 in real time, and converts it into text.

다음으로 변환된 텍스트를 기반으로 대화형 서비스를 이용하여 응답을 받아오고, 대화형 시스템의 텍스트 응답을 음성합성 서비스를 통해 음성데이터로 생성한다.Next, a response is received using an interactive service based on the converted text, and the text response of the interactive system is generated as voice data through the speech synthesis service.

마지막으로 음성데이터를 아틱에서 출력할 수 있는 형태로 변환하여 실시간으로 전송한다. Finally, it converts the voice data into a format that can be output by Artik and transmits it in real time.

이와 같이 본 발명은 저성능 단말에서 에이전트 단말을 통해 음성인식 서비스까지 연속적인 실시간 전송기법을 적용하여 지연시간을 최소화하고, 에이전트에서 단말로의 전송 역시 음성데이터의 실시간 전송기법을 적용하여 음성처리 서비스를 효율적으로 지원할 수 있도록 한 것이다.As described above, the present invention minimizes the delay time by applying a continuous real-time transmission technique from a low-performance terminal to a voice recognition service through an agent terminal, and also applies a real-time transmission technique of voice data to a voice processing service. It was designed to be able to efficiently support.

특히, 저성능 단말과 인공지능 음성처리 플랫폼의 사이에 스트리밍 방식의 음성인식 지원을 위한 에이전트 단말을 구비하고, 에이전트 단말이 다중 인터페이스를 지원하여 효율적인 음성처리 서비스가 가능하도록 한 것이다.In particular, an agent terminal for supporting streaming voice recognition is provided between a low-performance terminal and an artificial intelligence voice processing platform, and the agent terminal supports multiple interfaces to enable efficient voice processing service.

도 10a와 도 10b는 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치의 기준 처리 시간 및 전송 방법에 따른 녹음 시간별 처리 시간 그래프이다.10A and 10B are graphs of processing time for each recording time according to a reference processing time and a transmission method of an intelligent voice service support device for a lightweight IoT device according to the present invention.

도 10a는 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치에서 발생하는 처리시간과 지연시간을 각 단계별로 측정한 것이다.10A is a measurement of processing time and delay time occurring in an intelligent voice service support device for a lightweight IoT device according to the present invention in each step.

측정 방법은 저성능 단말에서의 처리되는 구간별 시간과 에이전트 단말에서 처리되는 서비스별 시간에 대해서 각각 측정한다.The measurement method measures the time per section processed by the low-performance terminal and the time per service processed by the agent terminal, respectively.

에이전트 단말의 처리시간을 포함한 측정 가능한 범위를 측정한 결과를 나타낸 것이다.It shows the result of measuring the measurable range including the processing time of the agent terminal.

저성능 단말 에서 녹음을 종료한 이후 재생을 시작할 때까지 평균 2.005 초의 시간이 소요되었다. 저성능 단말에서 녹음을 종료하고 mp3데이터를 수신할 때 까지의 대기시간은 1.091 초가 소요된다. 이것은 에이전트 단말의 처리시간과 송수신 시간이 포함된 결과이다. 따라서 에이전트 단말 처리시간을 제외하면 저성능 단말에서 PCM 데이터를 송신하고 에이전트 단말에서 mp3데이터를 변환하여 수신하는 부분은 총 121 ms가 소요되었음을 알 수 있다.It took an average of 2.005 seconds from the end of recording to the start of playback on the low-performance terminal. It takes 1.091 seconds to wait until the low-performance terminal finishes recording and receives mp3 data. This is the result of including the processing time and transmission/reception time of the agent terminal. Therefore, excluding the agent terminal processing time, it can be seen that a total of 121 ms was required for the low-performance terminal to transmit PCM data and the agent terminal to convert and receive mp3 data.

본 발명은 추가적으로 발생하는 지연시간을 줄이기 위해 저성능 단말에서 에이전트 단말까지 TCP 소켓통신을 이용한 실시간 전송기법을 적용하고, 에이전트 단말 내부의 음성인식 서비스의 사용에 있어서도 실시간 전송방법을 적용하여 처리시간을 최소화하여 음성기반 대화형 시스템으로 충분히 효용성을 갖도록 한 것이다.The present invention applies a real-time transmission technique using TCP socket communication from a low-performance terminal to an agent terminal in order to reduce additional delay time, and also applies a real-time transmission method in the use of the voice recognition service inside the agent terminal to reduce processing time. By minimizing it, it has sufficient utility as a voice-based interactive system.

도 10b는 전송 방법에 따른 녹음 시간별 처리 시간 그래프이다.10B is a graph of processing time for each recording time according to a transmission method.

이상에서 설명한 본 발명에 따른 경량 IoT 장치를 위한 지능형 음성 서비스 지원 장치 및 방법은 오디오 처리 하드웨어 부재 및 음성처리를 위한 고수준 처리기능의 부재 문제를 갖는 저성능 단말의 인공지능 음성 처리를 지원하기 위한 것으로, 저성능 단말과 인공지능 음성처리 플랫폼의 사이에 스트리밍 방식의 음성인식 지원을 위한 에이전트 단말을 구비하여 경량 IoT 장치를 위한 지능형 음성 서비스 지원을 하도록 하는 것이다.The intelligent voice service support apparatus and method for a lightweight IoT device according to the present invention described above is for supporting artificial intelligence voice processing of a low-performance terminal having a problem of absence of audio processing hardware and a high level processing function for voice processing. In order to support intelligent voice service for lightweight IoT devices, an agent terminal is provided between the low-performance terminal and the artificial intelligence voice processing platform for supporting voice recognition of a streaming method.

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.As described above, it will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments should be considered from a descriptive point of view rather than a limiting point of view, and the scope of the present invention is shown in the claims rather than the above description, and all differences within the scope equivalent thereto are included in the present invention. It will have to be interpreted.

100. 저성능 단말
200. 에이전트 단말
300. 인공지능 음성처리 플랫폼100. Low performance terminal
200. Agent terminal
300. Artificial Intelligence Voice Processing Platform

Claims

A low-performance terminal for receiving and storing raw voice data and PCM data, periodically streaming transmission, and outputting a final response;
An agent terminal that receives PCM data from the low-performance terminal in real time, transmits it to an artificial intelligence voice processing platform, and transmits a voice file received from the artificial intelligence speech processing platform to a low-performance terminal;
A lightweight IoT device comprising: an artificial intelligence voice processing platform that processes PCM data received from the agent terminal in real time and converts it into text, interprets the text processed by voice recognition and transmits a response message to the agent terminal Intelligent voice service support device for customers.

The method of claim 1, wherein the artificial intelligence voice processing platform,
A voice recognition platform that processes PCM data in real time, converts it to text, and transmits it to the agent terminal, determines the time when the voice has ended, and informs through a return value;
A query response platform that interprets the received text and transmits an appropriate response message to the agent terminal;
An intelligent voice service support device for a lightweight IoT device, comprising a voice synthesis platform that converts the response text into a voice file and transmits it to an agent terminal.

The method of claim 1, wherein the low-performance terminal,
H/W codec that receives raw voice data and PCM data,
PCM data recognition unit that accepts as sound when a specific decibel or more is detected,
A PCM data buffer that stores the data recognized by the PCM data recognition unit;
A PCM data transmission unit that periodically reads data of a certain size from the PCM data buffer and transmits it in a streaming method through an RF interface;
A response voice receiver that receives a response voice from an agent terminal through an RF interface,
An intelligent voice service support device for a lightweight IoT device, comprising: a voice response output unit that outputs the response voice received by the response voice receiver as a final response through the H/W codec.

The method of claim 1, wherein the agent terminal,
A speech recognition processing unit that receives PCM data from a low-performance terminal through the RF interface in real time and transmits it to an artificial intelligence speech processing platform to request speech recognition, and combines the converted text in the artificial intelligence speech processing platform until all speeches are recognized. Wow,
When all speech recognition is processed by the speech recognition processing unit, a query response processing unit that transmits a text to an artificial intelligence speech processing platform to request a query response and receive a query response;
When all the questions and answers are completed, the response text is received from the Q&A processing unit, requests for speech synthesis to the artificial intelligence speech processing platform, and the converted speech file from the artificial intelligence speech processing platform is received and transmitted to the low-performance terminal through the RF interface. Intelligent voice service support device for a lightweight IoT device comprising a.

The method of claim 4, wherein the speech recognition processing unit,
A PCM data receiving unit that receives PCM data in real time from a low-performance terminal through an RF interface,
A speech recognition requesting unit that requests speech recognition by transmitting the PCM data received from the PCM data receiving unit to an artificial intelligence speech processing platform;
An intelligent voice service support device for a lightweight IoT device, comprising: a voice recognition receiving and combining unit that combines the text converted by the artificial intelligence voice processing platform until all voices are recognized.

The method of claim 4, wherein the query response processing unit,
A query response requesting unit for receiving a text from the speech recognition receiving and combining unit of the speech recognition processing unit and transmitting the received text to an artificial intelligence speech processing platform to request a query response;
An intelligent voice service support device for a lightweight IoT device, comprising: a query response receiver configured to receive a response message by analyzing the corresponding text by the artificial intelligence voice processing platform.

The method of claim 4, wherein the speech synthesis processing unit,
When all the questions and answers are completed, a speech synthesis request unit that receives the response text from the query response processing unit and requests speech synthesis through an artificial intelligence speech processing platform;
A speech synthesis receiving unit that receives the converted speech file from the artificial intelligence speech processing platform,
An intelligent voice service support device for a lightweight IoT device, comprising a voice synthesis response unit for transmitting the voice file received through the voice synthesis receiver to a low-performance terminal through an RF interface.

Speech recognition processing that receives PCM data from a low-performance terminal through RF interface in real time and transmits it to an artificial intelligence speech processing platform to request speech recognition, and combines the converted text in the artificial intelligence speech processing platform until all speeches are recognized. step;
A query response processing step of transmitting a text to an artificial intelligence speech processing platform to perform a query response request and a query response reception when all speech recognition is processed in the speech recognition processing step;
When all the questions and answers are completed, the response text is transmitted to the artificial intelligence speech processing platform to request speech synthesis, and a speech synthesis processing step of receiving the converted speech file from the artificial intelligence speech processing platform and transmitting it to a low-performance terminal through an RF interface; Intelligent voice service support method for a lightweight IoT device comprising a.

The method of claim 8, wherein the speech recognition processing step,
PCM data receiving step of receiving PCM data in real time from a low-performance terminal through an RF interface,
A voice recognition request step of requesting voice recognition by transmitting the PCM data received in the PCM data reception step to an artificial intelligence voice processing platform;
A method for supporting an intelligent voice service for a lightweight IoT device, comprising: receiving and combining voice recognition of combining the converted text in the artificial intelligence voice processing platform until all voices are recognized.

The method of claim 8, wherein the query response processing step,
A query response request step in which the combined text is transmitted to an artificial intelligence speech processing platform through the speech recognition processing step to request a query response;
An intelligent voice service support method for a lightweight IoT device, comprising the step of receiving a query response when an artificial intelligence voice processing platform interprets the text and transmits a response message.

The method of claim 8, wherein the speech synthesis processing step,
When all the questions and answers are completed, the response text is received from the Q&A processing unit and a speech synthesis request step of requesting speech synthesis through an artificial intelligence speech processing platform;
A speech synthesis reception step of receiving a converted speech file from an artificial intelligence speech processing platform,
An intelligent voice service support method for a lightweight IoT device, comprising a voice synthesis response step of transmitting the voice file received through the voice synthesis reception step to a low-performance terminal through an RF interface.