KR102291117B1

KR102291117B1 - Ai speaker device for external connection and method for echo cancellation and synchronization

Info

Publication number: KR102291117B1
Application number: KR1020180170283A
Authority: KR
Inventors: 안성민; 박동길; 윤기민
Original assignee: 주식회사 오투오
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2021-08-20
Also published as: KR20200080635A

Abstract

본 발명은 외부 연결용 인공지능(AI) 스피커 장치 및 이와 통신 인터페이스에 의해 상기 AI 스피커 장치와 상호 연동될 수 있는 외부 기기 간 연결 시스템에 관한 것으로서, 상기 외부 기기로부터 제공되는 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로 수신하여 임시 저장하는 버퍼메모리 및 상기 버퍼메모리의 가용률에 반비례하여 상기 버퍼메모리의 동작 클럭을 제어하는 클럭제어기를 포함하고 상기 클럭제어기는 상기 버퍼메모리의 오버런 발생시 클럭을 증가시키고, 언더런 발생시 클럭을 감소시킴으로써, 인공지능(AI) 기능이 내장되지 않은 외부 기기를 AI 스피커와 상호 연동시킴에 있어서 외부 기기와 AI 스피커 간 클럭이 다름으로 인해 발생하는 버퍼 오버런(buffer over run) 또는 버퍼 언더런(buffer under run)을 억제할 수 있다.The present invention relates to an artificial intelligence (AI) speaker device for external connection and a connection system between external devices capable of interworking with the AI speaker device by means of a communication interface therewith. a buffer memory for receiving and temporarily storing an echo reference signal; and a clock controller for controlling an operation clock of the buffer memory in inverse proportion to an availability rate of the buffer memory, wherein the clock controller increases the clock when an overrun of the buffer memory occurs, and underrun Buffer over run or buffer underrun caused by the clock difference between the external device and the AI speaker in interworking with the AI speaker and the external device without built-in AI function by reducing the clock when it occurs (buffer under run) can be suppressed.

Description

AI SPEAKER DEVICE FOR EXTERNAL CONNECTION AND METHOD FOR ECHO CANCELLATION AND SYNCHRONIZATION

본 발명은 외부 연결용 AI 스피커 장치 및 이와 외부 기기 간 연결 시스템에 관한 것으로, 더욱 상세하게는 내장형 인공지능 셋톱박스와 달리 기존의 인공지능(AI) 기능이 구현되지 않은 셋톱박스를 교체 없이 그대로 이용해 상호 연동하여 동작할 수 있는 AI 스피커 장치 및 이러한 AI 스피커 장치를 이를테면 범용 직렬 버스(USB)와 같은 인터페이스에 의해 셋톱박스와 연결할 수 있는 AI 스피커 장치와 외부 기기 간 연결 시스템(이하, 현재 통상적으로 불리는 AI 스피커 시스템)에 관한 것이다.The present invention relates to an AI speaker device for external connection and a connection system between it and an external device. An AI speaker device capable of interoperating with each other and a connection system between an AI speaker device and an external device capable of connecting the AI speaker device to a set-top box by an interface such as a universal serial bus (USB) AI speaker system).

음악 감상이나 라디오 청취에 활용되던 스피커가 인간의 학습 능력과 추론 능력, 지각 능력, 자연 언어의 이해 능력 등을 컴퓨터 프로그램으로 실현한 기술인 인공지능(AI, Artificial Intelligence) 기술과 만나 단순하게 소리를 전달하는 음향 기기를 넘어 생각하고 관리하는 스마트 도구로 진화하고 있다. '음성인식', '자연어 처리', '추천' 등 AI 기술을 활용해 단순하게 소리를 전달하는 도구에서 생각하고 관리하는 인공지능 스피커(AI speaker)로 변신 중인 것이다.The speaker, which was used for listening to music or listening to the radio, meets artificial intelligence (AI) technology, a technology that realizes human learning ability, reasoning ability, perceptual ability, and natural language understanding ability with a computer program, and simply delivers sound. It is evolving into a smart tool that thinks and manages beyond an audio device. Using AI technologies such as 'voice recognition', 'natural language processing', and 'recommendation', it is transforming from a simple sound delivery tool to an AI speaker that thinks and manages.

이러한 AI 스피커는 인공지능 알고리즘을 이용해 사용자와 음성으로 의사소통을 하기 때문에 이를 활용하면 손을 이용하지 않고도 음성인식을 통해 간편하게 음악을 재생하거나 통신망에 연결된 기기를 제어하는 식으로 손쉽게 스마트 환경을 구축할 수 있어 이를 적용하려는 시도가 다양하게 있어왔는데, 그러한 시도 중 하나는 AI 스피커 기능을 멀티미디어 컨텐츠 제공 서비스와 결합시키는 것이다. 대표적으로는 가정에서 지상파·케이블·위성 방송을 수신해 멀티미디어 컨텐츠를 즐기게 하기 위해 보급되는 셋톱박스(STB, SetTop Box)에 AI 스피커 기능을 통합시키는 것이다. 본 발명의 목적을 감안하여 'AI 스피커 시스템'은 문맥에 위배되지 않는 한 셋톱박스에 AI 스피커 기능을 통합한 시스템을 가리키는 것으로 한다.These AI speakers use artificial intelligence algorithms to communicate with users by voice, so if you use them, you can easily build a smart environment by easily playing music or controlling devices connected to the communication network through voice recognition without using your hands. There have been various attempts to apply this, and one of such attempts is to combine the AI speaker function with a multimedia content providing service. A typical example is integrating the AI speaker function into the set-top box (STB, SetTop Box), which is distributed to receive terrestrial, cable, and satellite broadcasts at home and enjoy multimedia contents. In view of the purpose of the present invention, 'AI speaker system' shall refer to a system in which an AI speaker function is integrated into a set-top box, unless it goes against the context.

셋톱박스에 AI 스피커 기능을 통합함으로써 셋톱박스의 기능(예: 채널 변경, 볼륨 조절)을 음성으로 제어할 수 있을 뿐만 아니라 멀티미디어 컨텐츠에 대해 궁금한 것을 음성으로 문의하는 것도 가능하고 특정 상황에 대한 컨텐츠 추천 등도 셋톱박스로 요청하는 것이 가능하다. AI 스피커 기능이 구현된 셋톱박스 장치는 사용자가 음성으로 한 제어 명령, 질의 또는 요청에 대응하여 기능 수행 결과 혹은 답변 내용을 스피커 사운드로 출력한다.By integrating the AI speaker function into the set-top box, it is possible not only to control the functions of the set-top box (e.g., changing channels, adjusting the volume) by voice, but also to ask questions about multimedia contents by voice, and to recommend contents for specific situations. It is also possible to request with a set-top box. The set-top box device in which the AI speaker function is implemented responds to the user's voice control command, query, or request, and outputs the function execution result or answer as speaker sound.

도 4와 도 5는 종래기술에서 셋톱박스에 AI 스피커 시스템을 구현하는 두 가지 예를 개념적으로 나타내는 도면이다.4 and 5 are diagrams conceptually illustrating two examples of implementing an AI speaker system in a set-top box in the prior art.

먼저, 도 4는 셋톱박스에 AI 스피커 시스템을 구현하는 일반적인 방식을 나타낸다. 셋톱박스(10)는 외부의 컨텐츠 서버(30)로부터 멀티미디어 컨텐츠를 제공받아 재생하며 그에 따른 재생 화면을 디스플레이 장치(20)에 표시한다. 셋톱박스(10)는 인공지능 기능을 제공하며, 이를 활용하기 위해 사용자는 음성으로 기능 제어 명령을 제공하거나 질의 문구를 제공하고 셋톱박스(10)는 마이크 모듈(11)을 통해 사용자의 음성을 입력 받아 처리한다. 일반적으로, 인공지능 기능은 컨텐츠 서버(30) 또는 별도의 서버 장치(미도시)에 설치되어 있다.First, FIG. 4 shows a general method of implementing an AI speaker system in a set-top box. The set-top box 10 receives and plays multimedia content from an external content server 30 , and displays a corresponding playback screen on the display device 20 . The set-top box 10 provides an artificial intelligence function, and in order to utilize it, the user provides a function control command or a query phrase by voice, and the set-top box 10 inputs the user's voice through the microphone module 11 . accept and process In general, the artificial intelligence function is installed in the content server 30 or a separate server device (not shown).

다음으로, 도 5는 사용자의 음성 입력을 셋톱박스(10)가 리모컨(40)을 통해 입력받아 처리하는 구현예를 나타낸다. 주변 잡음을 배제하고 사용자의 음성 입력에 대한 인식 성능을 높이기 위해서 마이크 모듈(41)을 리모컨(40)에 설치하는 것이다. 음성 입력을 일차로 마이크 모듈(41)이 입력 받은 후에 리모컨(40)이 음성 데이터를 셋톱박스(10)로 무선 전송하는 것이다.Next, FIG. 5 shows an embodiment in which the set-top box 10 receives and processes a user's voice input through the remote controller 40 . The microphone module 41 is installed in the remote controller 40 in order to exclude ambient noise and increase the recognition performance of the user's voice input. After the microphone module 41 receives the voice input first, the remote control 40 wirelessly transmits the voice data to the set-top box 10 .

이처럼 종래의 AI 스피커 시스템은 셋톱박스 장치에 AI 스피커 기능을 일체로서 구현한 상태로 제조되어야 한다. 셋톱박스의 내부 기능과 AI 스피커 기능이 상호 유기적으로 연결되어 있기 때문에 일체로 구현한 것이다. 또한, 사용자의 음성 입력을 정상적으로 처리하기 위해서도 일체로 구현할 수밖에 없었다. 셋톱박스에 의해 텔레비전 스피커가 출력하는 멀티미디어 컨텐츠의 재생 사운드가 사용자의 음성 입력과 동일한 주파수 대역을 공유할 뿐만 아니라 음향학적 특성도 매우 유사하기 때문이다. 이러한 컨텐츠 재생 사운드와 사용자의 음성 입력을 구분하여 처리하기 위해서 셋톱박스 내부에 AI 기능을 구현하였다.As such, the conventional AI speaker system must be manufactured in a state in which the AI speaker function is integrated into the set-top box device. Since the internal function of the set-top box and the AI speaker function are organically connected, it is implemented as one. In addition, in order to normally process the user's voice input, there was no choice but to implement it as an integral part. This is because the reproduction sound of the multimedia content output from the television speaker by the set-top box not only shares the same frequency band as the user's voice input, but also has very similar acoustic characteristics. The AI function was implemented inside the set-top box to distinguish and process the content playback sound and the user's voice input.

그러나, 이러한 종래기술의 구현 방식은 고성능 셋톱박스 제품을 구매하거나 고가의 방송 서비스를 가입해야만 인공지능 스피커 서비스를 활용할 수 있는 단점이 있었다. 기존에 설치된 셋톱박스들을 교체해야만 한다는 것은 사용자와 사업자 모두에게 큰 비용부담이 되어 인공지능 스피커 서비스를 보급하는 데에 사실상 가장 큰 장애가 된다.However, this prior art implementation method has a disadvantage in that it is possible to utilize the artificial intelligence speaker service only by purchasing a high-performance set-top box product or subscribing to an expensive broadcasting service. Having to replace the existing set-top boxes is a huge cost burden for both users and operators, and is actually the biggest obstacle to disseminate artificial intelligence speaker services.

따라서, 인공지능 기능이 구현되지 않은 기존의 셋톱박스 이용자가 AI 스피커 서비스를 활용할 수 있게 보조함으로써 상기와 같은 종래기술의 문제점을 해결할 수 있는 기술이 요망된다.Therefore, there is a need for a technology that can solve the problems of the prior art by assisting the existing set-top box users in which the artificial intelligence function is not implemented to utilize the AI speaker service.

특히, 본 발명에서는 기존의 인공지능 기능이 구현되지 않은 셋톱박스를 AI 스피커와 상호 연동시킴에 있어서, 셋톱박스와 AI 스피커 간 클럭이 달라서 서로 간에 전달된 데이터를 버퍼에 저장했다가 처리하는 과정에서 어느 한 쪽의 속도가 너무 빠르거나 느려 버퍼가 버티지 못하고 처리를 중지하는 버퍼 오버런(buffer over run) 또는 버퍼 언더런(buffer under run)이 발생하게 되고, 이로 인해 일정한 속도로 에코 기준 신호(Echo-reference)를 에코캔슬레이션부에 전달하는데 문제가 생기게 되는 문제점을 해결하기 위한 기술을 개시한다.In particular, in the present invention, in interworking the set-top box in which the existing artificial intelligence function is not implemented with the AI speaker, the clock between the set-top box and the AI speaker is different, so in the process of storing and processing the data transmitted between each other in a buffer If either side is too fast or too slow, the buffer cannot withstand it, and a buffer over run or buffer under run occurs, which stops processing, and this causes the echo-reference signal at a constant rate ) to the echo cancellation unit, a technique for solving a problem that arises is disclosed.

대한민국 등록특허공보 제10-1914583호(등록일자 2018.10.29)Republic of Korea Patent Publication No. 10-1914583 (Registration Date 2018.10.29) 대한민국 공개특허공보 제2018-0116100호(공개일자 2018.11.16)Republic of Korea Patent Publication No. 2018-0116100 (published on November 16, 2018)

따라서, 본 발명은 상기한 종래 기술의 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 내장형 인공지능 셋톱박스와 달리 기존의 인공지능(AI) 기능이 구현되지 않은 셋톱박스를 교체 없이 그대로 이용해 상호 연동하여 동작할 수 있는 AI 스피커 장치 및 이러한 AI 스피커 장치를 범용 직렬 버스(USB)와 같은 인터페이스에 의해 셋톱박스와 연결할 수 있는 AI 스피커 장치와 외부 기기 간 연결 시스템을 제공하는데 있다.Accordingly, the present invention has been made to solve the problems of the prior art, and an object of the present invention is to use a set-top box in which an existing artificial intelligence (AI) function is not implemented, unlike a built-in artificial intelligence set-top box, as it is without replacement. It is to provide an AI speaker device that can operate in conjunction, and a connection system between an AI speaker device and an external device that can connect the AI speaker device to a set-top box through an interface such as a universal serial bus (USB).

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be able

상기와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시 예에 따른 일 측면에서, 외부 연결용 AI 스피커 장치와 통신 인터페이스에 의해 상기 AI 스피커 장치와 상호 연동될 수 있는 외부 기기 간 연결 시스템은 상기 외부 기기로부터 제공되는 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로 수신하여 임시 저장하는 버퍼메모리 및 상기 버퍼메모리의 가용률에 반비례하여 상기 버퍼메모리의 동작 클럭을 제어하는 클럭제어기를 포함하고 상기 클럭제어기는 상기 버퍼메모리의 오버런 발생시 클럭을 증가시키고, 언더런 발생시 클럭을 감소시켜 상기 재생 오디오의 유실을 방지한다.In one aspect according to a preferred embodiment of the present invention for achieving the above object, the external device connection system capable of interworking with the AI speaker device by means of a communication interface with the AI speaker device for external connection is the external device. and a buffer memory for receiving and temporarily storing playback audio of multimedia content provided from the Echo reference signal, and a clock controller for controlling an operation clock of the buffer memory in inverse proportion to an availability rate of the buffer memory, wherein the clock controller comprises the buffer memory When the overrun occurs, the clock is increased, and when the underrun occurs, the clock is decreased to prevent loss of the reproduced audio.

더 바람직하게는, 상기 클럭제어기는 상기 버퍼메모리의 사용량을 식별하고, 상기 사용량이 미리 설정한 제 1 임계치를 상회하면 상기 버퍼메모리의 동작 클럭을 증가 제어하고, 상기 사용량이 미리 설정한 제 2 임계치를 하회하면 상기 버퍼메모리의 동작 클럭을 감소 제어하는 것을 특징으로 한다.More preferably, the clock controller identifies the usage amount of the buffer memory, increases and controls the operation clock of the buffer memory when the usage exceeds a first threshold value set in advance, and the usage amount exceeds a preset second threshold value When it is less than , it is characterized in that the operation clock of the buffer memory is reduced and controlled.

다른 측면에서, 상기한 외부 연결용 AI 스피커 장치와 외부 기기 간 연결 시스템이 적용될 수 있는 상기 외부 연결용 AI 스피커 장치는 상기 AI 스피커와 관련하여 주변 음성 신호를 수집하여 입력하는 마이크 음성입력부, 디지털 인터페이스를 통해 상기 외부기기와 외부 접속하기 위한 디지털 외부접속부, 상기 디지털 인터페이스를 통해 상기 외부기기와 연동하여 동작하기 위한 외부기기 연동부, 청구항 1 내지 3 중 어느 한 항의 버퍼 메모리 및 클럭 제어기를 포함하는 재생오디오 버퍼부, 상기 외부기기에 기인한 멀티미디어 컨텐츠의 재생 오디오를 상기 재생오디오 버퍼부에서 읽어서 오디오 재생용 채널을 통해 오디오 코덱부로 보내고 동일한 재생 오디오를 복사하여 에코 캔슬레이션용 채널을 통해 에코 캔슬레이션부로 보내는 제어부, 상기 멀티미디어 콘텐츠의 재생 오디오를 디코딩하여 스피커 음성출력부로 보내는 오디오 코덱부, 상기 에코 기준 신호를 참조하여 상기 마이크 음성입력부가 수집하는 주변 음성 신호로부터 상기 멀티미디어 콘텐츠의 재생 오디오 에코 성분을 제거 처리하는 에코 캔슬레이션부, 상기 재생 오디오 에코 성분이 제거된 주변 음성 신호를 이용하여 사용자 음성을 전처리하고 디지털 외부 접속부를 통해 상기 외부기기로 전달하는 사용자 음성처리부 및 상기 외부기기를 통해 획득되는 인공지능 응답 데이터를 음성 대역으로 출력하기 위한 스피커 음성출력부를 포함한다.In another aspect, the AI speaker device for external connection to which the connection system between the AI speaker device for external connection and an external device can be applied is a microphone audio input unit for collecting and inputting surrounding voice signals in relation to the AI speaker, a digital interface A reproduction comprising a digital external connection unit for externally connecting with the external device through the digital interface, an external device interworking unit for operating in conjunction with the external device through the digital interface, and the buffer memory and clock controller of any one of claims 1 to 3 The audio buffer unit reads the playback audio of the multimedia content caused by the external device from the playback audio buffer unit, sends it to the audio codec unit through the audio reproduction channel, and copies the same reproduced audio to the echo cancellation unit through the echo cancellation channel. A sending control unit, an audio codec unit that decodes the reproduced audio of the multimedia content and sends it to a speaker audio output unit, and a process for removing the reproduced audio echo component of the multimedia content from the surrounding audio signal collected by the microphone audio input unit with reference to the echo reference signal an echo cancellation unit that pre-processes the user's voice using the surrounding voice signal from which the reproduced audio echo component has been removed and transmits it to the external device through a digital external connection unit, and an artificial intelligence response obtained through the external device and a speaker audio output unit for outputting data in an audio band.

또 다른 측면에서, 외부 연결용 AI 스피커 장치와 외부 기기 간 연결 시스템을 통해 상기 외부 연결용 AI 스피커 장치와 연결될 수 있는 상기 외부 기기는 디지털 인터페이스를 통해 상기 AI 스피커 장치와 외부 접속하면 상기 AI 스피커 장치를 USB 오디오 디바이스로 설정하고, 멀티미이어 콘텐츠의 재생 오디오를 에코 기준 신호로서 상기 AI 스피커 장치로 USB 오디오에 따라 스트리밍 출력하고, 상기 디지털 인터페이스를 통해 상기 AI 스피커 장치로부터 사용자 음성의 전처리 결과를 전달받아 외부의 인공지능 서버로 전달하여 인공지능 처리 결과를 전달받고, 상기 인공지능 처리 결과에 따라 멀티미디어 컨텐츠를 재생하고, 상기 인공지능 처리 결과를 상기 디지털 인터페이스를 통해 상기 AI 스피커 장치로 전달한다.In another aspect, when the external device that can be connected to the AI speaker device for external connection through a connection system between the AI speaker device for external connection and the external device is externally connected to the AI speaker device through a digital interface, the AI speaker device is set as a USB audio device, and the playback audio of multimedia content is streamed to the AI speaker device as an echo reference signal according to USB audio, and the preprocessing result of the user's voice is delivered from the AI speaker device through the digital interface Received and delivered to an external artificial intelligence server to receive artificial intelligence processing results, reproduce multimedia content according to the artificial intelligence processing results, and deliver the artificial intelligence processing results to the AI speaker device through the digital interface.

바람직하게는, 인터넷을 통해 상기 외부의 인공지능 서버와 연동하여 상기 AI 스피커 장치로부터 사용자 음성의 전처리 결과를 전달받으면 상기 외부의 인공지능 서버로 전달하여 인공지능 처리 결과를 전달받고, 인터넷을 통해 외부의 컨텐츠 서버와 연동하여 상기 인공지능 처리 결과에 대응하는 멀티미디어 컨텐츠를 제공받기 위한 외부 서버 연동부, 상기 디지털 인터페이스를 통해 상기 AI 스피커 장치와 외부 접속하고 상기 외부 접속된 AI 스피커 장치를 인식하면 USB 오디오 디바이스로 설정하기 위한 디지털 외부 접속부, 상기 디지털 인터페이스를 통해 상기 AI 스피커 장치와 연동하여 동작하기 위한 AI 스피커 연동부, 멀티미디어 콘텐츠의 재생 오디오를 에코 기준 신호로서 상기 AI 스피커 장치로 USB 오디오에 따라 스트리밍 출력하는 재생오디오 제공부 및 상기 컨텐츠 서버로부터 제공되는 멀티미디어 컨텐츠를 오디오/비디오 재생 처리하기 위한 컨텐츠 재생 처리부를 포함한다.Preferably, when receiving the pre-processing result of the user's voice from the AI speaker device by interworking with the external artificial intelligence server through the Internet, it is transferred to the external artificial intelligence server to receive the artificial intelligence processing result, and externally through the Internet An external server interworking unit for receiving multimedia contents corresponding to the artificial intelligence processing result by interworking with the content server of A digital external connection unit for setting the device, an AI speaker interworking unit for operating in conjunction with the AI speaker device through the digital interface, and streaming output according to USB audio to the AI speaker device as an echo reference signal for playback audio of multimedia content and a content reproduction processing unit for processing audio/video reproduction of multimedia contents provided from the contents server.

더 바람직하게는, 상기 외부 기기는 셋톱박스(SetTop Box)일 수 있다.More preferably, the external device may be a set-top box (SetTop Box).

상술한 바와 같이, 본 발명에 의한 외부 연결용 AI 스피커 장치 및 이와 외부 기기 간 연결 시스템은 다음과 같은 효과를 제공한다.As described above, the AI speaker device for external connection and the system for connection between the external device and the external device according to the present invention provide the following effects.

본 발명에 따르면 인공지능(AI) 기능이 내장되지 않은 기존에 출시된 일반적인 셋톱박스에 대하여 네트워크를 통한 소프트웨어 업그레이드를 수행하고 본 발명에 따른 AI 스피커를 USB로 외부 연결하는 것만으로 저렴하고 간편하게 AI 스피커 시스템을 구축할 수 있는 장점이 있다.According to the present invention, a cheap and convenient AI speaker simply by performing a software upgrade through a network and externally connecting the AI speaker according to the present invention to a conventional set-top box that does not have an artificial intelligence (AI) function built-in. There are advantages to building a system.

그리고, 본 발명에 따르면 인공지능(AI) 기능이 내장되지 않은 상태에서 일반 가정에 보급되어 있는 셋톱박스에 본 발명에 따른 AI 스피커를 외부에 추가 연결함으로써 멀티미디어 컨텐츠 제공 기능과 인공지능 스피커 기능이 상호 연동하는 고성능의 AI 스피커 시스템을 구축할 수 있는 장점이 있다.And, according to the present invention, in a state where the artificial intelligence (AI) function is not built-in, the function of providing multimedia contents and the function of the artificial intelligence speaker are mutually connected by additionally connecting the AI speaker according to the present invention to the set-top box that is distributed in general households. It has the advantage of being able to build high-performance AI speaker systems that work together.

특히, 본 발명에 따르면 인공지능(AI) 기능이 내장되지 않은 셋톱박스를 AI 스피커와 상호 연동시킴에 있어서, 셋톱박스와 AI 스피커 간 데이터 통신시 클럭이 다름으로 인해 서로 간에 전달된 데이터를 버퍼에 저장했다가 처리하는 과정 중에 발생하는 버퍼 오버런(buffer over run) 또는 버퍼 언더런(buffer under run)을 억제할 수 있다.In particular, according to the present invention, in interworking a set-top box without a built-in artificial intelligence (AI) function with an AI speaker, the data transmitted between the set-top box and the AI speaker is stored in a buffer due to the difference in clocks during data communication. Buffer over run or buffer under run that occurs during the process of saving and processing can be suppressed.

도 1은 본 발명의 외부 연결용 AI 스피커 장치 및 이와 외부 기기 간 연결 시스템이 적용되는 네트워크 환경을 나타내는 전체 구성도이다.
도 2는 본 발명의 AI 스피커 장치와 외부 기기 간 연결 시스템이 설치될 수 있는 AI 스피커 장치가 셋톱박스와 연결된 개략적인 구성을 나타낸 기능 블록도이다.
도 3은 본 발명의 바람직한 실시 예에 따른 AI 스피커 장치와 외부 기기 간 연결 시스템에서 AI 스피커 장치와 외부 기기 간에 음성인식을 처리하는 개략적인 구성을 나타낸 기능 블록도이다.
도 4는 종래기술에서의 AI 스피커 시스템의 일례를 나타내는 도면이다.
도 5는 종래기술에서의 AI 스피커 시스템의 다른 예를 나타내는 도면이다.1 is an overall configuration diagram showing a network environment to which an AI speaker device for external connection of the present invention and a connection system between it and an external device are applied.
2 is a functional block diagram illustrating a schematic configuration in which an AI speaker device in which a connection system between an AI speaker device and an external device of the present invention can be installed is connected to a set-top box.
3 is a functional block diagram illustrating a schematic configuration of processing voice recognition between an AI speaker device and an external device in a system for connecting an AI speaker device and an external device according to a preferred embodiment of the present invention.
4 is a diagram showing an example of an AI speaker system in the prior art.
5 is a diagram showing another example of an AI speaker system in the prior art.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains. It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

도면에 표시되고 아래에 설명되는 기능 블록들은 가능한 구현의 예들일 뿐이다. 다른 구현들에서는 상세한 설명의 사상 및 범위를 벗어나지 않는 범위에서 다른 기능 블록들이 사용될 수 있다. 또한, 본 발명의 하나 이상의 기능 블록이 개별 블록들로 표시되지만, 본 발명의 기능 블록들 중 하나 이상은 동일 기능을 실행하는 다양한 하드웨어 및 소프트웨어 구성들의 조합일 수 있다.The functional blocks shown in the drawings and described below are merely examples of possible implementations. Other functional blocks may be used in other implementations without departing from the spirit and scope of the detailed description. Also, although one or more functional blocks of the present invention are represented as separate blocks, one or more of the functional blocks of the present invention may be combinations of various hardware and software configurations that perform the same function.

또한, 어떤 구성요소들을 포함한다는 표현은 개방형의 표현으로서 해당 구성요소들이 존재하는 것을 단순히 지칭할 뿐이며, 추가적인 구성요소들을 배제하는 것으로 이해되어서는 안 된다.In addition, the expression that includes certain components is an open expression and merely refers to the existence of the corresponding components, and should not be construed as excluding additional components.

또한, 어떤 구성요소가 다른 구성요소에 연결되어 있다거나 접속되어 있다고 언급될 때에는, 그 다른 구성요소에 직접적으로 연결 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 한다.In addition, when it is said that a certain element is connected to or connected to another element, it may be directly connected or connected to the other element, but it should be understood that another element may exist in between.

도 1은 본 발명의 외부 연결용 AI 스피커 장치 및 이와 외부 기기 간 연결 시스템이 적용되는 네트워크 환경을 나타내는 전체 구성도이다.1 is an overall configuration diagram showing a network environment to which an AI speaker device for external connection of the present invention and a connection system between it and an external device are applied.

이하에서 제시된 본 발명의 외부 연결용 AI 스피커 장치 및 이와 외부 기기 간 연결 시스템(이하, AI 스피커 시스템)은 도 1에 도시된 바와 같이 가정에서 전화선(또는 전용선)에 연결되어 지상파·케이블·위성방송을 수신해 TV를 시청하게 하기 위한 기존의 인공지능(AI) 기능이 구현되지 않은 셋톱박스(100) 외부에 범용 직렬 버스(USB)와 같은 인터페이스를 통해 AI 스피커(200)에 연결하여 셋톱박스(100)와 AI 스피커(200)를 상호 연동시킬 수 있는 네트워크를 지원하는 경우를 바람직한 실시예로서 제안한다.As shown in FIG. 1, the AI speaker device for external connection and the connection system between it (hereinafter, AI speaker system) of the present invention presented below are connected to a telephone line (or dedicated line) at home to broadcast terrestrial, cable, and satellite broadcasting. By connecting to the AI speaker 200 through an interface such as a universal serial bus (USB) outside the set-top box 100 that does not implement the existing artificial intelligence (AI) function for receiving and watching TV, the set-top box ( 100) and a case in which a network capable of interworking with the AI speaker 200 is supported is proposed as a preferred embodiment.

설명의 편의를 위해, 구체적인 실시예에서 AI 스피커(200)가 연결되는 외부 기기가 셋톱박스(100)인 것으로 특정하여 설명하지만, 본 명세서에서 셋톱박스(100)는 다양한 전송매체를 통해 멀티미디어 컨텐츠를 제공 받아 디스플레이하는 장치를 의미한다. 전송매체로는 지상파, 위성, 케이블, 인터넷 등을 포함한다. 또한 일반적으로 셋톱박스라 불리는 제품에 한정되지 않으며 TV, 모니터, 스피커등 본 기술을 사용할 수 있는 모든 오디오, 비디오, 디스플레이 장치등을 널리 포함할 수 있다. 또한 설명의 편의를 위해, 셋톱박스(100)와 AI 스피커(200)는 1991년 리누스 토르발즈가 버전 0.02를 공개한 유닉스 기반 개인 컴퓨터용 공개 운영 체제인 LinuxOS 환경에서 실행되는 것으로 가정하나, 이에 제한되지는 않는다. For convenience of explanation, in a specific embodiment, the external device to which the AI speaker 200 is connected is specifically described as the set-top box 100, but in this specification, the set-top box 100 provides multimedia content through various transmission media. It means a device that is provided and displayed. The transmission medium includes terrestrial waves, satellites, cables, the Internet, and the like. In addition, it is not generally limited to a product called a set-top box, and may include all audio, video, and display devices that can use this technology, such as TVs, monitors, and speakers. Also, for convenience of explanation, it is assumed that the set-top box 100 and the AI speaker 200 run in the LinuxOS environment, which is an open operating system for Unix-based personal computers that Linus Torvalds released version 0.02 in 1991, but is not limited thereto. does not

구체적으로, 셋톱박스(100)는 외부의 컨텐츠 서버(30)로부터 멀티미디어 컨텐츠를 제공받아 재생하며 그에 따른 재생 화면을 디스플레이 장치(20)에 표시한다.Specifically, the set-top box 100 receives and plays multimedia content from the external content server 30 , and displays a corresponding playback screen on the display device 20 .

또한, AI 스피커(200)는 사용자의 음성을 전처리하여 그 결과를 셋톱박스(100)로 전달하며, 그에 따라 외부의 인공지능 서버(50)와 연동하여 인공지능 기능이 이루어지도록 보조한다. AI 스피커(200)는 셋톱박스(100)와 연동하여 동작함으로써 멀티미디어 컨텐츠 제공 기능과 인공지능 처리 기능을 연동시킨다.In addition, the AI speaker 200 pre-processes the user's voice and delivers the result to the set-top box 100 , and thus assists in performing the artificial intelligence function by interworking with the external artificial intelligence server 50 . The AI speaker 200 interlocks the multimedia content providing function and the artificial intelligence processing function by operating in conjunction with the set-top box 100 .

예를 들어, AI 스피커(200)가 사용자의 음성을 인식한 결과에 대응하여 셋톱박스(100)가 각종 기능제어를 수행한다. 또한, 예를 들어, 사용자가 AI 스피커(200)와 음성 인터랙션을 수행하여 특정의 영화를 추천받은 후에 AI 스피커(200)에 대하여 그 컨텐츠에 대한 즉시 재생을 요구하였다면, 그에 대응하여 셋톱박스(100)는 해당 영화 컨텐츠를 컨텐츠 서버(30)로부터 제공받아 재생한다. 사용자와 음성(음성) 기반으로 인터랙션하기 위하여 AI 스피커(200)는 마이크 모듈(미도시)과 스피커 모듈을 구비한다.For example, the set-top box 100 performs various function controls in response to the result of the AI speaker 200 recognizing the user's voice. Also, for example, if the user requests immediate playback of the content from the AI speaker 200 after performing a voice interaction with the AI speaker 200 and receiving a specific movie recommendation, the set-top box 100 in response thereto ) receives the corresponding movie content from the content server 30 and plays it. In order to interact with the user based on voice (voice), the AI speaker 200 includes a microphone module (not shown) and a speaker module.

이때, 인공지능 서버(50)는 구글(Google Assistant), 아마존(Amazon Alexa) 등에서 제공하는 인공지능 서버 장치를 나타낼 수 있다. AI 스피커(200)는 자체적으로 인공지능 기능을 완비하도록 구현될 수도 있지만, 인터넷을 통해 외부의 인공지능 서버(50)와 연동하도록 구성하는 것이 더 바람직하다.In this case, the artificial intelligence server 50 may represent an artificial intelligence server device provided by Google (Google Assistant), Amazon (Amazon Alexa), or the like. Although the AI speaker 200 may be implemented to be fully equipped with an artificial intelligence function by itself, it is more preferable to configure it to work with an external artificial intelligence server 50 through the Internet.

도 1에서 셋톱박스(100)는 하드웨어 측면에서 볼 때 종래 보급된 디지털 셋톱박스로도 충분하므로, 이와 같은 종래 보급된 셋톱박스에 펌웨어 업그레이드를 수행함으로써 본 발명을 구현할 수도 있다.In FIG. 1, the set-top box 100 is sufficient as a conventionally distributed digital set-top box in terms of hardware, so the present invention may be implemented by performing a firmware upgrade on such a conventionally distributed set-top box.

도 2는 본 발명의 AI 스피커 장치와 외부 기기 간 연결 시스템이 설치될 수 있는 AI 스피커 장치가 셋톱박스와 연결된 개략적인 구성을 나타낸 기능 블록도이다.2 is a functional block diagram illustrating a schematic configuration in which an AI speaker device in which an AI speaker device and an external device connection system of the present invention can be installed is connected to a set-top box.

도 2를 참조하면, 본 발명에서 셋톱박스(100)는 외부 서버 연동부(110), 디지털 외부 접속부(120), AI 스피커 연동부(130), 재생오디오 제공부(140), 컨텐츠 재생 처리부(150)를 포함하여 구성된다. 또한, 본 발명에서 AI 스피커(200)는 마이크 음성입력부(210), 디지털 외부 접속부(220), 셋톱박스 연동부(230), 재생오디오 버퍼부(240), 제어부(250), 오디오 재생용 채널(260), 오디오 코덱부(265), 에코 캔슬레이션용 채널(270), 에코 캔슬레이션부(275), 사용자 음성처리용 채널(280), 스피커 음성출력부(290)를 포함하여 구성된다.Referring to FIG. 2 , in the present invention, the set-top box 100 includes an external server interworking unit 110 , a digital external connection unit 120 , an AI speaker interworking unit 130 , a playback audio providing unit 140 , and a content reproduction processing unit ( 150) is included. In addition, in the present invention, the AI speaker 200 includes a microphone audio input unit 210 , a digital external connection unit 220 , a set-top box interworking unit 230 , a playback audio buffer unit 240 , a control unit 250 , and an audio reproduction channel. 260 , an audio codec unit 265 , an echo cancellation channel 270 , an echo cancellation unit 275 , a user voice processing channel 280 , and a speaker audio output unit 290 .

먼저, 본 발명에서 셋톱박스(100)를 구성하는 각 요소에 대해 살펴본다.First, look at each element constituting the set-top box 100 in the present invention.

본 발명에서 셋톱박스(100)는 디지털 인터페이스를 통해 AI 스피커 장치(200)와 외부 접속하면 AI 스피커 장치(200)를 USB 오디오 디바이스로 설정하고, 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로서 AI 스피커 장치(200)로 USB 오디오에 따라 스트리밍 출력하고, 디지털 인터페이스를 통해 AI 스피커 장치(200)로부터 사용자 음성의 전처리 결과를 전달받아 외부의 인공지능 서버(50)로 전달하여 인공지능 처리 결과를 전달받고, 인공지능 처리 결과에 따라 멀티미디어 컨텐츠를 재생하고, 인공지능 처리 결과를 디지털 인터페이스를 통해 AI 스피커 장치(200)로 전달하는 기능을 수행한다.In the present invention, when the set-top box 100 is externally connected with the AI speaker device 200 through a digital interface, the AI speaker device 200 is set as a USB audio device, and the playback audio of multimedia content is used as an echo reference signal. Streaming is output according to the USB audio to 200, and the pre-processing result of the user's voice is received from the AI speaker device 200 through the digital interface and delivered to the external artificial intelligence server 50 to receive the artificial intelligence processing result, It plays a multimedia content according to the artificial intelligence processing result, and performs a function of delivering the artificial intelligence processing result to the AI speaker device 200 through a digital interface.

이를 위해, 외부 서버 연동부(110)는 인터넷을 통해 외부의 인공지능 서버(50)와 연동하여 AI 스피커 장치(200)로부터 사용자 음성의 전처리 결과를 전달받으면 외부의 인공지능 서버(50)로 전달하여 인공지능 처리 결과를 전달받고, 인터넷을 통해 외부의 컨텐츠 서버(30)와 연동하여 인공지능 처리 결과에 대응하는 멀티미디어 컨텐츠를 제공받는다.To this end, the external server interworking unit 110 interworks with the external artificial intelligence server 50 through the Internet to receive the pre-processing result of the user's voice from the AI speaker device 200, it is transferred to the external artificial intelligence server 50 Thus, the artificial intelligence processing result is transmitted, and the multimedia content corresponding to the artificial intelligence processing result is provided by interworking with the external content server 30 through the Internet.

디지털 외부 접속부(120)는 디지털 인터페이스를 통해 AI 스피커 장치(200)와 외부 접속하고 그 외부 접속된 AI 스피커 장치(200)를 인식하면 USB 오디오 디바이스로 설정한다.The digital external connection unit 120 is externally connected to the AI speaker device 200 through a digital interface, and when the externally connected AI speaker device 200 is recognized, it is set as a USB audio device.

AI 스피커 연동부(130)는 디지털 인터페이스를 통해 AI 스피커 장치(200)와 연동하여 동작한다.The AI speaker interworking unit 130 operates in conjunction with the AI speaker device 200 through a digital interface.

재생오디오 제공부(140)는 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로서 AI 스피커 장치(200)로 USB 오디오에 따라 스트리밍 출력한다. 셋톱박스(100)와 AI 스피커(200) 간의 USB 통신은 USB 표준 규격 중 USB Audio Device Class를 사용하여 AI 스피커가 오디오 장치가 되도록 구성하는 것이 바람직하다.The playback audio providing unit 140 outputs the playback audio of the multimedia content as an echo reference signal to the AI speaker device 200 according to USB audio. The USB communication between the set-top box 100 and the AI speaker 200 is preferably configured so that the AI speaker becomes an audio device by using the USB Audio Device Class among the USB standard standards.

이때, 오디오 데이터를 주고 받는 기능은 ALSA(Advanced Linux Sound Architecture)의 라이브러리 중에서 aPlay, aRecord 함수를 활용하여 구현할 수 있다. 재생오디오 제공부(140)는 AI 스피커 장치(200)를 USB 오디오 장치로 인식한다.In this case, the function of sending and receiving audio data can be implemented by using the aPlay and aRecord functions in the ALSA (Advanced Linux Sound Architecture) library. The playback audio providing unit 140 recognizes the AI speaker device 200 as a USB audio device.

컨텐츠 재생 처리부(150)는 컨텐츠 서버(30)로부터 제공되는 멀티미디어 컨텐츠를 오디오/비디오 재생 처리한다.The content reproduction processing unit 150 performs audio/video reproduction processing for multimedia content provided from the content server 30 .

다음으로, 본 발명에서 셋톱박스(100)에 대해 디지털 인터페이스를 통해 외부 연결되어 상호 연동을 통해 AI 스피커 시스템을 제공하는 AI 스피커(200)를 구성하는 각 요소에 대해 살펴본다.Next, each element constituting the AI speaker 200 that is externally connected through a digital interface to the set-top box 100 in the present invention and provides an AI speaker system through interworking will be described.

먼저, 마이크 음성입력부(210)는 AI 스피커(200)와 관련하여 주변 음성 신호를 수집하여 입력한다.First, the microphone voice input unit 210 collects and inputs surrounding voice signals in relation to the AI speaker 200 .

디지털 외부 접속부(220)는 디지털 인터페이스를 통해 셋톱박스(100)와 외부 접속하는 경로를 제공한다.The digital external connection unit 220 provides a path for external connection with the set-top box 100 through a digital interface.

셋톱박스 연동부(230)는 디지털 인터페이스를 통해 셋톱박스(100)와 연동하여 동작하도록 제어한다.The set-top box interworking unit 230 controls to operate in conjunction with the set-top box 100 through a digital interface.

재생오디오 버퍼부(240)는 셋톱박스(100)로부터 제공되는 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로 수신하여 임시 저장한다. 도 2를 참조하면 재생오디오 버퍼부(240)는 버퍼메모리(241)와 클럭제어기(242)를 구비하는데, 이에 대해서는 도 3을 참조하여 후술한다.The playback audio buffer unit 240 receives and temporarily stores the playback audio of the multimedia content provided from the set-top box 100 as an echo reference signal. Referring to FIG. 2 , the playback audio buffer unit 240 includes a buffer memory 241 and a clock controller 242 , which will be described later with reference to FIG. 3 .

제어부(250)는 디지털 외부 접속부(220)를 통해 인입되는 멀티미디어 컨텐츠의 재생 오디오를 재생오디오 버퍼부(240)에서 읽어서 오디오 재생용 채널(260)을 통해 오디오 코덱부(265)로 보내고 동일한 재생 오디오를 복사하여 에코 캔슬레이션용 채널(270)로 보낸다.The control unit 250 reads the reproduction audio of the multimedia content input through the digital external connection unit 220 from the reproduction audio buffer unit 240, sends it to the audio codec unit 265 through the channel 260 for audio reproduction, and sends the same reproduction audio is copied and sent to the channel 270 for echo cancellation.

오디오 재생용 채널(260)은 제어부(250)로부터 수신된 멀티미디어 컨텐츠의 재생 오디오를 오디오 코덱부(265)로 전송하고 오디오 코덱부(265)는 수신한 재생 오디오를 디코딩하여 스피커 음성출력부(290)로 보낸다.The audio reproduction channel 260 transmits the reproduced audio of the multimedia content received from the controller 250 to the audio codec unit 265, and the audio codec unit 265 decodes the received reproduced audio to the speaker audio output unit 290 ) to send

에코 캔슬레이션용 채널(270)은 제어부(250)로부터 수신된 멀티미디어 컨텐츠의 재생 오디오를 에코 캔슬레이션부(275)로 전송하고 에코 캔슬레이션부(275)는 수신한 재생 오디오를 에코 기준 신호로 참조하여 마이크 음성 입력부(210)가 수집하는 주변 음성 신호로부터 셋톱박스(100)에 기인한 멀티미디어 컨텐츠의 재생 오디오 에코 성분을 제거 처리하고 에코 성분이 제거된 음성 신호를 사용자 음성 처리용 채널(280)로 보낸다.The echo cancellation channel 270 transmits the reproduced audio of the multimedia content received from the controller 250 to the echo canceling unit 275, and the echo canceling unit 275 refers to the received reproduced audio as an echo reference signal. Thus, the microphone voice input unit 210 removes the reproduced audio echo component of the multimedia content caused by the set-top box 100 from the surrounding voice signal collected by the set-top box 100 and transfers the echo component-removed voice signal to the user voice processing channel 280. send.

일반적으로 셋톱박스(100) 주변은 컨텐츠 재생 소리로 인하여 매우 시끄러운데, 본 발명은 에코 캔슬레이션을 수행함으로써 주변의 시끄러움에도 불구하고 사용자의 음성을 정확하게 인식할 수 있도록 해주어 원거리 음성 제어(far field voice recognition & function control)를 제공한다. 일반적으로 동일 제품, 동일 회로에서 에코 캔슬레이션을 구현하는 것은 기술적 난이도가 높지 않으며, 그에 따라 도 4나 도 5와 같은 종래기술에서도 일반적으로 적용되어 있다. 그러나, 별도의 제품, 즉 셋톱박스(100)와 AI 스피커(200)로 분리되어 서로 별개의 회로에서 에코 캔슬레이션을 구현하는 것을 새로운 기술이다. 본 발명에서의 에코 캔슬레이션에 대해서는 도 3을 참조하여 후술한다.In general, the vicinity of the set-top box 100 is very noisy due to the sound of content playback, but the present invention performs echo cancellation to accurately recognize the user's voice in spite of the surrounding noise, thereby controlling the far field voice (far field voice). recognition & function control). In general, the technical difficulty is not high to implement echo cancellation in the same product and the same circuit, and accordingly, it is generally applied even in the prior art such as FIGS. 4 and 5 . However, it is a new technology to implement echo cancellation in separate circuits separated into separate products, that is, the set-top box 100 and the AI speaker 200 . Echo cancellation in the present invention will be described later with reference to FIG. 3 .

사용자 음성 처리용 채널(280)은 재생 오디오 에코 성분이 제거된 주변 음성 신호를 이용하여 사용자 음성을 전처리하고 제어부(250)로 전송하여 제어부(250)가 이를 디지털 외부 접속부(220)를 통해 셋톱박스(100)로 그리고 인공지능 서버(50)로 전달하게 한다.The user voice processing channel 280 pre-processes the user's voice using the surrounding voice signal from which the reproduced audio echo component has been removed and transmits it to the control unit 250 so that the control unit 250 transmits it to the set-top box through the digital external connection unit 220 . (100) and to the artificial intelligence server (50).

스피커 음성출력부(290)는 셋톱박스(100)를 통해 획득되는 인공지능 응답 데이터를 내장 또는 외장 스피커를 통해 출력한다. 디스플레이 장치(20)가 꺼져있어 그와 연결된 스피커 장치를 활용할 수 없는 경우에도 스피커 음성출력부(290)가 제공하는 자체 스피커를 통해 인공지능 처리 결과를 제공할 수 있다.The speaker audio output unit 290 outputs artificial intelligence response data obtained through the set-top box 100 through a built-in or external speaker. Even when the display device 20 is turned off and the connected speaker device cannot be used, the artificial intelligence processing result may be provided through the speaker provided by the speaker audio output unit 290 .

도 3은 본 발명의 바람직한 실시 예에 따른 AI 스피커 장치와 외부 기기 간 연결 시스템에서 AI 스피커 장치와 외부 기기 간에 음성인식을 처리하는 개략적인 구성을 나타낸 기능 블록도이다. 이때, 도 3에는 본 발명에서 AI 스피커(200)에서의 음성인식 성능을 향상시키기 위해 수행하는 에코 캔슬레이션 처리에서 AI 스피커(200)가 수행하는 버퍼 제어 동작이 개념적으로 도시되어 있다.3 is a functional block diagram illustrating a schematic configuration of processing voice recognition between an AI speaker device and an external device in a system for connecting an AI speaker device and an external device according to a preferred embodiment of the present invention. At this time, FIG. 3 conceptually illustrates a buffer control operation performed by the AI speaker 200 in the echo cancellation process performed to improve the voice recognition performance of the AI speaker 200 in the present invention.

도 3을 참조하면, AI 스피커(200)는 마이크 음성입력부(210)를 통해 주변 음성 신호를 수집하는데, 이 중에는 셋톱박스(100)에서 멀티미디어 컨텐츠를 재생함에 따른 재생 오디오 성분이 크게 반영되어 있다. 에코 캔슬레이션부(275)는 이처럼 마이크 음성입력부(210)가 수집 입력하는 주변 사운드 신호로부터 재생 오디오 에코 성분을 제거하려고 한다.Referring to FIG. 3 , the AI speaker 200 collects surrounding voice signals through the microphone voice input unit 210 , and among them, the playback audio component caused by the multimedia content being played in the set-top box 100 is largely reflected. The echo cancellation unit 275 attempts to remove the reproduced audio echo component from the ambient sound signal collected and input by the microphone audio input unit 210 as described above.

이를 위해, 셋톱박스(100)의 재생오디오 제공부(140)는 멀티미디어 컨텐츠의 재생 오디오를 에코 기준 신호로서 AI 스피커 장치(200)로 USB 오디오에 따라 스트리밍 출력한다. 이렇게 제공되는 에코 기준 신호(즉, 셋톱박스(100)에서의 멀티미디어 컨텐츠의 재생 오디오)는 AI 스피커(200)의 재생오디오 버퍼부(240)에서 버퍼메모리(241)에 임시 저장되며, 적당한 타이밍에서 에코 캔슬레이션부(275)로 전달되어 사운드 처리에 사용된다.To this end, the playback audio providing unit 140 of the set-top box 100 outputs the playback audio of the multimedia content as an echo reference signal to the AI speaker device 200 according to USB audio streaming. The echo reference signal provided in this way (that is, the playback audio of the multimedia content in the set-top box 100) is temporarily stored in the buffer memory 241 in the playback audio buffer unit 240 of the AI speaker 200, and at an appropriate timing It is transmitted to the echo cancellation unit 275 and used for sound processing.

특히, 이렇게 셋톱박스(100)의 재생오디오 제공부(140)로부터 출력되는 멀티미디어 컨텐츠의 재생 오디오를 버퍼메모리(241)에 임시 저장했다가 적당한 타이밍에 에코 기준 신호로서 AI 스피커 장치(200)의 에코 캔슬레이션부(275)로 전달하여 사운드 처리에 사용할 때, 셋톱박스(100)와 AI 스피커(200) 간 클럭이 서로 다름으로 인해 한 쪽의 속도가 너무 빠르거나 느려 버퍼가 버티지 못하고 처리를 중지하는 버퍼 오버런(buffer over run) 또는 버퍼 언더런(buffer under run)이 발생하게 되고, 이로 인해 일정한 속도로 에코 기준 신호를 전달하는데 문제가 생기게 될 수 있다.In particular, the playback audio of the multimedia content output from the playback audio providing unit 140 of the set-top box 100 is temporarily stored in the buffer memory 241, and the echo of the AI speaker device 200 is used as an echo reference signal at an appropriate timing. When it is transmitted to the cancellation unit 275 and used for sound processing, the speed of one side is too fast or too slow due to the difference in clocks between the set-top box 100 and the AI speaker 200. A buffer over run or a buffer under run may occur, which may cause a problem in transmitting the echo reference signal at a constant rate.

이에, 클럭제어기(242)는 재생오디오 버퍼부(240)의 상태를 확인하여 버퍼메모리(241)의 동작 클럭을 가변적으로 제어하는데, 본 발명에서는 버퍼메모리(241)를 모니터링하여 버퍼메모리(241)의 가용률에 반비례하여 동작 클럭을 제어한다. 즉, 클럭제어기(242)는 버퍼메모리(241)의 사용량을 식별하고, 사용량이 미리 설정한 제 1 임계치를 상회하면 버퍼메모리(241)의 동작 클럭을 증가 제어하고, 사용량이 미리 설정한 제 2 임계치를 하회하면 버퍼메모리(241)의 동작 클럭을 감소 제어한다.Accordingly, the clock controller 242 checks the state of the playback audio buffer unit 240 to variably control the operation clock of the buffer memory 241. In the present invention, the buffer memory 241 is monitored by monitoring the buffer memory 241. Controls the operation clock in inverse proportion to the availability rate of That is, the clock controller 242 identifies the usage amount of the buffer memory 241 , and increases and controls the operation clock of the buffer memory 241 when the usage amount exceeds a preset first threshold, and the usage amount is set in the second When the threshold value is lower than the threshold, the operation clock of the buffer memory 241 is controlled to decrease.

개념적으로는, 버퍼메모리(241)의 사용량이 임계치보다 많으면 버퍼메모리(241)의 가용률이 낮아지므로 동작 클럭을 증가시켜 에코 기준 신호의 데이터를 좀 더 빠르게 에코 캔슬레이션부(275)로 전달하고, 반대로 버퍼메모리(241)의 사용량이 임계치보다 적으면 버퍼메모리(241)의 가용률이 높아지므로 동작 클럭을 감소시켜 에코 기준 신호의 데이터를 좀 더 천천히 에코 캔슬레이션부(275)로 전달하여 버퍼메모리(241)의 오버런 또는 언더런을 방지함으로써 AI 스피커 장치(200)로 전송되어야 할 재생 오디오의 데이터 유실을 방지하는 것이다.Conceptually, if the usage of the buffer memory 241 is greater than the threshold, the availability rate of the buffer memory 241 is lowered, so the operation clock is increased to transmit the data of the echo reference signal to the echo cancellation unit 275 more quickly, Conversely, if the amount of use of the buffer memory 241 is less than the threshold, the usability of the buffer memory 241 increases, so the operation clock is reduced and the data of the echo reference signal is transmitted to the echo cancellation unit 275 more slowly to the buffer memory ( 241 ) is to prevent data loss of the reproduced audio to be transmitted to the AI speaker device 200 by preventing overrun or underrun.

예를 들어, 재생오디오 버퍼부(240)의 전체 크기를 100이라고 할 때 오버런과 언더런의 기준을 미리 설정한 제 1 임계치, 이를테면, 각각 사용량이 80% 이상일 경우 오버런 그리고 미리 설정한 제 2 임계치, 이를테면 20% 이하일 경우 언더런이라고 하고, 버퍼메모리(241)의 현재의 동작 클럭이 100 KHz라고 할 때 오버런 발생시 소정의 제 1 속도만큼, 이를테면 + 0.2%(즉, 100.2KHz)로 그리고 언더런 발생 시 소정의 제2 속도만큼, 이를테면 - 0.2%(즉, 99.8KHz)로, 동작 클럭을 조정함으로써 AI 스피커 장치(200)의 에코 캔슬레이션부(275)로 전달되는 멀티미디어 컨텐츠의 재생 오디오의 전송 속도를 좀 더 빠르게 그리고 좀 더 천천히 조정함으로써, 재생오디오 버퍼부(240)의 오버런 또는 언더런을 방지하여 AI 스피커 장치(200)로 전송되어야 할 재생 오디오의 데이터 유실을 방지할 수 있다.For example, when the total size of the playback audio buffer unit 240 is 100, a first threshold at which the criteria of overrun and underrun are set in advance, that is, overrun and a preset second threshold when the usage is 80% or more, respectively, For example, if it is 20% or less, it is called underrun, and when the current operating clock of the buffer memory 241 is 100 KHz, a predetermined first speed when overrun occurs, for example, +0.2% (i.e., 100.2 KHz), and predetermined when underrun occurs. By adjusting the operation clock by a second rate of −0.2% (ie, 99.8 KHz), for example, the transmission rate of the playback audio of the multimedia content delivered to the echo cancellation unit 275 of the AI speaker device 200 is slightly increased. By adjusting more quickly and more slowly, overrun or underrun of the playback audio buffer unit 240 may be prevented, thereby preventing data loss of playback audio to be transmitted to the AI speaker device 200 .

에코 캔슬레이션부(275)는 주변 사운드와 에코 기준 신호 간의 시간 편차가 일정 임계시간(예: 수백 밀리초)을 넘기게 되면 캔슬레이션 효율이 급격하게 떨어지는 특성을 보인다. USB 통신이라는 저신뢰 통신매체를 활용하여 시간 편차를 일정하게 제어하기 위해서 오디오 데이터 전송속도의 항상성이 담보되고 기기 간 동기화를 위하여 버퍼 클럭을 동적으로 제어한다.When the time deviation between the ambient sound and the echo reference signal exceeds a predetermined threshold time (eg, hundreds of milliseconds), the echo canceller 275 exhibits a characteristic of abruptly decreasing cancellation efficiency. Utilizing a low-reliability communication medium called USB communication, constant audio data transmission speed is guaranteed to control time deviation, and the buffer clock is dynamically controlled for synchronization between devices.

이를 위해, 사용자 음성처리용 채널(280)은 주변 사운드와 에코 기준 신호 간의 시간 편차가 일정 임계시간(예: 수백 밀리초)을 넘는지, 즉 클럭제어기(242)에서 클럭의 증가율 또는 감소율이 일정 임계율을 넘는지를 모니터링하고 그렇다면 에코 캔슬레이션부(275)로부터 수신된 주변 음성 신호를 이용하는 것이 아니라 제어부(250)에 버퍼메모리(241)로부터 멀티미디어 컨텐츠의 재생 오디오를 다시 보낼 것을 요청한다.To this end, the user voice processing channel 280 determines whether the time deviation between the ambient sound and the echo reference signal exceeds a certain threshold time (eg, several hundred milliseconds), that is, the rate of increase or decrease of the clock in the clock controller 242 is constant. It monitors whether the threshold rate is exceeded, and if so, requests the control unit 250 to re-send the playback audio of the multimedia content from the buffer memory 241 instead of using the surrounding voice signal received from the echo cancellation unit 275 .

이상에서 구체적인 실시예를 들어 본 발명을 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다.Although the present invention has been described in detail with reference to specific embodiments above, the present invention is not necessarily limited to these embodiments, and various modifications may be made within the scope without departing from the spirit of the present invention.

20 : 디스플레이 장치
30 : 컨텐츠 서버
50 : 인공지능 서버
100 : 셋톱박스
110 : 외부 서버 연동부
120 : 디지털 외부 접속부
130 : AI 스피커 연동부
140 : 재생오디오 제공부
150 : 컨텐츠 재생 처리부
200 : AI 스피커
210 : 마이크 음성입력부
220 : 디지털 외부 접속부
230 : 셋톱박스 연동부
240 : 재생오디오 버퍼부
241 : 버퍼메모리
242 : 클럭제어기
250 : 제어부
260 : 오디오 재생용 채널
265 : 오디오 코덱부
270 : 에코 캔슬레이션용 채널
275 : 에코 캔슬레이션부
280 : 사용자 음성처리용 채널
290 : 스피커 음성출력부20: display device
30: content server
50: artificial intelligence server
100: set-top box
110: external server linkage
120: digital external connection part
130: AI speaker linkage
140: playback audio providing unit
150: content reproduction processing unit
200: AI speaker
210: microphone audio input unit
220: digital external connection
230: set-top box linkage
240: playback audio buffer unit
241: buffer memory
242: clock controller
250: control unit
260: channel for audio playback
265: audio codec unit
270: channel for echo cancellation
275: echo cancellation unit
280: user voice processing channel
290: speaker audio output unit

Claims

A connection system between an AI speaker device for external connection and an external device capable of interworking with the AI speaker device by a USB interface,
It includes a digital external connection unit, a playback audio buffer unit, an audio codec unit, a speaker audio output unit, an echo cancellation unit, a microphone audio input unit, and a channel for user voice processing,
The digital external connection unit receives audio data from an external device and transmits a voice signal received from the user voice processing channel to an external server,
The playback audio buffer unit includes a buffer memory and a clock controller,
The buffer memory receives and temporarily stores the reproduced audio of the multimedia content provided from the external device as an echo reference signal, and transmits the stored audio data to the echo cancellation unit and the audio codec unit;
The clock controller controls the operation clock of the buffer memory in inverse proportion to the availability rate of the buffer memory, and monitors the state of the buffer memory to increase the clock when an overrun occurs in the buffer memory and decrease the clock when an underrun occurs to control the playback audio. to prevent loss
The audio codec unit transmits the audio data received from the buffer memory to the speaker audio output unit,
The speaker audio output unit converts the audio data received from the audio codec unit into an audio signal and transmits it to the outside;
The echo canceling unit removes the echo component of the audio data received from the external device from the external audio signal received from the microphone audio input unit by referring to the audio data received from the buffer memory as an echo reference signal, thereby providing a user's voice signal. and transmits the user's voice signal from which the echo component has been removed to a user voice processing channel,
The microphone voice input unit collects the user's voice signal and transmits it to the echo cancellation unit,
A connection system between the AI speaker unit for external connection and an external device.

The method according to claim 1,
AI for external connection, further comprising a channel for user voice processing, further comprising a channel for requesting retransmission of the reproduced audio to the buffer memory when the rate of increase or decrease of the clock in the clock controller exceeds a certain threshold rate A connection system between a speaker unit and an external device.

The method according to claim 1,
The clock controller identifies the amount of use of the buffer memory, increases and controls the operation clock of the buffer memory when the amount exceeds a first threshold, and when the amount of use falls below a second threshold, the buffer A connection system between an AI speaker device for external connection and an external device, characterized in that it reduces and controls the operation clock of the memory.

The method according to claim 1,
The user voice processing channel,
When the clock of the buffer memory monitored by the clock controller is within the threshold, the audio signal received by the echo canceling unit is transmitted to the digital external connection unit,
When the clock of the buffer memory monitored by the clock controller deviates from the threshold, the buffer memory requests the retransmission of the stored audio data to the echo canceller.
A connection system between the AI speaker unit for external connection and an external device.

delete

The method according to claim 1,
The external device is a set-top box (SetTop Box) connection system between the AI speaker device for external connection and an external device, characterized in that it may be.