KR102351819B1

KR102351819B1 - Method and apparatus for simultaneously supporting remote voice recognition of two microphones and multiple broadcasters or ai service companies

Info

Publication number: KR102351819B1
Application number: KR1020210039335A
Authority: KR
Inventors: 원종철
Original assignee: 주식회사 이노피아테크
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-01-17

Abstract

Proposed are a sampling method and device of a PDM input when simultaneously supporting a remote voice recognition of multiple broadcasting and communication companies or AI service companies with two microphones. The sampling method of the PDM input when simultaneously supporting the remote voice recognition of the multiple broadcasting communication companies or AI service companies with two microphones according to one embodiment comprises a step of installing a software module so that a CPU receives the microphone data at a correct sampling position when an input of an echo canceled microphone and an input of a non-echo canceled microphone are simultaneously transmitted to the CPU in a DSP. Therefore, the present invention is capable of receiving the microphone data of the CPU at the correct sampling position.

Description

Sampling method and device for PDM input when two microphones simultaneously support remote voice recognition of multiple broadcasters or AI service providers

아래의 실시예들은 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식(Far Field Voice Recognition)을 동시에 지원할 때 PDM 입력의 샘플링 방법 및 장치에 관한 것이다. The following embodiments relate to a sampling method and apparatus of a PDM input when simultaneously supporting Far Field Voice Recognition of a plurality of broadcasters or AI service providers with two microphones.

근래의 셋톱박스(Set-top box)에는 음성 명령으로 날씨, 뉴스 등을 묻고 셋톱박스에서 이 음성 명령을 처리하여 내장된 스피커를 통해 음성으로 결과를 알려주기 위해(Text To Speech, TTS) 마이크 입력을 설치하게 된다. 이 음성 명령을 처리하기 위해서 기본적으로 2개의 마이크를 일정 거리 이상 간격으로 설치하여 신뢰성을 높이고 내장된 스피커 출력 또는 셋톱박스가 출력하고 있는 스트림(Stream)의 TV 출력 사운드가 다시 마이크를 통해 입력되는 것을 제거해주기 위해 어쿠스틱 에코 제거(Acoustic Echo Cancel, AEC)라는 기술을 사용한다. In recent set-top boxes, you can ask for weather, news, etc. with voice commands, and the set-top box processes the voice commands to notify the results by voice through the built-in speaker (Text To Speech, TTS). Microphone input will install In order to process this voice command, two microphones are basically installed more than a certain distance apart to increase reliability and prevent the TV output sound of the built-in speaker output or the stream output from the set-top box from being input again through the microphone. To cancel it, a technique called Acoustic Echo Cancel (AEC) is used.

이러한 에코 제거(Echo Cancel)된 마이크의 입력을 받기를 원하는 방송통신사업자의 경우는 외부에 디지털 신호 프로세서(Digital Signal Processor, DSP)를 사용하고, DSP에 스피커 또는 TV로 출력되는 사운드 레퍼런스(Reference)를 동시에 출력하며 DSP는 마이크 입력에서 사운드 레퍼런스 성분을 제거하고 마이크에 입력되는 음성 명령만 CPU로 제공하게 된다.Broadcasting and telecommunication service providers who want to receive the input of such echo cancelled microphones use an external digital signal processor (DSP), and use a sound reference that is output to a speaker or TV to the DSP. is output simultaneously, the DSP removes the sound reference component from the microphone input and provides only the voice command input to the microphone to the CPU.

그러나 방송통신사업자가 Google의 Voice Assistant(GVA) 혹은 Amazon AI 서비스 등을 동시에 지원하기를 요구할 수 있는데 Google Voice Assistant(GVA)나 Amazon AI 서비스 등은 외부 DSP를 사용하지 않고 마이크 입력 자체와 사운드 레퍼런스를 자사의 서버로 제공 받아 자체 처리를 거쳐 결과를 제공하게 된다.However, broadcasters may request to support Google's Voice Assistant (GVA) or Amazon AI service at the same time. It is provided to the company's server, and the result is provided through its own processing.

이런 경우 CPU에게는 DSP 처리를 거친 마이크 입력과 처리를 거치지 않은 마이크 입력이 모두 제공되어야 하며, 동일한 마이크 입력을 DSP와 CPU가 같이 입력 받을 때 DSP와 CPU의 마이크 PDM Clock의 위상 차이에 의해 동일한 입력을 받기가 어려워 질 수 있다.In this case, the CPU must be provided with both a microphone input that has undergone DSP processing and a microphone input that has not been processed. It can be difficult to get.

한국공개특허 10-2010-0100522호는 이러한 다중채널 ＩＰＴＶ 셋톱박스 및 다중채널 서비스를 위한 ＩＰＴＶ 방송 시스템에 관한 기술을 기재하고 있다.Korean Patent Application Laid-Open No. 10-2010-0100522 describes a technology related to such a multi-channel IPTV set-top box and an IPTV broadcasting system for a multi-channel service.

한국공개특허 10-2010-0100522호Korean Patent Publication No. 10-2010-0100522

실시예들은 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법 및 장치에 관하여 기술하며, 보다 구체적으로 2개의 마이크만을 사용하여 방송통신사업자와 다른 AI 서비스 업체의 음성 인식을 동시에 지원할 때 PDM 입력의 샘플링하는 기술을 제공한다. The embodiments describe a sampling method and apparatus of a PDM input when simultaneously supporting remote voice recognition of a plurality of broadcasting service providers or AI service providers with two microphones, and more specifically, using only two microphones, AI different from broadcasting service providers It provides a technology for sampling PDM input when simultaneously supporting voice recognition of a service provider.

실시예들은 소프트웨어 모듈을 설치하여 정확한 샘플링 위치에서 CPU의 마이크 데이터를 입력 받을 수 있도록 하는, 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법 및 장치를 제공하는데 있다.Embodiments provide a method and apparatus for sampling PDM input when simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones, which installs a software module to receive the microphone data of the CPU at an accurate sampling position is to provide

또한, 실시예들은 DSP PDM Clock과 CPU PDM Clock의 위상이 반대이더라도 송화자의 좌, 우 위치를 DSP 및 CPU에서 동일하게 파악할 수 있는, 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법 및 장치를 제공하는데 있다. In addition, embodiments provide long-distance voice recognition of a plurality of broadcasters or AI service providers with two microphones, in which the left and right positions of the speaker can be identified equally from the DSP and CPU even if the phases of the DSP PDM Clock and the CPU PDM Clock are opposite. It is to provide a sampling method and apparatus for PDM input when simultaneously supporting .

일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법은, DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과, 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달 시, 소프트웨어 모듈을 설치하여 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 단계를 포함하여 이루어질 수 있다. The sampling method of the PDM input when the remote voice recognition of a plurality of broadcasting service providers or AI service providers is supported simultaneously with two microphones according to an embodiment is an input of a microphone with echo cancellation in the DSP, and an echo cancellation (Echo) When the non-canceled microphone input is simultaneously transferred to the CPU, installing a software module may include the step of allowing the CPU to receive the microphone data at the correct sampling position.

2개의 마이크만을 사용하여 방송통신사업자가 원하는 상기 DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과 다른 AI 서비스 업체의 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 상기 CPU로 전달할 수 있다. Using only two microphones, the input of the microphone whose echo cancelled by the DSP desired by the broadcasting communication service provider and the input of the non-echo-canceled microphone of the other AI service provider can be simultaneously transmitted to the CPU.

상기 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 단계는, 송화자에게 특정 소리를 내도록 요청하는 단계; 상기 특정 소리를 이용하여 샘플링 위치를 변화시키면서 어느 위치에서 DSP의 결과와 같은 데이터가 입력되는지를 찾는 단계; 및 찾은 상기 샘플링 위치의 구간을 파악하여 그 값의 중간 값으로 위치를 정함에 따라 안정적인 마이크 입력 데이터를 얻도록 하는 단계를 포함할 수 있다. The step of allowing the CPU to receive the microphone data at the exact sampling position may include: requesting a speaker to make a specific sound; finding at which position data such as a result of DSP is input while changing a sampling position using the specific sound; and determining a section of the found sampling position and determining the position as an intermediate value of the value to obtain stable microphone input data.

상기 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 단계는, 송화자의 좌, 우 위치를 파악하고자 할 경우, 상기 송화자에게 셋톱박스의 좌측 또는 우측으로 이동하여 특정 소리를 내도록 요청하는 단계; 및 획득한 상기 특정 소리의 세기가 DSP와 CPU 양측에서 동일한 결과를 보이는 경우, 그대로 입력 데이터를 사용하는 단계를 포함할 수 있다. The step of allowing the CPU to receive the microphone data at the exact sampling position may include: when the speaker's left and right positions are to be identified, requesting the speaker to move to the left or right side of the set-top box to produce a specific sound; and using the input data as it is when the acquired intensity of the specific sound shows the same result in both the DSP and the CPU.

상기 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 단계는, 획득한 상기 특정 소리의 세기가 DSP와 CPU 양측에서 다른 결과를 보이는 경우, DSP의 위치를 사용하고 CPU 측은 방향을 반대로 바꾸는 단계를 더 포함할 수 있다. The step of allowing the CPU to receive the microphone data at the correct sampling position is the step of using the position of the DSP and reversing the direction on the CPU side when the acquired specific sound intensity shows different results on both sides of the DSP and CPU. may include more.

상기 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 단계는, DSP PDM Clock과 CPU PDM Clock의 위상 차이에 의해 송화자의 좌, 우 위치가 서로 반대로 파악되는 경우, 상기 CPU PDM Clock의 위상을 보정할 수 있다. In the step of allowing the CPU to receive the microphone data at the correct sampling position, the phase of the CPU PDM Clock is corrected when the left and right positions of the speaker are opposite to each other due to the phase difference between the DSP PDM Clock and the CPU PDM Clock. can do.

다른 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 장치는, DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달 시, 정확한 샘플링 위치에서 상기 CPU가 마이크 데이터를 입력 받도록 하는 소프트웨어 모듈을 포함하여 이루어질 수 있다. When remote voice recognition of a plurality of broadcasting communication operators or AI service providers is supported simultaneously with two microphones according to another embodiment, the sampling device of the PDM input is the input of the microphone and the echo cancellation (Echo Cancel) in the DSP. ) may include a software module that allows the CPU to receive microphone data at the correct sampling position when simultaneously transmitting the input of the microphone to the CPU.

실시예들에 따르면 소프트웨어 모듈을 설치하여 정확한 샘플링 위치에서 CPU의 마이크 데이터를 입력 받을 수 있도록 하는, 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법 및 장치를 제공할 수 있다.According to embodiments, a method for sampling PDM input when a software module is installed to receive the microphone data of the CPU at an accurate sampling location, and the remote voice recognition of a plurality of broadcasters or AI service providers is supported simultaneously with two microphones and devices may be provided.

또한, 실시예들에 따르면 DSP PDM Clock과 CPU PDM Clock의 위상이 반대이더라도 송화자의 좌, 우 위치를 DSP 및 CPU에서 동일하게 파악할 수 있는, 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법 및 장치를 제공할 수 있다. In addition, according to embodiments, even if the DSP PDM Clock and the CPU PDM Clock have opposite phases, the left and right positions of the speaker can be identified equally from the DSP and CPU. When simultaneously supporting voice recognition, a sampling method and apparatus for PDM input may be provided.

도 1 은 기존의 4개의 마이크가 사용되는 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 설명하기 위한 도면이다.
도 2는 일반적인 디지털 마이크의 Clock 입력 및 데이터 출력 방식을 설명하기 위한 블록도이다.
도 3은 일반적인 디지털 마이크의 Clock 입력 및 데이터 출력 방식을 설명하기 위한 타이밍 다이어그램이다.
도 4는 일반적인 DSP 및 CPU의 Clock이 정확하게 일치하지 않을 때 같은 음성 입력을 받을 수 없는 이유를 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 주파수는 일치하나 위상이 다른 경우의 예시를 나타내는 도면이다.
도 7은 일 실시예에 따른 CPU가 PDM의 입력을 받아 들이는 방법을 설명하기 위한 도면이다.
도 8은 일 실시예에 위상 차가 있어도 CPU가 마이크 입력을 정상적으로 받아들일 수 있는 이유를 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 나타내는 흐름도이다.
도 10은 일 실시예에 따른 CPU의 PDM Clock과 마이크 Clock(DSP PDM Clock)의 위상이 반대일 경우를 나타내는 도면이다.
도 11은 일 실시예에 따른 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법을 나타내는 흐름도이다.1 is a view for explaining a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication service providers or AI service providers using four conventional microphones.
2 is a block diagram illustrating a clock input and data output method of a general digital microphone.
3 is a timing diagram for explaining a clock input and data output method of a general digital microphone.
4 is a diagram for explaining the reason why the same voice input cannot be received when clocks of a general DSP and CPU do not exactly match.
5 is a diagram for explaining a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment.
6 is a diagram illustrating an example of a case in which frequencies coincide but phases are different according to an embodiment.
7 is a diagram for describing a method in which a CPU receives an input of a PDM according to an embodiment.
8 is a diagram for explaining the reason why the CPU can normally accept a microphone input even if there is a phase difference in one embodiment.
9 is a flowchart illustrating a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment.
10 is a diagram illustrating a case in which a PDM clock of a CPU and a microphone clock (DSP PDM clock) have opposite phases according to an embodiment.
11 is a flowchart illustrating a sampling method of a PDM input when remote voice recognition is simultaneously supported according to an embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in various other forms, and the scope of the present invention is not limited by the embodiments described below. In addition, various embodiments are provided in order to more completely explain the present invention to those of ordinary skill in the art. The shapes and sizes of elements in the drawings may be exaggerated for clearer description.

도 1 은 기존의 4개의 마이크가 사용되는 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 설명하기 위한 도면이다.1 is a view for explaining a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication service providers or AI service providers using four conventional microphones.

도 1을 참조하면, 기존에는 방송통신사업자의 원거리 음성(Far Field Voice, FFV)과 다른 AI 서비스 업체의 음성 인식을 사용하기 위해 4개의 마이크(Microphone)(31, 32)가 사용될 수 있다.Referring to FIG. 1 , conventionally, four microphones 31 and 32 may be used to use Far Field Voice (FFV) of a broadcasting service provider and voice recognition of another AI service provider.

방송통신사업자가 자사의 원거리 음성(FFV) 서비스와 다른 AI 서비스 업체의 음성 인식(Voice Recognition)을 동시에 지원하고자 하는 경우, 방송통신사업자는 원거리 음성(FFV) 서비스가 DSP(20)에 의해 에코 제거(Echo Cancel)된 마이크 입력을 받기를 원하며, 다른 AI 서비스 업체는 DSP(20)를 거치지 않은 순수 마이크 입력을 받기를 기대한다면 4개의 마이크(31, 32)를 사용하여 2개의 마이크(32)는 DSP(20)를 위한 마이크 입력으로 사용하고, 다른 2개의 마이크(31)는 AI 서비스 업체를 위한 마이크 입력으로 사용하게 된다.When a broadcasting service provider wants to simultaneously support its own far-field voice (FFV) service and another AI service provider's voice recognition, the broadcasting service provider must ensure that the far-field voice (FFV) service eliminates echoes by the DSP 20 If you want to receive (Echo Canceled) microphone input, and other AI service providers expect to receive pure microphone input that has not gone through DSP (20), use 4 microphones (31, 32) to create 2 microphones (32) is used as a microphone input for the DSP 20, and the other two microphones 31 are used as a microphone input for an AI service provider.

이는 원거리 음성(FFV) 서비스에서 사용되는 디지털 마이크(31, 32)가 DSP(20) 또는 중앙처리장치(Central Processing Unit, CPU)(10)로부터 하나의 Clock 신호를 받고, 이 Clock 신호에 동기된 하나의 마이크 데이터를 출력하기 때문이며, CPU(10)와 DSP(20)는 서로 다른 24MHz의 입력 Clock에 의해 동작하기 때문이다.This means that the digital microphones 31 and 32 used in the far-field voice (FFV) service receive one clock signal from the DSP 20 or the central processing unit (CPU) 10, and are synchronized with the clock signal. This is because one microphone data is output, and the CPU 10 and DSP 20 operate by different input clocks of 24 MHz.

CPU(10)와 DSP(20)가 서로 다른 24MHz Clock을 사용하는 경우 각자의 Clock은 정확한 24MHz Clock이 아닌, 예를 들어 23.9xxxMHz ~ 24.0xxxMHz 사이의 서로 다른 주파수와 또한 위상 차이를 보일 수 밖에 없기 때문에 동일한 마이크의 입력을 받는다고 보장할 수 없다. If the CPU 10 and DSP 20 use different 24 MHz clocks, each clock is not an accurate 24 MHz clock, but has a different frequency and phase difference between, for example, 23.9xxx MHz ~ 24.0xxx MHz. Therefore, there is no guarantee that the input from the same microphone is received.

도 2는 일반적인 디지털 마이크의 Clock 입력 및 데이터 출력 방식을 설명하기 위한 블록도이다. 도 3은 일반적인 디지털 마이크의 Clock 입력 및 데이터 출력 방식을 설명하기 위한 타이밍 다이어그램이다.2 is a block diagram illustrating a clock input and data output method of a general digital microphone. 3 is a timing diagram for explaining a clock input and data output method of a general digital microphone.

CPU와 DSP는 서로 다른 입력 Clock에 의해 동작하며, CPU와 DSP가 서로 다른 Clock을 사용하는 경우 서로 다른 주파수와 또한 위상 차이를 보인다. 이때 사용하는 디지털 마이크의 데이터 출력 방식을, 도 2 및 도 3에 도시된 바와 같이, 블록도 및 타이밍 다이어그램으로 나타낼 수 있다.CPU and DSP operate by different input clocks, and when CPU and DSP use different clocks, they show different frequencies and phase differences. The data output method of the digital microphone used at this time may be represented by a block diagram and a timing diagram as shown in FIGS. 2 and 3 .

도 4는 일반적인 DSP 및 CPU의 Clock이 정확하게 일치하지 않을 때 같은 음성 입력을 받을 수 없는 이유를 설명하기 위한 도면이다.4 is a diagram for explaining the reason why the same voice input cannot be received when clocks of a general DSP and CPU do not exactly match.

도 4에 도시된 바와 같이, 주파수가 일치하지 않는 환경에서 동일한 마이크 입력을 DSP와 CPU가 받을 수 없음을 보여준다. CPU와 DSP의 주파수와 위상이 다르다면 각각의 마이크 데이터가 일치하지 않기 때문에 공유될 수 없다.As shown in Fig. 4, it shows that the DSP and CPU cannot receive the same microphone input in an environment where the frequencies do not match. If the frequency and phase of the CPU and DSP are different, they cannot be shared because the microphone data of each does not match.

도 1 등에 도시된 바와 같이 기존의 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법에서 4개의 마이크를 사용하게 되면, 이 4개의 마이크를 배치하기 위한 공간 및 비용 측면에서 좋지 않은 결과를 보여주게 된다. 특히, 여러 개의 마이크를 사용할 경우 음성 입력 품질을 확보하기 위해서는 각각의 마이크의 거리가 충분히 떨어져 있어야 하므로 셋톱박스의 크기를 소형화 하는데 문제를 발생시키며, 4개의 마이크를 설치 및 연결하고 잡음(Noise) 입력을 방지하기 위한 구조물 등을 설치하는데 많은 비용이 소요되게 된다.As shown in FIG. 1, if four microphones are used in a method that simultaneously supports remote voice recognition of a plurality of broadcasting communication operators or AI service providers, it is not good in terms of space and cost for arranging these four microphones. will show non-existent results. In particular, when multiple microphones are used, the distance between each microphone must be sufficiently far to secure the voice input quality, which causes a problem in miniaturizing the size of the set-top box. It costs a lot of money to install a structure to prevent it.

아래의 실시예에서는 2개의 마이크만을 사용하여 방송통신사업자와 다른 AI 서비스 업체의 음성 인식을 동시에 지원할 수 있게 하는 방안을 제시한다. 또한 아래의 실시예에서는 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원할 때 PDM 입력의 샘플링 방안을 제시한다.The following embodiment proposes a method for simultaneously supporting voice recognition of a broadcasting communication service provider and another AI service company using only two microphones. In addition, in the following embodiment, when two microphones simultaneously support remote voice recognition of a plurality of broadcasting communication operators or AI service providers, a sampling method of PDM input is presented.

도 5는 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment.

도 5를 참조하면, 2개의 마이크로 방송통신사업자의 DSP를 통한 원거리 음성(FFV) 및 AI 서비스 업체의 음성 인식을 실현하는 방법을 나타낸다.
본 명세서에서, 제1 서비스 방식은 DSP에서 에코 제거를 하여 CPU로 음성 데이터를 전달하는 방식을 의미하고, 제2 서비스 방식은 DSP를 거치지 않고 에코 제거되지 않은 음성 데이터를 그대로 CPU로 전달하는 방식을 의미한다.Referring to FIG. 5 , a method for realizing far-field voice (FFV) and voice recognition of AI service providers through DSPs of two micro broadcast operators is shown.
In the present specification, the first service method refers to a method in which the DSP cancels echo and delivers voice data to the CPU, and the second service method refers to a method in which non-echo-cancelled voice data is delivered to the CPU without going through the DSP. it means.

일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 장치는 음성 명령을 받기 위한 2개의 마이크(530)와, 이 2개의 마이크(530)의 입력을 받아 에코 제거(Echo Cancel) 기능을 담당하는 DSP(520), 그리고 에코 제거(Echo Cancel)되지 않은 마이크 입력을 수신하는 CPU(510)로 구성될 수 있다.A device that simultaneously supports remote voice recognition of a plurality of broadcasting carriers or AI service providers with two microphones according to an embodiment is two microphones 530 for receiving a voice command, and the input of the two microphones 530 . It may be composed of a DSP 520 in charge of receiving an echo cancellation function, and a CPU 510 receiving a non-echo cancellation microphone input.

보다 구체적으로, 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 장치는, 복수 개의 마이크(530), 복수 개의 마이크(530)를 통해 입력된 마이크의 입력 중 제1 서비스 방식으로 마이크로 취득한 음성 데이터를 에코 제거(Echo Cancel)하는 DSP(520), 및 복수 개의 마이크(530)를 통해 입력된 마이크의 입력 중 제2 서비스 방식으로 마이크로 취득한 에코 제거(Echo Cancel)되지 않은 음성 데이터가 그대로 전달되고, 앞서 제1 서비스 방식으로 DSP(520)에서 에코 제거(Echo Cancel)된 마이크의 입력이 전달되는 CPU(510)를 포함하여 이루어질 수 있다. 한편, 복수 개의 마이크(530), DSP(520) 및 CPU(510)는 셋톱박스 내에 내장될 수 있으며, 특히 복수 개의 마이크(530)는 2개의 마이크가 사용되어 셋톱박스의 크기를 소형화할 수 있다. 다시 말해, 서로 다른 두 음성 인식 처리방식을 동시에 지원하는 셋톱박스 장치는 복수 개의 마이크(530), DSP(520) 및 CPU(510)를 포함하여 구성될 수 있다.More specifically, the apparatus for simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment includes a plurality of microphones 530 and microphones input through the plurality of microphones 530 . DSP 520 for echo canceling the voice data acquired through the first service method among the inputs of the microphone, and echo cancellation ( The non-echo-canceled voice data is transferred as it is, and the first service method may include the CPU 510 to which the input of the microphone echo-canceled by the DSP 520 is transferred. On the other hand, a plurality of microphones 530, DSP 520, and CPU 510 may be built into the set-top box, and in particular, the plurality of microphones 530 may use two microphones to reduce the size of the set-top box. . In other words, the set-top box device simultaneously supporting two different voice recognition processing methods may be configured to include a plurality of microphones 530 , DSP 520 , and CPU 510 .

앞에서 설명한 바와 같이 CPU(510)와 DSP(520)가 서로 다른 24MHz Clock 입력을 사용하게 된다면, 두 개의 Clock의 주파수가 정확히 일치한다고 보장할 수 없다. As described above, if the CPU 510 and the DSP 520 use different 24 MHz clock inputs, it cannot be guaranteed that the frequencies of the two clocks exactly match.

위와 같은 문제를 해결하기 위해, 도 5에 도시된 바와 같이, CPU의 PWM 출력(Output) 기능을 이용하여 CPU가 출력한 24MHz의 Clock을 DSP가 같이 사용, 양자간 주파수가 다른 문제를 해결할 수 있다. 또한, 마이크는 DSP의 PDM Clock을 사용하여 마이크 데이터를 출력하며, 도 7에 도시된 바와 같이 CPU가 PDM 입력(Input)을 읽어 들이는 방식을 사용하여 CPU의 PDM Clock이 마이크에 제공되지 않더라도 도 8에 도시된 바와 같이 DSP와 동일한 마이크 데이터를 입력 받을 수 있다. In order to solve the above problem, as shown in FIG. 5 , the DSP uses the clock of 24 MHz output by the CPU using the PWM output function of the CPU, thereby solving the problem of different frequencies between the two. . In addition, the microphone outputs the microphone data using the PDM Clock of the DSP, and as shown in FIG. 7, the CPU reads the PDM input, so even if the PDM Clock of the CPU is not provided to the microphone, As shown in Figure 8, the same microphone data as the DSP can be input.

도 6은 일 실시예에 따른 주파수는 일치하나 위상이 다른 경우의 예시를 나타내는 도면이다.6 is a diagram illustrating an example of a case in which frequencies coincide but phases are different according to an embodiment.

CPU와 DSP가 서로 다른 24MHz Clock 입력을 사용하게 된다면, 두 개의 Clock의 주파수가 정확히 일치한다고 보장할 수 없기 때문에 DSP는, 도 6에 도시된 바와 같이, CPU의 PWM(Pulse Wave Modulation) 출력 기능을 사용하여 만들어진 24MHz Clock을 제공 받아 사용하게 된다. 이렇게 하여 주파수가 일치하지 않는 문제를 해결하게 된다.If the CPU and DSP use different 24MHz clock inputs, the DSP cannot guarantee that the frequencies of the two clocks exactly match, so as shown in FIG. The 24MHz Clock created using the system will be provided and used. In this way, the problem of frequency mismatch is solved.

도 7은 일 실시예에 따른 CPU가 PDM의 입력을 받아 들이는 방법을 설명하기 위한 도면이다.7 is a diagram for describing a method in which a CPU receives an input of a PDM according to an embodiment.

도 7에 도시된 바와 같이, CPU와 DSP가 일치하는 주파수(24MHz)의 Clock 입력을 사용하다 하더라도 마이크가 사용하는 FCLOCK(0~5MHz)을 각자의 위상 고정 루프(Phase Locked Loop, PLL) 모듈을 사용하여 생성하게 되는데, 이 두 개의 Clock이 서로 위상이 일치한다고 보장할 수가 없다. 즉, 서로 위상이 일치하는 두 개의 Clock을 마이크에 제공할 수 없는 문제가 발생한다.As shown in Fig. 7, even if the CPU and DSP use the clock input of the same frequency (24MHz), the FCLOCK (0~5MHz) used by the microphone is set to the respective Phase Locked Loop (PLL) module. However, there is no guarantee that these two clocks are in phase with each other. In other words, there is a problem that two clocks that are in phase with each other cannot be provided to the microphone.

CPU는 PDM Clock보다 훨씬 높은 주파수의(예컨대, 133MHz) Internal Clock을 사용하여 PDM 데이터를 받아들이고자 하는 위치를 조절할 수 있다.The CPU may use an internal clock of a much higher frequency than the PDM clock (eg, 133 MHz) to adjust the position at which the PDM data is to be received.

도 8은 일 실시예에 위상 차가 있어도 CPU가 마이크 입력을 정상적으로 받아들일 수 있는 이유를 설명하기 위한 도면이다.8 is a diagram for explaining the reason why the CPU can normally accept a microphone input even if there is a phase difference in one embodiment.

도 8을 참조하면, 일 실시예에 따른 CPU의 PDM Clock과 마이크 Clock(DSP PDM Clock)의 위상 차가 있어도 CPU가 마이크 입력을 정상적으로 받아들일 수 있는 이유를 설명할 수 있다.Referring to FIG. 8 , it is possible to explain why the CPU can normally accept a microphone input even if there is a phase difference between the PDM Clock of the CPU and the microphone clock (DSP PDM Clock) according to an embodiment.

본 실시예에서는 따라서 마이크로 제공되는 FCLOCK을 DSP에서 제공되는 PDM(Pulse Duration Modulation) CLOCK으로 사용하며 CPU가 제공하는 PDM CLOCK은 사용하지 않는다. 도 6은 이렇게 구성된 시스템에서 DSP의 PDM CLOCK에 의해 마이크로부터 입력된 데이터가 사용되지 않는 CPU의 PDM CLOCK과 위상 면에서 일치하지 않을 수 있음을 보여준다.Accordingly, in this embodiment, the FCLOCK provided by the micro is used as the PDM (Pulse Duration Modulation) CLOCK provided by the DSP, and the PDM CLOCK provided by the CPU is not used. FIG. 6 shows that in the system configured in this way, data input from the microphone by the PDM clock of the DSP may not coincide with the PDM clock of the CPU that is not used in terms of phase.

이런 문제를 해결하기 위해 CPU의 PDM Timing Spec을 볼 필요가 있다. 도 7 은 CPU가 더 분할된 높은 주파수의 내부 Clock을 사용하여 입력 위치(Sampling Position)를 선택할 수 있으며, 이 기능을 사용하여 도 8에 도시된 바와 같이 DSP와 같은 위치에서 마이크 데이터를 입력 받을 수 있다.To solve this problem, you need to look at the CPU's PDM Timing Spec. 7 shows that the CPU can select an input position (sampling position) by using a more divided high-frequency internal clock, and using this function, microphone data can be received from the same position as the DSP as shown in FIG. have.

도 9는 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법을 나타내는 흐름도이다.9 is a flowchart illustrating a method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment.

도 9를 참조하면, 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법은, 제1 서비스 방식의 마이크의 입력을 DSP에 전달하는 단계(S110), DSP에서 전달 받은 마이크의 입력에 대해 에코 제거(Echo Cancel) 후, CPU에 전달하는 단계(S120), 및 제2 서비스 방식의 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 CPU에 전달하는 단계(S130)를 포함하여 이루어질 수 있다. Referring to FIG. 9 , in the method for simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment, the step of transmitting the input of the microphone of the first service method to the DSP (S110) ), the step of Echo Canceling the input of the microphone received from the DSP and transmitting it to the CPU (S120), and the second service method of transferring the input of the microphone that has not been echo canceled to the CPU. It may be made including step (S130).

아래에서 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법의 각 단계를 보다 상세히 설명한다.Hereinafter, each step of the method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment will be described in more detail.

일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 방법은 도 5에서 설명한 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 장치를 예를 들어 설명할 수 있다. 앞에서 설명한 바와 같이, 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원하는 장치는 복수 개의 마이크(530), DSP(520) 및 CPU(510)를 포함하여 이루어질 수 있다. A method of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment is two microphones according to an embodiment described with reference to FIG. A device that simultaneously supports voice recognition may be described as an example. As described above, a device that simultaneously supports remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment includes a plurality of microphones 530 , DSP 520 , and CPU 510 . can be done by

단계(S110)에서, 제1 서비스 방식의 마이크의 입력을 DSP(520)에 전달할 수 있다. 이 때, 복수 개의 마이크(530)는 제1 서비스 방식으로부터 입력 받은 마이크의 입력을 DSP(520)에 전달할 수 있다.In step S110 , the input of the microphone of the first service method may be transmitted to the DSP 520 . In this case, the plurality of microphones 530 may transmit the microphone input received from the first service method to the DSP 520 .

단계(S120)에서, DSP(520)에서 전달 받은 마이크의 입력에 대해 에코 제거(Echo Cancel) 후, CPU(510)에 전달할 수 있다. In step S120 , after echo cancellation for the microphone input received from the DSP 520 , it may be transmitted to the CPU 510 .

예컨대, 제1 서비스 방식은 방송통신사업자일 수 있다. 방송통신사업자의 마이크의 입력의 경우, 방송통신사업자가 원하는 DSP(520)에서 에코 제거(Echo Cancel)된 마이크의 입력을 CPU(510)로 전달할 수 있다.For example, the first service method may be a broadcasting communication service provider. In the case of the microphone input of the broadcasting communication service provider, the input of the microphone echo cancelled by the DSP 520 desired by the broadcasting communication service provider may be transmitted to the CPU 510 .

단계(S130)에서, 제2 서비스 방식의 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 CPU(510)에 전달할 수 있다. 이 때, 복수 개의 마이크(530)는 제2 서비스 방식으로부터 입력 받은 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 CPU(510)에 전달할 수 있다.In step S130 , an input of a microphone that is not echo canceled of the second service method may be transmitted to the CPU 510 . In this case, the plurality of microphones 530 may transfer the microphone input that is not echo canceled received from the second service method to the CPU 510 .

예컨대, 제2 서비스 방식은 AI 서비스 업체일 수 있다. AI 서비스 업체의 마이크의 입력의 경우, 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 CPU(510)로 전달할 수 있다.For example, the second service method may be an AI service provider. In the case of the input of the microphone of the AI service company, the input of the microphone that has not been echo canceled may be transferred to the CPU 510 .

이와 같이, 실시예들에 따르면 2개의 마이크만을 사용하여 방송통신사업자가 원하는 DSP(520)에서 에코 제거(Echo Cancel)된 마이크의 입력과 다른 AI 서비스 업체의 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU(510)로 전달할 수 있다. As such, according to embodiments, the input of the microphone that is echo-canceled in the DSP 520 desired by the broadcasting operator using only two microphones and the microphone that is not echo-canceled by other AI service providers The input may be simultaneously transferred to the CPU 510 .

여기서, CPU(510)의 PWM에서 만들어진 Clock을 DSP(520)로 공급하여, CPU(510)와 DSP(520)가 일치하는 주파수의 Clock을 사용하게 할 수 있다. 또한, CPU(510) 및 DSP(520)가 출력하는 서로 다른 2개의 PDM Clock 중 DSP(520)의 Clock을 선택하여 마이크의 입력 Clock으로 제공할 수 있다. Here, the clock generated by PWM of the CPU 510 may be supplied to the DSP 520 so that the CPU 510 and the DSP 520 use a clock having the same frequency. In addition, the clock of the DSP 520 may be selected from two different PDM clocks output from the CPU 510 and the DSP 520 and provided as the input clock of the microphone.

또한, CPU(510) 및 DSP(520)가 출력하는 서로 다른 2개의 PDM Clock 중 DSP(520)의 Clock을 선택하여 마이크의 입력 Clock으로 제공함으로써 발생하는 위상 차이의 문제를 CPU(510)의 PDM 데이터의 샘플링 위치(Sampling Position) 변경 방식을 사용하여 극복할 수 있다. In addition, the problem of the phase difference caused by selecting the clock of the DSP 520 from among the two different PDM clocks output from the CPU 510 and the DSP 520 and providing it as the input clock of the microphone is solved by the PDM of the CPU 510 . This can be overcome by using the method of changing the sampling position of the data.

실시예들에 따르면 DSP(520)를 사용하여 에코 제거(Echo Cancel)된 마이크 입력을 받아 원거리 음성(FFV) 서비스를 할 수 있고, 동시에 다른 AI 서비스 업체의 음성 인식을 지원하고자 하는 경우 종래의 4개 마이크를 사용하는 대신 2개의 마이크만을 사용하게 함으로써 공간 및 비용을 현저히 감소시킬 수 있다.According to embodiments, a far-field voice (FFV) service can be provided by receiving an echo-canceled microphone input using the DSP 520, and at the same time, when it is desired to support voice recognition of other AI service providers, the conventional 4 By allowing only two microphones to be used instead of using a dog microphone, space and cost can be significantly reduced.

위와 같은 방식으로 CPU가 마이크 데이터를 받아 들일 때, 샘플링 위치(Sampling Position) 중 하나를 선택하여야 한다. 예컨대, 도 8에 도시된 바와 같이 0 ~ 9의 샘플링 위치 중 하나를 선택할 수 있다. 가장 안정적으로 마이크 데이터를 받을 수 있는 위치는 마이크 데이터의 중간 부분이라고 볼 때 도 8의 경우는 샘플링 위치 3 또는 4가 될 수 있다. 10개의 샘플링 위치 중 0 ~ 7의 8개 위치에서 마이크 데이터를 올바르게 읽어 들이게 될 것이기 때문이다. 한편, 샘플링 위치는 0 ~ 9를 하나의 예시로써 설명하고 있으나, 0 ~ 9뿐만 아니라 0 ~ 15 등 다양하게 설정할 수 있다.When the CPU receives microphone data in the same way as above, one of the sampling positions should be selected. For example, as shown in FIG. 8 , one of sampling positions 0 to 9 may be selected. Considering that the position at which the microphone data can be received most stably is the middle part of the microphone data, in the case of FIG. 8 , the sampling position 3 or 4 may be used. This is because the microphone data will be read correctly from 8 positions 0 to 7 out of 10 sampling positions. Meanwhile, although 0 to 9 is described as an example of the sampling position, various settings such as 0 to 15 as well as 0 to 9 can be made.

여기에서는 어떤 방식으로 0 ~ 7의 샘플링 위치에서 올바른 데이터가 입력되는지를 찾는 방식을 제시한다. 또한, 이러한 방식을 적용해도 DSP PDM Clock과 CPU PDM Clock의 위상 차이에 의해 송화자의 위치를 반대로 파악할 수 있다. 특정 셋톱박스는 2개의 마이크 중 어느 쪽에서 오는 음성 데이터의 세기가 강한가를 구별할 필요가 있을 수 있다.Here, we present a method of finding the correct data input at sampling positions 0 to 7. In addition, even if this method is applied, the position of the speaker can be reversed by the phase difference between the DSP PDM Clock and the CPU PDM Clock. A specific set-top box may need to distinguish which of the two microphones has the strongest voice data.

한편, 도 2 및 도 3에 도시된 바와 같이, 디지털 마이크의 Clock 입력 및 데이터 출력 방법을 나타낼 수 있으며, 이 때 2개의 마이크를 좌, 우에 설치할 때 마이크의 SEL 입력을 좌측 마이크에는 VDD로, 우측 마이크에는 Ground로 설정할 수 있다. 그러면 Clock이 High일 때 입력되는 데이터는 좌측 마이크, Clock이 Low일 때 입력되는 데이터는 우측 마이크의 데이터라고 구별할 수 있다. On the other hand, as shown in FIGS. 2 and 3, the clock input and data output method of the digital microphone can be shown. In this case, when two microphones are installed on the left and right, the SEL input of the microphone is set to VDD for the left microphone and VDD for the right microphone. The microphone can be set to Ground. Then, it can be distinguished that the data input when the clock is high is from the left microphone, and the data input when the clock is low is the data from the right microphone.

도 10은 일 실시예에 따른 CPU의 PDM Clock과 마이크 Clock(DSP PDM Clock)의 위상이 반대일 경우를 나타내는 도면이다.10 is a diagram illustrating a case in which a PDM clock of a CPU and a microphone clock (DSP PDM clock) have opposite phases according to an embodiment.

도 10에 도시된 바와 같이, DSP PDM Clock과 CPU PDM Clock의 위상이 반대가 된다면 DSP와 CPU는 서로 송화자의 방향을 반대로 인식할 수 있다. 여기에서는 이런 경우 송화자의 방향을 올바르게 인식하게 할 수 있는 방안도 함께 제시한다.As shown in FIG. 10 , if the phases of the DSP PDM Clock and the CPU PDM Clock are opposite to each other, the DSP and the CPU may recognize that the direction of the speaker is opposite to each other. In this case, a method for correct recognition of the speaker's direction is also presented.

DSP PDM Clock과 CPU 내부의 PDM Clock의 위상이 반대일 경우, DSP의 Clock이 High인 지점에서 입력된 데이터가 CPU에서는 Clock이 Low인 지점에서 입력되게 되므로 좌, 우 위치를 서로 다르게 판단할 수 있다.If the phases of the DSP PDM Clock and the PDM Clock inside the CPU are opposite, the data input at the point where the DSP clock is high is input at the point where the clock is low in the CPU, so the left and right positions can be determined differently. .

예를 들어, 0 ~ 7의 위치에서 올바른 데이터가 읽힌다는 것을 알기 위해서는 특별한 소프트웨어 모듈(Software Module)이 필요하다. 이 모듈이 가동되면 셋톱박스는 송화자에게 특정한 소리를 내게 할 것이다(예컨대, "아~~"). 이 때, 소프트웨어 모듈은 샘플링 위치를 0 ~ 9로 변화시키면서 어느 위치에서 DSP의 결과와 같은 데이터가 입력되는지를 찾으면 된다. 이렇게 찾은 샘플링 위치의 구간을 파악하여 그 값들의 중간 값으로 위치를 정한다면 안정적인 마이크 입력 데이터를 얻을 수 있다.For example, a special software module is required to know that the correct data is read from positions 0 to 7. When this module is activated, the set-top box will make a specific sound to the speaker (eg "ahhhhh"). At this time, the software module can change the sampling position from 0 to 9 and find out where the same data as the DSP result is input. Stable microphone input data can be obtained if the section of the sampling position found in this way is identified and the position is determined as the intermediate value of the values.

또한, 위에서 설명한 것처럼 송화자의 좌, 우 위치를 파악하고자 할 경우 송화자에게 셋톱박스의 좌측 또는 우측으로 이동하여 소리를 내게 한다. 이 때 획득한 데이터의 세기가 DSP와 CPU 양쪽에서 같은 결과를 보인다면 그대로 입력 데이터를 사용할 수 있으나, 만일 반대의 결과를 보인다면 정확한 위치를 찾은 것은 DSP이므로 이후로 CPU 쪽은 방향을 반대로 바꾸어 사용하면 된다. In addition, as described above, when the speaker's left and right positions are to be identified, the speaker moves to the left or right side of the set-top box to make a sound. At this time, if the strength of the acquired data shows the same result in both DSP and CPU, the input data can be used as it is. Do it.

도 11은 일 실시예에 따른 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법을 나타내는 흐름도이다.11 is a flowchart illustrating a sampling method of a PDM input when remote voice recognition is simultaneously supported according to an embodiment.

도 11을 참조하면, 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법은, 소프트웨어 모듈을 설치하여 정확한 샘플링 위치에서 CPU가 마이크 데이터를 입력 받도록 하는 단계(S210)를 포함하여 이루어질 수 있다. 또한, DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과, 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달하는 단계(S220)를 더 포함할 수 있다. Referring to FIG. 11 , in the case of simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment, the sampling method of PDM input is to install a software module so that the CPU operates the microphone at the correct sampling position. It may include a step (S210) of receiving data. In addition, the DSP may further include a step (S220) of simultaneously transferring an input of an echo cancelled microphone and a non-echo canceled microphone input to the CPU (S220).

여기서, 2개의 마이크만을 사용하여 방송통신사업자가 원하는 DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과 다른 AI 서비스 업체의 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달할 수 있다. Here, by using only two microphones, the input of the microphone that has been echo canceled by the DSP desired by the broadcasting service provider and the input of the microphone that is not echo canceled by other AI service providers can be simultaneously transferred to the CPU.

아래에서 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법을 보다 상세히 설명한다.Hereinafter, a sampling method of a PDM input when two microphones simultaneously support remote voice recognition of a plurality of broadcasting communication operators or AI service providers according to an embodiment will be described in more detail.

일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 방법은 일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 원거리 음성 인식을 동시에 지원 시 PDM 입력의 샘플링 장치를 예를 들어 설명할 수 있다.The sampling method of the PDM input when simultaneously supporting remote voice recognition of a plurality of broadcasting communication operators or AI service providers with two microphones according to an embodiment is two microphones according to an embodiment When simultaneously supporting voice recognition, a sampling device for PDM input can be described as an example.

일 실시예에 따른 2개의 마이크로 복수의 방송통신사업자 또는 AI 서비스 업체의 서로 다른 원거리 음성 인식 처리방식을 동시에 지원 시 PDM 입력의 샘플링 장치는, DSP에서 에코 제거(Echo Cancel)된 음성 데이터와 마이크에서 직접 전달되는 에코 제거(Echo Cancel)되지 않은 음성 데이터를 동시에 CPU로 전달 시, 정확한 샘플링 위치에서 CPU가 마이크 데이터를 입력 받도록 하는 소프트웨어 모듈을 포함하여 이루어질 수 있다. 또한, 실시예에 따라 도 5에서 설명한 바와 같이 복수개의 마이크, DSP 및 CPU를 더 포함할 수 있다. 다시 말해, 서로 다른 두 음성 인식 처리방식을 동시에 지원하는 셋톱박스 장치는 PDM 입력의 샘플링 위치를 선택하는 소프트웨어 모듈(미도시)과 함께, 복수 개의 마이크(530), DSP(520) 및 CPU(510)를 포함하여 구성될 수 있다. 소프트웨어 모듈은 마이크(530), DSP(520) 및 CPU(510)와 마찬가지로, 셋톱박스 장치에 포함되는 구성요소로 구현될 수 있다.When two microphones according to an embodiment support different remote voice recognition processing methods of a plurality of broadcasting communication operators or AI service providers at the same time, the sampling device of the PDM input is When audio data that is not directly transmitted without echo cancellation is transmitted to the CPU at the same time, a software module that allows the CPU to receive microphone data at the correct sampling position may be included. In addition, according to an embodiment, as described with reference to FIG. 5 , a plurality of microphones, a DSP, and a CPU may be further included. In other words, the set-top box device that simultaneously supports two different voice recognition processing methods includes a plurality of microphones 530 , DSP 520 and CPU 510 together with a software module (not shown) that selects the sampling position of the PDM input. ) may be included. The software module may be implemented as a component included in the set-top box device, like the microphone 530 , the DSP 520 , and the CPU 510 .

단계(S210)에서, 소프트웨어 모듈은 DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과, 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달 시, 정확한 샘플링 위치에서 CPU가 마이크 데이터를 입력 받도록 할 수 있다. In step S210, the software module transmits the microphone input that has been echo canceled by the DSP and the microphone input that is not echo canceled to the CPU at the same time, and the CPU receives the microphone data at the correct sampling position. You can get input.

먼저, 소프트웨어 모듈은 정확한 샘플링 위치를 찾을 수 있다. First, the software module can find the exact sampling location.

이를 위해, 소프트웨어 모듈은 송화자에게 특정 소리를 내도록 요청할 수 있다. 소프트웨어 모듈은 특정 소리를 이용하여 샘플링 위치를 0 ~ 9로 변화시키면서 어느 위치에서 DSP의 결과와 같은 데이터가 입력되는지를 찾을 수 있다. 이 때, 0 ~ 9의 샘플링 위치는 하나의 예시일 뿐, 이에 제한되지 않으며 다양하게 설정할 수 있다. 그리고, 소프트웨어 모듈은 찾은 샘플링 위치의 구간을 파악하여 그 값의 중간 값으로 위치를 정함에 따라 안정적인 마이크 입력 데이터를 얻도록 할 수 있다. To this end, the software module may ask the speaker to make a specific sound. The software module can find where data such as the DSP result is input while changing the sampling position from 0 to 9 using a specific sound. In this case, the sampling positions of 0 to 9 are only an example, and are not limited thereto and may be set in various ways. In addition, the software module may obtain stable microphone input data by determining the interval of the found sampling position and setting the position as an intermediate value of the value.

또한, 소프트웨어 모듈은 송화자의 좌, 우 위치를 파악할 수 있다. In addition, the software module may determine the left and right positions of the speaker.

이를 위해, 소프트웨어 모듈은 송화자의 좌, 우 위치를 파악하고자 할 경우, 송화자에게 셋톱박스의 좌측 또는 우측으로 이동하여 특정 소리를 내도록 요청할 수 있다. 소프트웨어 모듈은 획득한 특정 소리의 세기가 DSP와 CPU 양측에서 동일한 결과를 보이는 경우, 그대로 입력 데이터를 사용할 수 있다. 또한, 소프트웨어 모듈은 획득한 특정 소리의 세기가 DSP와 CPU 양측에서 다른 결과를 보이는 경우 DSP의 위치를 사용하고 CPU 측은 방향을 반대로 바꿀 수 있다. 즉, 소프트웨어 모듈은 DSP PDM Clock과 CPU PDM Clock의 위상 차이에 의해 송화자의 좌, 우 위치가 서로 반대로 파악되는 경우 CPU PDM Clock의 위상을 보정할 수 있다. To this end, the software module may request the speaker to move to the left or right side of the set-top box to make a specific sound when the speaker's left and right positions are to be identified. The software module can use the input data as it is if the acquired specific sound intensity shows the same result in both DSP and CPU. In addition, the software module can use the position of the DSP and reverse the direction on the CPU side when the acquired specific sound intensity shows different results on both sides of the DSP and CPU. That is, the software module can correct the phase of the CPU PDM Clock when the left and right positions of the speaker are opposite to each other due to the phase difference between the DSP PDM Clock and the CPU PDM Clock.

단계(S220)에서, DSP에서 에코 제거(Echo Cancel)된 마이크의 입력과, 에코 제거(Echo Cancel)되지 않은 마이크의 입력을 동시에 CPU로 전달할 수 있다. 이는 도 9에서 상세히 설명하였다. In step S220 , the input of the microphone that has been echo canceled by the DSP and the input of the microphone that are not echo canceled may be simultaneously transmitted to the CPU. This has been described in detail in FIG. 9 .

이상과 같이, 실시예들에 따르면 2 개의 마이크만을 사용하여 방송통신 사업자가 원하는 DSP에서 에코 제거(Echo Cancel)된 마이크 입력과 다른 AI 서비스 업체의 에코 제거(Echo Cancel)되지 않은 마이크 입력을 동시에 CPU로 전달하는 방식을 사용할 경우, 소프트웨어 모듈을 설치하여 정확한 샘플링 위치에서 CPU의 마이크 데이터를 입력 받을 수 있게 할 수 있다. 또한, DSP PDM Clock과 CPU PDM Clock의 위상이 반대이더라도 송화자의 좌, 우 위치를 DSP 및 CPU에서 동일하게 파악할 수 있다. As described above, according to the embodiments, using only two microphones, a microphone input that is echo-canceled from a DSP desired by a broadcasting service provider and a microphone input that is not echo-canceled from another AI service provider can be simultaneously processed by the CPU. In case of using the method of transmitting to the CPU, it is possible to install a software module to receive the microphone data of the CPU at the correct sampling position. Also, even if the DSP PDM Clock and the CPU PDM Clock are out of phase, the left and right positions of the speaker can be identified equally from the DSP and CPU.

따라서, 본 실시예에서 제시한 방식으로 DSP의 PDM Clock을 사용하고 CPU의 PDM Clock은 사용되지 않으므로 CPU 쪽의 마이크 데이터가 실제와 다른 결과를 보일 수 있는 가능성을 방지하고, DSP PDM Clock과 CPU PDM Clock의 위상 차이에 의해 송화자의 좌, 우 위치가 서로 반대로 파악되는 경우를 보정할 수 있다.Therefore, since the PDM Clock of the DSP is used and the PDM Clock of the CPU is not used in the method presented in this embodiment, the possibility that the microphone data on the CPU side may show a different result from the actual result is prevented, and the DSP PDM Clock and the CPU PDM Clock are not used. It is possible to correct the case where the left and right positions of the speaker are detected opposite to each other due to the phase difference of the clock.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

delete

In a set-top box device,
a plurality of microphones;
DSP (digital signal processor) responsible for echo cancellation (Echo Cancel) function;
For voice data input through the microphone, the first service method, which is a voice recognition processing method of the broadcasting communication service provider, and the second service method, which is a voice recognition processing method of the broadcasting communication service provider and another AI (artificial intelligence) service company, are supported simultaneously a central processing unit (CPU); and
A software module for selecting a sampling position of the voice data received by the CPU so that the CPU receives the same data as the DSP from the microphone
including,
Voice data input through the microphone and echo-cancelled through the DSP is transmitted to the CPU as the first service method,
Voice data input through the microphone and not echo-cancelled is delivered to the CPU as it is as the second service method,
The PWM (Pulse Wave Modulation) clock provided by the CPU is supplied to the DSP so that the DSP uses a clock of the same frequency as the CPU,
The microphone does not use the PDM (Pulse Density Modulation) Clock directly provided by the CPU, but uses the PDM Clock provided by the DSP,
The software module is
Ask the speaker to make a specific sound,
Using the specific sound, while changing the sampling position of the voice data received by the CPU, find out where the data such as the result of the DSP is input,
To obtain stable microphone input data by identifying the section of the found sampling position and setting the position as the intermediate value of the value
characterized in that, a set-top box device.

In a set-top box device,
a plurality of microphones;
DSP (digital signal processor) responsible for echo cancellation (Echo Cancel) function;
For voice data input through the microphone, the first service method, which is a voice recognition processing method of the broadcasting communication service provider, and the second service method, which is a voice recognition processing method of the broadcasting communication service provider and another AI (artificial intelligence) service company, are supported simultaneously a central processing unit (CPU); and
A software module for selecting a sampling position of the voice data received by the CPU so that the CPU receives the same data as the DSP from the microphone
including,
Voice data input through the microphone and echo-cancelled through the DSP is transmitted to the CPU as the first service method,
Voice data input through the microphone and not echo-cancelled is delivered to the CPU as it is as the second service method,
The PWM (Pulse Wave Modulation) clock provided by the CPU is supplied to the DSP so that the DSP uses a clock of the same frequency as the CPU,
The microphone does not use the PDM (Pulse Density Modulation) Clock directly provided by the CPU, but uses the PDM Clock provided by the DSP,
The software module is
Ask the speaker to move to the left or right side of the set-top box device to make a specific sound,
When the acquired specific sound intensity shows the same result in the DSP and the CPU, using the input data as it is
characterized in that, a set-top box device.

delete

5. The method of claim 4,
The software module is
Correcting the phase of the PDM Clock of the CPU when the acquired specific sound intensity shows a different result between the DSP and the CPU
characterized in that, a set-top box device.

delete