KR20220162247A

KR20220162247A - Voice processing device for processing voice of speaker according to authority level

Info

Publication number: KR20220162247A
Application number: KR1020210070489A
Authority: KR
Inventors: 김정민
Original assignee: 주식회사 아모센스
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2022-12-08

Abstract

A voice processing device is disclosed. The voice processing device comprises: a microphone configured to generate a voice signal in response to voices uttered by a plurality of speakers; a voice processing circuit configured to generate a separated voice signal associated with each of the voices by separating a sound signal based on a position of a sound source for each voice; a positioning circuit configured to measure terminal positions of speaker terminals of the speakers; and a memory storing authority level information indicating an authority level for each of the speaker terminals, wherein the voice processing circuit determines a speaker terminal having a terminal position corresponding to a location of the sound source of the separated voice signal and processes the separated voice signal according to the authority level corresponding to the determined speaker terminal by referring to the authority level information. The voice processing device can generate a separate voice signal associated with each of the speakers from the voices of speakers in a vehicle.

Description

Voice processing device for processing a speaker's voice according to an authority level

본 발명의 실시 예들은 권한 수준에 따라 화자의 음성을 처리하기 위한 음성 처리 장치에 관한 것이다.Embodiments of the present invention relate to a voice processing device for processing a speaker's voice according to an authority level.

마이크(microphone)는 음성을 인식하고, 인식된 음성을 전기적인 신호인 음성 신호로 변환하는 장치이다. 회의실이나 교실과 같이 복수의 화자(speaker)들이 위치하는 공간 내에 마이크가 배치되는 경우, 상기 마이크는 복수의 화자들로부터 나온 음성들을 모두 수신하고, 복수의 화자들의 음성에 연관된 음성 신호들을 생성한다. 한편, 복수의 화자들이 동시에 발화하는 경우, 개별 화자들의 음성만을 나타내는 음성 신호를 분리하는 것이 필요하다. A microphone is a device that recognizes voice and converts the recognized voice into a voice signal that is an electrical signal. When a microphone is disposed in a space where a plurality of speakers are located, such as a conference room or classroom, the microphone receives all voices from the plurality of speakers and generates voice signals related to the voices of the plurality of speakers. Meanwhile, when a plurality of speakers speak at the same time, it is necessary to separate audio signals representing only the voices of individual speakers.

최근 음성 인식 기술의 발달로 인해, 다양한 전자 장치들은 음성을 통한 제어를 지원하고 있다. 특히, 자동차의 경우 운전자 또는 탑승자의 음성을 통해 여러가지 기능의 제어가 가능하고, 이러한 음성 제어가 가능한 기능들은 점점 늘어나는 추세이다. Due to the recent development of voice recognition technology, various electronic devices support control through voice. In particular, in the case of a car, various functions can be controlled through the voice of a driver or passenger, and the number of functions capable of such voice control is gradually increasing.

차량과 같이 한정된 공간에 복수의 화자(운전자 및 탑승자)가 존재하는 경우, 이들 복수의 화자들의 음성을 분리해야만 차량에 대한 정확한 음성 제어가 가능하다. 또한, 차량 동작의 안정성을 위해, 화자 마다 차량의 동작을 제어할 수 있는 권한 수준이 달리 설정될 필요가 있다.When a plurality of speakers (driver and passenger) exist in a limited space such as a vehicle, accurate voice control of the vehicle is possible only when the voices of the plurality of speakers are separated. In addition, for the stability of the vehicle operation, it is necessary to set different authority levels for controlling the vehicle operation for each speaker.

한국공개특허공보 제10-2017-0112713호 (2017.10.12.)Korean Patent Publication No. 10-2017-0112713 (2017.10.12.)

본 발명이 해결하고자 하는 과제는 차량 내의 화자들의 음성으로부터 화자들의 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있는 음성 처리 장치를 제공하는 것에 있다.An object to be solved by the present invention is to provide a voice processing device capable of generating a separate voice signal associated with each voice of speakers from the voices of speakers in a vehicle.

본 발명이 해결하고자 하는 과제는 화자들이 소지하는 화자 단말기에 대응하는 권한 수준에 따라 분리 음성 신호를 처리할 수 있는 음성 처리 장치를 제공하는 것에 있다.An object of the present invention is to provide a voice processing device capable of processing a separate voice signal according to an authority level corresponding to a speaker terminal possessed by speakers.

본 발명의 실시 예들에 따른 음성 처리 장치는, 복수의 화자들에 의해 발화된 음성들에 응답하여 음성 신호를 생성하도록 구성되는 마이크, 음성 신호를 음성들 각각의 음원 위치에 기초하여 음원 분리함으로써, 음성들 각각과 연관된 분리 음성 신호를 생성하도록 구성되는 음성 처리 회로, 화자들의 화자 단말기들의 단말 위치를 측정하도록 구성되는 측위 회로 및 화자 단말기들 각각에 대한 권한 수준을 나타내는 권한 수준 정보를 저장하는 메모리를 포함하고, 음성 처리 회로는, 분리 음성 신호의 음원 위치와 대응하는 단말 위치를 갖는 화자 단말기를 결정하고, 권한 수준 정보를 참조하여, 결정된 화자 단말기에 대응하는 권한 수준에 따라 분리 음성 신호를 처리한다.A voice processing apparatus according to embodiments of the present invention separates a microphone configured to generate a voice signal in response to voices uttered by a plurality of speakers and separates the voice signal from a sound source based on the location of each sound source of the voices, A voice processing circuit configured to generate a separate voice signal associated with each of the voices, a positioning circuit configured to measure terminal positions of talker terminals of speakers, and a memory storing privilege level information indicating a privilege level for each of the speaker terminals. and the voice processing circuit determines a speaker terminal having a terminal location corresponding to the location of the sound source of the separated voice signal, refers to the authority level information, and processes the separated voice signal according to the authority level corresponding to the determined speaker terminal. .

본 발명의 실시 예들에 따른 음성 처리 장치는 차량 내의 화자들의 음성과 연관된 음성 신호로부터 화자들의 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있는 효과가 있다.The voice processing apparatus according to embodiments of the present invention has an effect of generating a separate voice signal associated with each speaker's voice from a voice signal associated with the voice of speakers in a vehicle.

본 발명의 실시 예들에 따른 음성 처리 장치는 화자들이 소지하는 화자 단말기의 단말 위치와 음원 위치를 비교하여 분리 음성 신호에 대응하는 화자를 식별하고, 또한, 각 화자 단말기에 대응하는 권한 수준에 따라 분리 음성 신호를 처리할 수 있는 효과가 있다.The voice processing apparatus according to the embodiments of the present invention compares the location of the terminals of the speaker terminals possessed by the speakers and the location of the sound source to identify a speaker corresponding to the separated voice signal, and also separates the speaker according to the authority level corresponding to each speaker terminal. It has the effect of processing a voice signal.

따라서, 본 발명의 실시 예들에 따르면, 기존의 음성들 각각의 물리적 특성(예컨대, 피치, 음색 등)에 따라 음성을 구별하는 종래의 기술에 비해 화자들의 음성을 쉽게 구별할 수 있어 음성 처리 속도가 향상될 뿐만 아니라, 권한 수준에 따라 음성을 처리하므로 음성 제어에 대한 안정성이 향상되는 효과가 있다.Therefore, according to embodiments of the present invention, voices of speakers can be easily distinguished compared to conventional techniques for distinguishing voices according to physical characteristics (eg, pitch, timbre, etc.) In addition, since voice is processed according to the authority level, stability of voice control is improved.

도 1은 본 발명의 실시 예들에 따른 음성 처리 장치를 나타낸다.
도 2는 본 발명의 실시 예들에 따른 음성 처리 장치를 나타낸다.
도 3은 본 발명의 실시 예들에 따른 화자 단말기를 나타낸다.
도 4 내지 도 6은 본 발명의 실시 예들에 따른 음성 처리 장치의 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 실시 예들에 따른 화자 단말기의 권한 수준을 나타낸다.
도 8은 본 발명의 실시 예들에 따른 음성 처리 장치의 작동 방법을 나타내는 플로우 차트이다.
도 9는 본 발명의 실시 예들에 따른 음성 처리 장치의 작동을 설명하기 위한 도면이다.1 shows a voice processing apparatus according to embodiments of the present invention.
2 shows a voice processing device according to embodiments of the present invention.
3 shows a speaker terminal according to embodiments of the present invention.
4 to 6 are diagrams for explaining the operation of a voice processing apparatus according to embodiments of the present invention.
7 illustrates authority levels of a speaker terminal according to embodiments of the present invention.
8 is a flowchart illustrating a method of operating a voice processing apparatus according to embodiments of the present invention.
9 is a diagram for explaining the operation of a voice processing apparatus according to embodiments of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명의 실시 예들을 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 실시 예들에 따른 음성 처리 장치를 나타낸다. 도 1을 참조하면, 음성 처리 장치(100)는 차량(200) 내에 배치되어 차량(200) 내에 위치한 화자들(SPK1~SPK4)의 음성을 처리할 수 있다.1 shows a voice processing apparatus according to embodiments of the present invention. Referring to FIG. 1 , the voice processing device 100 is disposed in the vehicle 200 and may process voices of speakers SPK1 to SPK4 located in the vehicle 200 .

음성 처리 장치(100)는 차량(200)(또는 차량(200)의 컨트롤러(예컨대, ECU(electronic controller unit) 등))과 데이터를 주고받을 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 차량(200)의 컨트롤러를 제어하기 위한 명령어를 컨트롤러로 전송할 수 있다.The voice processing device 100 may exchange data with the vehicle 200 (or a controller (eg, electronic controller unit (ECU)) of the vehicle 200). According to embodiments, the voice processing device 100 may transmit a command for controlling the controller of the vehicle 200 to the controller.

한편, 실시 예들에 따라, 음성 처리 장치(100)는 차량(200)의 컨트롤러와 일체로 형성될 수 있고, 이 경우, 음성 처리 장치(100)는 차량(200)의 동작을 제어할 수 있다. 다만, 본 명세서에는 차량(200)의 컨트롤러와 음성 처리 장치(100)가 분리된 것으로 가정하고 설명한다.Meanwhile, according to embodiments, the voice processing device 100 may be integrally formed with the controller of the vehicle 200, and in this case, the voice processing device 100 may control the operation of the vehicle 200. However, in this specification, it is assumed that the controller of the vehicle 200 and the voice processing device 100 are separated.

차량(200) 내의 각 좌석에는 복수의 화자들(SPK1~SPK4)이 위치할 수 있다. 실시 예들에 따라, 제1화자(SPK1)는 전행(front row)의 왼쪽 좌석에 위치할 수 있고, 제2화자(SPK2)는 전행의 오른쪽 좌석에 위치할 수 있고, 제3화자(SPK3)는 후행(back row)의 왼쪽 좌석에 위치할 수 있고, 제4화자(SPK4)는 후행의 오른쪽 좌석에 위치할 수 있다.A plurality of talkers SPK1 to SPK4 may be located in each seat in the vehicle 200 . According to embodiments, the first speaker SPK1 may be located in the left seat of the front row, the second speaker SPK2 may be located in the right seat of the front row, and the third speaker SPK3 may be located in the right seat of the front row. It may be located in the left seat of the back row, and the fourth talker (SPK4) may be located in the right seat of the back row.

본 발명의 실시 예들에 따른 음성 처리 장치(100)는 차량(200) 내의 화자들(SPK1~SPK4)의 음성들을 수신하고, 화자들 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있다.The voice processing apparatus 100 according to embodiments of the present invention may receive voices of speakers SPK1 to SPK4 in the vehicle 200 and generate a separate voice signal associated with the voices of each speaker.

음성 처리 장치(100)는 음원 분리를 수행함으로써 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를 추출(또는 생성)할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 복수의 마이크들을 포함하고, 음성 처리 장치(100)는 복수의 마이크들 각각에 의해 생성된 음성 신호들 사이의 시간 지연(또는 위상 지연)을 이용하여 음성들 각각의 음원 위치를 결정하고, 특정 위치의 음원에만 대응하는 분리 음성 신호를 생성할 수 있다. 예컨대, 음성 처리 장치(100)는 특정 위치(또는 방향)에서 발화된 음성과 연관된 분리 음성 신호를 생성할 수 있다. 이에 따라, 음성 처리 장치(100)는 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있다. The voice processing apparatus 100 may extract (or generate) a separated voice signal associated with each voice of the speakers SPK1 to SPK4 by performing sound source separation. According to embodiments, the voice processing device 100 includes a plurality of microphones, and the voice processing device 100 uses a time delay (or phase delay) between voice signals generated by each of the plurality of microphones. A location of each sound source of the voices may be determined, and a separate voice signal corresponding to only the sound source at a specific location may be generated. For example, the voice processing apparatus 100 may generate a separate voice signal associated with a voice uttered in a specific location (or direction). Accordingly, the voice processing apparatus 100 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4.

예컨대, 차량(200) 내의 제1화자 및 제2화자가 함께 발화하는 경우, 음성 처리 장치(100)는 제1화자의 음성과 연관된 제1분리 음성 신호 제2화자의 음성과 연관된 제2분리 음성 신호를 생성할 수 있다. 이 때, 제1분리 음성 신호는 화자들(SPK1~SPK4)의 음성들 중 제1화자(SPK1)의 음성과 가장 높은 연관도를 가지는 음성 신호일 수 있다. 다시 말하면, 제1분리 음성 신호에 포함된 음성 성분 중에서 제1화자(SPK1)의 음성 성분의 비중이 가장 높을 수 있다.For example, when a first speaker and a second speaker in the vehicle 200 speak together, the voice processing device 100 outputs a first separated voice signal associated with the first speaker's voice and a second separated voice associated with the second speaker's voice. signal can be generated. In this case, the first split voice signal may be a voice signal having the highest correlation with the voice of the first speaker SPK1 among the voices of the speakers SPK1 to SPK4. In other words, among the voice components included in the first split voice signal, the voice component of the first speaker SPK1 may have the highest proportion.

음성 처리 장치(100)는 분리 음성 신호를 처리할 수 있다. 본 명세서에서, 음성 처리 장치(100)가 분리 음성 신호를 처리한다는 것은, 음성 처리 장치(100)가 분리 음성 신호를 차량(200)(또는 차량(200)을 제어하기 위한 컨트롤러)로 전송하는 동작, 분리 음성 신호로부터 차량(200)을 제어하기 위한 명령어를 인식하고 인식된 명령어에 대응하는 동작 명령을 결정하고, 결정된 동작 명령을 차량(200)으로 전송하는 동작, 또는 음성 처리 장치(100)가 분리 음성 신호에 대응하는 동작 명령에 따라 차량(200)을 제어하는 동장을 의미할 수 있다.The audio processing device 100 may process the separated audio signal. In this specification, processing the separated voice signal by the voice processing device 100 means that the voice processing device 100 transmits the separated voice signal to the vehicle 200 (or a controller for controlling the vehicle 200). , Recognizing a command for controlling the vehicle 200 from the separated voice signal, determining an operation command corresponding to the recognized command, and transmitting the determined operation command to the vehicle 200, or the voice processing device 100 This may refer to an action of controlling the vehicle 200 according to an operation command corresponding to the separated voice signal.

본 발명의 실시 예들에 따른 음성 처리 장치(100)는 화자들(SPK1~SPK4)이 소지하는 화자 단말기들(ST1~ST4)의 위치를 결정하고, 화자 단말기들(ST1~ST4)에 허용된 권한 수준에 따라 각 음원 위치의 분리 음성 신호를 처리할 수 있다. 즉, 음성 처리 장치(100)는 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를, 동일한(혹은 연관된) 위치에 있는 화자 단말기(ST1~ST4)의 권한 수준에 따라, 처리할 수 있다. 예컨대, 음성 처리 장치(100)는 제1음원 위치에서 발화된 음성의 분리 음성 신호를, 상기 제1음원 위치에 있는 화자 단말기에 할당된 권한 수준에 따라 처리할 수 있다.The voice processing apparatus 100 according to embodiments of the present invention determines the location of the speaker terminals ST1 to ST4 possessed by the speakers SPK1 to SPK4, and determines the authority allowed to the speaker terminals ST1 to ST4. Depending on the level, it is possible to process the separate audio signals of each sound source position. That is, the voice processing apparatus 100 may process a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 according to the authority level of the speaker terminals ST1 to ST4 located in the same (or related) position. have. For example, the voice processing apparatus 100 may process a separated voice signal of a voice uttered at a location of a first sound source according to an authority level assigned to a speaker terminal at the location of the first sound source.

한편, 차량(200)을 음성을 통해 제어하는 경우, 차량(200)의 동작 안정성을 위해 화자들(SPK1~SPK4)의 음성에 대한 권한 수준을 설정할 필요가 있다. 예를 들어, 차량(200)의 소유주의 음성은 높은 권한 수준이 할당될 수 있는 반면, 동석하는 아이들의 음성은 낮은 권한 수준이 할당될 수 있다. Meanwhile, when controlling the vehicle 200 through voice, it is necessary to set voice authority levels of the speakers SPK1 to SPK4 for operational stability of the vehicle 200 . For example, the voice of the owner of the vehicle 200 may be assigned a high permission level, while the voices of the children sitting in it may be assigned a low permission level.

한편, 이 경우, 음성 처리 장치(100)에 의해 인식된 각 음성이 어떤 화자의 것인지 구별하는 것이 필요한데, 음성 자체의 특징으로부터 화자를 구별하는 것은 처리 과정이 복잡하며 처리 시간이 오래 걸릴 뿐만 아니라, 정확도가 낮다. On the other hand, in this case, it is necessary to distinguish which speaker belongs to each voice recognized by the voice processing apparatus 100, and distinguishing the speaker from the characteristics of the voice itself is a complicated processing process and takes a long processing time. Accuracy is low.

반면, 본 발명의 실시 예들에 따른 음성 처리 장치(100)는 화자들(SPK1~SPK4) 각각이 소지하는 화자 단말기(ST1~ST4)의 위치를 통해, 각 음성이 발화된 음원 위치에 대응하는 화자 단말기(ST1~ST4)를 식별하고, 식별된 화자 단말기에 대응하는 권한 수준에 따라 음성을 처리할 수 있다. On the other hand, the voice processing apparatus 100 according to embodiments of the present invention determines the speaker corresponding to the location of the sound source where each voice was uttered through the location of the speaker terminals ST1 to ST4 possessed by each of the speakers SPK1 to SPK4. The terminals ST1 to ST4 may be identified, and voice may be processed according to an authority level corresponding to the identified speaker terminal.

따라서, 본 발명의 실시 예들에 따르면 화자들(SPK1~SPK4)의 각 음성을 쉽게 식별할 수 있어 음성의 처리 속도가 향상될 뿐만 아니라, 권한 수준에 따라 음성을 처리하므로 음성 제어에 대한 안정성(또는 보안성)이 향상되는 효과가 있다.Therefore, according to the embodiments of the present invention, each voice of the speakers (SPK1 to SPK4) can be easily identified, so that the voice processing speed is improved, and the voice is processed according to the authority level, so that the stability of voice control (or security) is improved.

실시 예들에 따라, 음성 처리 장치(100)는 화자 단말기(ST1~ST4) 각각으로부터 전송되는 신호를 이용하여, 화자 단말기(ST1~ST4) 각각의 위치를 결정할 수 있다.According to embodiments, the voice processing apparatus 100 may determine the location of each of the speaker terminals ST1 to ST4 using signals transmitted from each of the talker terminals ST1 to ST4.

차량(200)은 자동차, 기차, 오토바이, 선박, 항공기 등 도로, 해로, 선로 및 항로 상을 주행하는 수송 또는 운송 수단으로서 정의될 수 있다. 실시 예들에 따라, 차량(200)은 동력원으로서 엔진을 구비하는 내연기관 차량, 동력원으로서 엔진과 전기 모터를 구비하는 하이브리드 차량, 동력원으로서 전기 모터를 구비하는 전기 차량 등을 모두 포함하는 개념일 수 있다.The vehicle 200 may be defined as a means of transport or transportation that travels on roads, seaways, tracks, and airways, such as automobiles, trains, motorcycles, ships, and aircraft. According to embodiments, the vehicle 200 may be a concept including an internal combustion engine vehicle having an engine as a power source, a hybrid vehicle having an engine and an electric motor as a power source, an electric vehicle having an electric motor as a power source, and the like. .

차량(200)은 음성 처리 장치(100)로부터 음성 신호를 수신하고, 수신된 음성 신호에 응답하여 특정 동작을 수행할 수 있다. 또한, 실시 예들에 따라, 차량(200)은 음성 처리 장치(100)로부터 전송된 동작 명령에 따라 특정 동작을 수행할 수 있다.The vehicle 200 may receive a voice signal from the voice processing device 100 and perform a specific operation in response to the received voice signal. Also, according to embodiments, the vehicle 200 may perform a specific operation according to an operation command transmitted from the voice processing device 100 .

도 2는 본 발명의 실시 예들에 따른 음성 처리 장치를 나타낸다. 도 2를 참조하면, 음성 처리 장치(100)는 마이크(110), 음성 처리 회로(120), 메모리(130), 통신 회로(140), 측위 회로(150)를 포함할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 스피커(160)를 선택적으로 더 포함할 수 있다.2 shows a voice processing device according to embodiments of the present invention. Referring to FIG. 2 , the voice processing device 100 may include a microphone 110, a voice processing circuit 120, a memory 130, a communication circuit 140, and a positioning circuit 150. According to embodiments, the audio processing device 100 may selectively further include a speaker 160 .

마이크(110)는 발생한 음성에 응답하여 음성 신호를 생성할 수 있다. 실시 예들에 따라, 마이크(110)는 음성으로 인한 공기의 진동을 검출하고, 검출 결과에 따라 진동에 대응하는 전기적인 신호인 음성 신호를 생성할 수 있다.The microphone 110 may generate a voice signal in response to the generated voice. According to embodiments, the microphone 110 may detect air vibration caused by voice and generate a voice signal that is an electrical signal corresponding to the vibration according to the detection result.

실시 예들에 따라, 음성 처리 장치(100)는 어레이 형태로 배열된 복수 개의 마이크들(110)을 포함할 수 있고, 복수의 마이크들(110)은 각각은 음성에 의한 매질(예컨대, 공기)의 압력 변화를 측정하고, 측정된 매질의 압력 변화를 전기적인 신호인 음성 신호로 변환하고, 음성 신호를 출력할 수 있다. 이하, 본 명세서에서는 마이크(110)가 복수임을 가정하고 설명한다.According to embodiments, the voice processing apparatus 100 may include a plurality of microphones 110 arranged in an array form, and each of the plurality of microphones 110 may transmit sound to a medium (eg, air). The pressure change may be measured, the measured pressure change of the medium may be converted into an electrical signal, or audio signal, and the audio signal may be output. Hereinafter, in this specification, it is assumed that a plurality of microphones 110 are described.

마이크(110)는 화자들(SPK1~SPK4)의 음성에 응답하여 화자들(SPK1~SPK4)의 음성들과 관련된 음성 신호를 생성할 수 있다. 이 때, 마이크(110)에 의해 생성된 음성 신호는 적어도 하나 이상의 화자(SPK1~SPK4)의 음성에 대응할 수 있다. 예컨대, 화자들(SPK1~SPK4)이 동시에 발화하는 경우, 마이크들(110)에 각각에 의해 생성된 음성 신호들 각각은 화자들(SPK1~SPK4) 모두의 음성을 나타내는 신호일 수 있다.The microphone 110 may generate voice signals related to the voices of the speakers SPK1 to SPK4 in response to the voices of the speakers SPK1 to SPK4 . At this time, the voice signal generated by the microphone 110 may correspond to the voice of one or more speakers (SPK1 to SPK4). For example, when the speakers SPK1 to SPK4 speak at the same time, each of the voice signals generated by each of the microphones 110 may be a signal representing the voice of all the speakers SPK1 to SPK4.

음성 처리 회로(120)는 마이크들(110)에 의해 생성된 음성 신호를 처리할 수 있다. 예컨대, 음성 처리 회로(120)는 마이크(110)에 의해 생성된 아날로그 타입의 음성 신호를 디지털 타입의 음성 신호로 변환하고, 변환된 디지털 타입의 음성 신호를 처리할 수 있다. 이 경우, 신호의 타입(아날로그 또는 디지털)이 바뀌는 것이므로, 본 발명의 실시 예들에 대한 설명에 있어서, 디지털 타입의 음성 신호와 아날로그 타입의 음성 신호를 혼용하여 설명하도록 한다.The voice processing circuit 120 may process voice signals generated by the microphones 110 . For example, the voice processing circuit 120 may convert an analog-type voice signal generated by the microphone 110 into a digital-type voice signal and process the converted digital-type voice signal. In this case, since the type of signal (analog or digital) is changed, in the description of the embodiments of the present invention, a digital type audio signal and an analog type audio signal will be used interchangeably.

음성 처리 회로(120)는 연산 처리 기능을 갖는 프로세서를 포함할 수 있다. 예컨대, 음성 처리 회로(120)는 CPU(central processing unit), MCU(micro controller unit), GPU(graphics processing unit), DSP(digital signal processor), ADC 컨버터(analog to digital converter) 또는 DAC 컨버터(digital to analog converter)를 포함할 수 있으나, 이에 한정되는 것은 아니다.The voice processing circuit 120 may include a processor having an arithmetic processing function. For example, the audio processing circuit 120 may include a central processing unit (CPU), a micro controller unit (MCU), a graphics processing unit (GPU), a digital signal processor (DSP), an analog to digital converter (ADC) or a digital DAC converter (digital to analog converter), but is not limited thereto.

음성 처리 회로(120)는 마이크(110)에 의해 생성된 음성 신호를 이용하여, 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를 추출(또는 생성)할 수 있다. 실시 예들에 따라, 음성 처리 회로(120)는 차량(200) 내의 화자들(SPK1~SPK4) 각각의 위치에 따라 화자(SPK1~SPK4)의 음성과 연관된 분리 음성 신호를 생성할 수 있다.The voice processing circuit 120 may extract (or generate) a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 by using the voice signal generated by the microphone 110 . According to example embodiments, the voice processing circuit 120 may generate a separate voice signal related to the voice of the speakers SPK1 to SPK4 according to the respective positions of the speakers SPK1 to SPK4 in the vehicle 200 .

음성 처리 회로(120)는 음성 신호들 사이의 시간 지연(또는 위상 지연)을 이용하여 음성 신호들 각각의 음원 위치(즉, 화자들(SPK1~SPK4)의 위치)를 결정할 수 있다. 예컨대, 음성 처리 회로(120)는 음성 신호들 각각의 음원 위치(즉, 화자들(SPK1~SPK4)의 위치)를 나타내는 음원 위치 정보를 생성할 수 있다.The voice processing circuit 120 may determine the location of the sound source of each of the voice signals (ie, the location of the speakers SPK1 to SPK4) by using a time delay (or phase delay) between the voice signals. For example, the voice processing circuit 120 may generate sound source location information indicating the location of the sound source of each of the voice signals (ie, the location of the speakers SPK1 to SPK4).

음성 처리 회로(120)는 결정된 음원 위치에 기초하여, 음성 신호로부터 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있다. 예컨대, 음성 처리 회로(120)는 특정 위치(또는 방향)에서 발화된 음성과 연관된 분리 음성 신호를 생성할 수 있다. Based on the determined position of the sound source, the voice processing circuit 120 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 from the voice signal. For example, the voice processing circuit 120 may generate a separate voice signal associated with a voice uttered in a specific location (or direction).

예컨대, 제1화자(SPK1)와 제2화자(SPK2)가 시간적으로 중첩해서 발화하는 경우, 제1화자(SPK1)와 제2화자(SPK2)의 음성은 중첩되므로, 마이크들(110)에 의해 생성된 음성 신호들 각각은 제1화자(SPK1)와 제2화자(SPK2)의 중첩된 음성과 대응한다. 이 때, 음성 처리 회로(120)는 음성 신호를 이용하여 제1화자(SPK1) 및 제2화자(SPK2) 각각의 음성의 음원 위치를 파악하고, 음원 위치에 기초하여 제1화자(SPK1)의 음성과 연관된 제1분리 음성 신호와 제2화자(SPK2)의 음성을 나타내는 제2분리 음성 신호를 생성할 수 있다.For example, when the first speaker SPK1 and the second speaker SPK2 overlap in time, the voices of the first speaker SPK1 and the second speaker SPK2 overlap. Each of the generated voice signals corresponds to the overlapping voices of the first speaker SPK1 and the second speaker SPK2. At this time, the voice processing circuit 120 determines the position of the sound source of each of the first speaker SPK1 and the second speaker SPK2 by using the voice signal, and based on the position of the sound source, the sound source of the first speaker SPK1 A first split audio signal associated with voice and a second split audio signal indicating the voice of the second speaker SPK2 may be generated.

실시 예들에 따라, 음성 처리 회로(120)는 분리 음성 신호 및 음원 위치 정보를 매칭하여 저장할 수 있다. 예컨대, 음성 처리 회로(120)는 제1화자(SPK1)의 음성과 연관된 제1분리 음성 신호 및 제1화자(SPK1)의 음성의 음원 위치를 나타내는 제1음원 위치 정보를 매칭하여 저장할 수 있다.According to embodiments, the voice processing circuit 120 may match and store the separated voice signal and sound source location information. For example, the voice processing circuit 120 may match and store a first separated voice signal associated with the voice of the first speaker SPK1 and first sound source location information indicating the location of the sound source of the voice of the first speaker SPK1.

한편, 본 명세서에서 설명되는 음성 처리 회로(120)의 동작은 컴퓨팅 장치에 의해 실행 가능한 프로그램의 형태로 구현될 수 있다. 예컨대, 음성 처리 회로(120)는 메모리(130)에 저장된 애플리케이션을 실행하고, 애플리케이션의 실행에 따라 특정 작동들을 지시하는 명령어들에 대응하는 작동들을 수행할 수 있다. Meanwhile, the operation of the voice processing circuit 120 described in this specification may be implemented in the form of a program executable by a computing device. For example, the voice processing circuit 120 may execute an application stored in the memory 130 and perform operations corresponding to commands instructing specific operations according to the execution of the application.

메모리(130)는 음성 처리 장치(100)의 동작에 필요한 데이터를 저장할 수 있다. 실시 예들에 따라, 메모리(130)는 분리 음성 신호 및 음원 위치 정보를 저장할 수 있다.The memory 130 may store data necessary for the operation of the voice processing device 100 . According to embodiments, the memory 130 may store a separated voice signal and sound source location information.

통신 회로(140)는 차량(200)으로 데이터를 전송하거나, 또는, 차량(200)으로부터 데이터를 수신할 수 있다. The communication circuit 140 may transmit data to the vehicle 200 or receive data from the vehicle 200 .

실시 예들에 따라, 통신 회로(140)는 무선 통신 방식에 따라 데이터를 통신할 수 있으나, 본 발명의 실시 예들이 이에 한정되는 것은 아니다. 예컨대, 통신 회로(140)는 WiFi, Bluetooth, Zigbee, NFC, Wibro, WCDMA, 3G, LTE, 5G 등의 통신 방식을 지원할 수 있다.According to embodiments, the communication circuit 140 may communicate data according to a wireless communication method, but embodiments of the present invention are not limited thereto. For example, the communication circuit 140 may support communication methods such as WiFi, Bluetooth, Zigbee, NFC, Wibro, WCDMA, 3G, LTE, and 5G.

통신 회로(140)는 음성 처리 회로(120)의 제어에 따라, 분리 음성 신호를 차량(200)으로 전송할 수 있다. 실시 예들에 따라, 통신 회로(140)는 분리 음성 신호와 함께 음원 위치 정보를 함께 전송할 수 있다.The communication circuit 140 may transmit the separated voice signal to the vehicle 200 under the control of the voice processing circuit 120 . According to embodiments, the communication circuit 140 may transmit sound source location information together with a separated voice signal.

측위 회로(150)는 화자 단말기들(ST1~ST4)의 위치를 측정하고, 위치를 나타내는 단말 위치 정보를 생성할 수 있다. 실시 예들에 따라, 측위 회로(150)는 화자 단말기들(ST1~ST4)로부터 출력된 무선 신호를 이용하여 화자 단말기들(ST1~ST4)의 위치를 측정할 수 있다.The positioning circuit 150 may measure the locations of the speaker terminals ST1 to ST4 and generate terminal location information representing the locations. According to example embodiments, the positioning circuit 150 may measure the positions of the speaker terminals ST1 to ST4 using radio signals output from the talker terminals ST1 to ST4.

예컨대, 측위 회로(150)는 UWB(ultra-wide band), WLAN(wireless local area network), ZigBee, Bluetooth 또는 RFID(radio frequency identification) 방식에 따라 화자 단말기들(ST1~ST4)의 위치를 측정할 수 있으나, 본 발명의 실시 예들이 위치 측정 방식 자체에 한정되는 것은 아니다.For example, the positioning circuit 150 may measure the positions of the talker terminals ST1 to ST4 according to ultra-wide band (UWB), wireless local area network (WLAN), ZigBee, Bluetooth, or radio frequency identification (RFID). However, embodiments of the present invention are not limited to the location measurement method itself.

실시 예들에 따라, 측위 회로(150)는 무선 신호를 송수신하기 위한 안테나(151)를 포함할 수 있다. According to embodiments, the positioning circuit 150 may include an antenna 151 for transmitting and receiving radio signals.

스피커(160)는 음성 신호에 해당하는 음성을 출력할 수 있다. 실시 예들에 따라, 스피커(160)는 (결합 또는 분리) 음성 신호에 기초하여 진동을 발생할 수 있고, 스피커(160)의 진동에 따라 음성이 재생될 수 있다.The speaker 160 may output a voice corresponding to the voice signal. According to embodiments, the speaker 160 may generate vibration based on a (combined or separated) voice signal, and voice may be reproduced according to the vibration of the speaker 160 .

도 3은 본 발명의 실시 예들에 따른 화자 단말기를 나타낸다. 도 3에 도시된 화자 단말기(300)는 도 1에 도시된 화자 단말기들(ST1~ST4)을 나타낸다. 도 3을 참조하면, 화자 단말기(300)는 입력부(310), 통신부(320), 제어부(330) 및 저장부(340)를 포함할 수 있다.3 shows a speaker terminal according to embodiments of the present invention. The speaker terminal 300 shown in FIG. 3 represents the speaker terminals ST1 to ST4 shown in FIG. 1 . Referring to FIG. 3 , the speaker terminal 300 may include an input unit 310, a communication unit 320, a control unit 330, and a storage unit 340.

입력부(310)는 사용자의 입력(예컨대, 푸시, 터치, 클릭 등)을 검출하고, 검출 신호를 생성할 수 있다. 예컨대, 입력부(310)는 터치 패널 또는 키보드일 수 있으나, 이에 한정되는 것은 아니다.The input unit 310 may detect a user's input (eg, push, touch, click, etc.) and generate a detection signal. For example, the input unit 310 may be a touch panel or a keyboard, but is not limited thereto.

통신부(320)는 외부 장치(예컨대, 100 또는 200)와 통신을 수행할 수 있다. 실시 예들에 따라, 통신부(320)는 외부 장치로부터 데이터를 수신하거나, 또는 외부 장치로 데이터를 전송할 수 있다.The communication unit 320 may communicate with an external device (eg, 100 or 200). According to embodiments, the communication unit 320 may receive data from an external device or transmit data to the external device.

통신부(320)는 화자 단말기(300)의 위치 측정을 위해, 음성 처리 장치(100)와 무선 신호를 주고받을 수 있다. 실시 예들에 따라, 통신부(320)는 음성 처리 장치(100)로부터 수신된 무선 신호를 수신하고, 무선 신호의 수신 특성을 나타내는 변수(수신 시점, 수신 각도, 수신 세기 등)와 관련된 데이터를 음성 처리 장치(100)로 전송할 수 있다. 또한, 실시 예들에 따라, 통신부(320)는 통신부(320)는 음성 처리 장치(100)로 무선 신호를 전송하고, 무선 신호의 전송 특성을 나타내는 변수(전송 시점, 전송 각도, 전송 세기 등)과 관련된 데이터를 음성 처리 장치(100)로 전송할 수 있다.The communication unit 320 may exchange radio signals with the voice processing device 100 to measure the location of the speaker terminal 300 . According to embodiments, the communication unit 320 receives the radio signal received from the voice processing device 100, and processes data related to variables (reception time, reception angle, reception strength, etc.) representing reception characteristics of the radio signal into voice. It can be transmitted to the device 100. In addition, according to embodiments, the communication unit 320 transmits a radio signal to the voice processing device 100, and transmits a variable representing transmission characteristics of the radio signal (transmission time point, transmission angle, transmission intensity, etc.) and Related data may be transmitted to the voice processing device 100 .

예컨대, 통신부(320)는 ToF(time of flight), TDoA(time difference of arrival), AoA(angle of arrival), RSSI(received signal strength indicator) 방식에 따라 화자 단말기(300)의 위치를 측정하기 위해, 음성 처리 장치(100)와 무선 신호를 주고받을 수 있다.For example, the communication unit 320 is configured to measure the position of the speaker terminal 300 according to time of flight (ToF), time difference of arrival (TDoA), angle of arrival (AoA), and received signal strength indicator (RSSI) methods. , it is possible to send and receive wireless signals with the voice processing device 100.

실시 예들에 따라, 통신부(320)는 무선 신호를 송수신하기 위한 안테나(321)를 포함할 수 있다. According to embodiments, the communication unit 320 may include an antenna 321 for transmitting and receiving radio signals.

제어부(330)는 화자 단말기(300)의 전반적인 동작을 제어할 수 있다. 실시 예들에 따라, 제어부(330)는 저장부(340)에 저장된 프로그램(또는 애플리케이션)을 로딩하고, 로딩에 따라 해당 프로그램의 동작을 수행할 수 있다.The controller 330 may control overall operations of the speaker terminal 300 . According to embodiments, the controller 330 may load a program (or application) stored in the storage 340 and perform an operation of the corresponding program according to the loading.

실시 예들에 따라, 제어부(330)는 음성 처리 장치(100)와 화자 단말기(300) 사이의 위치 측정을 수행하도록, 통신부(320)를 제어할 수 있다.According to embodiments, the controller 330 may control the communication unit 320 to measure the position between the voice processing device 100 and the speaker terminal 300 .

제어부(330)는 연산 처리 기능을 갖는 프로세서를 포함할 수 있다. 예컨대, 제어부(330)는 CPU(central processing unit), MCU(micro controller unit), GPU(graphics processing unit), AP(application processor) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The controller 330 may include a processor having an arithmetic processing function. For example, the controller 330 may include, but is not limited to, a central processing unit (CPU), a micro controller unit (MCU), a graphics processing unit (GPU), an application processor (AP), and the like.

저장부(340)는 화자 단말기(300)의 동작에 필요한 데이터를 저장할 수 있다. 실시 예들에 따라, 저장부(340)는 화자 단말기(300)의 동작에 필요한 설정 값들 및 애플리케이션들을 저장할 수 있다. The storage unit 340 may store data necessary for the operation of the speaker terminal 300 . According to embodiments, the storage unit 340 may store setting values and applications necessary for the operation of the speaker terminal 300 .

도 4 내지 도 6은 본 발명의 실시 예들에 따른 음성 처리 장치의 동작을 설명하기 위한 도면이다. 도 4 내지 도 6을 참조하면, 각 위치(FL, FR, BL, BR)에 위치한 화자들(SPK1~SPK4) 각각이 발화할 수 있다.4 to 6 are diagrams for explaining the operation of a voice processing apparatus according to embodiments of the present invention. Referring to FIGS. 4 to 6 , each of the speakers SPK1 to SPK4 located at each position (FL, FR, BL, BR) can speak.

본 발명의 실시 예들에 따른 음성 처리 장치(100)는 화자들(SPK1~SPK4)의 음성들로부터 각 화자들(SPK1~SPK4)의 음성과 연관된 분리 음성 신호를 생성할 수 있고, 분리 음성 신호와 각 음성의 발화 위치(즉, 화자들(SPK1~SPK4)의 위치)를 나타내는 음원 위치 정보를 저장할 수 있다.The voice processing apparatus 100 according to embodiments of the present invention may generate a separate voice signal associated with the voices of the speakers SPK1 to SPK4 from the voices of the speakers SPK1 to SPK4, and Sound source location information indicating the utterance location of each voice (that is, the location of the speakers SPK1 to SPK4) may be stored.

실시 예들에 따라, 음성 처리 장치(100)는 음성 신호들 사이의 시간 지연(또는 위상 지연)을 이용하여 음성들의 음원 위치(즉, 화자들(SPK1~SPK4)의 위치)를 결정할 수 있다. 예컨대, 음성 처리 장치(100)는 음성 처리 장치(100)에 대한 음원(즉, 화자들(SPK1~SPK4))의 상대적인 위치를 결정할 수 있다.According to embodiments, the voice processing apparatus 100 may determine the locations of sound sources of voices (ie, the locations of speakers SPK1 to SPK4) by using a time delay (or phase delay) between voice signals. For example, the audio processing apparatus 100 may determine relative positions of sound sources (ie, speakers SPK1 to SPK4) with respect to the audio processing apparatus 100.

음성 처리 장치(100)는 결정된 음원 위치에 기초하여, 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호를 생성할 수 있다. The voice processing apparatus 100 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 based on the determined location of the sound source.

도 4에 도시된 바와 같이, 제1화자(SPK1)가 음성 'AAA'을 발화한다. 음성 'AAA'가 발화되면, 음성 처리 장치(100)는 음성 'AAA'에 응답하여, 제1화자(SPK1)의 음성 'AAA'과 연관된 분리 음성 신호를 생성할 수 있다. 상술한 바와 같이, 음성 처리 장치(100)는 수신되는 음성들의 음원 위치들에 기초하여, 수신되는 음성들 중에서 제1화자(SPK1)의 위치에서 발화된 음성 'AAA'와 연관된 분리 음성 신호를 생성할 수 있다.As shown in FIG. 4 , the first speaker SPK1 utters the voice 'AAA'. When the voice 'AAA' is uttered, the voice processing apparatus 100 may generate a separate voice signal associated with the voice 'AAA' of the first speaker SPK1 in response to the voice 'AAA'. As described above, the voice processing apparatus 100 generates a separate voice signal associated with the voice 'AAA' uttered at the position of the first speaker SPK1 among the received voices, based on the positions of the sound source of the received voices. can do.

실시 예들에 따라, 음성 처리 장치(100)는 제1화자(SPK1)의 음성 'AAA'와 연관된 제1분리 음성 신호와 음성 'AAA'의 음원 위치(즉, 제1화자(SPK1)의 위치)인 'FL(전행 왼쪽)'을 나타내는 제1음원 위치 정보를 메모리(130)에 저장할 수 있다. 예컨대, 도 4에 도시된 바와 같이, 제1분리 음성 신호와 제1음원 위치 정보는 서로 매칭되어 저장될 수 있다.According to embodiments, the voice processing apparatus 100 determines the location of a first separated audio signal associated with the voice 'AAA' of the first speaker SPK1 and the sound source of the audio 'AAA' (ie, the location of the first speaker SPK1). First sound source location information indicating 'FL (previous left)' may be stored in the memory 130 . For example, as shown in FIG. 4 , the first split audio signal and the first sound source location information may be matched and stored.

도 5에 도시된 바와 같이, 제2화자(SPK2)가 음성 'BBB'를 발화한다. 음성 'BBB'가 발화되면, 음성 처리 장치(100)는 수신되는 음성들의 음원 위치들에 기초하여, 제2화자(SPK2)의 음성 'BBB'와 연관된 제2분리 음성 신호를 생성할 수 있다. As shown in FIG. 5 , the second speaker SPK2 utters the voice 'BBB'. When the voice 'BBB' is uttered, the voice processing apparatus 100 may generate a second separated voice signal associated with the voice 'BBB' of the second speaker SPK2 based on the positions of sound sources of the received voices.

실시 예들에 따라, 음성 처리 장치(100)는 제2화자(SPK2)의 음성 'BBB'와 연관된 제2분리 음성 신호와 음성 'BBB'의 음원 위치(즉, 제2화자(SPK2)의 위치)인 'FR(전행 오른쪽)'을 나타내는 제2음원 위치 정보를 메모리(130)에 저장할 수 있다. According to embodiments, the voice processing apparatus 100 determines the location of a second separated audio signal associated with the voice 'BBB' of the second speaker SPK2 and the sound source of the voice 'BBB' (that is, the location of the second speaker SPK2). Second sound source location information indicating 'FR (previous right)' may be stored in the memory 130 .

도 6에 도시된 바와 같이, 제3화자(SPK3)가 음성 'CCC'를 발화하고, 제4화자(SPK4)가 음성 'DDD'를 발화한다. 음성 처리 장치(100)는 수신되는 음성들의 음원 위치들에 기초하여, 제3화자(SPK3)의 음성 'CCC'와 연관된 제3분리 음성 신호 및 제4화자(SPK4)의 음성 'DDD'와 연관된 제4분리 음성 신호 각각을 생성할 수 있다. As shown in FIG. 6 , the third speaker SPK3 utters the voice 'CCC' and the fourth speaker SPK4 utters the voice 'DDD'. The voice processing apparatus 100, based on the positions of sound sources of the received voices, determines the third separated voice signal associated with the voice 'CCC' of the third speaker (SPK3) and the voice 'DDD' of the fourth speaker (SPK4). Each of the fourth split audio signals may be generated.

실시 예들에 따라, 음성 처리 장치(100)는 제3화자(SPK3)의 음성 'CCC'와 연관된 제3분리 음성 신호와 음성 'CCC'의 음원 위치(즉, 제3화자(SPK3)의 위치)인 'BL(후행 왼쪽)'을 나타내는 제3음원 위치 정보를 메모리(130)에 저장할 수 있고, 제4화자(SPK4)의 음성 'DDD'와 연관된 제4분리 음성 신호와 음성 'DDD'의 음원 위치(즉, 제4화자(SPK4)의 위치)인 'BR(후행 오른쪽)'을 나타내는 제4음원 위치 정보를 메모리(130)에 저장할 수 있다.According to embodiments, the voice processing apparatus 100 generates a third separated voice signal associated with the voice 'CCC' of the third speaker SPK3 and the location of the sound source of the voice 'CCC' (ie, the location of the third speaker SPK3). Third sound source location information indicating 'BL (trailing left)' may be stored in the memory 130, and a fourth separate audio signal associated with the voice 'DDD' of the fourth speaker (SPK4) and the sound source of the voice 'DDD' The fourth sound source location information indicating the location (that is, the location of the fourth speaker SPK4) 'BR (trailing right)' may be stored in the memory 130 .

도 7은 본 발명의 실시 예들에 따른 화자 단말기의 권한 수준을 나타낸다. 도 7을 참조하면, 음성 처리 장치(100)는 화자 단말기(ST1~ST4)를 식별하기 위한 단말 ID, 및 화자 단말기(ST1~ST4)의 권한 수준을 나타내는 권한 수준 정보를 저장할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 단말 ID 및 권한 수준 정보를 서로 매칭하여 저장할 수 있다. 예컨대, 음성 처리 장치(100)는 단말 ID 및 권한 수준 정보를 메모리(130)에 저장할 수 있다.7 illustrates authority levels of a speaker terminal according to embodiments of the present invention. Referring to FIG. 7 , the voice processing apparatus 100 may store a terminal ID for identifying the speaker terminals ST1 to ST4 and authority level information indicating the authority level of the speaker terminals ST1 to ST4. According to embodiments, the voice processing device 100 may match and store terminal ID and authority level information. For example, the voice processing device 100 may store terminal ID and authority level information in the memory 130 .

화자 단말기(ST1~ST4)의 권한 수준은, 화자 단말기(ST1~ST4)의 단말 위치와 대응하는 음원 위치에서 발화된 분리 음성 신호의 처리 여부를 결정하기 위한 것일 수 있다. 즉, 음원 처리 장치(100)는 분리 음성 신호와 대응하는 화자 단말기를 결정하고, 화자 단말기에 할당된 권한 수준에 따라 분리 음성 신호를 처리할 수 있다.The authority level of the talker terminals ST1 to ST4 may be for determining whether to process a separate voice signal uttered at a sound source location corresponding to the terminal location of the talker terminals ST1 to ST4. That is, the sound source processing apparatus 100 may determine a speaker terminal corresponding to the divided voice signal and process the divided voice signal according to an authority level assigned to the speaker terminal.

특히, 차량(200)을 음성으로 제어하는 경우, 본 발명의 실시 예들에 따르면, 일정 수준 이상의 권한 수준을 갖는 화자(또는 화자 단말기)의 음성만 처리할 수 있어, 차량 제어의 안정성이 훨씬 더 향상되는 효과가 있다.In particular, in the case of controlling the vehicle 200 by voice, according to embodiments of the present invention, only the voice of a speaker (or speaker terminal) having a certain authority level or higher can be processed, and thus the stability of vehicle control is further improved. has the effect of

실시 예들에 따라, 음성 처리 장치(100)는 분리 음성 신호와 대응하는 화자 단말기의 권한 수준이 기준 수준 이상일 때, 해당하는 분리 음성 신호를 처리할 수 있다. 예컨대, 기준 수준이 2인 경우, 음성 처리 장치(100)는 기준 수준 '2' 미만의 권한 수준을 갖는 제4화자 단말기(ST4)와 대응하는 제4분리 음성 신호를 처리하지 않을 수 있다. 한편, 미처리된 분리 음성 신호에 대한 정보는 음성 처리 장치(100)에 저장될 수 있다.According to embodiments, the voice processing apparatus 100 may process the separated voice signal when the authority level of the speaker terminal corresponding to the divided voice signal is equal to or higher than the reference level. For example, when the reference level is 2, the voice processing apparatus 100 may not process a fourth separate voice signal corresponding to a fourth talker terminal ST4 having an authority level lower than the reference level '2'. Meanwhile, information on the unprocessed separated voice signal may be stored in the voice processing device 100 .

또한, 실시 예들에 따라, 음성 처리 장치(100)는 분리 음성 신호와 대응하는 화자 단말기의 권한 수준이 높을수록, 해당하는 분리 음성 신호를 우선 순위로 처리할 수 있다. 예컨대, 제1화자 단말기(ST1)의 권한 수준이 '4'로 가장 높으므로, 음성 처리 장치(100)는 제1화자 단말기(ST1)에 대응하는 제1분리 음성 신호를 가장 우선적으로 처리할 수 있다.Also, according to embodiments, the voice processing apparatus 100 may process the separated voice signal with priority as the authority level of the speaker terminal corresponding to the divided voice signal increases. For example, since the authority level of the first talker terminal ST1 is '4', which is the highest, the voice processing apparatus 100 may process the first split voice signal corresponding to the first talker terminal ST1 with the highest priority. have.

한편, 비록 도 7에는 4가지 수준의 권한 수준이 나타나 있으나, 실시 예들에 따라, 권한 수준은 2가지 수준일 수 있다. 즉, 권한 수준은 처리가 허용된 제1수준 및 처리가 허용되지 않는 제2수준의 2가지 수준을 포함할 수도 있다.Meanwhile, although four levels of authority are shown in FIG. 7 , the authority level may be two levels according to embodiments. That is, the authority level may include two levels: a first level in which processing is permitted and a second level in which processing is not permitted.

도 8은 본 발명의 실시 예들에 따른 음성 처리 장치의 작동 방법을 나타내는 플로우 차트이다. 도 8을 참조하면, 음성 처리 장치(100)는 화자들(SPK1~SPK4)의 음성에 응답하여 분리 음성 신호 및 음원 위치 정보를 생성할 수 있다(S110). 실시 예들에 따라, 음성 처리 장치(100)는 화자들(SPK1~SPK4) 각각의 음성과 연관된 분리 음성 신호, 그리고 각 음성의 음원 위치를 나타내는 음원 위치 정보를 생성할 수 있다.8 is a flowchart illustrating a method of operating a voice processing apparatus according to embodiments of the present invention. Referring to FIG. 8 , the voice processing apparatus 100 may generate a separate voice signal and sound source location information in response to the voices of the speakers SPK1 to SPK4 (S110). According to embodiments, the voice processing apparatus 100 may generate a separate voice signal related to the voice of each of the speakers SPK1 to SPK4 and sound source location information indicating the location of the sound source of each voice.

음성 처리 장치(100)는 화자들(SPK1~SPK4)의 화자 단말기(ST1~ST4)의 위치를 결정할 수 있다(S120). 실시 예들에 따라, 음성 처리 장치(100)는 화자 단말기(ST1~ST4)로부터 전송되는 무선 신호를 이용하여, 화자 단말기(ST1~ST4)의 위치를 결정할 수 있다.The voice processing apparatus 100 may determine the location of the speaker terminals ST1 to ST4 of the speakers SPK1 to SPK4 (S120). According to embodiments, the voice processing apparatus 100 may determine the locations of the speaker terminals ST1 to ST4 using radio signals transmitted from the talker terminals ST1 to ST4.

음성 처리 장치(100)는 분리 음성 신호에 대응하는 화자 단말기(ST1~ST4)를 결정할 수 있다(S130). 실시 예들에 따라, 음성 처리 장치(100)는 분리 음성 신호의 음원 위치와 대응하는 위치를 갖는 화자 단말기(ST1~ST4)를 결정할 수 있다. The voice processing apparatus 100 may determine speaker terminals ST1 to ST4 corresponding to the separated voice signal (S130). According to embodiments, the voice processing apparatus 100 may determine the speaker terminals ST1 to ST4 having locations corresponding to the locations of sound sources of the separated voice signal.

실시 예들에 따라, 음성 처리 장치(100)는 차량(200) 내의 각 구역들(FL, FR, BL, BR)을 기준으로, 동일 구역에 대응하는 분리 음성 신호와 화자 단말기를 매칭할 수 있다. 예컨대, 음성 처리 장치(100)는 차량(200)의 전행 왼쪽 'FL'에 대응하는 제1화자 단말기(ST1)와 제1분리 음성 신호를 매칭할 수 있다.According to exemplary embodiments, the voice processing apparatus 100 may match a divided voice signal corresponding to the same region with a speaker terminal based on each of the regions FL, FR, BL, and BR in the vehicle 200. For example, the voice processing device 100 may match the first split voice signal with the first talker terminal ST1 corresponding to the forward left 'FL' of the vehicle 200 .

음성 처리 장치(100)는 분리 음성 신호를 대응하는 화자 단말기에 할당된 권한 수준에 따라 처리할 수 있다(S140). 실시 예들에 따라, 음성 처리 장치(100)는 메모리(130)로부터 권한 수준 정보를 리드하고, 각 분리 음성 신호를, 각 분리 음성 신호에 대응하는(또는 매칭된) 화자 단말기의 권한 수준에 따라 처리할 수 있다. The voice processing apparatus 100 may process the separated voice signal according to the authority level assigned to the corresponding speaker terminal (S140). According to embodiments, the voice processing device 100 reads authority level information from the memory 130 and processes each divided voice signal according to the authority level of the speaker terminal corresponding to (or matched with) each divided voice signal. can do.

예컨대, 제1화자(SPK1)의 음성에 대응하는 제1분리 음성 신호는 'FL(전행 왼쪽)'에서 발화되었으므로, 'FL(전행 왼쪽)'에 대응하는 제1화자 단말기(ST1)의 권한 수준에 따라 처리할 수 있다.For example, since the first split voice signal corresponding to the voice of the first speaker SPK1 is uttered at 'FL (previous left)', the authority level of the first talker terminal ST1 corresponding to 'FL (previous left)'. can be processed accordingly.

도 9는 본 발명의 실시 예들에 따른 음성 처리 장치의 작동을 설명하기 위한 도면이다. 도 9를 참조하면, 제1화자(SPK1)는 음성 '문 열어줘'를 음원 위치 'FL(전행 왼쪽)'에서 발화하고, 제3화자(SPK3)는 음성 '음악 틀어줘'를 음원 위치 'BL(후행 왼쪽)'에서 발화하고, 제4화자(SPK4)는 음성 '시동 꺼줘'를 음원 위치 'BR(후행 오른쪽)'에서 발화한다.9 is a diagram for explaining the operation of a voice processing apparatus according to embodiments of the present invention. Referring to FIG. 9, the first speaker (SPK1) utters the voice 'Open the door' at the sound source position 'FL (previous left)', and the third speaker (SPK3) utters the voice 'Play music' at the sound source position ' BL (trailing left)' is uttered, and the fourth speaker (SPK4) utters the voice 'turn off' at the sound source location 'BR (trailing right)'.

한편, 음성 처리 장치(100)에 저장된 권한 수준 정보에 따르면, 제1화자 단말기(ST1)에 대한 권한 수준은 '4'이고, 제2화자 단말기(ST2)에 대한 권한 수준은 '2'이고, 제3화자 단말기(ST3)에 대한 권한 수준은 '2'이고, 제4화자 단말기(ST4)에 대한 권한 수준은 '1'이다. 이 때, 음성 처리 장치(100)는 권한 수준이 기준 수준(예컨대, '2') 이상인 화자 단말기와 대응하는 분리 음성 신호만을 처리할 수 있다.Meanwhile, according to the permission level information stored in the voice processing device 100, the permission level for the first speaker terminal ST1 is '4' and the permission level for the second speaker terminal ST2 is '2'. The authority level for the third speaker terminal ST3 is '2', and the authority level for the fourth speaker terminal ST4 is '1'. At this time, the voice processing device 100 may process only the separated voice signal corresponding to a speaker terminal having an authority level equal to or higher than a reference level (eg, '2').

음성 처리 장치(100)는 화자들의 음성('문 열어줘', '음악 틀어줘' 및 '시동 꺼줘')에 응답하여, 음성들 각각에 대응하는 분리 음성 신호를 생성할 수 있다. 또한, 음성 처리 장치(100)는 화자들의 음성('문 열어줘', '음악 틀어줘' 및 '시동 꺼줘') 각각의 음원 위치('FL', 'BL' 및 'BR')를 나타내는 음원 위치 정보를 생성할 수 있다.The voice processing apparatus 100 may generate a separate voice signal corresponding to each of the voices in response to the voices of the speakers ('open the door', 'play the music', and 'turn off the engine'). In addition, the voice processing device 100 is a sound source representing the position of each sound source ('FL', 'BL', and 'BR') of the speakers' voices ('open the door', 'play the music', and 'turn off the engine'). Location information can be created.

음원 처리 장치(100)는 화자들의 음성이 입력되면, 화자 단말기들(ST1~ST4) 각각의 단말 위치를 결정할 수 있다. 실시 예들에 따라, 음원 처리 장치(100)는 화자 단말기들(ST1~ST4) 각각과 무선 신호를 주고받음으로써, 화자 단말기들(ST1~ST4) 각각의 단말 위치를 결정할 수 있다. 음성 처리 장치(100)는 화자 단말기들(ST1~ST4)의 단말 위치를 나타내는 단말 위치 정보를 저장할 수 있다. 이 때, 단말 위치 정보는 화자 단말기들(ST1~ST4)의 단말 ID와 매칭되어 저장될 수 있다.When the voices of the speakers are input, the sound source processing device 100 may determine the location of each terminal of the speaker terminals ST1 to ST4. According to embodiments, the sound source processing device 100 may determine the terminal location of each of the speaker terminals ST1 to ST4 by exchanging a radio signal with each of the talker terminals ST1 to ST4. The voice processing apparatus 100 may store terminal location information indicating the terminal locations of the speaker terminals ST1 to ST4. At this time, the terminal location information may be matched with the terminal IDs of the speaker terminals ST1 to ST4 and stored.

음성 처리 장치(100)는 화자들(SPK1~SPK4) 각각의 음성과 관련된 분리 음성 신호를, 분리 음성 신호와 대응하는 화자 단말기(ST1~ST4)에 할당된 권한 수준에 따라 처리할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 기준 수준 이상의 권한 수준이 할당된 화자 단말기(ST1~ST4)와 대응하는 분리 음성 신호만을 처리할 수 있으나, 본 발명의 실시 예들이 이에 한정되는 것은 아니다.The voice processing apparatus 100 may process the separated voice signals related to the respective voices of the speakers SPK1 to SPK4 according to the authority level assigned to the separate voice signals and corresponding speaker terminals ST1 to ST4. According to embodiments, the voice processing apparatus 100 may process only the separated voice signals corresponding to the speaker terminals ST1 to ST4 to which the authority level equal to or higher than the reference level is assigned, but the embodiments of the present invention are not limited thereto. .

도 9에 도시된 바와 같이, 음성 처리 장치(100)는 제1화자(SPK1)의 음성 '문 열어줘'와 관련된 제1분리 음성 신호의 처리 여부를, 제1분리 음성 신호와 대응하는 제1화자 단말기(ST1)의 권한 수준 '4'에 따라 결정할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 제1분리 음성 신호의 위치 'FL'와 대응하는 단말 위치를 갖는 제1화자 단말기(ST1)를 식별하고, 제1화자 단말기(ST1)의 권한 수준을 리드하고, 리드된 권한 수준에 따라 제1분리 음성 신호를 처리할 수 있다. 예컨대, 기준 수준이 2이므로, 음성 처리 장치(100)는 제1분리 음성 신호를 처리할 수 있고, 이에 따라, 차량(200)은 음성 '문 열어줘'에 대응하는 동작(예컨대, 문 열림 동작)을 수행할 수 있다.As shown in FIG. 9 , the voice processing apparatus 100 determines whether or not the first split voice signal related to the voice 'open the door' of the first speaker SPK1 is processed, in a first split voice signal corresponding to the first split voice signal. It can be determined according to the authority level '4' of the speaker terminal ST1. According to embodiments, the voice processing apparatus 100 identifies the first talker terminal ST1 having a terminal location corresponding to the location 'FL' of the first split audio signal, and determines the authority level of the first talker terminal ST1. and may process the first separated voice signal according to the led authority level. For example, since the reference level is 2, the voice processing device 100 may process the first separate voice signal, and accordingly, the vehicle 200 performs an operation corresponding to the voice 'open the door' (eg, a door open operation). ) can be performed.

또한, 도 9에 도시된 바와 같이, 음성 처리 장치(100)는 제4화자(SPK4)의 음성 '시동 꺼줘'와 관련된 제4분리 음성 신호의 처리 여부를, 제4분리 음성 신호와 대응하는 제4화자 단말기(ST4)의 권한 수준 '1'에 따라 결정할 수 있다. 실시 예들에 따라, 음성 처리 장치(100)는 제4분리 음성 신호의 위치 'BR'와 대응하는 단말 위치를 갖는 제4화자 단말기(ST4)를 식별하고, 제4화자 단말기(ST4)의 권한 수준을 리드하고, 리드된 권한 수준에 따라 제4분리 음성 신호를 처리할 수 있다. 예컨대, 기준 수준이 2이므로, 음성 처리 장치(100)는 제4분리 음성 신호를 처리하지 않을 수 있다. 즉, 이 경우 차량(200)은 제4화자(SPK4)가 '시동 꺼줘'라는 음성을 발화했음에도, '시동 꺼줘'에 대응하는 동작을 수행하지 않을 수 있다.In addition, as shown in FIG. 9 , the voice processing apparatus 100 determines whether or not to process the fourth separation voice signal related to the voice 'turn off the engine' of the fourth speaker (SPK4), and determines whether or not the fourth separation voice signal and the corresponding voice signal are processed. This can be determined according to the authority level '1' of the 4-speaker terminal ST4. According to embodiments, the voice processing apparatus 100 identifies a fourth speaker terminal ST4 having a terminal location corresponding to the location 'BR' of the fourth divided voice signal, and determines the authority level of the fourth speaker terminal ST4. and may process the fourth separate voice signal according to the led authority level. For example, since the reference level is 2, the voice processing apparatus 100 may not process the fourth divided voice signal. That is, in this case, the vehicle 200 may not perform an operation corresponding to 'turn off the engine' even though the fourth speaker SPK4 utters a voice 'turn off the engine'.

이상과 같이 실시 예들이 비록 한정된 실시 예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

100: 음성 처리 장치 110: 마이크
120: 음성 처리 회로 130: 메모리
140: 통신 회로 150: 측위 회로
151: 안테나 160: 스피커
200: 차량 SPK1~SPK4: 화자
ST1~ST4: 화자 단말기 100: voice processing device 110: microphone
120: voice processing circuit 130: memory
140: communication circuit 150: positioning circuit
151: antenna 160: speaker
200: vehicle SPK1 to SPK4: speaker
ST1~ST4: Speaker Terminal

Claims

a microphone configured to generate a voice signal in response to voices uttered by a plurality of speakers;
a voice processing circuit configured to generate a separated voice signal associated with each of the voices by separating the voice signal from a sound source based on the position of each of the voices;
a positioning circuit configured to measure terminal positions of talker terminals of the speakers; and
A memory for storing permission level information indicating a permission level for each of the speaker terminals;
The voice processing circuit,
determining a speaker terminal having a terminal position corresponding to a sound source position of the separated voice signal;
Processing the separated voice signal according to the authority level corresponding to the determined speaker terminal with reference to the authority level information,
voice processing device.

According to claim 1,
The microphone comprises a plurality of microphones arranged to form an array,
voice processing device.

The method of claim 2, wherein the audio processing circuit,
Based on a time delay between a plurality of voice signals generated from the plurality of microphones, determining a location of a sound source of each of the voices;
Based on the determined sound source location, generating the separated voice signal,
voice processing device.

The method of claim 2, wherein the audio processing circuit,
Based on the time delay between the plurality of voice signals generated from the plurality of microphones, sound source location information indicating the location of each sound source of the voices is generated, and the sound source location information for the voice and the separated voice for the voice Matching signals to each other and storing them in the memory,
voice processing device.

The method of claim 1, wherein the positioning circuit,
Transmitting and receiving radio signals with each of the talker terminals, and determining a terminal location of each of the talker terminals according to a transmission/reception result.
voice processing device.

According to claim 1,
The voice processing device is installed in a vehicle,
The processing of the separated audio signal by the audio processing circuit,
Recognizing a command for controlling the vehicle from the separated voice signal, and determining an operation command corresponding to the recognized command,
voice processing device.

The method of claim 1, wherein the audio processing circuit,
If the authority level corresponding to the determined speaker terminal is greater than or equal to a reference level, processing the separated voice signal;
If the authority level corresponding to the determined speaker terminal is less than the reference level, the separated voice signal is not processed.
voice processing device.