KR102223653B1

KR102223653B1 - Apparatus and method for processing voice signal and terminal

Info

Publication number: KR102223653B1
Application number: KR1020160076806A
Authority: KR
Inventors: 이민규; 김상훈; 김영익; 김동현; 최무열
Original assignee: 한국전자통신연구원
Priority date: 2015-07-10
Filing date: 2016-06-20
Publication date: 2021-03-05
Also published as: KR20170007114A

Abstract

본 발명의 일 실시예에 따른 음성 신호 처리 장치는 사용자의 음성 신호를 입력받는 입력부, 상기 사용자의 발화에 기인하는 움직임을 감지해서 상기 사용자의 음성신호 발화구간 식별을 위한 보조 신호를 감지하는 감지부, 상기 사용자로부터 동작 모드의 선택 및 상기 음성 신호와 보조 신호에 대한 프로토콜 적용 방식의 선택 중 적어도 하나에 관한 정보를 입력받는 스위치 및 선택된 상기 동작 모드가 제 1 동작 모드인 경우 상기 음성 신호를 제 1 프로토콜을 이용하여 외부 단말로 전송하고, 선택된 상기 동작 모드가 제 2 동작 모드인 경우 상기 음성 신호 및 보조 신호를 상기 제 1 프로토콜을 이용하여 상기 외부 단말로 전송하거나, 상기 음성 신호 및 보조 신호 별로 상이하게 각각 상기 제 1 프로토콜 및 제 2 프로토콜 중 하나의 프로토콜을 이용하여 상기 외부 단말로 전송하는 신호 처리부를 포함할 수 있다. The voice signal processing apparatus according to an embodiment of the present invention includes an input unit receiving a user's voice signal, a sensing unit detecting a motion caused by the user's utterance and detecting an auxiliary signal for identifying the user's voice signal utterance section. , A switch receiving information on at least one of selection of an operation mode and selection of a protocol application method for the voice signal and an auxiliary signal from the user, and when the selected operation mode is a first operation mode, the voice signal is first Transmits to an external terminal using a protocol, and when the selected operation mode is the second operation mode, the voice signal and the auxiliary signal are transmitted to the external terminal using the first protocol, or the voice signal and the auxiliary signal are different. Each may include a signal processing unit for transmitting to the external terminal using one of the first protocol and the second protocol.

Description

Voice signal processing apparatus and method, and terminal TECHNICAL FIELD [APPARATUS AND METHOD FOR PROCESSING VOICE SIGNAL AND TERMINAL}

본 발명은 음성 신호 처리 장치 및 방법, 그리고 단말에 관한 것이다.The present invention relates to a voice signal processing apparatus and method, and to a terminal.

종래의 스마트폰, PC와 같은 음성 인식 서비스를 제공하는 장치에서는 발화자의 음성을 음성 인식 단말기에 전달하는 도구로 PC 마이크, 스마트폰 마이크, 혹은 블루투스 헤드셋 등이 사용되었다. 그 중에서도 블루투스 헤드셋은 별도의 케이블 없이 귀에만 장착하는 형태로 사용자에게 편의성을 제공하므로 널리 사용되고 있다. In conventional devices that provide a voice recognition service such as a smartphone or a PC, a PC microphone, a smartphone microphone, or a Bluetooth headset has been used as a tool for transmitting the talker's voice to the voice recognition terminal. Among them, Bluetooth headsets are widely used because they provide convenience to users in the form of being mounted only on the ear without a separate cable.

그러나 위와 같은 마이크, 블루투스 헤드셋들은 발화자의 음성 이외에 다른 사람의 목소리나 주변 잡음에 취약하다. 주변 잡음을 제거하기 위해 자체적으로 신호 처리 과정을 거쳐 음성 인식 단말기로 전달하는 경우도 있지만, 신호 처리가 완료된 단일(mono) 채널 정보밖에 전달할 수 없어 소프트웨어적으로 추가적인 후 처리를 할 수 없다는 단점이 있었다. 또한 통화의 상대방의 음성도 그대로 사용자의 마이크로 들어가게 되어 음성 인식 기능의 오동작을 일으키게 하는 원인이 되는 문제가 있었다. However, these microphones and Bluetooth headsets are vulnerable to other people's voices or ambient noise other than the talker's voice. In order to remove ambient noise, there are cases in which the signal is processed by itself and transmitted to the voice recognition terminal, but there is a disadvantage that additional post-processing by software cannot be performed because only single channel information that has been signal-processed can be transmitted. . In addition, there is a problem that the voice of the other party of the call enters the user's microphone as it is, causing a malfunction of the voice recognition function.

본 발명의 실시예들의 목적은 하드웨어의 추가적인 변경없이 음성 인식 처리를 위한 보조 신호를 외부 단말로 전달할 수 있는 음성 신호 처리 장치 및 방법, 그리고 단말을 제공하는 데 있다. It is an object of the embodiments of the present invention to provide a voice signal processing apparatus and method, and a terminal capable of transmitting an auxiliary signal for voice recognition processing to an external terminal without additional change in hardware.

본 발명의 실시예들의 목적은 음성 인식의 정확도를 향상시킬 수 있는 음성 신호 처리 장치 및 방법, 그리고 단말을 제공하는 데 있다. It is an object of embodiments of the present invention to provide a voice signal processing apparatus and method, and a terminal capable of improving the accuracy of voice recognition.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재들로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

본 발명의 일 실시예에 따른 음성 신호 처리 장치는 사용자의 음성 신호를 입력받는 입력부, 상기 사용자의 발화에 기인하는 움직임을 감지해서 상기 사용자의 음성신호 발화구간 식별을 위한 보조 신호를 감지하는 감지부, 상기 사용자로부터 동작 모드의 선택 및 상기 음성 신호와 보조 신호에 대한 프로토콜 적용 방식의 선택 중 적어도 하나에 관한 정보를 입력받는 스위치 및 선택된 상기 동작 모드가 제 1 동작 모드인 경우 상기 음성 신호를 제 1 프로토콜을 이용하여 외부 단말로 전송하고, 선택된 상기 동작 모드가 제 2 동작 모드인 경우 상기 음성 신호 및 보조 신호를 상기 제 1 프로토콜을 이용하여 상기 외부 단말로 전송하거나, 상기 음성 신호 및 보조 신호 별로 상이하게 각각 상기 제 1 프로토콜 및 제 2 프로토콜 중 하나의 프로토콜을 이용하여 상기 외부 단말로 전송하는 신호 처리부를 포함할 수 있다.The voice signal processing apparatus according to an embodiment of the present invention includes an input unit receiving a user's voice signal, a sensing unit detecting a motion caused by the user's utterance and detecting an auxiliary signal for identifying the user's voice signal utterance section. , A switch receiving information on at least one of selection of an operation mode and selection of a protocol application method for the voice signal and an auxiliary signal from the user, and when the selected operation mode is a first operation mode, the voice signal is first Transmits to an external terminal using a protocol, and when the selected operation mode is the second operation mode, the voice signal and the auxiliary signal are transmitted to the external terminal using the first protocol, or the voice signal and the auxiliary signal are different. Each may include a signal processing unit for transmitting to the external terminal using one of the first protocol and the second protocol.

삭제delete

일 실시예에서, 상기 신호 처리부는 상기 제 2 동작 모드에서 핸즈프리 프로파일(HandsFree Profile, HFP)에 기반하여 상기 음성 신호를 상기 외부 단말로 전송하고, 블루투스 저전력(Bluetooth Low Energy, BLE)에 기반하여 상기 보조 신호를 상기 외부 단말로 전송할 수 있다. In one embodiment, the signal processing unit transmits the voice signal to the external terminal based on a HandsFree Profile (HFP) in the second operation mode, and based on Bluetooth Low Energy (BLE), the An auxiliary signal can be transmitted to the external terminal.

일 실시예에서, 상기 신호 처리부는 상기 제 2 동작 모드에서 상기 음성 신호 및 보조 신호를 통합하여 통합 신호를 생성하고, 상기 통합 신호를 핸즈프리 프로파일에 기반하여 상기 외부 단말로 전송할 수 있다. In an embodiment, the signal processor may generate an integrated signal by integrating the voice signal and an auxiliary signal in the second operation mode, and transmit the integrated signal to the external terminal based on a hands-free profile.

일 실시예에서, 상기 감지부는 인-이어(in-ear) 타입 마이크, 골전도 마이크, 모션 센서 및 자이로 센서 중 적어도 어느 하나를 포함할 수 있다. In an embodiment, the sensing unit may include at least one of an in-ear type microphone, a bone conduction microphone, a motion sensor, and a gyro sensor.

일 실시예에서, 상기 감지부가 상기 인-이어 타입 마이크 또는 골전도 마이크이고, 상기 보조 신호의 크기가 기준치 이상인 경우 상기 신호 처리부는 상기 보조 신호를 상기 외부 단말로 전송할 수 있다. In an embodiment, when the detection unit is the in-ear type microphone or bone conduction microphone, and the amplitude of the auxiliary signal is greater than or equal to a reference value, the signal processor may transmit the auxiliary signal to the external terminal.

일 실시예에서, 상기 음성 신호 또는 보조 신호를 상기 외부 단말로 전송하는 통신부를 더 포함하고, 상기 통신부는 블루투스 통신 모듈을 포함할 수 있다. In one embodiment, further comprising a communication unit for transmitting the voice signal or the auxiliary signal to the external terminal, the communication unit may include a Bluetooth communication module.

삭제delete

일 실시예에서, 상기 동작 모드 선택은 상기 제 1 동작 모드 또는 제 2 동작 모드 중 하나의 선택이고, 상기 음성 신호와 보조 신호에 대한 프로토콜 적용 방식 선택은 상기 제 2 동작 모드로 동작하는 경우 상기 음성 신호 및 보조 신호를 상기 제 1 프로토콜 또는 제 2 프로토콜 중 중 선택된 하나를 이용하여 상기 외부 단말로 전송하거나, 상기 음성 신호 및 보조 신호 별로 상이하게 각각 상기 제 1 프로토콜 및 제 2 프로토콜 중 하나의 프로토콜을 이용하여 상기 외부 단말로 전송하는 것 중 하나의 선택일 수 있다. In one embodiment, the selection of the operation mode is a selection of one of the first operation mode or the second operation mode, and the selection of a protocol application method for the voice signal and the auxiliary signal is the voice signal when the second operation mode is operated. Transmit a signal and an auxiliary signal to the external terminal using one of the first protocol or the second protocol, or use one of the first protocol and the second protocol differently for each of the voice signal and the auxiliary signal. It may be one of transmission to the external terminal by using.

본 발명의 일 실시예에 따른 음성 신호 처리 방법은 사용자의 음성 신호를 입력받는 단계, 상기 사용자의 발화에 기인하는 움직임을 감지해서 상기 사용자의 음성 신호 발화구간 식별을 위한 보조 신호를 감지하는 단계, 상기 사용자가 선택한 동작 모드 및 상기 음성 신호와 보조 신호에 대한 프로토콜 적용 방식 중 적어도 하나에 관한 정보를 입력받는 단계 및 입력된 상기 동작 모드가 제 1 동작 모드인 경우 상기 음성 신호를 제 1 프로토콜을 이용하여 외부 단말로 전송하고, 선택된 상기 동작 모드가 제 2 동작 모드인 경우 상기 음성 신호 및 보조 신호를 상기 제 1 프로토콜을 이용하여 상기 외부 단말로 전송하거나, 상기 음성 신호 및 보조 신호 별로 상이하게 각각 상기 제 1 프로토콜 및 제 2 프로토콜 중 하나의 프로토콜을 이용하여 상기 외부 단말로 전송하는 단계를 포함한다. The voice signal processing method according to an embodiment of the present invention includes the steps of receiving a user's voice signal, detecting a motion caused by the user's utterance, and detecting an auxiliary signal for identifying the user's voice signal utterance section, Receiving information on at least one of an operation mode selected by the user and a protocol application method for the voice signal and an auxiliary signal, and when the input operation mode is a first operation mode, the voice signal is used as a first protocol Is transmitted to the external terminal, and when the selected operation mode is the second operation mode, the voice signal and the auxiliary signal are transmitted to the external terminal using the first protocol, or the voice signal and the auxiliary signal are differently each And transmitting to the external terminal using one of the first protocol and the second protocol.

삭제delete

일 실시예에서, 상기 제 1 동작 모드로 선택되는 경우 상기 음성 신호를 외부 단말로 전송하고, 제 2 동작 모드로 선택되는 경우 상기 음성 신호 및 보조 신호를 동일한 프로토콜 또는 서로 다른 프로토콜을 이용하여 상기 외부 단말로 전송하는 단계는 상기 제 2 동작 모드에서 핸즈프리 프로파일(HandsFree Profile, HFP)에 기반하여 상기 음성 신호를 상기 외부 단말로 전송하고, 블루투스 저전력(Bluetooth Low Energy, BLE)에 기반하여 상기 보조 신호를 상기 외부 단말로 전송할 수 있다. In one embodiment, when the first operation mode is selected, the voice signal is transmitted to an external terminal, and when the second operation mode is selected, the voice signal and the auxiliary signal are transmitted to the external terminal using the same protocol or different protocols. Transmitting to the terminal includes transmitting the voice signal to the external terminal based on a HandsFree Profile (HFP) in the second operation mode, and transmitting the auxiliary signal based on Bluetooth Low Energy (BLE). It can be transmitted to the external terminal.

일 실시예에서, 입력된 상기 동작 모드에 기초하여 상기 음성 신호 및 보조 신호 중 적어도 하나를 상기 외부 단말로 전송하는 단계는 상기 제 2 동작 모드에서 핸즈프리 프로파일(HandsFree Profile, HFP)에 기반하여 상기 음성 신호를 상기 외부 단말로 전송하고, 블루투스 저전력(Bluetooth Low Energy, BLE)에 기반하여 상기 보조 신호를 상기 외부 단말로 전송할 수 있다.In an embodiment, the transmitting of at least one of the voice signal and the auxiliary signal to the external terminal based on the input operation mode comprises the voice signal based on a HandsFree Profile (HFP) in the second operation mode. A signal may be transmitted to the external terminal, and the auxiliary signal may be transmitted to the external terminal based on Bluetooth Low Energy (BLE).

삭제delete

본 발명의 실시예들에 따른 음성 신호 처리 장치 및 방법, 그리고 단말에 따르면 하드웨어의 추가적인 변경없이 음성 인식 처리를 위한 보조 신호를 외부 단말로 전달할 수 있다. According to the voice signal processing apparatus and method and the terminal according to the embodiments of the present invention, it is possible to transmit an auxiliary signal for voice recognition processing to an external terminal without additional change in hardware.

본 발명의 실시예들에 따른 음성 신호 처리 장치 및 방법, 그리고 단말에 따르면 음성 인식의 정확도를 향상시킬 수 있다. According to the voice signal processing apparatus and method and the terminal according to embodiments of the present invention, it is possible to improve the accuracy of voice recognition.

도 1은 본 발명의 일 실시예에 따른 음성 인식 처리 시스템을 보여준다.
도 2는 본 발명의 일 실시예에 따른 음성 신호 처리 장치를 보여주는 블록도이다.
도 3은 본 발명의 일 실시예에 따른 음성 신호 처리 방법을 보여주는 흐름도이다.
도 4는 본 발명의 일 실시예에 따른 단말을 보여주는 블록도이다.
도 5는 본 발명의 일 실시예에 따른 단말의 음성 인식 처리 동작을 설명하기 위한 도면이다. 1 shows a speech recognition processing system according to an embodiment of the present invention.
2 is a block diagram showing a voice signal processing apparatus according to an embodiment of the present invention.
3 is a flowchart illustrating a method of processing a voice signal according to an embodiment of the present invention.
4 is a block diagram showing a terminal according to an embodiment of the present invention.
5 is a diagram illustrating a voice recognition processing operation of a terminal according to an embodiment of the present invention.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 실시예에 대한 이해를 방해한다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to elements of each drawing, it should be noted that the same elements are assigned the same numerals as possible, even if they are indicated on different drawings. In addition, in describing an embodiment of the present invention, if it is determined that a detailed description of a related known configuration or function interferes with an understanding of the embodiment of the present invention, the detailed description thereof will be omitted.

본 발명의 실시예의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 또한, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In describing the constituent elements of the embodiment of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are for distinguishing the constituent element from other constituent elements, and the nature, order, or order of the constituent element is not limited by the term. In addition, unless otherwise defined, all terms including technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in the present application. Does not.

도 1은 본 발명의 일 실시예에 따른 음성 인식 처리 시스템을 보여준다. 도 2는 본 발명의 일 실시예에 따른 음성 신호 처리 장치를 보여주는 블록도이다. 1 shows a speech recognition processing system according to an embodiment of the present invention. 2 is a block diagram showing a voice signal processing apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 음성 인식 처리 시스템(1000)은 사용자의 음성을 인식 처리하여, 다른 언어로의 번역, 기기 제어와 같은 기능들을 제공할 수 있다. 이를 위해 음성 인식 처리 시스템(1000)은 음성 신호 처리 장치(100) 및 단말(200)을 포함할 수 있다. 다만, 도 1에서는 사용자의 음성을 입력받는 음성 신호 처리 장치(100)와 음성 인식을 처리하는 단말(200)이 기능적으로 구분되어 도시되어 있으나, 이에 한정되는 것은 아니며, 음성 신호 처리 장치(100)와 단말(200)은 하나의 전자 기기로 통합되어 구성될 수도 있다. Referring to FIG. 1, a voice recognition processing system 1000 according to an embodiment of the present invention may recognize and process a user's voice to provide functions such as translation into another language and device control. To this end, the speech recognition processing system 1000 may include a speech signal processing apparatus 100 and a terminal 200. However, in FIG. 1, a voice signal processing apparatus 100 for receiving a user's voice input and a terminal 200 for processing voice recognition are functionally separated, but are not limited thereto, and the voice signal processing apparatus 100 The and terminal 200 may be integrated and configured as one electronic device.

음성 신호 처리 장치(100)는 사용자로부터 입력되는 음성 신호를 단말(200)로 전달할 수 있다. 예를 들어, 음성 신호 처리 장치(100)는 블루투스 핸즈프리(Bluetooth Handsfree) 기기일 수 있으며, 사용자의 귀에 착용이 용이한 형태를 가질 수 있다. 또한, 음성 신호 처리 장치(100)는 음성 신호의 음성 인식 처리를 위한 음성 구간 식별에 이용되는 보조 신호를 단말(200)로 전달할 수 있다. The voice signal processing apparatus 100 may transmit a voice signal input from a user to the terminal 200. For example, the voice signal processing apparatus 100 may be a Bluetooth Handsfree device, and may have a shape that is easy to be worn on a user's ear. In addition, the voice signal processing apparatus 100 may transmit an auxiliary signal used to identify a voice section for voice recognition processing of the voice signal to the terminal 200.

음성 신호 처리 장치(100)는 동작 모드에 따라 음성 신호 및/또는 보조 신호를 단말(200)로 전달할 수 있다. 예를 들어, 동작 모드는 사용자에 의해 설정될 수 있다. 음성 신호 처리 장치(100)는 제 1 동작 모드에서 음성 신호만을 단말(200)로 전송할 수 있고, 제 2 동작 모드에서 음성 신호 및 보조 신호를 단말(200)로 전송하되, 음성 신호 및 보조 신호를 동일한 프로토콜 또는 서로 다른 프로토콜을 이용하여 단말(200)로 전송할 수 있다. 예를 들어, 음성 신호 처리 장치(100)는 음성 신호 및 보조 신호를 동일한 프로토콜을 이용하여 단말(200)로 전송하는 경우, 음성 신호 및 보조 신호를 통합하여 통합 신호를 생성하고, 생성된 통합 신호를 단말(200)로 전송할 수 있다. 상기 프로토콜은 블루투스 핸즈프리 프로파일(HandsFree Profile) 및 블루투스 저전력(Bluetooth Low Energy, BLE) 프로토콜을 포함할 수 있다. The voice signal processing apparatus 100 may transmit a voice signal and/or an auxiliary signal to the terminal 200 according to an operation mode. For example, the operation mode can be set by the user. The voice signal processing apparatus 100 may transmit only the voice signal to the terminal 200 in the first operation mode, and transmit the voice signal and the auxiliary signal to the terminal 200 in the second operation mode, and transmit the voice signal and the auxiliary signal. It can be transmitted to the terminal 200 using the same protocol or different protocols. For example, when the voice signal processing apparatus 100 transmits the voice signal and the auxiliary signal to the terminal 200 using the same protocol, the voice signal and the auxiliary signal are integrated to generate an integrated signal, and the generated integrated signal May be transmitted to the terminal 200. The protocol may include a Bluetooth HandsFree Profile and a Bluetooth Low Energy (BLE) protocol.

상술한 바와 같이, 음성 신호 처리 장치(100)는 동작 모드에 따라 음성 신호 및 보조 신호를 동일한 프로토콜 또는 서로 다른 프로토콜을 통해 단말(200)로 전송할 수 있고, 음성 신호 및 보조 신호를 동일한 프로토콜을 통해 단말(200)로 전송하는 경우, 음성 신호 및 보조 신호를 통합하여 기존의 음성 신호를 전달하는 프로토콜(예를 들어, 블루투스 핸즈프리 프로파일(HFP))에 기반하여 단말(200)로 전송할 수도 있다. 따라서, 하드웨어의 추가적인 변경없이 음성 인식 처리를 위한 보조 신호를 단말(200)로 전달할 수 있고, 음성 신호 처리 장치(100)의 핸즈프리 동작에도 전혀 제약이 없을 수 있다. 음성 신호 처리 장치(100)는 도 2를 참조하여 더욱 구체적으로 설명될 것이다. As described above, the voice signal processing apparatus 100 may transmit a voice signal and an auxiliary signal to the terminal 200 through the same protocol or different protocols according to the operation mode, and transmit the voice signal and the auxiliary signal through the same protocol. In the case of transmission to the terminal 200, the voice signal and an auxiliary signal may be combined to be transmitted to the terminal 200 based on a protocol (eg, Bluetooth Hands-Free Profile (HFP)) for transmitting an existing voice signal. Accordingly, an auxiliary signal for speech recognition processing may be transmitted to the terminal 200 without additional hardware change, and hands-free operation of the speech signal processing apparatus 100 may not be restricted at all. The audio signal processing apparatus 100 will be described in more detail with reference to FIG. 2.

도 1 및 도 2를 참조하면, 음성 신호 처리 장치(100)는 입력부(110), 감지부(120), 신호 처리부(130), 통신부(140), 스위치(150), 및 출력부(160)를 포함할 수 있다. 1 and 2, the voice signal processing apparatus 100 includes an input unit 110, a detection unit 120, a signal processing unit 130, a communication unit 140, a switch 150, and an output unit 160. It may include.

입력부(110)는 사용자로부터 음성 신호를 입력받을 수 있다. 예를 들어, 입력부(110)는 모노 마이크일 수 있으나, 이에 한정되는 것은 아니다. The input unit 110 may receive an audio signal from a user. For example, the input unit 110 may be a mono microphone, but is not limited thereto.

감지부(120)는 보조 신호를 감지할 수 있다. 여기서, 보조 신호는 사용자의 음성 신호의 음성 인식 처리를 위한 음성 구간 식별에 이용되는 신호를 의미할 수 있다. 예를 들어, 감지부(120)는 인-이어(in-ear) 마이크, 골전도 마이크, 모션 센서 및 자이로 센서 중 적어도 어느 하나를 포함할 수 있다. The detection unit 120 may detect an auxiliary signal. Here, the auxiliary signal may mean a signal used to identify a voice section for voice recognition processing of a user's voice signal. For example, the sensing unit 120 may include at least one of an in-ear microphone, a bone conduction microphone, a motion sensor, and a gyro sensor.

예를 들어, 감지부(120)가 인-이어 마이크 또는 골전도 마이크인 경우 감지부(120)는 사용자의 귓속 또는 관자놀이 부근에 배치될 수 있고, 사용자가 발화하면 사용자의 음성 신호를 감지할 수 있으며, 이러한 측면에서 보조 신호는 감지부(120)를 통해 감지되는 사용자의 음성 신호를 의미할 수 있다. 즉, 감지부(120)는 입력부(110)와 달리 주변 소음/잡음의 영향이 상대적으로 적은 음성 신호가 입력되므로, 사용자의 음성 신호의 음성 구간을 식별하는 데 유용하게 이용될 수 있다. For example, when the sensing unit 120 is an in-ear microphone or a bone conduction microphone, the sensing unit 120 may be disposed in the user's ear or near the temple, and when the user speaks, the user's voice signal can be detected. In this aspect, the auxiliary signal may mean a user's voice signal sensed through the sensing unit 120. That is, unlike the input unit 110, the sensing unit 120 inputs a voice signal having a relatively small influence of ambient noise/noise, and thus, may be usefully used to identify a voice section of a user's voice signal.

또한, 예를 들어, 감지부(120)가 모션 센서 또는 자이로 센서인 경우, 사용자가 발화하면 감지부(120)에 발생하는 움직임을 감지할 수 있으며, 이러한 측면에서 보조 신호는 감지부(120)를 통해 감지되는 사용자의 움직임을 의미할 수 있다. 즉, 감지부(120)는 사용자의 발화에 기인하는 움직임을 감지하므로, 사용자의 음성 신호의 음성 구간을 식별하는 데 유용하게 이용될 수 있다. In addition, for example, in the case where the sensing unit 120 is a motion sensor or a gyro sensor, a motion occurring in the sensing unit 120 can be detected when the user ignites, and in this aspect, the auxiliary signal is the sensing unit 120 It may mean the user's movement detected through. That is, since the sensing unit 120 detects motion caused by the user's speech, it may be usefully used to identify the voice section of the user's voice signal.

신호 처리부(130)는 동작 모드에 따라 동작 모드에 따라 음성 신호 및/또는 보조 신호를 단말(200)로 전달할 수 있다. 예를 들어, 동작 모드는 사용자에 의해 설정될 수 있으며, 제 1 동작 모드 및 제 2 동작 모드를 포함할 수 있다. 신호 처리부(130)는 보조 신호가 기준 레벨 이상인 경우 보조 신호를 단말(200)로 전달할 수 있다. The signal processing unit 130 may transmit a voice signal and/or an auxiliary signal to the terminal 200 according to an operation mode according to an operation mode. For example, the operation mode may be set by the user, and may include a first operation mode and a second operation mode. The signal processing unit 130 may transmit the auxiliary signal to the terminal 200 when the auxiliary signal is equal to or higher than the reference level.

신호 처리부(130)는 제 1 동작 모드에서 블루투스 핸즈프리 프로파일(HFP)에 기반하여 음성 신호만을 단말(200)로 전송할 수 있다. 신호 처리부(130)는 제 2 동작 모드에서 음성 신호 및 보조 신호를 단말(200)로 전송하되, 음성 신호 및 보조 신호를 동일한 프로토콜 또는 서로 다른 프로토콜을 이용하여 단말(200)로 전송할 수 있다. 상기 프로토콜은 블루투스 핸즈프리 프로파일(HFP) 및 블루투스 저전력(Bluetooth Low Energy, BLE) 프로토콜을 포함할 수 있다. The signal processing unit 130 may transmit only the voice signal to the terminal 200 based on the Bluetooth hands-free profile (HFP) in the first operation mode. The signal processing unit 130 transmits the voice signal and the auxiliary signal to the terminal 200 in the second operation mode, but may transmit the voice signal and the auxiliary signal to the terminal 200 using the same protocol or different protocols. The protocol may include a Bluetooth Hands-Free Profile (HFP) and a Bluetooth Low Energy (BLE) protocol.

예를 들어, 신호 처리부(130)는 음성 신호 및 보조 신호를 서로 다른 프로토콜을 이용하여 단말(200)로 전송하는 경우, 음성 신호는 블루투스 핸즈프리 프로파일(HFP)에 기반하여 단말(200)로 전송하고, 보조 신호는 블루투스 저전력(BLE)에 기반하여 단말(200)로 전송할 수 있다. 또한, 예를 들어, 신호 처리부(130)는 음성 신호 및 보조 신호를 동일한 프로토콜을 이용하여 단말(200)로 전송하는 경우, 음성 신호 및 보조 신호를 통합하여 통합 신호를 생성하고, 생성된 통합 신호를 블루투스 핸즈프리 프로파일(HFP)에 기반하여 단말(200)로 전송할 수 있다. 예를 들어, 통합 신호는 음성 신호 및 보조 신호가 소정 시간 간격으로 교대로 단말(200)로 전송되는 형태를 가질 수 있다. For example, when the signal processing unit 130 transmits a voice signal and an auxiliary signal to the terminal 200 using different protocols, the voice signal is transmitted to the terminal 200 based on the Bluetooth Hands Free Profile (HFP), and , The auxiliary signal may be transmitted to the terminal 200 based on Bluetooth low power (BLE). In addition, for example, when the signal processing unit 130 transmits the voice signal and the auxiliary signal to the terminal 200 using the same protocol, the voice signal and the auxiliary signal are integrated to generate an integrated signal, and the generated integrated signal May be transmitted to the terminal 200 based on the Bluetooth hands-free profile (HFP). For example, the integrated signal may have a form in which a voice signal and an auxiliary signal are alternately transmitted to the terminal 200 at predetermined time intervals.

통신부(140)는 신호 처리부(130)의 제어에 따라 음성 신호 및/또는 보조 신호를 단말(200)에 전달할 수 있다. 또한, 통신부(140)는 단말(200)로부터 음성/음향 출력 데이터들을 수신할 수 있다. 수신된 데이터들은 신호 처리부(130)를 통해 출력부(160)로 출력될 수 있다. 예를 들어, 통신부(140)는 블루투스 통신모듈을 포함할 수 있다.The communication unit 140 may transmit a voice signal and/or an auxiliary signal to the terminal 200 under the control of the signal processing unit 130. In addition, the communication unit 140 may receive voice/sound output data from the terminal 200. The received data may be output to the output unit 160 through the signal processing unit 130. For example, the communication unit 140 may include a Bluetooth communication module.

스위치(150)는 사용자로부터 제 1 동작 모드 또는 제 2 동작 모드를 입력받을 수 있다. 또한, 스위치(150)는 사용자로부터 제 2 동작 모드에서 음성 신호 및 보조 신호를 동일한 프로토콜(단일) 또는 서로 다른 프로토콜(개별)을 이용하여 단말(200)로 전송할지 여부를 입력받을 수 있다. The switch 150 may receive a first operation mode or a second operation mode from a user. In addition, the switch 150 may receive an input from the user whether to transmit the voice signal and the auxiliary signal to the terminal 200 using the same protocol (single) or different protocols (individual) in the second operation mode.

출력부(160)는 신호 처리부(130)로부터 전달되는 데이터/신호를 출력할 수 있다. 예를 들어, 출력부(160)는 이어폰일 수 있으나, 이에 한정되는 것은 아니다. The output unit 160 may output data/signals transmitted from the signal processing unit 130. For example, the output unit 160 may be an earphone, but is not limited thereto.

다시 도 1을 참조하면, 단말(200)은 음성 신호 처리 장치(100)로부터 전달되는 음성 신호를 음성 인식 처리할 수 있다. 단말(200)은 음성 신호 처리 장치(100)로부터 전달되는 보조 신호를 이용하여 음성 신호에 포함된 사용자의 음성 구간을 식별하고, 식별된 음성 구간을 이용하여 음성 인식을 처리할 수 있다. Referring back to FIG. 1, the terminal 200 may perform speech recognition processing on a speech signal transmitted from the speech signal processing apparatus 100. The terminal 200 may identify a user's voice section included in the voice signal by using an auxiliary signal transmitted from the voice signal processing apparatus 100 and process voice recognition using the identified voice section.

따라서, 단말(200)은 사용자의 음성 구간을 보다 정확하게 식별할 수 있고, 음성 인식의 정확도가 향상될 수 있다. 단말(200)의 동작에 대해서는 이하의 도 4 및 도 5를 참조하여 더욱 구체적으로 설명될 것이다. Accordingly, the terminal 200 may more accurately identify the user's voice section, and the accuracy of voice recognition may be improved. The operation of the terminal 200 will be described in more detail with reference to FIGS. 4 and 5 below.

도 3은 본 발명의 일 실시예에 따른 음성 신호 처리 방법을 보여주는 흐름도이다. 3 is a flowchart illustrating a method of processing a voice signal according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 음성 신호 처리 방법은 사용자의 음성 신호를 입력받는 단계(S110), 보조 신호를 감지하는 단계(S120), 동작 모드를 선택받는 단계(S130), 제 1 동작 모드로 선택되는 경우 음성 신호를 외부 단말로 전송하는 단계(S140), 제 2 동작 모드로 선택되는 경우 전송 스킴을 선택받는 단계(S150), 개별 전송으로 선택되는 경우 음성 신호 및 보조 신호를 서로 다른 프로토콜을 이용하여 단말(200)로 전송하는 단계(S160), 단일 전송으로 선택되는 경우 음성 신호 및 보조 신호를 통합하여 통합 신호를 생성하고, 통합 신호를 외부 단말로 전송하는 단계(S170)를 포함할 수 있다. Referring to FIG. 3, a method of processing a voice signal according to an embodiment of the present invention includes receiving a user's voice signal (S110), detecting an auxiliary signal (S120), and selecting an operation mode (S130). , When the first operation mode is selected, transmitting a voice signal to an external terminal (S140), when the second operation mode is selected, receiving a transmission scheme (S150), and when individual transmission is selected, the voice signal and auxiliary Transmitting the signal to the terminal 200 using different protocols (S160), when a single transmission is selected, generating an integrated signal by integrating the voice signal and an auxiliary signal, and transmitting the integrated signal to an external terminal ( S170) may be included.

이하에서는 도 1 및 도 2를 참조하여 상술한 S110 단계 내지 S170 단계가 더욱 구체적으로 설명된다. Hereinafter, steps S110 to S170 described above with reference to FIGS. 1 and 2 will be described in more detail.

S110 단계에서, 입력부(110)는 사용자로부터 음성 신호를 입력받을 수 있다. 예를 들어, 입력부(110)는 모노 마이크일 수 있으나, 이에 한정되는 것은 아니다. In step S110, the input unit 110 may receive a voice signal from the user. For example, the input unit 110 may be a mono microphone, but is not limited thereto.

S120 단계에서, 감지부(120)는 보조 신호를 감지할 수 있다. 여기서, 보조 신호는 사용자의 음성 신호의 음성 인식 처리를 위한 음성 구간 식별에 이용되는 신호를 의미할 수 있다. 예를 들어, 감지부(120)는 인-이어(in-ear) 마이크, 골전도 마이크, 모션 센서 및 자이로 센서 중 적어도 어느 하나를 포함할 수 있다. S120 단계는 S110 단계와 동시에 수행될 수 있다. In step S120, the detection unit 120 may detect an auxiliary signal. Here, the auxiliary signal may mean a signal used to identify a voice section for voice recognition processing of a user's voice signal. For example, the sensing unit 120 may include at least one of an in-ear microphone, a bone conduction microphone, a motion sensor, and a gyro sensor. Step S120 may be performed at the same time as step S110.

S130 단계에서, 스위치(150)는 사용자로부터 제 1 동작 모드 또는 제 2 동작 모드를 입력받을 수 있다. In step S130, the switch 150 may receive a first operation mode or a second operation mode from the user.

S140 단계에서, 신호 처리부(130)는 제 1 동작 모드에서 블루투스 핸즈프리 프로파일(HFP)에 기반하여 음성 신호만을 단말(200)로 전송할 수 있다. In step S140, the signal processing unit 130 may transmit only the voice signal to the terminal 200 based on the Bluetooth hands-free profile (HFP) in the first operation mode.

S150 단계에서, 스위치(150)는 사용자로부터 제 2 동작 모드에서 음성 신호 및 보조 신호를 동일한 프로토콜(단일) 또는 서로 다른 프로토콜(개별)을 이용하여 단말(200)로 전송할지 여부를 입력받을 수 있다. In step S150, the switch 150 may receive an input from the user whether to transmit the voice signal and the auxiliary signal to the terminal 200 using the same protocol (single) or different protocols (individual) in the second operation mode. .

S160 단계에서, 신호 처리부(130)는 음성 신호는 블루투스 핸즈프리 프로파일(HFP)에 기반하여 단말(200)로 전송하고, 보조 신호는 블루투스 저전력(BLE)에 기반하여 단말(200)로 전송할 수 있다. In step S160, the signal processing unit 130 may transmit the voice signal to the terminal 200 based on the Bluetooth Hands-Free Profile (HFP), and the auxiliary signal may be transmitted to the terminal 200 based on the Bluetooth low power (BLE).

S170 단계에서, 신호 처리부(130)는 음성 신호 및 보조 신호를 통합하여 통합 신호를 생성하고, 생성된 통합 신호를 블루투스 핸즈프리 프로파일(HFP)에 기반하여 단말(200)로 전송할 수 있다. In step S170, the signal processing unit 130 may generate an integrated signal by integrating the voice signal and the auxiliary signal, and transmit the generated integrated signal to the terminal 200 based on the Bluetooth Hands Free Profile (HFP).

도 4는 본 발명의 일 실시예에 따른 단말을 보여주는 블록도이다. 도 5는 본 발명의 일 실시예에 따른 단말의 음성 인식 처리 동작을 설명하기 위한 도면이다. 4 is a block diagram showing a terminal according to an embodiment of the present invention. 5 is a diagram for explaining a speech recognition processing operation of a terminal according to an embodiment of the present invention.

도 4를 참조하면, 단말(200)은 통신부(210) 및 음성 인식 처리부(210)를 포함할 수 있다. Referring to FIG. 4, the terminal 200 may include a communication unit 210 and a speech recognition processing unit 210.

통신부(210)는 음성 신호 처리 장치(100)로부터 음성 신호 및 보조 신호를 수신할 수 있다. 또한, 통신부(210)는 음성/음향 출력 데이터들을 음성 신호 처리 장치(100)로 전달할 수 있다. 예를 들어, 통신부(210)는 블루투스 통신모듈을 포함할 수 있다. The communication unit 210 may receive a voice signal and an auxiliary signal from the voice signal processing apparatus 100. In addition, the communication unit 210 may transmit voice/sound output data to the voice signal processing apparatus 100. For example, the communication unit 210 may include a Bluetooth communication module.

음성 인식 처리부(220)는 보조 신호를 이용하여 음성 신호에 포함된 사용자의 음성 구간을 식별하고, 식별된 음성 구간을 이용하여 음성 인식 처리를 수행할 수 있다. The voice recognition processing unit 220 may identify a user's voice section included in the voice signal using the auxiliary signal, and perform a voice recognition process using the identified voice section.

도 5를 참조하면, 음성 인식 처리부(220)는 보조 신호를 이용하여 음성 신호의 음성 구간을 식별할 수 있다. 예를 들어, 음성 인식 처리부(220)는 보조 신호의 레벨이 기준 레벨 이상인 구간에 대응되는 구간(b)을 음성 신호의 음성 구간으로 식별하고, 식별된 음성 구간을 이용하여 음성 인식 처리를 수행할 수 있다. 반면, 음성 인식 처리부(220)는 보조 신호의 레벨이 기준 레벨 미만인 구간에 대응되는 구간(a, c)은 음성 신호의 음성 구간이 아닌 것으로 식별할 수 있다. Referring to FIG. 5, the speech recognition processing unit 220 may identify a speech section of a speech signal using an auxiliary signal. For example, the speech recognition processing unit 220 identifies a section (b) corresponding to a section in which the level of the auxiliary signal is equal to or higher than the reference level as a voice section of the voice signal, and performs speech recognition processing using the identified voice section. I can. On the other hand, the speech recognition processing unit 220 may identify the sections a and c corresponding to the section in which the level of the auxiliary signal is less than the reference level as not the voice section of the voice signal.

상술한 바와 같이, 보조 신호는 주변 소음/잡음의 영향이 상대적으로 적은 음성 신호 또는 사용자의 발화에 기인하는 움직임을 감지한 신호이므로, 음성 신호의 음성 구간을 정확히 판별하는 데 이용될 수 있고, 결과적으로 음성 인식의 정확도가 향상될 수 있다. As described above, since the auxiliary signal is a voice signal with relatively little influence of ambient noise/noise or a signal that detects motion caused by the user's utterance, it can be used to accurately determine the voice section of the voice signal. As a result, the accuracy of speech recognition can be improved.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art to which the present invention pertains will be able to make various modifications and variations without departing from the essential characteristics of the present invention.

따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain the technical idea, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

1000: 음성 인식 처리 시스템
100: 음성 신호 처리 장치
110: 입력부
120: 감지부
130: 신호 처리부
140: 통신부
150: 스위치
160: 출력부
200: 단말
210: 통신부
220: 음성 인식 처리부1000: speech recognition processing system
100: voice signal processing device
110: input unit
120: detection unit
130: signal processing unit
140: communication department
150: switch
160: output
200: terminal
210: communication department
220: speech recognition processing unit

Claims

An input unit for receiving a user's voice signal;
A sensing unit detecting a motion caused by the user's speech and detecting an auxiliary signal for identifying the user's speech signal speech section;
A switch receiving information on at least one of selection of an operation mode and a protocol application method for the voice signal and the auxiliary signal from the user; And
When the selected operation mode is the first operation mode, the voice signal is transmitted to an external terminal using a first protocol,
When the selected operation mode is the second operation mode, the voice signal and the auxiliary signal are transmitted to the external terminal using the first protocol, or the first protocol and the second protocol are different for each of the voice signal and the auxiliary signal, respectively. Signal processing unit that transmits to the external terminal using one of the protocols
Speech signal processing apparatus comprising a.

delete

The method of claim 1,
The signal processing unit transmits the voice signal to the external terminal based on a HandsFree Profile (HFP) in the second operation mode, and transmits the auxiliary signal to the external terminal based on Bluetooth Low Energy (BLE). Voice signal processing apparatus, characterized in that transmitting to the terminal.

The method of claim 1,
Wherein the signal processor generates an integrated signal by integrating the voice signal and an auxiliary signal in the second operation mode, and transmits the integrated signal to the external terminal based on a hands-free profile.

The method of claim 1,
The detection unit includes at least one of an in-ear type microphone, a bone conduction microphone, a motion sensor, and a gyro sensor.

The method of claim 5,
When the sensing unit is the in-ear type microphone or a bone conduction microphone, and the auxiliary signal has a size equal to or greater than a reference value, the signal processing unit transmits the auxiliary signal to the external terminal.

The method of claim 1,
And a communication unit for transmitting the voice signal or an auxiliary signal to the external terminal, wherein the communication unit includes a Bluetooth communication module.

delete

The method of claim 1,
The selection of the operation mode is a selection of one of the first operation mode or the second operation mode,
The selection of the protocol application method for the voice signal and the auxiliary signal is to transmit the voice signal and the auxiliary signal to the external terminal using one of the first protocol or the second protocol when operating in the second operation mode. And transmitting to the external terminal using one of the first protocol and the second protocol differently for each of the voice signal and the auxiliary signal.

Receiving a user's voice signal;
Detecting a motion caused by the user's utterance and detecting an auxiliary signal for identifying the user's speech signal utterance section;
Receiving information on at least one of an operation mode selected by the user and a protocol application method for the voice signal and an auxiliary signal; And
When the input operation mode is the first operation mode, the voice signal is transmitted to an external terminal using a first protocol,
When the selected operation mode is the second operation mode, the voice signal and the auxiliary signal are transmitted to the external terminal using the first protocol, or the first protocol and the second protocol are different for each of the voice signal and the auxiliary signal, respectively. And transmitting to the external terminal using one of the protocols.

delete

The method of claim 10,
Transmitting at least one of the voice signal and an auxiliary signal to the external terminal based on the input operation mode includes transmitting the voice signal to the external terminal based on a HandsFree Profile (HFP) in the second operation mode. And transmitting the auxiliary signal to the external terminal based on Bluetooth Low Energy (BLE).

The method of claim 10,
Transmitting at least one of the voice signal and the auxiliary signal to the external terminal based on the input operation mode comprises generating an integrated signal by integrating the voice signal and an auxiliary signal in the second operation mode, and the integrated signal And transmitting to the external terminal based on a hands-free profile.

delete