KR20210056183A

KR20210056183A - Method, computer program and system for amplification of speech

Info

Publication number: KR20210056183A
Application number: KR1020190143002A
Authority: KR
Inventors: 홍성화; 문일준; 설혜윤
Original assignee: 사회복지법인 삼성생명공익재단; 성균관대학교산학협력단
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2021-05-18
Also published as: KR102301149B1

Abstract

According to an embodiment of the present invention, a method of selectively amplifying a voice of a specific speaker includes the steps of: identifying a target voice to be heard by a user from among a plurality of candidate voices based on user's brain waves while reproducing the plurality of candidate voices in time series; extracting at least one characteristic value of the target voice; checking whether the target voice is included in an input sound based on the characteristic value, and when the target voice is included, amplifying only the target voice to generate an amplified voice; and outputting the amplified voice. According to the present invention, hearing loss rehabilitation treatment can be better performed by amplifying and providing only the target voice which a user who is a hearing loss patient wants to hear.

Description

Method of selective amplification of voice, computer program and system {METHOD, COMPUTER PROGRAM AND SYSTEM FOR AMPLIFICATION OF SPEECH}

본 발명은 사용자의 뇌파에 기초하여 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하여 증폭하는 음성의 선택적 증폭 방법, 컴퓨터 프로그램 및 시스템에 관한 것이다.The present invention relates to a method, a computer program, and a system for selectively amplifying a voice that identifies and amplifies a target voice that a user wants to hear among a plurality of candidate voices based on a user's brain waves.

이어폰과 같은 개인용 음향 감상 도구의 보급 확대와 인구의 신속한 고령화로 난청 환자의 수가 점차 증가하고 있다.The number of patients with hearing loss is gradually increasing due to the expansion of the spread of personal acoustic listening tools such as earphones and the rapid aging of the population.

노인성 난청의 경우, 1차적으로 말초 청각 기관의 손실로 고주파수의 소리를 잘 듣지 못하고, 2차적으로는 중추 청각 기관의 손실로 주변의 배경소음 혹은 반향음이 있는 공간에서 상대방의 말소리를 이해하기가 어려우며, 시간처리 기능의 저하로 상대방이 빠르게 이야기할 때 말소리를 이해하기 힘든 특징이 있다. 뿐만 아니라, 작업기억 등의 인지 기능의 노화로 인해 긴 문장을 이해하거나 다화자의 대화 흐름을 원활하게 따라가기가 어렵다. 따라서 난청 환자들, 특히 노인성 난청 환자들을 고려한 효과적인 재활 치료 방법이 요구되는 상황이다. In the case of senile hearing loss, primarily due to the loss of the peripheral auditory organs, it is difficult to hear high-frequency sounds, and secondly, due to the loss of the central auditory organs, it is difficult to understand the other's speech in a space with background noise or reverberation. It is difficult, and it is difficult to understand the speech when the other person speaks quickly due to the deterioration of the time processing function. In addition, due to the aging of cognitive functions such as working memory, it is difficult to understand long sentences or to smoothly follow the flow of conversations of multiple speakers. Therefore, there is a need for an effective rehabilitation treatment method that considers patients with hearing loss, especially those with senile hearing loss.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-described background technology is technical information possessed by the inventor for derivation of the present invention or acquired during the derivation process of the present invention, and is not necessarily known to be known to the general public prior to filing the present invention.

본 발명은 난청 환자인 사용자가 청취하고자 하는 목표 음성만을 증폭하여 제공함으로써 난청의 재활 치료가 보다 잘 수행될 수 있도록 하고자 한다.An object of the present invention is to amplify and provide only a target voice that a user, who is a hearing-impaired patient, wants to hear, so that the rehabilitation treatment for hearing loss can be performed better.

본 발명의 일 실시예에 따른 특정 화자의 음성을 선택적으로 증폭하는 방법은, 복수의 후보 음성을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 상기 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하는 단계; 상기 목표 음성의 적어도 하나의 특성값을 추출하는 단계; 상기 특성값에 기초하여 입력 음향에 상기 목표 음성이 포함되어 있는지 여부를 확인하고, 상기 목표 음성이 포함된 경우 상기 목표 음성만을 증폭하여 증폭 음성을 생성하는 단계; 및 상기 증폭 음성을 출력하는 단계;를 포함할 수 있다.A method of selectively amplifying a voice of a specific speaker according to an embodiment of the present invention is, while reproducing a plurality of candidate voices in a time-sequential manner, a target voice that a user wants to hear among the plurality of candidate voices is based on the user's brain waves. Identifying a; Extracting at least one characteristic value of the target voice; Checking whether the target voice is included in the input sound based on the characteristic value, and if the target voice is included, generating an amplified voice by amplifying only the target voice; And outputting the amplified voice.

상기 목표 음성을 식별하는 단계는 상기 복수의 후보 음성의 청취 중 상기 뇌파의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 상기 목표 음성으로 식별할 수 있다. 이때 상기 물리량은 상기 뇌파의 진동수 및 상기 뇌파의 진폭 중 어느 하나 이상을 포함하고, 상기 소정의 조건은 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진동수가 가장 큰 음성일 조건 및 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 포함할 수 있다.In the step of identifying the target voice, a voice in which at least one of the physical quantities of the brain waves satisfies a predetermined condition while listening to the plurality of candidate voices may be identified as the target voice. At this time, the physical quantity includes any one or more of the frequency of the brain wave and the amplitude of the brain wave, and the predetermined condition is a condition that the frequency of the user's brain wave is the largest among the plurality of candidate voices, and the plurality of candidate voices. Among them, any one or more of the conditions in which the amplitude of the user's brain wave is the smallest voice may be included.

상기 목표 음성을 식별하는 단계는 학습된 인공 신경망을 이용하여, 상기 사용자가 청취하고자 하는 목표 음성을 결정하는 단계;를 포함하고, 상기 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.The step of identifying the target voice includes determining a target voice to be listened to by the user using the learned artificial neural network, wherein the artificial neural network includes EEG data indicating whether the user desires to hear or not. Based on the training data, it may be a neural network that has been trained to output whether a user wants to listen in response to an input of EEG data.

상기 적어도 하나의 특성값은 상기 목표 음성의 화자의 성별, 상기 목표 음성의 음고, 상기 목표 음성의 진동수 및 상기 목표 음성의 발화 패턴 중 적어도 하나를 포함할 수 있다.The at least one characteristic value may include at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and a speech pattern of the target voice.

상기 입력 음향은 실시간으로 입력되는 음향으로, 복수의 화자에 의해 발화된 복수의 발화 음성을 포함하고, 상기 증폭 음성을 생성하는 단계는 제1 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인하는 단계; 상기 발화 특성값과 상기 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인하는 단계; 및 상기 소정의 임계 유사도를 초과하는 경우 상기 증폭 음성의 출력 특성값 중 적어도 하나를 조절하는 단계;를 포함할 수 있다.The input sound is a sound that is input in real time, and includes a plurality of spoken voices uttered by a plurality of speakers, and the step of generating the amplified voice is a speech characteristic value that is a characteristic value of the speaker's voice spoken at a first time point. Confirming; Checking whether a similarity between the speech characteristic value and the characteristic value of the target speech exceeds a predetermined threshold similarity; And adjusting at least one of output characteristic values of the amplified voice when the predetermined threshold similarity is exceeded.

상기 출력 특성값 중 적어도 하나를 조절하는 단계는 상기 제1 시점에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인하는 단계; 및 상기 입력 음향에서, 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도 보다 높게 설정하는 단계;를 포함할 수 있다.Adjusting at least one of the output characteristic values may include: checking a frequency band to which the speaker's voice speaking at the first time point belongs; And setting, in the input sound, a degree of amplification of the sound in the identified frequency band higher than the amplification degree of the sound in a band other than the identified frequency band.

상기 증폭 음성을 생성하는 단계는 상기 제1 시점에서 발화하는 화자의 음성이 종료되는 시점인 제2 시점을 검출하는 단계; 상기 제2 시점부터 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도와 동일하게 설정하는 단계;를 포함할 수 있다.The generating of the amplified voice may include: detecting a second time point when the speaker's voice speaking at the first time point ends; And setting the amplification degree of the sound in the identified frequency band from the second point in time to be the same as the amplification degree of the sound in a band other than the identified frequency band.

상기 증폭 음성을 출력하는 단계는 상기 입력 음향을 상쇄시키는 상쇄 음향을 출력하는 단계; 및 상기 증폭 음성을 출력하는 단계;를 포함할 수 있다. 이때 상기 상쇄 음향은 상기 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다.The outputting of the amplified sound may include outputting a canceled sound canceling the input sound; And outputting the amplified voice. In this case, the offset sound may be a sound having the same amplitude and opposite phase as the input sound.

본 발명의 일 실시예에 따른 특정 화자의 음성을 선택적으로 증폭하는 음성의 선택적 증폭 시스템에 있어서, 상기 시스템은 사용자의 뇌파를 감지하는 뇌파 감지 장치; 상기 사용자의 청각 기관에 선택적으로 증폭된 음성을 출력하는 음성 출력 유닛; 및 상기 뇌파 감지 장치가 감지한 뇌파에 기초하여 상기 증폭된 음성을 생성하여 상기 음성 출력 유닛에 제공하는 사용자 단말;을 포함할 수 있다. 이때 상기 사용자 단말은 상기 음성 출력 유닛을 통하여 복수의 후보 음성을 시계열적으로 재생하면서, 상기 뇌파 감지 장치가 감지한 사용자의 뇌파에 기초하여 상기 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별하고, 상기 목표 음성의 적어도 하나의 특성값을 추출하고, 상기 특성값에 기초하여 입력 음향에 상기 목표 음성이 포함되어 있는지 여부를 확인하고, 상기 목표 음성이 포함된 경우 상기 목표 음성만을 증폭하여 증폭 음성을 생성하고, 상기 증폭 음성을 상기 음성 출력 유닛이 출력하도록 제어할 수 있다.In the voice selective amplification system for selectively amplifying the voice of a specific speaker according to an embodiment of the present invention, the system comprises: an EEG sensing device for detecting a user's EEG; A voice output unit for selectively outputting amplified voice to the user's auditory organ; And a user terminal generating the amplified voice based on the EEG detected by the EEG sensing device and providing the amplified voice to the voice output unit. At this time, the user terminal, while reproducing a plurality of candidate voices in a time series through the voice output unit, identifies a target voice that the user wants to hear from among the plurality of candidate voices based on the user's EEG detected by the EEG detection device. And extracting at least one feature value of the target voice, checking whether the target voice is included in the input sound based on the feature value, and amplifying by amplifying only the target voice if the target voice is included. A voice may be generated, and the amplified voice may be controlled to be output by the voice output unit.

상기 사용자 단말은 상기 복수의 후보 음성의 청취 중 상기 뇌파의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 상기 목표 음성으로 식별할 수 있다.The user terminal may identify a voice in which at least one of the physical quantities of the brain waves satisfies a predetermined condition while listening to the plurality of candidate voices as the target voice.

상기 물리량은 상기 뇌파의 진동수 및 상기 뇌파의 진폭 중 어느 하나 이상을 포함하고, 상기 소정의 조건은 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진동수가 가장 큰 음성일 조건 및 상기 복수의 후보 음성 중 상기 사용자의 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 포함할 수 있다.The physical quantity includes at least one of a frequency of the brain wave and an amplitude of the brain wave, and the predetermined condition is a condition that the frequency of the user's brain wave is the largest among the plurality of candidate voices, and the plurality of candidate voices. Any one or more of conditions in which the amplitude of the user's EEG is the smallest voice may be included.

상기 사용자 단말은 학습된 인공 신경망을 이용하여, 상기 사용자가 청취하고자 하는 목표 음성을 결정하고, 상기 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.The user terminal uses the learned artificial neural network to determine a target voice that the user wants to listen to, and the artificial neural network is based on learning data including EEG data indicating whether the user wants to hear or not, based on the EEG data. It may be a neural network that has been trained to output whether or not a user wants to listen in response to an input.

상기 입력 음향은 상기 사용자 단말 및 상기 음성 출력 유닛 중 어느 하나가 실시간으로 감지하는 음향으로, 복수의 화자에 의해 발화된 복수의 발화 음성을 포함하고, 상기 사용자 단말은 제1 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인하고, 상기 발화 특성값과 상기 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인하고, 상기 소정의 임계 유사도를 초과하는 경우 상기 음성 출력 유닛의 출력 특성값 중 적어도 하나를 조절할 수 있다.The input sound is a sound detected in real time by any one of the user terminal and the voice output unit, and includes a plurality of spoken voices uttered by a plurality of speakers, and the user terminal Checking the speech characteristic value, which is a characteristic value of the speech, checking whether the similarity between the speech characteristic value and the characteristic value of the target speech exceeds a predetermined threshold similarity, and outputting the speech when it exceeds the predetermined threshold similarity At least one of the output characteristic values of the unit can be adjusted.

상기 사용자 단말은 상기 제1 시점에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인하고, 상기 입력 음향에서 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.The user terminal checks the frequency band to which the speaker's voice spoken at the first time point belongs, and determines the amplification degree of the sound of the identified frequency band in the input sound, and the amplification degree of the sound of the band other than the identified frequency band. It can be set higher.

상기 사용자 단말은 상기 제1 시점에서 발화하는 화자의 음성이 종료되는 시점인 제2 시점을 검출하고, 상기 제2 시점부터 상기 확인된 주파수 대역의 음향의 증폭 정도를 상기 확인된 주파수 대역 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The user terminal detects a second point in time, which is a point in time when the speaker's voice uttering at the first point in time, ends, and determines the amplification degree of the sound in the identified frequency band from the second point in a band other than the identified frequency band. It can be set equal to the level of amplification of the sound of.

상기 사용자 단말은 상기 음성 출력 유닛이 상기 입력 음향을 상쇄시키는 상쇄 음향 및 상기 증폭 음성을 함께 출력하도록 제어할 수 있다. 이때 상기 상쇄 음향은 상기 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다.The user terminal may control the voice output unit to output a canceled sound canceling the input sound and the amplified sound together. In this case, the offset sound may be a sound having the same amplitude and opposite phase as the input sound.

본 발명에 의하면 난청 환자인 사용자가 청취하고자 하는 목표 음성만을 증폭하여 제공함으로써 난청의 재활 치료가 보다 잘 수행될 수 있도록 한다.According to the present invention, by amplifying and providing only a target voice that a user, who is a hearing-impaired patient, wants to hear, rehabilitation treatment for hearing loss can be performed better.

또한 본 발명은 사용자가 청취하고자 하는 목표 음성만을 제공하되, 목표 음성을 제외한 나머지 음성만을 제공함으로써 난청 재활 치료의 효율성을 향상시킬 수 있다.In addition, the present invention provides only the target voice that the user wants to listen to, but provides only the remaining voices excluding the target voice, thereby improving the efficiency of the hearing loss rehabilitation treatment.

도 1은 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템의 구성을 개략적으로 도시한다.
도 2는 본 발명의 일 실시예에 따른 음성 출력 유닛(100)의 구성을 개략적으로 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 사용자 단말(200)의 구성을 개략적으로 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 제어부(212)가 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 선택적 실시예에 따른 제어부(212)가 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.
도 6은 복수의 화자에 의해 발화된 음성을 시간의 흐름에 따라 표시한 도면이다.
도 7은 복수의 화자의 음성이 속하는 주파수 대역을 도시한 도면이다.
도 8은 입력 음향의 각 주파수 대역 별 증폭 정도를 도시한 도면이다.
도 9 내지 도 10은 본 발명의 일 실시예에 따른 사용자 단말(200)에 의해 수행되는 음성의 선택적 증폭 방법을 설명하기위한 흐름도이다.1 schematically shows the configuration of a system for selective amplification of speech according to an embodiment of the present invention.
2 is a diagram schematically showing the configuration of an audio output unit 100 according to an embodiment of the present invention.
3 is a diagram schematically showing the configuration of a user terminal 200 according to an embodiment of the present invention.
4 is a diagram for explaining a process by which the controller 212 identifies a target voice according to an embodiment of the present invention.
5 is a diagram for explaining a process of identifying a target voice by using a learned artificial neural network by the control unit 212 according to an exemplary embodiment of the present invention.
6 is a diagram showing voices uttered by a plurality of speakers over time.
7 is a diagram illustrating a frequency band to which voices of a plurality of speakers belong.
8 is a diagram showing the amplification degree of input sound for each frequency band.
9 to 10 are flowcharts illustrating a method of selectively amplifying voice performed by the user terminal 200 according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이러한 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 본 명세서에 기재되어 있는 특정 형상, 구조 및 특성은 본 발명의 정신과 범위를 벗어나지 않으면서 일 실시예로부터 다른 실시예로 변경되어 구현될 수 있다. 또한, 각각의 실시예 내의 개별 구성요소의 위치 또는 배치도 본 발명의 정신과 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 행하여지는 것이 아니며, 본 발명의 범위는 특허청구범위의 청구항들이 청구하는 범위 및 그와 균등한 모든 범위를 포괄하는 것으로 받아들여져야 한다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 구성요소를 나타낸다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the present invention described below refers to the accompanying drawings, which illustrate specific embodiments in which the present invention may be practiced. These embodiments are described in detail sufficient to enable a person skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different from each other, but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described in the present specification may be changed and implemented from one embodiment to another without departing from the spirit and scope of the present invention. In addition, it should be understood that the position or arrangement of individual elements in each embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the detailed description to be described below is not made in a limiting sense, and the scope of the present invention should be taken as encompassing the scope claimed by the claims of the claims and all scopes equivalent thereto. Like reference numerals in the drawings indicate the same or similar elements over several aspects.

이하에서는, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 여러 실시예에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to enable those of ordinary skill in the art to easily implement the present invention.

도 1은 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템의 구성을 개략적으로 도시한 도면이다. 도 2는 본 발명의 일 실시예에 따른 음성 출력 유닛(100)의 구성을 개략적으로 도시한 도면이다. 도 3은 본 발명의 일 실시예에 따른 사용자 단말(200)의 구성을 개략적으로 도시한 도면이다. 이하에서는 도 1내지 도 3을 함께 참조하여 설명한다.1 is a diagram schematically showing the configuration of a system for selective amplification of speech according to an embodiment of the present invention. 2 is a diagram schematically showing the configuration of an audio output unit 100 according to an embodiment of the present invention. 3 is a diagram schematically showing the configuration of a user terminal 200 according to an embodiment of the present invention. Hereinafter, it will be described with reference to FIGS. 1 to 3 together.

본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템은 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 식별된 목표 음성만을 증폭하여 제공할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 음성의 선택적 증폭 시스템은 음성 출력 유닛(100), 사용자 단말(200) 및 뇌파 감지 장치(300)를 포함할 수 있다.The voice selective amplification system according to an embodiment of the present invention may identify a target voice that the user wants to listen to based on the user's brain waves, and amplify and provide only the identified target voice. To this end, the audio selective amplification system according to an embodiment of the present invention may include a voice output unit 100, a user terminal 200, and an EEG detection device 300.

본 발명의 일 실시예에 따른 음성 출력 유닛(100)은 사용자 단말(200)이 생성하거나 스스로 생성한 제어신호에 기초하여, 사용자의 청취를 위한 소리를 출력하는 다양한 형태의 수단을 의미할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 음성 출력 유닛(100)은 통신부(111), 제어부(112), 음향 입력부(113), 음향 출력부(114) 및 메모리(115)를 포함할 수 있다.The voice output unit 100 according to an embodiment of the present invention may refer to various types of means for outputting sound for the user's listening based on a control signal generated by the user terminal 200 or generated by itself. . To this end, the audio output unit 100 according to an embodiment of the present invention may include a communication unit 111, a control unit 112, an audio input unit 113, an audio output unit 114, and a memory 115.

본 발명의 일 실시예에 따른 통신부(111)는 음성 출력 유닛(100)이 사용자 단말(200) 및/또는 뇌파 감지 장치(300)와 데이터를 송수신하도록 하는 수단을 의미할 수 있다. 가령 통신부(111)는 음향 출력부(114)를 통하여 사용자 단말(200)로부터 사용자의 청각 기관으로 출력하고자 하는 음향 콘텐츠에 대한 데이터를 수신할 수 있다. 또한 통신부(111)는 음향 입력부(113)가 획득한 음향에 대한 데이터를 사용자 단말(200)에 제공할 수도 있다. 다만 이는 예시적인 것으로 통신부(111)의 역할이 이에 한정되는 것은 아니며, 가령 통신부(111)는 음성 출력 유닛(100)에 대한 사용자의 조작 신호 등을 사용자 단말(200)에 제공할 수도 있다.The communication unit 111 according to an embodiment of the present invention may mean a means for allowing the voice output unit 100 to transmit and receive data with the user terminal 200 and/or the brainwave sensing device 300. For example, the communication unit 111 may receive data on sound content to be output to the user's auditory organ from the user terminal 200 through the sound output unit 114. In addition, the communication unit 111 may provide data on the sound acquired by the sound input unit 113 to the user terminal 200. However, this is exemplary, and the role of the communication unit 111 is not limited thereto. For example, the communication unit 111 may provide a user's manipulation signal for the voice output unit 100 to the user terminal 200.

한편 본 발명의 일 실시예에 따른 통신부(111)는 데이터의 송수신을 위해 유선 또는 무선으로 사용자 단말(200)과 연결될 수 있다. 이때 유선 또는 무선 연결 방식에는 다양한 공지의 방식이 사용될 수 있으므로, 구체적인 방식의 열거는 생략한다.Meanwhile, the communication unit 111 according to an embodiment of the present invention may be connected to the user terminal 200 by wire or wirelessly to transmit and receive data. At this time, since various well-known methods may be used for the wired or wireless connection method, a detailed list of methods will be omitted.

본 발명의 일 실시예에 따른 제어부(112)는 소정의 알고리즘에 따라 음성 출력 유닛(100)의 구성요소들, 즉 통신부(111), 음향 입력부(113), 음향 출력부(114) 및 메모리(115)의 제어를 수행할 수 있다. 가령 제어부(112)는 통신부(111)를 통하여 음향 입력부(113)가 획득한 입력 음향을 사용자 단말(200)로 전송하고, 사용자 단말(200)이 수신된 입력 음향에 기초하여 생성한 증폭 음성을 통신부(111)를 통하여 수신할 수 있다. 또한 제어부(112)는 수신된 증폭 음성을 음향 출력부(114)를 통하여 소리 형태로 출력할 수 있다.According to an embodiment of the present invention, the control unit 112 includes components of the audio output unit 100, that is, the communication unit 111, the sound input unit 113, the sound output unit 114, and the memory ( 115) can be performed. For example, the control unit 112 transmits the input sound acquired by the sound input unit 113 through the communication unit 111 to the user terminal 200, and transmits the amplified sound generated by the user terminal 200 based on the received input sound. It can be received through the communication unit 111. In addition, the control unit 112 may output the received amplified voice in a sound form through the sound output unit 114.

본 발명의 일 실시예에 따른 음향 입력부(113)는 사용자 주변의 소리를 전기신호 형태로 변환하는 다양한 종류의 수단을 의미할 수 있으며, 이와 반대로 음향 출력부(114)는 전기신호를 소리의 형태로 출력하는 다양한 종류의 수단을 의미할 수 있다. The sound input unit 113 according to an embodiment of the present invention may refer to various types of means for converting sounds around the user into an electric signal form, and on the contrary, the sound output unit 114 converts the electric signal into a sound form. It can mean various kinds of means for outputting to.

가령 음향 출력부(114)는 사용자 단말(200)이 생성한 증폭 음성을 소리의 형태로 출력하여 사용자의 청각 기관에 제공할 수 있다. 또한 음향 출력부(114)는 사용자 단말(200)이 제공하는 다양한 종류의 음향 콘텐츠를 소리의 형태로 출력하여 사용자의 청각 기관에 제공할 수도 있다.For example, the sound output unit 114 may output the amplified voice generated by the user terminal 200 in the form of sound and provide it to the user's auditory organ. In addition, the sound output unit 114 may output various types of sound content provided by the user terminal 200 in the form of sound and provide it to the user's auditory organ.

본 발명에서 설명하는 음성의 선택적 증폭 방법이 음성 출력 유닛(100)에 의해 수행되는 선택적 실시예에서, 음향 출력부(114)는 제어부(112)가 생성한 증폭 음성을 소리의 형태로 출력할 수도 있다. In an optional embodiment in which the method for selectively amplifying the voice described in the present invention is performed by the voice output unit 100, the sound output unit 114 may output the amplified voice generated by the control unit 112 in the form of sound. have.

본 발명의 일 실시예에 따른 메모리(115)는 제어부(112)에 의해 수행되는 동작들에 대한 프로그램이 저장하는 수단을 의미할 수 있다.The memory 115 according to an embodiment of the present invention may mean a means for storing a program for operations performed by the controller 112.

음성의 선택적 증폭 방법이 음성 출력 유닛(100)에 의해 수행되는 선택적 실시예에서, 메모리(115)는 사용자가 청취하고자 하는 목표 음성 및/또는 목표 음성의 특성값을 저장하는 수단을 의미할 수도 있다.In an optional embodiment in which the voice selective amplification method is performed by the voice output unit 100, the memory 115 may mean a means for storing a target voice and/or a characteristic value of the target voice that the user wants to hear. .

이와 같은 메모리(115)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(Random Access Memory), ROM(Read Only Memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(Permanent Mass Storage Device)를 포함할 수 있다. Such a memory 115 is a computer-readable recording medium, and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 사용자의 뇌파를 감지하여 사용자 단말(200) 및/또는 음성 출력 유닛(100)에 제공하는 수단을 의미할 수 있다. The EEG detection apparatus 300 according to an embodiment of the present invention may mean a means for detecting a user's EEG and providing it to the user terminal 200 and/or the voice output unit 100.

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 다양한 공지의 기법으로 구현될 수 있다. 가령 뇌파 감지 장치(300)는 도 1에 도시된 바와 같이 비침습형 방식으로 구현될 수도 있고, 침습형 방식으로 구현될 수도 있다. 또한 뇌파 감지 장치(300)는 뇌파 유도 방식으로 구현될 수도 있고, 뇌파 인식 방식으로 구현될 수도 있다. 다만 상술한 방식들은 예시적인 것으로 본 발명의 사상이 이에 한정되는 것은 아니며, 사용자의 뇌파를 감지할 수 있는 수단이면 본 발명의 뇌파 감지 장치(300)로 사용될 수 있다.The brain wave sensing device 300 according to an embodiment of the present invention may be implemented using various known techniques. For example, the EEG sensing device 300 may be implemented in a non-invasive manner or in an invasive manner, as shown in FIG. 1. In addition, the EEG sensing device 300 may be implemented in an EEG induction method or an EEG recognition method. However, the above-described methods are exemplary, and the spirit of the present invention is not limited thereto, and any means capable of detecting a user's EEG may be used as the EEG detection device 300 of the present invention.

본 발명의 일 실시예에 따른 뇌파 감지 장치(300)는 사용자 단말(200) 및/또는 음성 출력 유닛(100)과 유선 및/또는 무선으로 연결될 수 있으며, 구체적인 통신 방식의 열거는 생략한다.The brainwave sensing apparatus 300 according to an embodiment of the present invention may be connected to the user terminal 200 and/or the voice output unit 100 by wire and/or wirelessly, and a detailed list of communication methods is omitted.

본 발명의 일 실시예에 따른 사용자 단말(200)은 뇌파 감지 장치(300)가 감지한 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 입력 음향에서 식별된 목표 음성만을 증폭하여 음성 출력 유닛(100)에 제공할 수 있다. 이를 위하여 본 발명의 일 실시예에 따른 사용자 단말(200)은 통신부(211), 제어부(212), 메모리(213) 및 디스플레이부(214)를 포함할 수 있다.The user terminal 200 according to an embodiment of the present invention identifies a target voice that the user wants to hear based on the user's EEG detected by the EEG sensing device 300, and amplifies only the target voice identified from the input sound. It can be provided to the audio output unit 100. To this end, the user terminal 200 according to an embodiment of the present invention may include a communication unit 211, a control unit 212, a memory 213 and a display unit 214.

본 발명의 일 실시예에 따른 통신부(211)는 상술한 음성 출력 유닛(100) 및 뇌파 감지 장치(300)와 데이터를 송수신하기 위한 수단을 의미할 수 있다. 가령 통신부(211)는 뇌파 감지 장치(300)로부터 감지된 뇌파 데이터를 수신할 수 있다. 또한 통신부(211)는 생성된 증폭 음성을 음성 출력 유닛(100)에 제공할 수도 있다.The communication unit 211 according to an embodiment of the present invention may mean a means for transmitting/receiving data with the voice output unit 100 and the brainwave sensing device 300 described above. For example, the communication unit 211 may receive EEG data sensed from the EEG detection device 300. In addition, the communication unit 211 may provide the generated amplified voice to the voice output unit 100.

한편 본 발명의 일 실시예에 따른 통신부(211)는 데이터의 송수신을 위해 유선 또는 무선으로 시스템의 구성요소들과 연결될 수 있다. 이때 유선 또는 무선 연결 방식에는 다양한 공지의 방식이 사용될 수 있으므로, 구체적인 방식의 열거는 생략한다.Meanwhile, the communication unit 211 according to an embodiment of the present invention may be connected to the components of the system by wire or wirelessly to transmit and receive data. At this time, since various well-known methods may be used for the wired or wireless connection method, a detailed list of methods will be omitted.

본 발명의 일 실시예에 따른 제어부(212)는 사용자의 뇌파에 기초하여 사용자가 청취하고자 하는 목표 음성을 식별하고, 입력 음향에서 식별된 목표 음성만을 증폭하여 음성 출력 유닛(100)에 제공할 수 있다.The controller 212 according to an embodiment of the present invention may identify a target voice that the user wants to listen to based on the user's brain waves, amplify only the target voice identified from the input sound, and provide it to the voice output unit 100. have.

이와 같은 제어부(212)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(213) 또는 통신부(211)에 의해 프로세서로 제공될 수 있다. 예를 들어 제어부(212)는 메모리(213)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The controller 212 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. The command may be provided to the processor by the memory 213 or the communication unit 211. For example, the controller 212 may be configured to execute a command received according to a program code stored in a recording device such as the memory 213.

본 발명의 일 실시예에 따른 메모리(213)는 제어부(212)에 의해 수행되는 동작들에 대한 프로그램이 저장하는 수단을 의미할 수 있다.The memory 213 according to an embodiment of the present invention may mean a means for storing a program for operations performed by the controller 212.

음성의 선택적 증폭 방법이 사용자 단말(200)에 의해 수행되는 실시예에서, 메모리(213)는 사용자가 청취하고자 하는 목표 음성 및/또는 목표 음성의 특성값을 저장하는 수단을 의미할 수도 있다.In an embodiment in which the voice selective amplification method is performed by the user terminal 200, the memory 213 may mean a means for storing a target voice and/or a characteristic value of the target voice that the user wants to listen to.

이와 같은 메모리(213)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(Random Access Memory), ROM(Read Only Memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(Permanent Mass Storage Device)를 포함할 수 있다. Such a memory 213 is a computer-readable recording medium, and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive.

본 발명의 일 실시예에 따른 디스플레이부(214)는 음성의 선택적 증폭 과정 중 사용자의 확인이 필요한 정보 등을 표시하는 수단을 의미할 수 있다. 가령 디스플레이부(214)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 다만 이는 예시적인 것으로 본 발명의 사상이 이에 한정되는 것은 아니다.The display unit 214 according to an embodiment of the present invention may refer to a means for displaying information that needs to be confirmed by a user during the selective amplification process of voice. For example, the display unit 214 may be a means for interfacing with a device in which input and output functions are integrated into one, such as a touch screen. However, this is merely an example and the spirit of the present invention is not limited thereto.

또한, 다른 실시예에서 사용자 단말(200)은 전술한 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 이와 같은 사용자 단말(200)은 휴대용 단말(201, 202, 203)일 수도 있고 퍼스널 컴퓨터(204)일 수도 있다. In addition, in another embodiment, the user terminal 200 may include more components than the above-described components. Such a user terminal 200 may be a portable terminal 201, 202, 203 or a personal computer 204.

본 발명에 따른 음성의 선택적 증폭 방법은 상술한 기재에서 선택적으로 기재된 바와 같이, 사용자 단말(200)에 의해 수행될 수도 있고, 음성 출력 유닛(100)에 의해 수행될 수도 있다. The selective amplification method of the voice according to the present invention may be performed by the user terminal 200 or may be performed by the voice output unit 100, as selectively described in the above description.

다만 이하에서는 설명의 편의를 위하여 음성의 선택적 증폭 방법이 사용자 단말(200)에 의해 수행됨을 전제로, 사용자 단말(200)의 제어부(212)의 동작을 중심으로 설명한다.However, in the following, for convenience of explanation, the operation of the controller 212 of the user terminal 200 will be mainly described on the premise that the selective amplification method of the voice is performed by the user terminal 200.

본 발명의 일 실시예에 따른 사용자 단말(200)의 제어부(212)는 음성 출력 유닛(100)을 통하여 복수의 후보 음성(즉 복수의 후보 음성에 따른 음향 콘텐츠)을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 재생된 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별할 수 있다. 이때 사용자의 뇌파는 뇌파 감지 장치(300)가 획득하여 사용자 단말(200)에 제공한 것 일 수 있다. 본 발명에서 때때로 '음성'은 화자의 '음성 그 자체'를 의미할 수도 있고, 화자의 '음성에 따른 음향 콘텐츠'를 의미할 수 있다.The control unit 212 of the user terminal 200 according to an embodiment of the present invention reproduces a plurality of candidate voices (that is, sound contents according to a plurality of candidate voices) through the voice output unit 100 in time series, while the user It is possible to identify a target voice that the user wants to listen to among a plurality of candidate voices reproduced based on the EEG. In this case, the user's EEG may be obtained by the EEG sensing device 300 and provided to the user terminal 200. In the present invention, sometimes'speech' may mean'speech itself' of the speaker, or'sound contents according to the voice' of the speaker.

도 4는 본 발명의 일 실시예에 따른 제어부(212)가 목표 음성을 식별하는 과정을 설명하기 위한 도면이다.4 is a diagram for explaining a process by which the controller 212 identifies a target voice according to an embodiment of the present invention.

설명의 편의를 위해, 도 4에 도시된 바와 같이 4 명의 화자의 음성(411, 412, 413, 414)을 포함하는 복수의 후보 음성(410)이 음성 출력 유닛(100)을 통하여 출력되었고, 복수의 후보 음성(410)을 청취하는 과정 중 각각의 화자의 음성에 대한 사용자의 뇌파(511, 512, 513, 514)가 도시된 바와 같음을 전제로 설명한다.For convenience of explanation, as shown in FIG. 4, a plurality of candidate voices 410 including voices 411, 412, 413, and 414 of four speakers were output through the voice output unit 100, and a plurality of In the process of listening to the candidate voice 410 of, the user's brain waves 511, 512, 513, and 514 for each speaker's voice will be described on the premise that they are as shown.

본 발명의 일 실시예에 따른 제어부(212)는 복수의 후보 음성(410)의 청취 중 뇌파(510)의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 제어부(212)는 뇌파(510)의 진동수 및/또는 진폭이 소정의 조건을 만족하는지 여부에 기초하여 목표 음성을 식별할 수 있다.The control unit 212 according to an embodiment of the present invention may identify a voice in which at least one of the physical quantities of the EEG 510 satisfies a predetermined condition while listening to the plurality of candidate voices 410 as the target voice. For example, the controller 212 may identify a target voice based on whether the frequency and/or amplitude of the brain wave 510 satisfies a predetermined condition.

본 발명의 일 실시예에 따른 제어부(212)는 복수의 후보 음성(410) 중 뇌파의 진동수가 가장 큰 음성일 조건 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 도 4에서 제어부(212)는 4 명의 화자의 음성(411, 412, 413, 414) 중에서 진동수가 가장 크고 진폭이 가장 작은 뇌파(512)를 나타내는 음성인 화자 2의 음성(412)을 목표 음성으로 식별할 수 있다.The control unit 212 according to an embodiment of the present invention targets a voice that satisfies any one or more of the conditions that the EEG has the highest frequency among the plurality of candidate voices 410 and the voice has the lowest amplitude of the EEG. Can be identified as. For example, in FIG. 4, the control unit 212 targets the voice of speaker 2, which is the voice representing the brainwave 512 having the largest frequency and the smallest amplitude among the voices 411, 412, 413, and 414 of four speakers. Can be identified as.

선택적 실시예에서, 제어부(212)는 학습된 인공 신경망을 이용하여 사용자가 청취하고자 하는 목표 음성을 결정할 수 있다. 이때 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여 학습된 것으로, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.In an optional embodiment, the controller 212 may determine a target voice that the user wants to listen to by using the learned artificial neural network. In this case, the artificial neural network is learned based on learning data including EEG data indicating whether the user desires to hear or not, and may be a neural network that has been trained to output whether or not the user desires to listen in response to input of the EEG data.

도 5는 본 발명의 선택적 실시예에 따른 제어부(212)가 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다. 설명의 편의를 위하여, 도 4과 동일한 가정을 전제로 설명한다.5 is a diagram for explaining a process of identifying a target voice by using a learned artificial neural network by the control unit 212 according to an exemplary embodiment of the present invention. For convenience of explanation, the same assumptions as in FIG. 4 are assumed.

본 발명의 일 실시예에 따른 제어부(212)는 각 4 명의 화자의 음성(411, 412, 413, 414)에 대한 뇌파(511, 512, 513, 514) 각각을 학습된 인공 신경망(610)에 입력하여, 각 뇌파(511, 512, 513, 514) 별 목표 음성 여부(621, 622, 623, 624)를 확인할 수 있다. The control unit 212 according to an embodiment of the present invention transmits the brain waves 511, 512, 513, and 514 for the voices 411, 412, 413, and 414 of each of the four speakers to the learned artificial neural network 610. By inputting, it is possible to check whether the target voice or not (621, 622, 623, 624) for each brain wave (511, 512, 513, 514).

이때 인공 신경망(610)은 학습 방식에 따라서 도 5에 도시된 바와 같이 바이너리(Binary) 방식으로 목표 음성인지 여부(620)를 출력할 수도 있고, 목표 음성일 확률을 출력할 수도 있다. 인공 신경망(610)이 목표 음성일 확률을 출력하도록 학습되는 실시예에서, 제어부(212)는 확률이 소정의 조건을 만족하는 음성을 목표 음성으로 결정할 수도 있다.In this case, the artificial neural network 610 may output whether or not the target voice is a target voice 620 or may output a probability of the target voice as shown in FIG. 5 according to the learning method. In an embodiment in which the artificial neural network 610 is trained to output a probability of a target voice, the controller 212 may determine a voice whose probability satisfies a predetermined condition as the target voice.

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해서 결정된 목표 음성의 적어도 하나의 특성값을 추출할 수 있다. 이때 특성값은 가령 목표 음성의 화자의 성별, 목표 음성의 음고, 목표 음성의 진동수 및 목표 음성의 발화 패턴 중 적어도 하나일 수 있다. 다만 이와 같은 특성값은 예시적인 것으로, 목표 음성의 특성을 계량하여 나타낼 수 있는 방식이면 본 발명의 특성값 추출 방식으로 사용될 수 있다. 선택적 실시예에서 제어부(212)는 목표 음성의 특성값을 수 차원의 벡터(Vector) 형태로 추출할 수도 있다. The controller 212 according to an embodiment of the present invention may extract at least one characteristic value of the target voice determined by the above-described process. In this case, the characteristic value may be at least one of, for example, the gender of the speaker of the target voice, the pitch of the target voice, the frequency of the target voice, and the speech pattern of the target voice. However, such characteristic values are exemplary, and any method capable of quantifying and expressing the characteristics of the target voice may be used as the characteristic value extraction method of the present invention. In an optional embodiment, the control unit 212 may extract the characteristic value of the target speech in the form of a vector in a number dimension.

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해 추출된 목표 음성의 특성값을 이용하여, 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인할 수 있다. 또한 제어부(212)는 입력 음향에 목표 음성이 포함된 경우 목표 음성만을 증폭하여 증폭 음성을 생성할 수 있다.The controller 212 according to an embodiment of the present invention may check whether the target voice is included in the input sound by using the characteristic value of the target voice extracted by the above-described process. In addition, when the input sound includes the target sound, the controller 212 may amplify only the target sound to generate an amplified sound.

이때 '입력 음향'은 실시간으로 입력되는 음향으로 가령 사용자 단말(200)에 의해 감지되거나, 음성 출력 유닛(100)의 음향 입력부(113)에 의해 감지된 것 일 수 있다. 한편 입력 음향은 도 6에 도시된 바와 같이 복수의 화자에 의해 발화된 발화 음성을 포함할 수 있다. 가령 입력 음향은 도 6에 도시된 바와 같이 3 명의 화자에 의해 발화된 음성을 포함할 수 있고, 서로 발화 시구간을 달리하는 음성들(S1_V1, S2_V1, S3_V1, S1_V2)을 포함할 수 있다.In this case, the'input sound' is a sound input in real time and may be detected by, for example, the user terminal 200 or the sound input unit 113 of the audio output unit 100. Meanwhile, as shown in FIG. 6, the input sound may include spoken voices uttered by a plurality of speakers. For example, the input sound may include voices uttered by three speakers as shown in FIG. 6, and may include voices S1_V1, S2_V1, S3_V1, and S1_V2 having different utterance time periods.

본 발명의 일 실시예에 따른 제어부(212)는 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인하기 위해, 복수의 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인할 수 있다. 가령 제어부(212)는 t1 시점에서 음성(S2_V1)의 발화 특성값을 확인할 수 있다. The control unit 212 according to an embodiment of the present invention may check a speech characteristic value, which is a characteristic value of a speaker's speech spoken at a plurality of viewpoints, in order to determine whether the target speech is included in the input sound. For example, the controller 212 may check the speech characteristic value of the voice S2_V1 at the time point t1.

또한 제어부(212)는 확인된 발화 특성값과 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인할 수 있다. 제어부(212)는 확인된 유사도가 소정의 임계 유사도를 초과하는 경우 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다. In addition, the controller 212 may check whether the similarity between the identified speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity. The controller 212 may adjust at least one of the output characteristic values of the amplified voice when the confirmed similarity exceeds a predetermined threshold similarity.

가령 도 4 및 도 5에서 설명한 과정에 따라 화자 2의 음성(412)이 사용자의 목표 음성으로 식별되었고, 제1 시점(t1) 내지 제2 시점(t2)에서 화자 2의 발화가 발생한 경우, 제어부(212)는 음성(412)과 음성(S2_V1)의 유사도가 임계 유사도를 초과하는 것으로 판단할 수 있다. 물론 제어부(212)는 판단된 유사도가 임계 유사도를 초과함에 따라 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.For example, if speaker 2's voice 412 is identified as the user's target voice according to the process described in FIGS. 4 and 5, and speaker 2's utterance occurs at the first time point t1 to the second time point t2, the controller 212 may determine that the similarity between the voice 412 and the voice S2_V1 exceeds the threshold similarity. Of course, the controller 212 may adjust at least one of the output characteristic values of the amplified voice as the determined similarity exceeds the threshold similarity.

본 발명의 일 실시예에 따른 제어부(212)가 증폭 음성의 출력 특성값을 조절하는 과정을 자세히 살펴보면, 제어부(212)는 제1 시점(t1)에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인할 수 있다. 가령 제어부(212)는 도 7에 도시된 바와 같이 제1 시점(t1)에서 발화하는 화자인 화자 2의 음성이 속하는 주파수 대역(S2_f, 즉 f1 내지 f2)을 확인할 수 있다. 물론 제어부(212)는 다른 시점에서 발화하는 화자들의 음성이 속하는 주파수 대역을 확인할 수도 있다. 가령 제어부(212)는 화자 1의 음성이 속하는 주파수 대역(S1_f)과 화자 3의 음성이 속하는 주파수 대역(S3_f)을 확인할 수도 있다.Looking in detail at the process in which the control unit 212 adjusts the output characteristic value of the amplified voice according to an embodiment of the present invention, the control unit 212 checks the frequency band to which the speaker's voice speaking at the first time point t1 belongs. I can. For example, as shown in FIG. 7, the controller 212 may check the frequency band S2_f (ie, f1 to f2) to which the voice of the speaker 2, which is the speaker speaking at the first time point t1, belongs. Of course, the controller 212 may check the frequency band to which the voices of speakers speaking at different times belong. For example, the controller 212 may check the frequency band S1_f to which the speaker 1's voice belongs and the frequency band S3_f to which the speaker 3's voice belongs.

이어서 제어부(212)는 도 8에 도시된 바와 같이 입력 음향에서 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.Subsequently, as shown in FIG. 8, the control unit 212 may set the degree of amplification of the sound of the frequency band S2_f identified in the input sound to be higher than the amplification degree of the sound of the band other than the identified frequency band S2_f.

본 발명의 일 실시예에 따른 제어부(212)는 제1 시점(t1)과 제2 시점(t2) 내에서만 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다. 이때 제1 시점(t1)은 화자(특히 목표 음성의 화자)의 발화가 시작된 시점을 의미할 수 있고, 제2 시점(t2)은 제1 시점(t1)에서 발화를 시작한 화자의 음성(또는 발화)이 종료되는 시점을 의미할 수 있다.The control unit 212 according to an embodiment of the present invention determines the degree of amplification of the sound in the frequency band S2_f identified only within the first time point t1 and the second time point t2. It can be set higher than the amplification level of the sound of the band. At this time, the first time point (t1) may mean the time point at which the speaker (especially the speaker of the target voice) starts to speak, and the second time point (t2) is the voice (or speech) of the speaker who started speaking at the first time point (t1). ) May mean the end point.

본 발명의 일 실시예에 따른 제어부(212)는 제2 시점(t2)부터는 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The control unit 212 according to an embodiment of the present invention determines the amplification degree of the sound in the identified frequency band S2_f from the second point in time t2 equal to the amplification degree of the sound in a band other than the identified frequency band S2_f. Can be set.

본 발명의 일 실시예에 따른 제어부(212)는 상술한 과정에 의해 생성된 증폭 음성을 출력할 수 있다. 가령 제어부(212)는 음성 출력 유닛(100)을 통하여 생성된 증폭 음성을 출력할 수 있다.The controller 212 according to an embodiment of the present invention may output an amplified voice generated by the above-described process. For example, the controller 212 may output an amplified voice generated through the voice output unit 100.

선택적 실시예에서, 제어부(212)는 입력 음향을 상쇄시키는 상쇄 음향과 함께 증폭 음성을 출력할 수 있다. 이때 상쇄 음향은 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다. 이로써 본 발명은 증폭된 음성만 또렷하게 사용자에게 제공할 수 있다.In an optional embodiment, the control unit 212 may output an amplified voice together with a canceling sound that cancels the input sound. In this case, the offset sound may be a sound having the same amplitude as the input sound and the opposite phase. Accordingly, the present invention can clearly provide only the amplified voice to the user.

도 9 내지 도 10은 본 발명의 일 실시예에 따른 사용자 단말(200)에 의해 수행되는 음성의 선택적 증폭 방법을 설명하기위한 흐름도이다. 이하에서는 도 1 내지 도 8에서 설명한 내용과 중복되는 내용의 설명은 생략하되, 도 1 내지 도 8을 함께 참조하여 설명한다.9 to 10 are flowcharts illustrating a method of selectively amplifying voice performed by the user terminal 200 according to an embodiment of the present invention. Hereinafter, descriptions of contents overlapping with those described in FIGS. 1 to 8 will be omitted, but will be described with reference to FIGS. 1 to 8 together.

본 발명의 일 실시예에 따른 사용자 단말(200)은 음성 출력 유닛(100)을 통하여 복수의 후보 음성(즉 복수의 후보 음성에 따른 음향 콘텐츠)을 시계열적으로 재생하면서, 사용자의 뇌파에 기초하여 재생된 복수의 후보 음성 중 사용자가 청취하고자 하는 목표 음성을 식별할 수 있다.(S910) 이때 사용자의 뇌파는 뇌파 감지 장치(300)가 획득하여 사용자 단말(200)에 제공한 것 일 수 있다. 본 발명에서 때때로 '음성'은 화자의 '음성 그 자체'를 의미할 수도 있고, 화자의 '음성에 따른 음향 콘텐츠'를 의미할 수 있다.The user terminal 200 according to an embodiment of the present invention reproduces a plurality of candidate voices (i.e., acoustic contents according to a plurality of candidate voices) through the voice output unit 100 in time series, and based on the user's brain waves. Among the plurality of reproduced candidate voices, a target voice that the user wants to hear may be identified (S910). At this time, the user's EEG may be obtained by the EEG sensing device 300 and provided to the user terminal 200. In the present invention, sometimes'speech' may mean'speech itself' of the speaker, or'sound contents according to the voice' of the speaker.

다시 도 4를 참조하여, 본 발명의 일 실시예에 따른 사용자 단말(200)이 목표 음성을 식별하는 과정을 설명한다. 설명의 편의를 위해, 도 4에 도시된 바와 같이 4 명의 화자의 음성(411, 412, 413, 414)을 포함하는 복수의 후보 음성(410)이 음성 출력 유닛(100)을 통하여 출력되었고, 복수의 후보 음성(410)을 청취하는 과정 중 각각의 화자의 음성에 대한 사용자의 뇌파(511, 512, 513, 514)가 도시된 바와 같음을 전제로 설명한다.Referring back to FIG. 4, a process of identifying a target voice by the user terminal 200 according to an embodiment of the present invention will be described. For convenience of explanation, as shown in FIG. 4, a plurality of candidate voices 410 including voices 411, 412, 413, and 414 of four speakers were output through the voice output unit 100, and a plurality of In the process of listening to the candidate voice 410 of, the user's brain waves 511, 512, 513, and 514 for each speaker's voice will be described on the premise that they are as shown.

본 발명의 일 실시예에 따른 사용자 단말(200)은 복수의 후보 음성(410)의 청취 중 뇌파(510)의 물리량 중 적어도 하나가 소정의 조건을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 사용자 단말(200)은 뇌파(510)의 진동수 및/또는 진폭이 소정의 조건을 만족하는지 여부에 기초하여 목표 음성을 식별할 수 있다.The user terminal 200 according to an embodiment of the present invention may identify a voice in which at least one of the physical quantities of the EEG 510 satisfies a predetermined condition while listening to the plurality of candidate voices 410 as the target voice. For example, the user terminal 200 may identify a target voice based on whether the frequency and/or amplitude of the brain wave 510 satisfies a predetermined condition.

본 발명의 일 실시예에 따른 사용자 단말(200)은 복수의 후보 음성(410) 중 뇌파의 진동수가 가장 큰 음성일 조건 뇌파의 진폭이 가장 작은 음성일 조건 중 어느 하나 이상을 만족하는 음성을 목표 음성으로 식별할 수 있다. 가령 도 4에서 사용자 단말(200)은 4 명의 화자의 음성(411, 412, 413, 414) 중에서 진동수가 가장 크고 진폭이 가장 작은 뇌파(512)를 나타내는 음성인 화자 2의 음성(412)을 목표 음성으로 식별할 수 있다.The user terminal 200 according to an embodiment of the present invention targets a voice that satisfies any one or more of the conditions that the EEG frequency is the largest among the plurality of candidate voices 410. Can be identified by voice. For example, in FIG. 4, the user terminal 200 targets the speaker 2's voice 412, which is the voice representing the brainwave 512 with the largest frequency and the smallest amplitude among the voices 411, 412, 413, and 414 of four speakers. Can be identified by voice.

선택적 실시예에서, 사용자 단말(200)은 학습된 인공 신경망을 이용하여 사용자가 청취하고자 하는 목표 음성을 결정할 수 있다. 이때 인공 신경망은 사용자의 청취 희망 여부가 표지 된 뇌파 데이터를 포함하는 학습 데이터에 기초하여 학습된 것으로, 뇌파 데이터의 입력에 대응하여 사용자의 청취 희망 여부를 출력하도록 학습된 신경망일 수 있다.In an optional embodiment, the user terminal 200 may determine a target voice that the user wants to listen to by using the learned artificial neural network. In this case, the artificial neural network is learned based on learning data including EEG data indicating whether the user desires to hear or not, and may be a neural network that has been trained to output whether or not the user desires to listen in response to input of the EEG data.

도 5는 본 발명의 선택적 실시예에 따른 사용자 단말(200)이 학습된 인공 신경망을 이용하여 목표 음성을 식별하는 과정을 설명하기 위한 도면이다. 설명의 편의를 위하여, 도 4과 동일한 가정을 전제로 설명한다.5 is a diagram illustrating a process of identifying a target voice by using a learned artificial neural network by the user terminal 200 according to an exemplary embodiment of the present invention. For convenience of explanation, the same assumptions as in FIG. 4 are assumed.

본 발명의 일 실시예에 따른 사용자 단말(200)은 각 4 명의 화자의 음성(411, 412, 413, 414)에 대한 뇌파(511, 512, 513, 514) 각각을 학습된 인공 신경망(610)에 입력하여, 각 뇌파(511, 512, 513, 514) 별 목표 음성 여부(621, 622, 623, 624)를 확인할 수 있다. The user terminal 200 according to an embodiment of the present invention is an artificial neural network 610 that learns each of the brain waves 511, 512, 513, and 514 for the voices 411, 412, 413, and 414 of each of the four speakers. By inputting to, it is possible to check whether the target voice or not (621, 622, 623, 624) for each brain wave (511, 512, 513, 514).

이때 인공 신경망(610)은 학습 방식에 따라서 도 5에 도시된 바와 같이 바이너리(Binary) 방식으로 목표 음성인지 여부(620)를 출력할 수도 있고, 목표 음성일 확률을 출력할 수도 있다. 인공 신경망(610)이 목표 음성일 확률을 출력하도록 학습되는 실시예에서, 사용자 단말(200)은 확률이 소정의 조건을 만족하는 음성을 목표 음성으로 결정할 수도 있다.In this case, the artificial neural network 610 may output whether or not the target voice is a target voice 620 or may output a probability of the target voice as shown in FIG. 5 according to the learning method. In an embodiment in which the artificial neural network 610 is trained to output a probability of a target voice, the user terminal 200 may determine a voice whose probability satisfies a predetermined condition as the target voice.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해서 결정된 목표 음성의 적어도 하나의 특성값을 추출할 수 있다.(S920) 이때 특성값은 가령 목표 음성의 화자의 성별, 목표 음성의 음고, 목표 음성의 진동수 및 목표 음성의 발화 패턴 중 적어도 하나일 수 있다. 다만 이와 같은 특성값은 예시적인 것으로, 목표 음성의 특성을 계량하여 나타낼 수 있는 방식이면 본 발명의 특성값 추출 방식으로 사용될 수 있다. 선택적 실시예에서 사용자 단말(200)은 목표 음성의 특성값을 수 차원의 벡터(Vector) 형태로 추출할 수도 있다. The user terminal 200 according to an embodiment of the present invention may extract at least one characteristic value of the target speech determined by the above-described process (S920). At this time, the characteristic value is, for example, the gender of the speaker of the target speech and the target target. It may be at least one of a pitch of a voice, a frequency of a target voice, and a speech pattern of a target voice. However, such characteristic values are exemplary, and any method capable of quantifying and expressing the characteristics of the target voice may be used as the characteristic value extraction method of the present invention. In an optional embodiment, the user terminal 200 may extract the characteristic value of the target voice in the form of a vector of several dimensions.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해 추출된 목표 음성의 특성값을 이용하여, 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인할 수 있다. 또한 사용자 단말(200)은 입력 음향에 목표 음성이 포함된 경우 목표 음성만을 증폭하여 증폭 음성을 생성할 수 있다.(S930)The user terminal 200 according to an embodiment of the present invention may check whether the target voice is included in the input sound by using the characteristic value of the target voice extracted by the above-described process. In addition, when the target voice is included in the input sound, the user terminal 200 may generate an amplified voice by amplifying only the target voice (S930).

본 발명의 일 실시예에 따른 사용자 단말(200)은 입력 음향에 목표 음성이 포함되어 있는지 여부를 확인하기 위해, 복수의 시점에서 발화하는 화자의 음성의 특성값인 발화 특성값을 확인할 수 있다.(S931) 가령 사용자 단말(200)은 t1 시점에서 음성(S2_V1)의 발화 특성값을 확인할 수 있다. The user terminal 200 according to an embodiment of the present invention may check a speech characteristic value, which is a characteristic value of a speaker's speech spoken at a plurality of viewpoints, in order to determine whether the target speech is included in the input sound. (S931) For example, the user terminal 200 may check the speech characteristic value of the voice S2_V1 at time t1.

또한 사용자 단말(200)은 확인된 발화 특성값과 목표 음성의 특성값의 유사도가 소정의 임계 유사도를 초과하는지 여부를 확인할 수 있다.(S932) 사용자 단말(200)은 확인된 유사도가 소정의 임계 유사도를 초과하는 경우 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.(S933) In addition, the user terminal 200 may check whether the similarity between the identified speech characteristic value and the characteristic value of the target voice exceeds a predetermined threshold similarity. (S932) The user terminal 200 may check whether the confirmed similarity degree exceeds a predetermined threshold similarity. When the similarity is exceeded, at least one of the output characteristic values of the amplified voice may be adjusted (S933).

가령 도 4 및 도 5에서 설명한 과정에 따라 화자 2의 음성(412)이 사용자의 목표 음성으로 식별되었고, 제1 시점(t1) 내지 제2 시점(t2)에서 화자 2의 발화가 발생한 경우, 사용자 단말(200)은 음성(412)과 음성(S2_V1)의 유사도가 임계 유사도를 초과하는 것으로 판단할 수 있다. 물론 사용자 단말(200)은 판단된 유사도가 임계 유사도를 초과함에 따라 증폭 음성의 출력 특성값 중 적어도 하나를 조절할 수 있다.For example, according to the process described in FIGS. 4 and 5, when speaker 2's voice 412 is identified as the user's target voice, and speaker 2's utterance occurs at the first time point t1 to the second time point t2, the user The terminal 200 may determine that the similarity between the voice 412 and the voice S2_V1 exceeds the threshold similarity. Of course, the user terminal 200 may adjust at least one of the output characteristic values of the amplified voice as the determined similarity exceeds the threshold similarity.

본 발명의 일 실시예에 따른 사용자 단말(200)이 증폭 음성의 출력 특성값을 조절하는 과정을 자세히 살펴보면, 사용자 단말(200)은 제1 시점(t1)에서 발화하는 화자의 음성이 속하는 주파수 대역을 확인할 수 있다. 가령 사용자 단말(200)은 도 7에 도시된 바와 같이 제1 시점(t1)에서 발화하는 화자인 화자 2의 음성이 속하는 주파수 대역(S2_f, 즉 f1 내지 f2)을 확인할 수 있다. 물론 사용자 단말(200)은 다른 시점에서 발화하는 화자들의 음성이 속하는 주파수 대역을 확인할 수도 있다. 가령 사용자 단말(200)은 화자 1의 음성이 속하는 주파수 대역(S1_f)과 화자 3의 음성이 속하는 주파수 대역(S3_f)을 확인할 수도 있다.Looking in detail at the process of adjusting the output characteristic value of the amplified voice by the user terminal 200 according to an embodiment of the present invention, the user terminal 200 is a frequency band to which the speaker's voice spoken at the first time point t1 belongs. can confirm. For example, the user terminal 200 may check the frequency band S2_f, that is, f1 to f2, to which the voice of the speaker 2, which is the speaker speaking at the first time point t1, belongs, as shown in FIG. 7. Of course, the user terminal 200 may check the frequency band to which the voices of speakers speaking at different times belong. For example, the user terminal 200 may check the frequency band S1_f to which the speaker 1's voice belongs and the frequency band S3_f to which the speaker 3's voice belongs.

이어서 사용자 단말(200)은 도 8에 도시된 바와 같이 입력 음향에서 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다.Subsequently, as shown in FIG. 8, the user terminal 200 may set the degree of amplification of the sound in the frequency band S2_f identified in the input sound higher than the amplification degree of the sound in a band other than the identified frequency band S2_f. .

본 발명의 일 실시예에 따른 사용자 단말(200)은 제1 시점(t1)과 제2 시점(t2) 내에서만 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도 보다 높게 설정할 수 있다. 이때 제1 시점(t1)은 화자(특히 목표 음성의 화자)의 발화가 시작된 시점을 의미할 수 있고, 제2 시점(t2)은 제1 시점(t1)에서 발화를 시작한 화자의 음성(또는 발화)이 종료되는 시점을 의미할 수 있다.The user terminal 200 according to an embodiment of the present invention determines the degree of amplification of the sound of the frequency band S2_f identified only within the first time point t1 and the second time point t2, other than the confirmed frequency band S2_f. It can be set higher than the amplification level of the sound in the band of At this time, the first time point (t1) may mean the time point at which the speaker (especially the speaker of the target voice) starts to speak, and the second time point (t2) is the voice (or speech) of the speaker who started speaking at the first time point (t1). ) May mean the end point.

본 발명의 일 실시예에 따른 사용자 단말(200)은 제2 시점(t2)부터는 확인된 주파수 대역(S2_f)의 음향의 증폭 정도를 확인된 주파수 대역(S2_f) 이외의 대역의 음향의 증폭 정도와 동일하게 설정할 수 있다.The user terminal 200 according to an embodiment of the present invention determines the degree of amplification of the sound in the identified frequency band S2_f from the second point in time t2 and the degree of amplification of the sound in a band other than the identified frequency band S2_f. You can set the same.

본 발명의 일 실시예에 따른 사용자 단말(200)은 상술한 과정에 의해 생성된 증폭 음성을 출력할 수 있다.(S940) 가령 사용자 단말(200)은 음성 출력 유닛(100)을 통하여 생성된 증폭 음성을 출력할 수 있다.The user terminal 200 according to an embodiment of the present invention may output the amplified voice generated by the above-described process (S940). For example, the user terminal 200 may output the amplified voice generated through the voice output unit 100. Audio can be output.

선택적 실시예에서, 사용자 단말(200)은 입력 음향을 상쇄시키는 상쇄 음향과 함께 증폭 음성을 출력할 수 있다. 이때 상쇄 음향은 입력 음향과 진폭은 동일하고 위상이 반대인 음향일 수 있다. 이로써 본 발명은 증폭된 음성만 또렷하게 사용자에게 제공할 수 있다.In an optional embodiment, the user terminal 200 may output an amplified voice along with a canceling sound that cancels the input sound. In this case, the offset sound may be a sound having the same amplitude as the input sound and the opposite phase. Accordingly, the present invention can clearly provide only the amplified voice to the user.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments are, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, such as one or more general purpose computers or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or, to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description to those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as systems, structures, devices, circuits, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and those equivalent to the claims also fall within the scope of the claims to be described later.

100: 음성 출력 유닛
111: 통신부
112: 제어부
113: 음향 입력부
114: 음향 출력부
115: 메모리
200: 사용자 단말
211: 통신부
212: 제어부
213: 메모리
214: 디스플레이부
300: 뇌파 감지 장치100: audio output unit
111: communication department
112: control unit
113: sound input unit
114: sound output unit
115: memory
200: user terminal
211: Ministry of Communications
212: control unit
213: memory
214: display unit
300: brain wave detection device

Claims

In the method of selectively amplifying the voice of a specific speaker,
Reproducing a plurality of candidate voices in time series and identifying a target voice that the user wants to hear from among the plurality of candidate voices based on the user's brain waves;
Extracting at least one characteristic value of the target voice;
Checking whether the target voice is included in the input sound based on the characteristic value, and if the target voice is included, generating an amplified voice by amplifying only the target voice; And
Including, outputting the amplified voice, the selective amplification method of the voice.

The method according to claim 1
Identifying the target voice
A voice selective amplification method of identifying a voice in which at least one of the physical quantities of the brain waves satisfies a predetermined condition while listening to the plurality of candidate voices as the target voice.

The method according to claim 2
The above physical quantity is
Including any one or more of the frequency of the brain wave and the amplitude of the brain wave,
The predetermined conditions above are
A condition that the frequency of the user's EEG is the highest among the plurality of candidate voices, and
A method for selectively amplifying speech, including any one or more of conditions in which the amplitude of the user's EEG is the lowest among the plurality of candidate speeches.

The method according to claim 1
Identifying the target voice
Including; using the learned artificial neural network, determining a target voice that the user wants to listen to,
The artificial neural network is
A method of selective amplification of speech, which is a neural network that has been trained to output whether or not the user desires to listen in response to an input of the EEG data based on learning data including EEG data indicating whether the user desires to hear or not.

The method according to claim 1
The at least one characteristic value is
A voice selective amplification method comprising at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and a speech pattern of the target voice.

The method according to claim 1
The input sound is
Sound input in real time, including a plurality of spoken voices uttered by a plurality of speakers,
The step of generating the amplified voice
Checking a speech characteristic value, which is a characteristic value of the speaker's speech spoken at the first point in time;
Checking whether a similarity between the speech characteristic value and the characteristic value of the target speech exceeds a predetermined threshold similarity; And
Including, the selective amplification method of the voice comprising; adjusting at least one of the output characteristic values of the amplified voice when the predetermined threshold similarity is exceeded.

The method of claim 6
Adjusting at least one of the output characteristic values
Identifying a frequency band to which the speaker's voice speaking at the first time point belongs; And
In the input sound, setting the amplification degree of the sound in the identified frequency band higher than the amplification degree of the sound in the band other than the identified frequency band; including, selective amplification method of speech.

The method of claim 7
The step of generating the amplified voice
Detecting a second point in time that is a point in time when the speaker's voice uttering at the first point in time ends;
Including, selective amplification method of speech comprising; setting the amplification degree of the sound in the identified frequency band from the second point in time to be the same as the amplification degree of the sound in a band other than the identified frequency band.

The method according to claim 1
The step of outputting the amplified voice
Outputting a cancellation sound canceling the input sound; And
Including, outputting the amplified voice, the selective amplification method of the voice.

The method of claim 9
The offset sound is
The input sound and the amplitude of the sound is the same and the phase is opposite, the selective amplification method of the voice.

Using a computer
A computer program stored in a medium to execute the method of any one of claims 1 to 10.

In the selective amplification system of the voice selectively amplifying the voice of a specific speaker, the system
An EEG sensing device for detecting a user's EEG;
A voice output unit for selectively outputting amplified voice to the user's auditory organ; And
Including; a user terminal generating the amplified voice based on the EEG detected by the EEG sensing device and providing the amplified voice to the voice output unit; and
The user terminal
While reproducing a plurality of candidate voices in a time series through the voice output unit, based on the user's EEG detected by the EEG sensing device, identify a target voice that the user wants to hear from among the plurality of candidate voices,
Extracting at least one characteristic value of the target voice,
It is checked whether the target voice is included in the input sound based on the characteristic value, and if the target voice is included, only the target voice is amplified to generate an amplified voice,
The audio selective amplification system for controlling the audio output unit to output the amplified audio.

The method of claim 12
The user terminal
A voice selective amplification system for identifying a voice in which at least one of the physical quantities of the brain waves satisfies a predetermined condition while listening to the plurality of candidate voices as the target voice.

The method of claim 13
The above physical quantity is
Including any one or more of the frequency of the brain wave and the amplitude of the brain wave,
The predetermined conditions above are
A condition that the frequency of the user's EEG is the highest among the plurality of candidate voices, and
The voice selective amplification system comprising at least one of conditions in which the amplitude of the user's EEG is the smallest among the plurality of candidate voices.

The method of claim 12
The user terminal
Using the learned artificial neural network, determine the target voice that the user wants to listen to,
The artificial neural network is
A system for selective amplification of speech, which is a neural network that has been trained to output whether or not the user desires to listen in response to an input of the EEG data based on learning data including EEG data indicating whether the user wants to hear or not.

The method of claim 12
The at least one characteristic value is
A voice selective amplification system comprising at least one of a gender of a speaker of the target voice, a pitch of the target voice, a frequency of the target voice, and a speech pattern of the target voice.

The method of claim 12
The input sound is
A sound sensed in real time by any one of the user terminal and the voice output unit, and includes a plurality of spoken voices uttered by a plurality of speakers,
The user terminal
Check the speech characteristic value, which is the characteristic value of the speaker's speech spoken at the first point in time, check whether the similarity between the speech characteristic value and the characteristic value of the target speech exceeds a predetermined threshold similarity, and the predetermined threshold When the similarity is exceeded, the audio selective amplification system adjusts at least one of the output characteristic values of the audio output unit.

The method of claim 17
The user terminal
Checking the frequency band to which the speaker's voice spoken at the first time point belongs, and setting the amplification degree of the sound of the identified frequency band in the input sound higher than the amplification degree of the sound of the band other than the identified frequency band , A system of selective amplification of speech.

The method of claim 18
The user terminal
Detecting a second time point, which is a time point at which the speaker's voice uttering at the first time point ends, and determining the amplification degree of the sound in the identified frequency band from the second time point in amplification of the sound in a band other than the identified frequency band The selective amplification system of the voice, which is set equal to the degree.

The method of claim 12
The user terminal
The audio selective amplification system for controlling the audio output unit to output a canceled sound canceling the input sound and the amplified sound together.

The method of claim 20
The offset sound is
The input sound and the amplitude of the sound is the same and the phase is opposite, the selective amplification system of the voice.