KR20210010404A

KR20210010404A - Apparatus for adjusting output sound source and method thereof

Info

Publication number: KR20210010404A
Application number: KR1020200088942A
Authority: KR
Inventors: 안강헌; 김현재
Original assignee: 충남대학교산학협력단
Priority date: 2019-07-17
Filing date: 2020-07-17
Publication date: 2021-01-27
Also published as: KR102316509B1

Abstract

The present invention provides an operating method of a device for adjusting sound source output, operated by at least one processor. The method includes the steps of: calculating a threshold by collecting external noise for a certain period of time through one or more interlocking microphones, and calculating the average size of the collected external noise; preprocessing according to the input of a sound analysis model learned for the external noise if the external noise collected in real time has a value greater than the threshold; determining whether the sound generated within a critical distance is a human voice or a vehicle driving sound by inputting the preprocessed external noise into the learned sound analysis model; and outputting to the generated sound to the sound source output device which is interlocked to corresponding external noise when the external noise is determined as a human voice or the vehicle driving sound, wherein when a sound source content is output from the sound source output device, the sound source content is output by combining the voice of the person or the driving sound of the vehicle.

Description

Sound source output control device and its method {APPARATUS FOR ADJUSTING OUTPUT SOUND SOURCE AND METHOD THEREOF}

외부 소음에 기초하여 음원 출력을 자동으로 제어하는 음원 출력 제어 장치 및 그 방법에 관한 것이다. It relates to a sound source output control apparatus and method for automatically controlling the sound source output based on external noise.

최근에는 다양한 소형기기가 대중화됨에 따라 일상 생활에서 음악, 동영상 및 DMB를 시청하는 경우가 증가하고 있다. 이러한 기기들을 이용하기 위해서는 해당 콘텐츠에 집중할 수 있도록 외부 소음을 차단하는 구조로 형성된 이어폰, 헤드폰 등을 착용하게 된다. Recently, as various small devices have become popular, the cases of watching music, video and DMB in daily life are increasing. In order to use these devices, earphones, headphones, etc., formed in a structure that block external noise so that they can concentrate on the corresponding content, are worn.

그러나 외부 소음을 완전 차단하지 못하기 때문에, 사용자는 착용한 이어폰, 헤드폰 등의 볼륨을 높이게 되며, 이를 오랜 시간 유지하는 경우, 청력이 손실되는 문제로 발전할 수 있다.However, since it is not possible to completely block external noise, the user increases the volume of earphones and headphones worn, and if maintained for a long time, it may develop into a problem of hearing loss.

또한, 외부 소음과 차단되어 콘텐츠에 집중함으로써, 사용자가 인지해야만 하는 주변 상황을 놓치거나 무시하게 되어 사고가 발생하고, 이로 인한 사회적 손실이 발생할 수 있다. In addition, by being blocked from external noise and focusing on the content, an accident may occur because the user misses or ignores surrounding situations that must be recognized, resulting in social loss.

이러한 문제를 해결하고자, 사용자가 이어폰을 착용한 상태에서 갑작스러운 큰 소음이 외부에서 발생하면 해당 소음에 대응하여 이어폰의 볼륨을 조절하는 기술이 연구되고 있지만, 외부에서 큰 소음이 발생한 경우에는 이미 사고가 발생된 이후의 상황일 가능성이 크기 때문에 사용자가 해당 상황에 대한 대응할 시간을 확보할 수 없다. In order to solve this problem, when a sudden loud noise occurs outside while the user is wearing the earphone, research is being conducted to adjust the volume of the earphone in response to the corresponding noise. Since there is a high possibility that this is the situation after the occurrence of, the user cannot secure time to respond to the situation.

따라서, 사용자가 해당 상황을 인지하고 대응할 시간을 확보할 수 있도록 주변에서 발생하는 소음을 정확하게 구분하여 사용자가 착용한 이어폰의 볼륨을 제어하거나 외부 소음을 이어폰으로 출력하는 기술이 요구된다. Accordingly, there is a need for a technology that accurately classifies noise generated in the surroundings and controls the volume of the earphone worn by the user or outputs external noise to the earphone so that the user can recognize the situation and secure time to respond.

해결하고자 하는 과제는 음원 콘텐츠를 출력하는 상황에서 외부 소음 중에서 사람의 음성 또는 차량의 주행 소리를 인식하면 음원 콘텐츠와 인식한 사람의 음성 또는 차량의 주행 소리를 조합하여 출력하는 기술을 제공하는 것이다. The problem to be solved is to provide a technology for outputting a combination of sound source content and the recognized human voice or vehicle driving sound when a human voice or vehicle driving sound is recognized among external noises in a situation in which sound source content is output.

본 발명의 실시예에 따르면, 적어도 하나의 프로세서에 의해 동작하는 음원 출력 제어 장치의 동작 방법으로서, 연동되는 하나 이상의 마이크를 통해 일정 시간 동안 외부 소음을 수집하고, 수집한 외부 소음의 평균 크기를 산출하여 역치를 산정하는 단계, 실시간으로 수집되는 외부 소음이 역치보다 큰 값을 가지면 외부 소음에 대해 학습된 소리 분석 모델의 입력에 맞게 전처리하는 단계, 전처리한 외부 소음을 학습된 소리 분석 모델에 입력하여 임계 거리 이내에 발생한 소리에 대해 사람의 음성 또는 차량의 주행 소리 인지 판별하는 단계, 그리고 외부 소음이 사람의 음성 또는 차량의 주행 소리로 판별하면 해당 외부 소음을 연동하는 음원 출력 장치로 출력하는 단계를 포함하고, 음원 출력 장치에서 음원 콘텐츠가 출력되고 있는 경우, 음원 콘텐츠와 상기 사람의 음성 또는 차량의 주행 소리를 조합하여 출력한다. According to an embodiment of the present invention, as an operating method of a sound source output control device operated by at least one processor, external noise is collected for a predetermined period of time through one or more interlocked microphones, and the average magnitude of the collected external noise is calculated. To calculate the threshold value, if the external noise collected in real time has a value greater than the threshold, pre-processing the external noise according to the input of the learned sound analysis model, and inputting the pre-processed external noise into the learned sound analysis model Determining whether the sound generated within the critical distance is a human voice or a driving sound of a vehicle, and when the external noise is determined as a human voice or a driving sound of a vehicle, outputting the corresponding external noise to a sound source output device interlocking And, when the sound source content is being output from the sound source output device, the sound source content and the voice of the person or the driving sound of the vehicle are combined and output.

다양한 위치에서 발화하는 사람의 음성들과 주행하는 차량의 주행 소리들을 수집하는 단계, 사람의 음성 또는 차량의 주행 소리 중에서 소리 신호의 거리 특성에 기초하여 임계 거리 내에서 발생된 소리만을 선택하여 데이터 셋을 구축하는 단계, 그리고 데이터 셋을 이용하여 임계 거리 이내에서 발화하는 사람의 음성을 판별하는 제1 소리 분석 모델과 임계 거리 이내에서 발생된 차량의 주행 소리를 판별하는 제2 소리 분석 모델을 학습시키는 단계를 더 포함할 수 있다. Collecting human voices spoken at various locations and driving sounds of a driving vehicle. Data set by selecting only sounds generated within a critical distance based on distance characteristics of the sound signal from among human voices or vehicle driving sounds Constructing, and learning a first sound analysis model that determines the voice of a person uttering within a critical distance using the data set and a second sound analysis model that determines the driving sound of a vehicle generated within the critical distance. It may further include a step.

마이크는 각각의 면 중앙부에 마이크 센서가 부착되어 상기 마이크 센서마다 소리를 수음하는 다면체 마이크를 나타낼 수 있다. The microphone may represent a polyhedral microphone having a microphone sensor attached to the center of each surface to receive sound for each microphone sensor.

학습시키는 단계는, 복수의 마이크를 통해 외부 소음을 수음하는 경우, 발생되는 소리의 위치에서부터 마이크마다 도달되는 시간차에 기초하여 임계 거리 이내의 소리만을 분류하고, 분류된 임계 거리 이내의 소리에 대해서 사람의 음성 또는 차량의 주행 소리 여부를 판별하도록 학습시킬 수 있다. In the learning step, when external noise is received through a plurality of microphones, only sounds within a critical distance are classified based on the time difference reached for each microphone from the position of the generated sound, and the sound within the classified critical distance is It can be learned to determine whether the voice of the vehicle or the driving sound of the vehicle.

학습시키는 단계는, 다음 수학식과 같이 음원 소리 신호(S_o(t')가 t 시간 만큼 전파되었을 때 소리 신호(S(t))의 거리에 따른 소리의 분산 특성(g_d)에 기초하여 주파수 도메인 상에서 거리에 따른 소리의 분산 특성G_d(W)을 산출하고 산출된 소리의 분산 특성에 기초하여 임계 거리 내에서 발생된 소리만을 선택하도록 학습시킬 수 있다. The learning step is based on the dispersion characteristic (g _d ) of the sound according to the distance of the sound signal (S(t)) when the sound source sound signal (S _o (t') is propagated by t time as shown in the following equation). It is possible to learn to calculate a sound dispersion characteristic G _d (W) according to a distance in a domain and select only sounds generated within a critical distance based on the calculated sound dispersion characteristic.

여기서, S_o(t')는 음원의 신호 값으로 설정된 값을 의미하고 S_o(w)는 주파수 도메인상에서의 음원 소리 신호, S(W)는 주파수 도메인 상에서의 소리 신호(S(t)를 나타낸다. Here, S _o (t') means a value set as the signal value of the sound source, S _o (w) is the sound source sound signal in the frequency domain, and S(W) is the sound signal S(t) in the frequency domain. Show.

역치를 산정하는 단계는 수집된 외부 소음을 디지털 신호로 변환하고, 전환된 디지털 신호들을 기초로 평균 크기를 산출하면 산출된 평균 크기에 대한 오차 범위를 더한 값으로 역치를 산정할 수 있다. In the step of calculating the threshold, the collected external noise is converted into a digital signal, and the average size is calculated based on the converted digital signals. Then, the threshold may be calculated by adding an error range for the calculated average size.

음원 출력 장치로 출력하는 단계는, 인식된 사람의 음성 또는 차량의 주행 소리에 기초하여 외부 소리 볼륨 값을 설정하면, 외부 소리 볼륨 값에 기초하여 음원 콘텐츠의 볼륨이 낮아지도록 제어할 수 있다. In the outputting of the sound source output device, if the external sound volume value is set based on the recognized human voice or the driving sound of the vehicle, the volume of the sound source content may be lowered based on the external sound volume value.

본 발명의 또 실시예에 따르면, 컴퓨팅 장치에 의해 실행되고, 컴퓨터로 판독가능한 저장매체에 저장되는 프로그램으로서, 연동되는 하나 이상의 마이크를 통해 일정 시간 동안 외부 소음을 수집하고, 수집한 외부 소음의 평균 크기를 산출하여 역치를 산정하는 단계, 실시간으로 수집되는 외부 소음이 역치보다 작은 값을 가지면 삭제하고 역치보다 큰 값을 가지면 외부 소음에 대해 학습된 소리 분석 모델의 입력에 맞게 전처리하는 단계, 전처리한 외부 소음을 학습된 소리 분석 모델에 입력하여 임계 거리 이내에 발생한 소리에 대해 사람의 음성 또는 차량의 주행 소리 인지 판별하는 단계, 그리고 외부 소음이 사람의 음성 또는 차량의 주행 소리로 판별하면 해당 외부 소음을 연동하는 음원 출력 장치로 출력하는 단계를 실행하는 명령어들을 포함한다. According to another embodiment of the present invention, as a program executed by a computing device and stored in a computer-readable storage medium, external noise is collected for a certain time through one or more interlocked microphones, and the average of the collected external noise The step of calculating the threshold value by calculating the size, deleting if the external noise collected in real time has a value less than the threshold, and preprocessing the external noise according to the input of the learned sound analysis model if it has a value greater than the threshold. The step of inputting the external noise into the learned sound analysis model to determine whether the sound generated within a critical distance is a human voice or a driving sound of a vehicle, and when the external noise is determined as a human voice or a driving sound of a vehicle, the corresponding external noise is determined. It includes instructions for executing the step of outputting to an interlocking sound source output device.

실시예에 따르면, 외부 소음을 차단하면서 음원 콘텐츠를 이용하는 사용자의 근거리에서 발화하는 사람의 음성이나 근거리로 다가오는 차량의 주행 소리를 인식하여 제공함으로써, 사용자가 주변 상황의 변화를 쉽게 인지하여 발생할 수 있는 사고를 예방할 수 있다. According to the embodiment, by recognizing and providing the voice of a person uttering at a close range of the user using the sound source content while blocking external noise or the driving sound of a vehicle approaching at a close range, the user can easily recognize changes in the surrounding situation. Accidents can be prevented.

또한, 실시예에 따르면 사용자가 주변 사람과 대화를 하고자 하거나 소음이 발생하는 상황에 주의를 기울이기 위해 이용하고 있는 음원 콘텐츠를 직접 제어하지 않아도 의사 소통이 가능하도록 자동으로 음원 출력 제어 장치의 볼륨을 제어함으로써 사용자의 편의성을 제공한다. In addition, according to an embodiment, the volume of the sound source output control device is automatically adjusted so that communication is possible without directly controlling the sound source content that the user is using to communicate with people around him or to pay attention to situations in which noise occurs. It provides user convenience by controlling.

도 1은 본 발명의 실시예에 따른 음원 출력 제어 장치를 포함하는 시스템을 나타낸 구성도이다.
도 2는 본 발명의 실시예에 따른 사면체 마이크를 통해 임계거리 내의 소리들을 분석하는 방법에 대해서 설명하기 위한 예시도이다.
도 3는 본 발명의 실시예에 따른 거리에 따른 주파수 도메인을 나타낸 예시도이다.
도 4는 본 발명의 실시예에 따른 음원 출력 제어 장치를 나타낸 구성도이다.
도 5는 본 발명의 실시예에 따른 소리 분석 모델을 나타낸 예시도이다.
도 6은 본 발명의 실시예에 따른 외부 소리에 따른 음원 출력을 제어하는 방법을 나타낸 순서도이다.
도 7는 본 발명의 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다.1 is a block diagram showing a system including a sound source output control apparatus according to an embodiment of the present invention.
2 is an exemplary view for explaining a method of analyzing sounds within a critical distance through a tetrahedral microphone according to an embodiment of the present invention.
3 is an exemplary diagram showing a frequency domain according to a distance according to an embodiment of the present invention.
4 is a block diagram showing a sound source output control apparatus according to an embodiment of the present invention.
5 is an exemplary view showing a sound analysis model according to an embodiment of the present invention.
6 is a flowchart illustrating a method of controlling a sound source output according to an external sound according to an embodiment of the present invention.
7 is a hardware configuration diagram of a computing device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the embodiments of the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless otherwise stated. In addition, terms such as "... unit", "... group", and "module" described in the specification mean units that process at least one function or operation, which can be implemented by hardware or software or a combination of hardware and software. have.

명세서 상에서 외부 소음은 사람의 음성 또는 자동차 소리 등 외부에서 발생한 모든 소리를 포함한다. In the specification, external noise includes all sounds generated from outside, such as human voices or car sounds.

명세서 상에서는 음원 출력 장치는 사용자의 신체에 탈장착이 가능하며 음원 콘텐츠를 출력하는 이어폰 또는 헤드셋등을 나타내며, 이하에서는 음원 출력 장치를 사용자가 장착한 상황에서 실시예가 구현된다. In the specification, the sound source output device refers to an earphone or a headset that can be attached to the user's body and outputs sound source content. Hereinafter, an embodiment is implemented in a situation in which the sound source output device is mounted by the user.

명세서 상에서 음원 출력 제어 장치는 음원 출력 장치에 음원 콘텐츠를 제공하는 단말에 장착되지만 이에 반드시 한정하는 것은 아니고 이어폰 또는 헤드셋과 같이 음원 출력 장치에 내장되는 형태로 구현 가능하다. 이를 통해 음원 출력 제어 장치는 음원 출력 장치에서 출력되는 음원 콘텐츠의 볼륨, 재생 또는 정지 등에 대해 제어할 수 있다. 여기서, 이어폰 또는 헤드셋의 종류, 유선 또는 무선, 페어링 형식 등에 의해 한정되지 않는다. In the specification, the sound source output control device is mounted on a terminal that provides sound source content to the sound source output device, but is not limited thereto and may be implemented in a form embedded in a sound source output device such as an earphone or a headset. Through this, the sound source output control device may control the volume, playback or stop of sound source content output from the sound source output device. Here, it is not limited by the type of earphone or headset, wired or wireless, or pairing format.

명세서 상에서 사람의 음성 또는 차량의 주행 소리는 사용자를 기준으로 근거리로 나타내는 임계거리 내에서 발생되는 소리이거나 사용자 방향으로 진행하는 소리를 의미하지만 반드시 이에 한정하는 것은 아니며, 자전거 클락션 소리, 자전거 주행 소리, 오토바이, 전기 스쿠터 등 다양한 교통 수단에 의한 소리도 포함할 수 있다. In the specification, the voice of a person or the driving sound of a vehicle refers to a sound generated within a critical distance indicated by the user in a short distance or a sound traveling toward the user, but is not limited thereto, and the bicycle clock sound, the bicycle driving sound, It may also include sounds from various means of transportation such as motorcycles and electric scooters.

명세서 상에서 음원과 소리는 동일한 의미로 사용되며, 음원 콘텐츠는, 소리를 이용하여 콘텐츠를 의미하며, 음악, 노래, 강연, 강의 등을 모두 포함한다. In the specification, sound source and sound are used with the same meaning, and sound source content refers to content using sound, and includes all music, songs, lectures, lectures, and the like.

도 1은 본 발명의 실시예에 따른 음원 출력 제어 장치를 포함하는 시스템을 나타낸 구성도이다. 1 is a block diagram showing a system including a sound source output control apparatus according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 시스템은 음원 출력 제어 장치(100)에 하나 이상의 마이크(10)와 연동하고, 해당 마이크(10)를 통해 외부 소음을 집음한다. 그리고 음원 출력 제어 장치(100)는 외부 소음을 분석하여 근거리에서 사람의 음성 또는 차량의 주행 소리를 인식하면 연동되는 음원 출력 장치(20)를 통해 해당 소리를 출력한다. 음원 출력 장치(20)는 이어폰 또는 헤드셋 등과 같은 귀에 밀착하여 음원을 출력하는 장치로, 유선 또는 무선으로 연결된 한 쌍의 출력 장치를 나타낸다. As shown in FIG. 1, the system interlocks with one or more microphones 10 to the sound source output control device 100 and collects external noise through the microphone 10. In addition, when the sound source output control device 100 analyzes the external noise and recognizes a human voice or a driving sound of a vehicle at a short distance, the sound source output control device 100 outputs the corresponding sound through the linked sound source output device 20. The sound source output device 20 is a device that outputs a sound source in close contact with an ear, such as an earphone or a headset, and represents a pair of output devices connected by wire or wirelessly.

이때, 음원 출력 제어 장치(100)는 마이크(10)에서 집음하는 외부 소음이 임계 거리에서 발생하는 사람의 음성 또는 차량의 주행 소리인지를 판별하기 위해 하나 이상의 학습된 인공 신경망을 이용할 수 있다. In this case, the sound source output control apparatus 100 may use one or more learned artificial neural networks to determine whether the external noise collected by the microphone 10 is a human voice generated at a critical distance or a driving sound of a vehicle.

그리고 음원 출력 제어 장치(100)는 외부 소음으로부터 임계 거리 내에서 사람의 음성 또는 차량의 주행 소리를 인식하면, 현재 출력되는 콘텐츠가 있는 경우, 해당 콘텐츠의 볼륨을 낮추도록 제어한다. 이때, 임계 거리란, 사용자의 위치에서의 미리 설정된 근거리를 나타내는 것으로 사용자의 사용환경에 기초하여 변경 및 설정가능하다.In addition, when the sound source output control apparatus 100 recognizes a human voice or a driving sound of a vehicle within a critical distance from external noise, if there is currently output content, the volume of the content is lowered. In this case, the threshold distance indicates a preset short distance from the user's location and can be changed and set based on the user's use environment.

그리고 음원 출력 제어 장치(100)는 마이크(10)로부터 수음된 외부 소음을 그대로 음원 출력 장치(20)를 통해 출력할 수 있다. 상세하게는 음원 출력 제어 장치(100)는 출력 중이던 음원 콘텐츠와 외부 소음 소리를 결합하여 음원 출력 장치(20)를 통해 출력할 수 있으며, 이때 출력 중이던 음원 콘텐츠의 볼륨보다 외부 소음 소리를 더 크게 출력하도록 제어할 수 있다. In addition, the sound source output control device 100 may output external noise received from the microphone 10 through the sound source output device 20 as it is. In detail, the sound source output control device 100 may combine the sound source content being output with the external noise sound and output it through the sound source output device 20, and at this time, the external noise sound is output louder than the volume of the sound source content being output. Can be controlled to do.

여기서 출력되는 음원 콘텐츠는 단말에 저장된 음원 콘텐츠이거나 유선 또는 무선 네트워크로 연결된 외부 서버(30)로부터 실시간으로 수신된 음원 콘텐츠이다. The sound source content output here is sound source content stored in the terminal or sound source content received in real time from the external server 30 connected via a wired or wireless network.

다시 말해, 음원 출력 제어 장치(100)가 내장된 단말은 외부 서버(30)와 네트워크를 통해 연결되어 데이터를 송수신한다. In other words, the terminal in which the sound source output control device 100 is embedded is connected to the external server 30 through a network to transmit and receive data.

여기서 네트워크는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루스(Bluetooth) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등들 포함하지만 반드시 이에 한정하는 것은 아니다. Here, the networks are 3GPP (3rd Generation Partnership Project) networks, LTE (Long Term Evolution) networks, WIMAX (World Interoperability for Microwave Access) networks, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network). , Wide Area Network (WAN), Personal Area Network (PAN), Blues (Bluetooth) network, satellite broadcasting network, analog broadcasting network, Digital Multimedia Broadcasting (DMB) network, and the like, but are not limited thereto.

이를 통해 음원 출력 제어 장치(100)는 별도의 통신 모듈을 통해 외부 장치 또는 외부 서버(30)와 네트워크로 연결되어 데이터를 송수신할 수 있지만, 단말을 통해 외부 장치 또는 외부 서버(30)와 네트워크로 연결되어 데이터를 송수신할 수 있다. 이러한 구성은 추후에 사용자에 의해 용이하게 설계 및 변경 가능하다. Through this, the sound source output control device 100 may be connected to an external device or an external server 30 through a network through a separate communication module to transmit and receive data, but to the external device or an external server 30 through a terminal. It is connected and can transmit and receive data. This configuration can be easily designed and changed by the user in the future.

한편, 음원 출력 제어 장치(100)는 음원 콘텐츠의 재생 여부, 볼륨 조절 등과 같은 제어를 수행하며, 장착된 마이크(10)와 음원 출력 장치(20)와 네트워크를 통해 연결되는 장치인 단말, 스마트 단말, 스마트 워치 등에 내장될 수 있으나 특정 단말로 한정하는 것은 아니다. On the other hand, the sound source output control device 100 performs control such as whether or not to play sound source content, volume control, etc., and is a device that is connected to the equipped microphone 10 and the sound source output device 20 through a network, a smart terminal. , Smart watch, etc., but is not limited to a specific terminal.

이하에서는 음원 출력 제어 장치(100)가 단말에 내장된 것으로 가정하고 설명하지만, 반드시 이에 한정하는 것은 아니고, 상황이나 구현조건에 따라 음원 출력장치(20)에 내장되거나 단말과 음원 출력 장치(20)의 연결 라인에 내장될 수 있다. Hereinafter, it is assumed that the sound source output control device 100 is embedded in the terminal, but the description is not limited thereto, and the sound source output device 20 is embedded in the sound source output device 20 or the terminal and the sound source output device 20 are not limited thereto. It can be built into the connecting line.

이하에서는 도 2 및 도 3을 이용하여 마이크(10)를 통해 수음되는 소리들의 임계 거리를 구분하기 위한 사면체 마이크 구성이나 확보한 데이터 셋에 대해서 상세하게 설명한다. Hereinafter, a configuration of a tetrahedral microphone for classifying a critical distance of sounds received through the microphone 10 or a secured data set will be described in detail using FIGS. 2 and 3.

이하에서는 임계 거리 이내의 소리만을 분류할 수 있도록 사면체 마이크를 이용하거나 소리 신호의 거리 특성을 이용하는 구성에 대해서 상세하게 설명한다. Hereinafter, a configuration using a tetrahedral microphone or a distance characteristic of a sound signal will be described in detail so that only sounds within a critical distance can be classified.

도 2는 본 발명의 실시예에 따른 사면체 마이크를 통해 임계거리 내의 소리들을 분석하는 방법에 대해서 설명하기 위한 예시도이고, 도 3은 본 발명의 실시예에 따른 거리에 따른 주파수 도메인을 나타낸 예시도이다. 2 is an exemplary diagram for explaining a method of analyzing sounds within a critical distance through a tetrahedral microphone according to an exemplary embodiment of the present invention, and FIG. 3 is an exemplary diagram showing a frequency domain according to distance according to an exemplary embodiment of the present invention. to be.

도 2에 도시한 바와 같이, 마이크(10)는 사면체 마이크로 구현될 수 있으며, 사면의 중점에 각각 위치한 지점에 각각 마이크 센서가 배열되어 4개의 마이크 센서(원형으로 표시)로부터 음성을 수음할 수 있다. As shown in FIG. 2, the microphone 10 may be implemented as a tetrahedral microphone, and microphone sensors are arranged at points respectively located at the midpoints of the slopes to receive voice from four microphone sensors (represented by circles). .

이때, 음원 발생 지점과의 거리에 따라서 복수개의 마이크 센서에서 수음하는 시간차이가 달라진다. At this time, the time difference for receiving sound from the plurality of microphone sensors varies according to the distance from the sound source generation point.

예를 들어 마이크(10)와 음원(sound source)간의 거리가 임계 거리 이내인 경우(near field), 음원(sound source)의 위치로부터 각 마이크 센서에 도달하는 시간이 서로 다르게 나타난다. For example, when the distance between the microphone 10 and the sound source is within a critical distance (near field), the time to reach each microphone sensor from the location of the sound source appears differently.

도 2에 도시한 바와 같이, 임계 거리 이내일수록 각 마이크 센서(mic₁, mic₂, mic₃)에 각 도달하는 거리가 상이하게 적용되므로 결과적으로 각 마이크 센서마다 수음한 시간을 비교하면 시간 지연이 발생한다. As shown in FIG. 2, as the distances reached to each microphone sensor (mic ₁ , mic ₂ , mic ₃ ) are applied differently as the distance is within the threshold distance, as a result, the time delay is reduced by comparing the time received for each microphone sensor. Occurs.

상세하게는 마이크 센서(mic₂)의 거리(x)를 기준으로 보면, 마이크 센서(mic₂)보다 가까운 거리에서 수음하는 mic₃에서는 음원과의 거리값이 x 보다 d₂만큼 더 가까운 거리에 위치하고, 마이크 센서(mic₂)보다 먼 거리에서 수음하는 mic₁에서는 음원과의 거리값이 x 보다 d₁만큼 더 먼 거리에 위치하게 된다. Specifically, the microphone sensor, even based on the distance (x) of the (mic _2), the mic ₃ which received sound closer than a microphone sensor (mic ₂₎ the distance value between the sound source is located in closer distance by more than x d ₂ , In mic ₁ , which receives sound at a greater distance than the microphone sensor (mic ₂ ), the distance to the sound source is located at a distance by d ₁ more than x.

이러한 음원과의 거리값의 차이에 의해 마이크 센서마다 해당 음원을 수음하는 시간차이가 발생한다. Due to the difference in distance between the sound source and the sound source, there is a difference in time for each microphone sensor to receive the sound source.

이에 임계 거리 내인 근거리(near field)의 경우, 마이크 센서((mic₁,mic₂) 간의 시간 차이 값과 마이크 센서((mic₂,mic₃)간의 시간 차이 값이 크게 나타나게 되며, 반면에 원거리(Far field)의 경우, 마이크 센서((mic₁,mic₂) 간의 시간 차이 값과 마이크 센서((mic₂,mic₃)간의 시간 차이 값이 상대적으로 근소하게 나타나게 된다. Therefore, in the case of a near field within a critical distance, the microphone sensor ((mic ₁ ,The time difference value between mic ₂ ) and the microphone sensor ((mic ₂ ,The time difference value between mic ₃ ) appears large, whereas in the case of a far field, the microphone sensor ((mic ₁ ,The time difference value between mic ₂ ) and the microphone sensor ((mic ₂ ,The time difference between mic ₃ ) appears relatively insignificant.

예를 들어, 근거리(near field)의 경우, 마이크 센서((mic₁,mic₂) 간의 음원을 감지한 시간 차이 값이 약 10초이고, 마이크 센서((mic₂,mic₃)간의 음원을 감지한 시간 차이 값이 약 20초로 나타난다. 이처럼 근거리의 경우, 마이크 센서간의 시간 지연 값이 임계값 이상으로 차이가 나타난다. For example, in the case of near field, the microphone sensor ((mic ₁ ,The time difference between detecting the sound source between mic ₂ ) is about 10 seconds, and the microphone sensor ((mic ₂ ,The difference in time when the sound source is detected between mic ₃ ) appears as about 20 seconds. In the case of a short distance, the time delay value between the microphone sensors is more than the threshold value.

반면에 원거리( Far field)의 경우 예를 들어, 마이크 센서((mic₁,mic₂) 간의 음원을 감지한 시간 차이 값이 약10초이고, 마이크 센서((mic₂,mic₃)간의 음원을 감지한 시간 차이 값이 약 11초와 같이 나타난다. 이처럼 원거리의 경우, 마이크 센서간의 시간 지연 값이 임계값 이하로 차이가 나타난다. On the other hand, in the case of the far field, for example, the microphone sensor ((mic ₁ ,The time difference between detecting the sound source between mic ₂ ) is about 10 seconds, and the microphone sensor ((mic ₂ ,The difference in time when the sound source is detected between mic ₃ ) appears as about 11 seconds. As such, in the case of a long distance, a difference appears in the time delay value between the microphone sensors below the threshold value.

다음 수학식 1과 같이 마이크 센서(mic_1, mic₂, mic₃)를 이용하여 임계 거리 이내에 위치하는 음원에 대한 시간 지연(1)을 나타내고, 임계 거리 이상에 위치하는 음원에 대한 시간 지연(2)을 나타낼 수 있다. As shown in Equation 1 below, a time delay (1) for a sound source located within a critical distance is represented by using a microphone sensor (mic _1, mic ₂ , mic ₃ ), and a time delay (2) for a sound source located above the threshold distance (2 ) Can be represented.

[수학식 1][Equation 1]

여기서 τ₁₂는 mic₁과 mic₂에서 수음되는 시간차, d₁은 음원 위치와 mic₁과의 거리에서 음원 위치와 mic₂과의 거리를 제외한 길이, τ₂₃는 mic₂과 mic₃에서 수음되는 시간차, d₂은 음원 위치에서 mic₂과의 거리에서 음원 위치와 mic₃과를 거리를 제외한 길이 그리고 c는 소리 신호의 속도를 나타낸다. Wherein τ ₁₂ is the time difference to be received sound from the mic ₁ and the time difference to be received sound from the mic _2, length, τ ₂₃ is mic ₂ and mic ₃ d ₁ is in the distance to the sound source position and a mic _1, except the distance to the sound source position and mic ₂ , d ₂ is the distance from the sound source position to mic ₂ , excluding the distance from the sound source position and mic ₃ , and c is the speed of the sound signal.

그리고 d는 음원과 각 마이크 센서간의 원거리,l은 마이크 센서간의 거리를 나타낸다. And d is the distance between the sound source and each microphone sensor, l is the distance between the microphone sensors.

이와 같이, 사면체 마이크(10)의 다수의 마이크 센서를 통해 입력되는 동일 소리에 대한 시간차에 기초하여 음원 출력 제어 장치(100)는 실시간으로 수음되는 외부 소음의 위치가 임계 거리 이내인지 임계 거리 이상인지를 분류할 수 있다. In this way, based on the time difference for the same sound input through the plurality of microphone sensors of the tetrahedral microphone 10, the sound source output control device 100 determines whether the location of the external noise received in real time is within a threshold distance or greater than the threshold distance. Can be classified.

또한, 음원 출력 제어 장치(100)는 마이크(10)와 음성 또는 차량의 주행 소리가 발생하는 위치간의 다양한 거리에 따른 다양한 데이터 셋을 확보하여 인공 신경망을 학습하는 데 이용할 수 있다.In addition, the sound source output control apparatus 100 may secure various data sets according to various distances between the microphone 10 and the location where the voice or the driving sound of the vehicle is generated, and use it to learn an artificial neural network.

한편, 음원 출력 제어 장치(100)는 마이크(10)의 형태가 사면체 마이크가 아닌 다면체 마이크로 구현되거나 각각의 마이크가 일정한 거리를 가지며 복수개의 형태로 구현될 수 있다. On the other hand, the sound source output control apparatus 100 may be implemented in a form of a polyhedral microphone, not a tetrahedral microphone, or a plurality of forms with each microphone having a predetermined distance.

그리고 이러한 다면체 마이크가 아닌 경우에는 거리 특성에 대한 소리 신호에 기초하여 임계거리 이내의 소리만을 인공 신경망을 학습하는 데 이용할 수 있다.In addition, in the case of a non-polyhedral microphone, only sounds within a critical distance may be used to learn an artificial neural network based on a sound signal for a distance characteristic.

도 4의 (a)는 거리에 따른 소리 분산 특성을 나타내는 그래프이고, (b)는 1m 거리의 사람 소리(1m PN), 6m 거리의 사람 소리(6m PN)를 나타낸다. 4A is a graph showing sound dispersion characteristics according to distance, and (b) shows human sounds at a distance of 1m (1m PN) and human sounds at a distance of 6m (6m PN).

음원 소리 신호 S₀(t')가 발생하였을 때 t 만큼 전파되었을 때 소리 신호는 S(t)로, 거리에 따른 분산 특성은 다음 수학식 2와 같다. When the sound source sound signal S ₀ (t') is generated and propagated by t, the sound signal is S(t), and the dispersion characteristic according to the distance is as shown in Equation 2 below.

[수학식 2][Equation 2]

여기서, g_d는 소리 신호 S(t)의 거리에 따른 분산 특성을 나타내며, 이러한 분산 특성은 주파수 도메인 상에서 G_d(w)로 나타나며, 음원 소리 신호 대 t시간 전파된 소리 신호의 비로 나타낼 수 있다. Here, g _d represents the dispersion characteristic according to the distance of the sound signal S(t), and this dispersion characteristic is expressed as G _d (w) in the frequency domain, and can be expressed as the ratio of the sound signal of the sound source to the sound signal propagated for t time. .

다시 말해, So(w)는 주파수 도메인상에서의 음원 소리 신호, S(W)는 주파수 도메인 상에서의 소리 신호(S(t))를 의미한다. In other words, So(w) denotes a sound source sound signal in the frequency domain, and S(W) denotes a sound signal S(t) in the frequency domain.

음원 출력 제어 장치(100)는 다양한 위치에서 발생하는 음원을 데이터 셋으로 확보하면 이러한 소리 신호의 거리 특성을 고려하여 인공신경망을 학습시키도록 할 수 있다. The sound source output control apparatus 100 may train the artificial neural network in consideration of distance characteristics of such sound signals when sound sources generated at various locations are secured as a data set.

다시 말해 음원 출력 장치(100)는 임계 거리에서 발생하는 소리에 대한 분산 특성과 임계 거리 이외에서 발생하는 소리에 대한 분산 특성의 차이에 기초하여 임계 거리 내의 신호만을 선별하여 인공신경망을 학습시킬 수 있다. In other words, the sound source output device 100 may train an artificial neural network by selecting only signals within a critical distance based on a difference between a dispersion characteristic for sound generated at a critical distance and a dispersion characteristic for sound generated outside the critical distance. .

이하에서는 도 4 및 도 5를 이용하여 임계거리에 위치하는 사람의 음성 또는 차량의 주행 소리를 판별하는 인공신경망을 학습시키고, 학습된 인공 신경망을 이용하여 판별된 결과에 따라 음원 출력을 제어하는 음원 출력 제어 장치에 대해서 상세하게 설명한다. Hereinafter, using Figs. 4 and 5, an artificial neural network that determines the voice of a person located at a critical distance or the driving sound of a vehicle is trained, and a sound source that controls the sound source output according to the determined result using the learned artificial neural network. The output control device will be described in detail.

도 4은 본 발명의 실시예에 따른 음원 출력 제어 장치를 나타낸 구성도이다. 4 is a block diagram showing a sound source output control apparatus according to an embodiment of the present invention.

도 4에 도시한 바와 같이, 음원 출력 제어 장치(100)는 수집부(110), 전처리부(120), 학습부(130), 인식부(140), 그리고 제어부(150)를 포함한다. As shown in FIG. 4, the sound source output control apparatus 100 includes a collection unit 110, a preprocessor 120, a learning unit 130, a recognition unit 140, and a control unit 150.

설명을 위해, 수집부(110), 전처리부(120), 학습부(130), 인식부(140), 그리고 제어부(150)로 명명하여 부르나, 이들은 적어도 하나의 프로세서에 의해 동작하는 컴퓨팅 장치이다. 여기서, 수집부(110), 전처리부(120), 학습부(130), 인식부(140), 그리고 제어부(150)는 하나의 컴퓨팅 장치에 구현되거나, 별도의 컴퓨팅 장치에 분산 구현될 수 있다. 별도의 컴퓨팅 장치에 분산 구현된 경우, 수집부(110), 전처리부(120), 학습부(130), 인식부(140), 그리고 제어부(150)는 통신 인터페이스를 통해 서로 통신할 수 있다. 컴퓨팅 장치는 본 발명을 수행하도록 작성된 소프트웨어 프로그램을 실행할 수 있는 장치이면 충분하다. For the sake of explanation, the collection unit 110, the preprocessor 120, the learning unit 130, the recognition unit 140, and the control unit 150 are referred to as names, but these are computing devices operated by at least one processor. . Here, the collection unit 110, the preprocessor 120, the learning unit 130, the recognition unit 140, and the control unit 150 may be implemented in one computing device or distributedly implemented in a separate computing device. . When distributed in a separate computing device, the collection unit 110, the preprocessor 120, the learning unit 130, the recognition unit 140, and the control unit 150 may communicate with each other through a communication interface. It suffices that the computing device is a device capable of executing a software program written to carry out the present invention.

수집부(110)는 연동되는 하나 이상의 마이크 센서를 통해 외부 소음을 실시간으로 수집하여 디지털 신호로 전환한다. The collection unit 110 collects external noise in real time through one or more interlocked microphone sensors and converts it into a digital signal.

그리고 전처리부(120)는 미리 설정된 시간동안 전환된 디지털 신호들을 기초로 평균 크기를 산출한다. In addition, the preprocessor 120 calculates an average size based on the digital signals converted for a preset time.

여기서 평균 크기가 의미하는 바는 일반적으로 발생되는 소음의 크기에 대한 것으로 사람의 음성 또는 차량 소리 등 특정 소리로 판별하기 어려운 값을 나타낸다. Here, the meaning of the average size refers to the amount of noise that is generally generated, and indicates a value that is difficult to discriminate as a specific sound such as a human voice or a vehicle sound.

그러므로 전처리부(120)는 디지털 신호들의 평균 크기 값에 임계치를 더한 값을 역치로 설정할 수 있다. 전처리부(120)는 특정 주기마다, 설정된 시간 간격마다 또는 역치보다 큰 디지털 신호들이 지속적으로 검출되지만 사람의 음성 또는 자동차의 주행 소리로 인식되지 않는 경우가 N번(N은 자연수) 반복되면 역치를 재설정할 수 있다. Therefore, the preprocessor 120 may set a value obtained by adding the threshold value to the average magnitude value of the digital signals as the threshold value. The preprocessing unit 120 continuously detects digital signals greater than the threshold value at a specific period, at a set time interval, but the threshold value is repeated N times (N is a natural number) when it is not recognized as a human voice or a driving sound of a vehicle. Can be reset.

이에 전처리부(120)는 역치 값이 설정된 이후에 실시간으로 전환된 디지털 신호가 역치보다 작으면 해당 디지털 신호를 무시하고, 역치보다 크면 해당 디지털 신호를 분석하기 위해 전달한다. 예를 들어 전처리부(120)는 해당 디지털 신호가 데이터 셋(data set)으로 수음된 경우, 학습부(130)로 전달하고 실제 데이터인 경우 인식부(140)로 전달할 수 있다. Accordingly, if the digital signal converted in real time after the threshold value is set is less than the threshold value, the preprocessor 120 ignores the digital signal, and if it is greater than the threshold value, the digital signal is transmitted to analyze the digital signal. For example, the preprocessor 120 may transmit the digital signal to the learning unit 130 when the corresponding digital signal is received as a data set, and may transmit the actual data to the recognition unit 140.

그리고 전처리부(120)는 연동되는 소리 분석 모델의 입력 형식에 맞춰 해당 디지털 신호에 대한 복수의 구간으로 나누거나 복수의 행렬 형태로 변환하는 등의 전처리 작업을 수행할 수 있다. In addition, the preprocessor 120 may perform a pre-processing operation such as dividing the digital signal into a plurality of sections or converting it into a plurality of matrix forms according to an input format of an interlocked sound analysis model.

다시 말해 전처리부(120)는 연동되는 소리 분석 모델에 입력하기 위해 해당 디지털 신호에 대해 MFCC(Mel-Frequency Cepstral Coefficient)를 통해 복수개의 특성값을 추출할 수 있다. 그리고 추출된 특성 값들은 사람 음성을 인식하기 위한 소리 분석 모델에 입력된다. 이러한 특성값 추출은 소리 분석 모델의 종류에 의해 설정 및 변경 가능하다. In other words, the preprocessor 120 may extract a plurality of characteristic values for a corresponding digital signal through a Mel-Frequency Cepstral Coefficient (MFCC) in order to input into an interlocked sound analysis model. Then, the extracted feature values are input to a sound analysis model for recognizing human speech. This feature value extraction can be set and changed according to the type of sound analysis model.

그리고 서로 다른 종류의 소리 분석 모델을 사용하는 경우에 전처리부(120)는 각 소리 분석 모델의 종류마다 각각의 입력 형식으로 해당 디지털 신호를 변환할 수 있다. In addition, when different types of sound analysis models are used, the preprocessor 120 may convert the corresponding digital signal into each input format for each type of sound analysis model.

학습부(130)는 복수개의 데이터 셋를 기초로 하나 이상의 소리 분석 모델을 학습시킨다. 여기서 데이터 셋은 소리 분석 모델을 학습시키기 위한 일종의 학습 데이터를 의미하며, 수집부(110) 그리고 전처리부(120)와 동일한 과정을 통해 수집된 디지털 신호에 해당 신호가 의미하는 사람의 음성 또는 차량의 주행 소리에 대해 태깅하여 확보할 수 있다.The learning unit 130 trains one or more sound analysis models based on a plurality of data sets. Here, the data set refers to a kind of learning data for learning the sound analysis model, and the digital signal collected through the same process as the collection unit 110 and the preprocessor 120 is It can be secured by tagging the driving sound.

소리 분석 모델은 하나의 인공지능 모델로 인공 신경망(artificial neural network)을 의미하지만 반드시 한정하는 것은 아니고 선형 회귀 (linear regression), 로지스틱 회귀 (logistic regression), 결정 트리 (decision tree), 서포트 벡터 머신 (support vector machine), 등으로 구현될 수 있고, 또는 강화 학습 모델, 콘볼루션 신경망(CNN), 심층 순환 신경망(deep RNN : deep Recurrent Neural Network), 심층 오토인코더(deep autoencoder) 구현될 수 있으며, 어느 하나로 한정되는 것은 아니다.(이하에서는 소리 분석모델과 인공지능 모델을 혼용하여 설명한다)The sound analysis model is an artificial intelligence model, which means an artificial neural network, but it is not necessarily limited. Linear regression, logistic regression, decision tree, support vector machine ( support vector machine), or the like, or a reinforcement learning model, a convolutional neural network (CNN), a deep recurrent neural network (RNN), a deep autoencoder, etc. It is not limited to one (hereinafter, a sound analysis model and an artificial intelligence model will be mixed and described).

학습부(130)는 수집된 디지털 신호로부터 사람의 음성을 판단하는 인공 신경망 또는 디지털 신호로부터 차량의 주행 소리를 판단하는 인공 신경망을 각각 학습시킬 수 있다. 다시 말해, 사람의 음성을 판단하는 제1 소리 분석 모델과 차량의 주행 소리를 판단하는 제2 소리 분석 모델에 대해서 각각 학습시킬 수 있다. The learning unit 130 may train an artificial neural network that determines a human voice from the collected digital signal or an artificial neural network that determines a driving sound of a vehicle from the digital signal. In other words, a first sound analysis model for determining a person's voice and a second sound analysis model for determining a driving sound of a vehicle may be trained, respectively.

이러한 학습부(130)는 별도의 컴퓨팅 장치로 구현될 수 있고, 연동되는 데이터베이스에 학습이 완료된 하나 이상의 인공 신경망을 저장하거나 일정 주기에 따라 해당 인공 신경망을 재학습할 수 있다. The learning unit 130 may be implemented as a separate computing device, and may store one or more artificial neural networks for which training has been completed in an interworking database or retrain the artificial neural network according to a predetermined period.

예를 들어, 학습부(130)가 별도의 컴퓨팅 장치에서 구현되는 경우, 유선 또는 무선 네트워크를 통해 음원 출력 제어 장치(100)에 학습이 완료된 인공 신경망을 저장하고, 일정 주기 또는 실시간으로 재학습된 인공 신경망에 대한 내부 가중치 값을 업데이트할 수 있다. For example, when the learning unit 130 is implemented in a separate computing device, the artificial neural network that has been learned is stored in the sound source output control device 100 through a wired or wireless network, and retrained in a predetermined period or in real time. The internal weight value for the artificial neural network can be updated.

또한 학습부(130)는 소리가 발생한 위치와의 거리에 기초하여 복수개의 마이크 센서로부터 수집되는 음원의 시간 지연 특성을 고려하여 제1 소리 분석 모델과 제2 소리 분석 모델을 학습시킬 수 있다. In addition, the learning unit 130 may train the first sound analysis model and the second sound analysis model in consideration of a time delay characteristic of sound sources collected from a plurality of microphone sensors based on a distance to a location where sound is generated.

다시 말해 학습부(130)는 임계 거리 이내의 위치에서 발생하는 소리만을 분류하여 해당 소리가 사람의 음성인지 또는 차량의 주행 소리인지를 판별하도록 소리 분석 모델을 학습시킨다. In other words, the learning unit 130 trains the sound analysis model to determine whether the sound is a human voice or a vehicle driving sound by classifying only the sound generated at a location within a critical distance.

이에 학습부(130)는 모델 학습 과정에서 원하는 임계 거리 이내의 음성 또는 차량의 주행 소리만 감지하도록 하면서, 임계 거리 밖에서 발생한 소리라고 분류되는 경우 해당 소리가 음성 또는 차량의 주행 소리라 할지라도 노이즈로 분류되도록 학습시킬 수 있다. Accordingly, the learning unit 130 detects only the voice within the desired threshold distance or the driving sound of the vehicle in the model learning process, and if it is classified as a sound generated outside the threshold distance, the corresponding sound is converted to noise even if it is a voice or driving sound of the vehicle. Can be taught to be classified.

앞서 도 2 및 3을 통해 설명한 바와 같이, 학습부(130)는 발생되는 음원과의 거리 정보를 포함하여 해당 음원이 임계 거리 이내에서 발생한 사람의 음성 또는 차량의 주행 소리여부를 판별하도록 학습시킨다. As described above with reference to FIGS. 2 and 3, the learning unit 130 learns to determine whether the sound source is a human voice generated within a threshold distance or a driving sound of a vehicle, including distance information from the generated sound source.

인식부(140)는 전처리부(120)에서 전달받은 실제 외부 소음에 대한 디지털 신호를 학습된 인공 신경망에 적용하여 해당 디지털 신호가 사람의 음성 또는 차량의 주행 소리 여부를 판별한다. The recognition unit 140 applies the digital signal for the actual external noise received from the preprocessor 120 to the learned artificial neural network to determine whether the digital signal is a human voice or a driving sound of a vehicle.

인식부(140)는 하나 이상의 학습이 완료된 인공 신경망을 이용하는 경우, 일정한 기준에 의해 순차적으로 특정 인공 신경망을 통해 차량의 주행 소리 여부인지 먼저 판단하여 차량의 주행 소리가 아닌 경우 사람의 음성에 해당하는 지 판별할 수 있다. When using an artificial neural network for which one or more learning has been completed, the recognition unit 140 first determines whether the driving sound of the vehicle is sequentially based on a certain criterion through a specific artificial neural network, and if it is not the driving sound of the vehicle, it corresponds to the human voice. Can be determined.

예를 들어, 사용자의 위험 상황인지에 대해 우선순위를 두는 경우, 차량의 주행 소리인지를 먼저 판별한 후, 사람의 음성에 대해서 판별할 수 있다. 이러한 인공 신경망을 선택하여 순차적으로 판별하는 일정한 기준은 추후에 용이하게 변경 가능하다. For example, when prioritizing whether a user is in a dangerous situation, it is possible to first determine whether it is a driving sound of a vehicle, and then determine a human voice. Certain criteria for sequentially determining such an artificial neural network can be easily changed later.

인식부(140)는 학습된 인공 신경망(소리 분석 모델)을 통해 차량의 주행 소리가 사용자 위치로 저속으로 이동하는 소리를 판별하거나 차량의 주행 소리가 사용자의 근방에서 급정지 또는 브레이크 소리 등을 판별하거나 차량의 주행 소리 이외에 클락션 소리 등등 에 대한 다양한 소리의 판별이 가능하다. 이에 차량의 주행 소리에 대한 판별은 각각 독립적인 인공신경망을 이용하거나 하나의 인공신경망을 통해 확인할 수 있다. The recognition unit 140 determines the sound of the driving sound of the vehicle moving at a low speed to the user's position through the learned artificial neural network (sound analysis model), or the driving sound of the vehicle determines the sound of a sudden stop or brake in the vicinity of the user, or In addition to the driving sound of the vehicle, various sounds such as clock sound can be discriminated. Accordingly, the discrimination of the driving sound of the vehicle can be confirmed through each independent artificial neural network or through a single artificial neural network.

또는 인식부(140)는 복수개의 소리 분석 모델에 동시에 입력하여 각각의 소리 분석 모델으로부터 획득한 결과 데이터를 기초로 해당 디지털 신호가 의미하는 소리가 무엇인지 인식할 수 있다. Alternatively, the recognition unit 140 may simultaneously input a plurality of sound analysis models and recognize what sound means the corresponding digital signal based on result data obtained from each sound analysis model.

예를 들어, 인식부(140)는 학습된 소리 분석 모델을 통해 해당 소음이 사람의 음성이거나 차량의 주행 소리인지에 대한 결과값을 획득하거나 해당 소음이 사람의 음성이거나 차량의 주행 소리이면 볼륨을 낮추라는 결과값을 획득할 수 있다. For example, the recognition unit 140 obtains a result value of whether the corresponding noise is a human voice or a driving sound of a vehicle through the learned sound analysis model, or if the corresponding noise is a human voice or a driving sound of a vehicle, the volume is adjusted. You can get the result of lowering.

제어부(150)는 인식부(140)의 인식 결과에 따라 제공되는 음원 콘텐츠의 볼륨을 조절하거나 일시 정지하도록 제어할 수 있다. The controller 150 may control the volume of the sound source content provided according to the recognition result of the recognition unit 140 to be adjusted or paused.

예를 들어 제어부(150)는 해당 디지털 신호가 사람의 음성이거나 차량의 주행 소리인 경우, 제공하는 음원 콘텐츠의 볼륨이 낮아지도록 제어하고 음원 콘텐츠와 사람의 음성 또는 차량의 주행 소리를 결합하여 함께 출력할 수 있다. For example, when the corresponding digital signal is a human voice or a driving sound of a vehicle, the control unit 150 controls the volume of the sound source content to be lowered, and combines the sound source content with the human voice or the driving sound of the vehicle and outputs it together. can do.

이때 사람의 음성 또는 차량의 주행 소리를 그대로 출력 가능하며, 음원 콘텐츠와 사람의 음성 또는 차량의 주행 소리를 구별하여 인식하도록 볼륨의 차이를 두고 출력할 수 있다. In this case, a human voice or a driving sound of a vehicle may be output as it is, and the sound source content may be output with a difference in volume so as to distinguish and recognize a sound source content and a human voice or a driving sound of the vehicle.

예를 들어 제어부(150)는 음원 콘텐츠보다 사람의 음성 또는 차량의 주행 소리의 볼륨을 더 크게 출력 가능하다. For example, the controller 150 may output a louder volume of a human voice or a driving sound of a vehicle than the sound source content.

또한 제어부(150)는 해당 외부 소음을 그대로 출력가능하거나 해당 사람의 음성 또는 차량의 주행 소리만을 선별하여 출력도 가능하다. In addition, the control unit 150 may output the corresponding external noise as it is, or select and output only the voice of the person or the driving sound of the vehicle.

그리고 제어부(150)는 사람의 음성과 차량의 주행 소리가 모두 판별되는 경우, 제공되는 음원 콘텐츠를 일시 정지하여 해당 사람의 음성과 차량의 주행 소리를 출력할 수 있다.In addition, when both the voice of the person and the driving sound of the vehicle are determined, the control unit 150 may temporarily stop the provided sound source content to output the voice of the person and the driving sound of the vehicle.

상세하게는 제어부(150)는 아래 수학식 3를 통해 음원 콘텐츠와 사람의 음성 또는 차량의 주행 소리의 조합(S)을 생성한다. In detail, the control unit 150 generates a combination (S) of sound source content and human voice or vehicle driving sound through Equation 3 below.

[수학식 3][Equation 3]

S = a * S₁ + (1-a)* S₂ S = a * S ₁ + (1-a)* S ₂

여기서, S₁은 재생되고 있는 음원 콘텐츠를 의미하고, S₂는 외부에서 수음된 소리를 나타낸다. 그리고 a는 0 에서 1 의 값을 가지는 값으로, S₂가 사람의 음성 또는 차량의 주행 소리인 경우, a는 1 보다 작은 값을 가지며, 그 이외의 경우에는 a=1을 가진다. 여기서 a는 사람의 음성 또는 차량의 주행 소리에 따라 미리 설정된 값으로 적용가능하며, 이러한 a 값의 설정은 추후에 용이하게 변경 및 설계 가능하다. Here, S ₁ denotes a sound source content being played, and S ₂ denotes a sound received from the outside. In addition, a is a value from 0 to 1, and when S ₂ is a voice of a person or a driving sound of a vehicle, a has a value less than 1, and in other cases, a=1. Here, a can be applied as a preset value according to a voice of a person or a driving sound of a vehicle, and the setting of the a value can be easily changed and designed later.

한편, 제어부(150)는 사람의 음성 또는 차량의 주행 소리가 아닌 경우, 단순 소음으로 추정할 수 있으며, 단순 소음으로 추정하는 경우 해당 소리를 삭제하면서 단순 소음에 대한 카운트하고, 해당 카운트를 누적한 횟수가 임계값과 일치하면 역치를 재설정하도록 할 수 있다. On the other hand, if it is not a human voice or a driving sound of a vehicle, the control unit 150 can estimate it as a simple noise. If it is estimated as a simple noise, the control unit 150 counts the simple noise while deleting the corresponding sound, and accumulates the corresponding count. If the number of times matches the threshold, you can reset the threshold.

이처럼 제어부(150)는 음원 콘텐츠와 사람의 음성 또는 차량의 주행 소리를 조합하여 자동으로 출력가능하며, 실시간으로 사람의 음성 또는 차량의 주행 소리가 인식되지 않는 경우, 해당 음원 콘텐츠의 볼륨을 조절 전으로 재설정가능하다. As described above, the control unit 150 can automatically output a combination of sound source content and human voice or vehicle driving sound, and if the human voice or vehicle driving sound is not recognized in real time, before adjusting the volume of the corresponding sound source content. Can be reset to

도 5는 본 발명의 실시예에 따른 소리 분석 모델을 나타낸 예시도이다.5 is an exemplary view showing a sound analysis model according to an embodiment of the present invention.

도 5에 도시한 바와 같이, 소리 분석 모델에 복수개의 데이터 셋을 이용하여 각각 사람의 음성 또는 차량의 주행 소리를 판별하도록 반복 학습시킨다. As shown in FIG. 5, by using a plurality of data sets in the sound analysis model, iterative learning is performed to determine the human voice or the driving sound of the vehicle, respectively.

이때, 음원 출력 제어 장치(100)는 다양한 사람들 목소리 및 사용자를 기준으로 임계 거리 안에서 서로 상이한 거리마다 사용자를 향해 발성한 음성에 대한 데이터들을 데이터 셋으로 확보하여 소리 분석 모델 A(제1 소리 분석 모델)를 반복 학습시킴으로써, 음성 데이터의 특성값을 분석하여 해당 음성 데이터가 사람의 음성을 하면 해당 볼륨이 낮아지도록 제어할수 있다. At this time, the sound source output control device 100 secures data about various people's voices and voices spoken toward the user at different distances within a critical distance based on the user as a data set, and the sound analysis model A (the first sound analysis model ) By repetitive learning, it is possible to analyze the characteristic value of the voice data and control the volume to be lowered when the corresponding voice data speaks to a person.

상세하게는 소리 분석 모델 A에 행렬 형태로 입력하고, 과거 시간을 고려하지 않는 원인 컨볼루션(Causal Convolution)과 효율적인 계산을 위한 분리 가능한 컨볼루션(Separable Convolution)을 동시에 수행하고, 게이트 활성화(Gated Activation, G.A.) 연산을 각각 독립적으로 거친다. 독립적으로 계산된 G.A. 결과값을 Concatenation (Concat)을 통해 하나의 배열로 합치는 과정을 반복한다. Specifically, input in the form of a matrix to the sound analysis model A, cause convolution that does not take into account the past time and separate convolution for efficient calculation are performed simultaneously, and gate activation , GA) operations are independently performed. Independently calculated G.A. The process of combining the result values into one array through concatenation (Concat) is repeated.

그리고 +로 표시된 구간에서 연결(Residual Connect)로 입력된 값들을 각 행렬 인덱스(index)별로 더해주는 연산을 수행하며, 마지막 단계에서는 로그 소프트맥스(Log Softmax)로 연산하여 해당 입력 데이터가 사람의 음성인지 잡음인지를 판단하고, 최종으로 볼륨은 낮추도록 하는 결과값을 획득할 수 있다. In the section marked with +, an operation is performed to add the values inputted through the Residual Connect for each matrix index, and in the last step, the input data is calculated by Log Softmax to determine whether the corresponding input data is human voice. It is possible to determine whether it is noise, and finally obtain a result value of lowering the volume.

한편, 음원 출력 제어 장치(100)는 다양한 차량의 주행 소리 및 사용자를 기준으로 임계 거리 안에서 서로 상이한 거리마다 사용자를 향해 주행하는 차량으로부터 수집한 데이터를 데이터 셋으로 확보하여 소리 분석 모델 B(제2 소리 분석 모델)를 반복 학습시킴으로써, 음성 데이터의 특성값을 분석하여 해당 음성 데이터가 차량의 주행 소리인지를 판별하여 최종으로 볼륨은 낮추도록 하는 결과값을 획득할 수 있다.On the other hand, the sound source output control apparatus 100 secures as a data set the driving sounds of various vehicles and the data collected from the vehicle driving toward the user at different distances within a critical distance based on the user, and the sound analysis model B (the second By repeatedly learning (a sound analysis model), a characteristic value of the voice data is analyzed to determine whether the corresponding voice data is a driving sound of a vehicle, and finally, a result value of lowering the volume may be obtained.

상세하게는 소리 분석 모델 B에 다수의 소리 샘플 포인트가 입력하여 반복 학습을 통해 차량의 주행 소리인지를 판별할 수 있다. In detail, a plurality of sound sample points are input to the sound analysis model B to determine whether the vehicle is driving sound through repeated learning.

이처럼 음원 출력 제어 장치(100)는 사람의 음성 또는 차량의 주행 소리에 각각에 최적화된 소리 분석 모델을 이용하여 판별할 수 있다. In this way, the sound source output control apparatus 100 may determine a human voice or a driving sound of a vehicle using a sound analysis model optimized for each.

한편, 소리 분석 모델 A 및 소리 분석 모델 B는 하나의 실시예로 반드시 해당 신경망을 이용하는 것으로 한정하는 것은 아니다. Meanwhile, the sound analysis model A and the sound analysis model B are one embodiment and are not necessarily limited to using the corresponding neural network.

도 6는 본 발명의 실시예에 따른 외부 소리에 따른 음원 출력을 제어하는 방법을 나타낸 순서도이다.6 is a flowchart illustrating a method of controlling a sound source output according to an external sound according to an embodiment of the present invention.

도 6에 도시한 바와 같이, 음원 출력 제어 장치(100)는 사람 음성 또는 차량의 주행 소리를 포함하는 학습데이터 수집한다(S110). As shown in FIG. 6, the sound source output control apparatus 100 collects learning data including a human voice or a driving sound of a vehicle (S110).

음원 출력 제어 장치(100)는 임계 거리 이내에서 발화되는 음성 또는 중심 영역을 향해 이동하는 차량의 주행 소리를 수집할 수 있다. The sound source output control apparatus 100 may collect a voice uttered within a threshold distance or a driving sound of a vehicle moving toward a central region.

예를 들어 일정한 지름을 가지는 영역 내에서 중심 영역을 향해 발화된 음성 또는 중심 영역을 향해 이동하는 차량의 주행 소리를 수집할 수 있다. For example, a voice spoken toward a central region or a driving sound of a vehicle moving toward the central region may be collected within a region having a constant diameter.

음원 출력 제어 장치(100)는 사면체 마이크를 이용하여 각 면에 장착된 마이크 센서로 수음되는 소리의 지연 시간차를 이용하거나 음원과의 거리 특성에 따른 소리 신호의 분산 특성을 이용하여 임계 거리 이내의 음성 또는 차량의 주행 소리를 분류할 수 있다. The sound source output control apparatus 100 uses a delay time difference of sound received by a microphone sensor mounted on each side by using a tetrahedral microphone, or by using a dispersion characteristic of a sound signal according to a distance characteristic from the sound source to a voice within a critical distance. Or, it is possible to classify the driving sound of the vehicle.

다음으로 음원 출력 제어 장치(100)는 수집한 데이터 셋에 기초하여 하나 이상의 인공신경망을 학습한다(S120). Next, the sound source output control device 100 learns one or more artificial neural networks based on the collected data set (S120).

음원 출력 제어 장치(100)는 임계치 이상의 정확도로 입력된 데이터에 대해서 사람 음성 또는 차량의 주행 소리로 분류하도록 반복적인 학습을 수행한다.The sound source output control apparatus 100 performs repetitive learning to classify data input with an accuracy equal to or higher than a threshold value as a human voice or a driving sound of a vehicle.

한편, 상황에 따라 일반적인 사람 음성 또는 차량의 주행 소리 이외에 특정인에 대한 음성 또는 차량 이외의 자전거 또는 전기 스쿠터와 같은 다양한 외부 소리를 데이터 셋으로 수집하여 학습된 인공신경망을 생성할 수 있다. On the other hand, depending on the situation, it is possible to generate a learned artificial neural network by collecting voices for a specific person in addition to general human voices or driving sounds of a vehicle or various external sounds such as bicycles or electric scooters other than vehicles as a data set.

이처럼 사용자의 상황에 따라 가장 적합하도록 데이터 셋을 수집하고 인식할 수 있도록 인공 신경망을 학습시킬 수 있다. In this way, artificial neural networks can be trained to collect and recognize data sets that are most suitable according to the user's situation.

인공신경망이 학습된 이후에는 S110 단계와 S120 단계를 제외하고 바로 S130 단계에서 시작할 수 있다. After the artificial neural network is trained, it can be started immediately at step S130 except for steps S110 and S120.

다음으로 음원 출력 제어 장치(100)는 일정 시간 동안의 외부 소음의 평균 크기를 통해 역치를 산정한다(S130). Next, the sound source output control apparatus 100 calculates a threshold value through the average magnitude of external noise for a predetermined time (S130).

음원 출력 제어 장치(100)는 수집된 외부 소음에 대해 산출한 평균 소음에 대해서 추정되는 오차 범위를 더하여 역치를 산정할 수 있다. 여기서 오차 범위를 추후에 용이하게 변경 및 설계 가능하다. The sound source output control apparatus 100 may calculate a threshold value by adding an estimated error range to the average noise calculated for the collected external noise. Here, the error range can be easily changed and designed later.

그리고 음원 출력 제어 장치(100) 음원 출력 장치의 외부 소음을 실시간으로 수집한다(S140). 음원 출력 제어 장치(100)는 연동되는 마이크를 통해 실시간으로 외부 소음을 수집하여 디지털 신호로 변환한다. Then, the sound source output control device 100 collects the external noise of the sound source output device in real time (S140). The sound source output control device 100 collects external noise in real time through an interlocked microphone and converts it into a digital signal.

그리고 음원 출력 제어 장치(100)는 외부 소음의 크기가 역치보다 큰 값을 가지는 지 확인한다(S150). Then, the sound source output control apparatus 100 checks whether the magnitude of the external noise has a value greater than a threshold value (S150).

이때, 음원 출력 제어 장치(100)는 외부 소음의 크기가 역치보다 작으면 판별할 수 없는 소음으로 추정하여 무시하고 S140 단계 또는 S130 단계로 회귀한다. At this time, the sound source output control apparatus 100 estimates that the external noise is less than the threshold value, ignores it, and returns to step S140 or S130.

외부 소음의 크기가 역치보다 큰 값을 가지면, 음원 출력 제어 장치(100)는 학습된 인공신경망을 통해 외부 소음에서 사람 음성 또는 차량의 주행 소리를 인식한다(S160). When the amount of the external noise has a value greater than the threshold, the sound source output control apparatus 100 recognizes a human voice or a driving sound of a vehicle from the external noise through the learned artificial neural network (S160).

음원 출력 제어 장치(100)는 학습된 인공 신경망을 통해 외부 소음이 사용자를 향해 발화하는 사람의 음성 여부, 사용자를 향해 다가오는 차량의 주행 소리 여부를 판별할 수 있다. The sound source output control apparatus 100 may determine whether an external noise is a voice of a person uttering toward the user or a driving sound of a vehicle approaching the user through the learned artificial neural network.

이때, 음원 출력 제어 장치(100)는 사람의 음성 또는 차량의 주행 소리가 모두 아닌 경우, 해당 외부 소음을 무시하고, S140 단계 또는 S130 단계로 회귀할 수 있다. In this case, when the sound source output control device 100 is not all of a human voice or a driving sound of a vehicle, the corresponding external noise may be ignored and the process returns to step S140 or S130.

사람의 음성 또는 차량의 주행 소리를 인식하면, 음원 출력 제어 장치(100)는 인식된 사람 음성 또는 차량 주행 소리를 음원 출력 장치를 통해 출력한다(S170). When a human voice or vehicle driving sound is recognized, the sound source output control apparatus 100 outputs the recognized human voice or vehicle driving sound through the sound source output device (S170).

음원 출력 제어 장치(100)는 사용자가 착용한 음원 출력 장치로 해당 사람 음성 또는 차량 주행 소리를 출력한다. 이때, 음원 출력 장치에서 음원 콘텐츠가 출력되고 있는 경우에는 음원 출력 제어 장치(100)는 음원 콘텐츠의 볼륨을 낮추도록 제어하면서 해당 사람 음성 또는 차량 주행 소리를 출력할 수 있다. The sound source output control device 100 is a sound source output device worn by a user and outputs a corresponding person's voice or a vehicle driving sound. In this case, when the sound source content is being output from the sound source output device, the sound source output control device 100 may control to lower the volume of the sound source content and output a corresponding person's voice or vehicle driving sound.

그리고 음원 출력 제어 장치(100)는 사용자가 해당 음원 출력 장치의 사용을 종료하거나 사용자의 신체에서 탈착한 경우에는 자동으로 종료된다. In addition, the sound source output control device 100 is automatically terminated when the user ends the use of the sound source output device or is detached from the user's body.

도7은 본 발명의 실시예에 따른 컴퓨팅 장치의 하드웨어 구성도이다.7 is a hardware configuration diagram of a computing device according to an embodiment of the present invention.

도 7에 도시한 바와 같이, 컴퓨팅 장치(200)의 하드웨어는 적어도 하나의 프로세서(210), 메모리(220), 스토리지(230), 통신 인터페이스(240)를 포함할 수 있고, 버스를 통해 연결될 수 있다. 이외에도 입력 장치 및 출력 장치 등의 하드웨어가 포함될 수 있다. 컴퓨팅 장치(200)는 프로그램을 구동할 수 있는 운영 체제를 비롯한 각종 소프트웨어가 탑재될 수 있다.As shown in FIG. 7, the hardware of the computing device 200 may include at least one processor 210, a memory 220, a storage 230, and a communication interface 240, and may be connected through a bus. have. In addition, hardware such as an input device and an output device may be included. The computing device 200 may be equipped with various software including an operating system capable of driving a program.

프로세서(210)는 컴퓨팅 장치(200)의 동작을 제어하는 장치로서, 프로그램에 포함된 명령들을 처리하는 다양한 형태의 프로세서(210)일 수 있고, 예를 들면, CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 등 일 수 있다. 메모리(220)는 본 발명의 동작을 실행하도록 기술된 명령들이 프로세서(210)에 의해 처리되도록 해당 프로그램을 로드한다. 메모리(220)는 예를 들면, ROM(read only memory), RAM(random access memory) 등 일 수 있다. 스토리지(230)는 본 발명의 동작을 실행하는데 요구되는 각종 데이터, 프로그램 등을 저장한다. 통신 인터페이스(240)는 유/무선 통신 모듈일 수 있다.The processor 210 is a device that controls the operation of the computing device 200 and may be various types of processors 210 that process instructions included in a program. For example, a CPU (Central Processing Unit) or an MPU ( Micro Processor Unit), MCU (Micro Controller Unit), GPU (Graphic Processing Unit), and the like. The memory 220 loads a corresponding program so that instructions described to perform the operation of the present invention are processed by the processor 210. The memory 220 may be, for example, read only memory (ROM), random access memory (RAM), or the like. The storage 230 stores various types of data, programs, etc. required to execute the operation of the present invention. The communication interface 240 may be a wired/wireless communication module.

실시예에 따르면, 음원 콘텐츠를 이용하는 사용자의 근거리에서 발생하는 사람의 음성이나 근거리로 다가오는 차량의 주행 소리를 인식하여 제공함으로써, 주변 상황의 변화를 쉽게 인지하여 발생할 수 있는 사고를 예방할 수 있다. According to an embodiment, by recognizing and providing a voice of a person occurring at a close distance of a user using sound source content or a driving sound of a vehicle approaching at a close distance, it is possible to prevent accidents that may occur by easily recognizing changes in surrounding conditions.

또한, 실시예에 따르면 사용자가 대화를 하거나 어떤 상황에 주의를 기울이기 위해 이용하고 있는 음원 콘텐츠를 제어하지 않아도 의사 소통이 가능하다. In addition, according to the embodiment, communication is possible without controlling the sound source content used by the user to have a conversation or pay attention to a certain situation.

또한, 외부 소음에 대해 MFCC를 이용하여 수십개의 계수에 해당하는 소리의 특성값들을 추출하고, 해당 특성값들을 인공신경망에 입력하기 때문에 상대적으로 적은 인공신경망의 계산양으로도 정확하게 사람의 음성 또는 차량의 주행 소리를 판별할 수 있다. In addition, since the characteristic values of sound corresponding to dozens of coefficients are extracted using MFCC for external noise and the corresponding characteristic values are input to the artificial neural network, the human voice or vehicle can be accurately calculated with a relatively small amount of artificial neural network calculation. Can determine the driving sound of

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only through an apparatus and a method, but may be implemented through a program that realizes a function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

A method of operating a sound source output control device operated by at least one processor,
Collecting external noise for a certain period of time through one or more interlocked microphones, calculating the average size of the collected external noise to calculate a threshold,
If the external noise collected in real time has a value greater than the threshold, preprocessing the external noise according to the input of the learned sound analysis model,
Inputting the pre-processed external noise into the learned sound analysis model to determine whether the sound generated within a critical distance is a human voice or a driving sound of a vehicle, and
If the external noise is determined as the voice of the person or the driving sound of the vehicle, outputting the external noise to a sound source output device interlocking with the external noise,
Including,
When the sound source content is being output from the sound source output device, the method of outputting a combination of the sound source content and the voice of the person or the driving sound of the vehicle.

In claim 1,
Collecting voices of a person igniting at various locations and driving sounds of a driving vehicle,
Constructing a data set by selecting only the sound generated within a critical distance based on the distance characteristic of the sound signal from among the human voice or the driving sound of the vehicle, and
Learning a first sound analysis model for determining the voice of a person speaking within a critical distance using the data set and a second sound analysis model for determining a driving sound of a vehicle generated within the critical distance,
Operation method further comprising a.

In paragraph 2,
The microphone is a method of operation representing a polyhedral microphone having a microphone sensor attached to the center of each surface to receive sound for each microphone sensor.

In paragraph 3,
The learning step,
In the case of receiving external noise through a plurality of microphones, only sounds within a critical distance are classified based on the time difference reached for each microphone from the location of the generated sound, and human voice or vehicle for the classified sounds within the critical distance. How to learn to determine whether the driving sound of the.

In paragraph 2,
The learning step,
As shown in the following equation, when the sound source sound signal (S _o (t') is propagated by t time, based on the dispersion characteristic (g _d ) of the sound according to the distance of the sound signal (S(t)), An operating method of calculating sound dispersion characteristics G _d (W) and learning to select only sounds generated within the threshold distance based on the calculated sound dispersion characteristics.

Here, (S _o (t') means a value set as the signal value of the sound source sound signal, S _o (w) is the sound source sound signal in the frequency domain, and S(W) is the sound signal in the frequency domain (S( t)

In paragraph 2,
The step of calculating the threshold is
An operating method of converting the collected external noise into a digital signal, calculating the average size based on the converted digital signals, and calculating the threshold value by adding an error range for the calculated average size.

In paragraph 6,
The step of outputting to the sound source output device,
If an external sound volume value is set based on the recognized voice of the person or the driving sound of the vehicle, the operation method of controlling the volume of the sound source content to be lowered based on the external sound volume value.

A program executed by a computing device and stored in a computer-readable storage medium,
Collecting external noise for a certain period of time through one or more interlocked microphones, calculating the average size of the collected external noise to calculate a threshold,
If the external noise collected in real time has a value less than the threshold value, delete it, and if it has a value greater than the threshold value, preprocessing the external noise according to the input of the learned sound analysis model,
Inputting the pre-processed external noise into the learned sound analysis model to determine whether the sound generated within a critical distance is a human voice or a driving sound of a vehicle, and
If the external noise is determined as the voice of the person or the driving sound of the vehicle, outputting the external noise to a sound source output device interlocking with the external noise,
A program containing instructions to execute.

In clause 8,
Collecting voices of a person igniting at various locations and driving sounds of a driving vehicle,
Constructing a data set by selecting only the sound generated within a critical distance based on the distance characteristic of the sound signal from among the human voice or the driving sound of the vehicle, and
Learning a first sound analysis model for determining the voice of a person speaking within a critical distance using the data set and a second sound analysis model for determining a driving sound of a vehicle generated within the critical distance,
The program further comprises.

In claim 9,
A program that collects the external noise in real time using a polyhedral microphone that has a microphone sensor attached to the center of each surface to receive sound for each microphone sensor.

In claim 10,
The learning step,
A program that classifies only sounds within a critical distance based on the time difference reached for each microphone sensor from the location of the generated sound, and learns to determine whether the classified sound within the critical distance is a human voice or a vehicle driving sound .

In claim 9,
The learning step,
A program that learns to select only the sound generated within the threshold distance according to the dispersion characteristic in the frequency domain represented by the ratio of the sound signal of the sound source to the external noise to the sound signal propagated by t time based on the dispersion characteristic of sound according to distance .

In claim 9,
The step of outputting to the sound source output device,
When the sound source content is being output from the sound source output device, a program for outputting a combination of the sound source content and the voice of the person or the driving sound of a vehicle.

In claim 13,
The step of outputting to the sound source output device,
A program for determining the volume of the combined sound by controlling the volume of the sound source content to be lowered based on the external sound volume value when the external sound volume value is set based on the recognized human voice or the driving sound of the vehicle .