KR101090894B1

KR101090894B1 - Sound localization system and method in reverberant environments

Info

Publication number: KR101090894B1
Application number: KR1020100101924A
Authority: KR
Inventors: 최종석; 김문상; 이병기
Original assignee: 한국과학기술연구원
Priority date: 2010-10-19
Filing date: 2010-10-19
Publication date: 2011-12-08

Abstract

PURPOSE: A system for sensing the direction of a sound source at an echo environment and a method thereof are provided to reliably estimate the direction of a sound source by being applied to an echo environment using a camera. CONSTITUTION: A voice acquisition unit(10) changes a received voice signal into a digital signal. A sound source direction sensing part(12) senses the direction of the sound source of the received voice signal by processing the digital signal of the voice acquisition unit. An echo characteristic extracting part(14) obtains the direction data of an anticipation which is determined according to a power parameter and a time parameter using an anticipation effect among voice signals which are sensed from the sound source direction sensing part. The echo characteristic extracting part extracts the echo characteristic data by calculating correlation between the direction data of the anticipation and the sound source direction data. An echo verification and elimination part(16) determines the direction data of the sound source using the echo characteristic data extracted from the echo characteristic extracting part. The echo verification and elimination part eliminates the data which is determined as the echo direction data among the direction data of the sound source.

Description

Sound Localization System and Method in Reverberant Environments

본 발명은 반향 환경에서의 음원 방향 검지 시스템 및 방법에 관한 것이다.
The present invention relates to a sound source direction detection system and method in an echo environment.

음원 방향 검지 기술은 다수의 마이크로폰을 사용하여 음원의 방향을 찾아내는 기술이다. 음원과 마이크로폰들 간의 위치적 특성으로 인해 마이크로폰 간의 신호에 차이가 발생하고, 이 신호 차이를 분석하여 역으로 음원의 방향을 추정하게 된다. Sound source direction detection technology is a technique of finding the direction of the sound source using a plurality of microphones. Due to the positional characteristics between the sound source and the microphones, a difference occurs in the signal between the microphones, and the signal difference is analyzed to estimate the direction of the sound source.

그러나, 밀폐된 환경 등에서 음원 방향을 추정할 때에는 반향의 영향을 고려해야 한다. 밀폐된 공간의 벽면이나 음원 주변의 장애물은 반사파를 생성시킨다. 이를 '반향'이라고 하며, 반향의 세기는 반사면의 상태나 재질에 따라 달라지게 된다. 일반적인 생활 공간에서의 반향은 세기가 약하기 때문에 그 영향은 미미하지만, 대강당이나 긴 복도 같은 곳에서는 반향의 세기가 강하기 때문에, 음원 방향 추정 결과에 수많은 가짜 음원들을 포함하게 된다. However, when estimating the direction of the sound source in a closed environment or the like, the influence of the reflection must be considered. Obstacles around walls or sound sources in enclosed spaces generate reflected waves. This is called 'echo', and the intensity of the echo depends on the state or material of the reflecting surface. The reverberation in general living spaces is weak because the intensity is weak, but the intensity of the reverberation is strong in places such as a large auditorium or a long corridor, so that a large number of fake sound sources are included in the sound source direction estimation result.

인간의 경우도 동일한 문제점을 안고 있으나 이를 해결하기 위해 우리 인간은 '선행 효과(precedence effect)'라는 독특한 메커니즘을 갖추고 있다. 실제 음원에서 직접적으로 전파되는 소리는 최단 거리의 전파경로를 갖기 때문에, 어떠한 반사파들보다도 일찍 귀에 도착하게 된다. 따라서, 반향들은 아주 짧은 시차를 가지고 직접 전파되는 소리보다 뒤늦게 도착하게 되며, 선행 효과는 이 점을 이용한다. Humans have the same problem, but to solve this problem, we humans have a unique mechanism called 'precedence effect'. Since the sound propagated directly from the actual sound source has the shortest distance propagation path, it arrives at the ear before any reflected waves. Thus, the echoes arrive with a very short parallax later than the sound propagated directly, and the preceding effect takes advantage of this.

즉, 최초의 소리가 가진 음원 방향 정보는 믿을 수 있는 것으로 활용하지만, 그 이후에 도착한 소리가 가지는 음원 방향 정보는 믿을 수가 없기 때문에 억제시킨다. 이때, 억제되는 조건은 신호가 시간차가 약 40 ms 이하를 가지고 도착하고, 신호의 세기가 약 10dB 이하인 경우로 알려져 있다. That is, the sound source direction information of the first sound is used as a reliable one, but the sound source direction information of the sound arriving after that is unreliable and therefore suppressed. In this case, the suppressed condition is known as a case where the signal arrives with a time difference of about 40 ms or less and the signal strength is about 10 dB or less.

현재의 많은 반향 환경에서의 음원 방향 검지 알고리즘들이 선행 효과를 이용하고 있다. 하지만, 이 알고리즘들은 단순히 음성이 시작되는 온셋(onset) 구간을 찾아 그 구간에서만 음원 방향 검지를 수행하는 방식이 대부분이며, 이러한 방식의 경우 극히 짧은 구간에 대해서만 방향 정보를 추출하기 때문에 충분히 신뢰성 있는 방향 정보를 얻기가 힘든 경우가 많다는 단점이 있다.
In many current echo environments, sound source direction detection algorithms use prior effects. However, these algorithms usually find the onset section where the voice starts and perform the sound source direction detection only in that section. In this case, the direction information is extracted only for the extremely short section, so the direction is sufficiently reliable. The disadvantage is that information is often difficult to obtain.

본 발명은 상술한 종래 기술의 문제점을 해결하기 위한 것으로, 반향 환경 하에서도 음원의 방향을 효율적으로 추정하는 시스템 및 방법을 제공하는 것을 목적으로 한다.
SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and an object thereof is to provide a system and method for efficiently estimating the direction of a sound source even under an echo environment.

이를 위해 본 발명의 일실시예에 따른 음원 방향 검지 시스템은 음성 신호를 수신하고, 수신된 음성 신호를 디지털 신호화하는 음성 취득부와, 음성 취득부의 디지털 신호를 처리하여 수신된 음성 신호의 음원 방향 정보를 검지하는 음원 방향 검지부와, 음원 방향 검지부로부터 음성 신호 중 선행음 효과(precedence effect)를 이용하여 시간 파라미터 및 파워 파라미터에 따라 결정되는 선행음의 방향 정보를 획득하고, 선행음의 방향 정보와 음원 방향 정보와의 상관 관계(correlation)를 계산함으로써 반향 특징 정보를 추출하는 반향 특징 추출부와, 반향 특징 추불부에서 추출한 반향 특징 정보를 이용하여 음원 방향 정보가 반향의 정보인지 아닌지를 판단하고, 음원 방향 정보 중 반향의 방향 정보로 판단되는 정보를 제거하는 반향 검증 제거부를 포함할 수 있다.To this end, the sound source direction detection system according to an embodiment of the present invention receives a voice signal, the voice acquisition unit for digitalizing the received voice signal, and the sound source direction of the received voice signal by processing the digital signal of the voice acquisition unit Acquiring the direction information of the preceding sound determined according to the time parameter and the power parameter by using the sound source direction detecting unit for detecting the information and the precedence effect of the voice signal from the sound source direction detecting unit, and obtaining the direction information of the preceding sound and It is determined whether or not the sound source direction information is the echo information by calculating the correlation with the sound source direction information by using the echo feature extracting unit extracting the echo feature information and the echo feature information extracted from the echo feature adding unit. It may include an echo verification removal unit for removing the information determined as the direction information of the echo of the sound source direction information. have.

이때, 반향 특징 추출부는 아래의 수학식 1으로 정의되는 함수

에 해당하는

필터로 상기 선행음의 방향 정보를 획득하고,In this case, the echo feature extracting unit is a function defined by Equation 1 below.

Equivalent to

Acquiring the direction information of the preceding sound with a filter,

〔수학식 1〕[Equation 1]

n : 프레임 번호n: frame number

θ : 음원 방위각θ: sound source azimuth

: 시간 파라미터

: Time parameter

δ: 파워 파라미터δ: power parameter

μ_δ: 필터 게이트μ _δ : filter gate

Δp : n-1번째 프레임과 n번째 프레임에서의 파워 증분Δp: Power increment in n-1th frame and nth frame

s(n) : n번째 프레임에서의 음원 방향 검지 결과s (n): Sound source direction detection result in nth frame

R : n번째 프레임 및 음원 방위각 θ에서의 음원 방향 정보
R: Sound source direction information at nth frame and sound source azimuth θ

상기 반향 특징 정보는 아래의 수학식 2와 같이 정의되는 반향 특징 벡터

일 수 있다.The echo feature information is an echo feature vector defined as in Equation 2 below.

Can be.

〔수학식 2〕[Equation 2]

n : 프레임 번호n: frame number

θ : 음원 방위각
θ: sound source azimuth

또한, 본 발명의 일실시예에 따른 음원 방향 검지 시스템은 음원의 위치 정보를 이용하여 반향 검증 제거부를 실시간으로 학습시키기 위한 검증기 학습부를 더 포함할 수 있다.In addition, the sound source direction detection system according to an embodiment of the present invention may further include a verifier learning unit for learning in real time the echo verification removal unit using the position information of the sound source.

또한, 본 발명의 일실시예에 따른 음원 방향 검지 시스템은 영상 카메라와, 영상 카메라로부터 획득한 영상 정보로부터 얼굴의 위치 정보를 검지하는 얼굴 검지부를 더 포함하고, 검증기 학습부는 얼굴의 위치 정보를 음원의 위치 정보로 이용할 수 있다.In addition, the sound source direction detection system according to an embodiment of the present invention further comprises a video camera, a face detection unit for detecting the position information of the face from the image information obtained from the image camera, the verifier learning unit sound source information Can be used as location information.

본 발명의 일실시예에 따른 음원 방향 검지 방법은 음성 신호를 수신하는 단계와, 수신된 음성 신호의 음원 방향 정보를 검지하는 단계와, 수신된 음성 신호 중 선행음 효과(precedence effect)를 이용하여 시간 파라미터 및 파워 파라미터에 따라 결정되는 선행음의 방향 정보를 획득하는 단계와, 선행음의 방향 정보와 음원 방향 정보와의 상관 관계(correlation)를 계산함으로써 반향 특징 정보를 추출하는 단계와, 반향 특징 정보를 이용하여 음원 방향 정보가 반향의 방향 정보인지 아닌지를 판단하는 단계와, 음원 방향 정보 중 반향의 방향 정보로 판단되는 정보를 제거하는 단계를 포함한다. A sound source direction detecting method according to an embodiment of the present invention is a method for receiving a sound signal, detecting sound source direction information of the received voice signal, and using a precedence effect among the received voice signals. Acquiring direction information of the preceding sound determined according to the time parameter and the power parameter; extracting the echo feature information by calculating a correlation between the direction information of the preceding sound and the sound source direction information; And determining whether or not the sound source direction information is the direction information of the echo by using the information, and removing information determined as the direction information of the echo from the sound source direction information.

이때, 선행음의 방향 정보는 아래의 수학식 1과 같이 정의되는 함수

에 해당하는

필터로 획득할 수 있다.In this case, the direction information of the preceding sound is a function defined as in Equation 1 below.

Equivalent to

Can be obtained with a filter.

〔수학식 1〕[Equation 1]

n : 프레임 번호n: frame number

θ : 음원 방위각θ: sound source azimuth

: 시간 파라미터

: Time parameter

δ: 파워 파라미터δ: power parameter

μ_δ: 필터 게이트μ _δ : filter gate

또한, 상기 반향 특징 정보는 아래의 수학식 2와 같이 정의되는 반향 특징 벡터

일 수 있다.In addition, the echo feature information is an echo feature vector defined as in Equation 2 below.

Can be.

〔수학식 2〕[Equation 2]

n : 프레임 번호n: frame number

θ : 음원 방위각
θ: sound source azimuth

또한, 본 발명의 일실시예에 따른 음원 방향 검지 방법은 음원의 위치 정보를 검출하는 단계를 더 포함하고, 반향 특징 정보를 이용하여 음원 방향 정보가 반향의 방향 정보인지 아닌지를 판단하는 단계는 검출된 음원의 위치 정보를 이용하여 학습하는 단계를 포함할 수 있다.In addition, the sound source direction detection method according to an embodiment of the present invention further comprises the step of detecting the position information of the sound source, the step of determining whether the sound source direction information is the direction information of the echo using the echo feature information is detected It may include learning using the location information of the sound source.

이때, 음원의 위치 정보를 검출하는 단계는 영상 카메라로부터 영상 정보를 획득하는 단계와, 영상 정보로부터 얼굴의 위치 정보를 검출하는 단계를 포함할 수 있다.
In this case, detecting the position information of the sound source may include acquiring image information from the image camera, and detecting position information of the face from the image information.

본 발명에 따른 반향 환경에서의 음원 방형 검지 시스템 및 방법은 카메라를 이용하여 실시간으로 반향 환경에 적응할 수 있으며, 대강당이나 긴 복도 같은 반향이 심한 공간이나 벽면 근처의 반향 영향이 강한 위치에서도 음원 방향을 신뢰성 있게 추정할 수 있게 해 준다.
The sound source rectangular detection system and method in the echo environment according to the present invention can be adapted to the echo environment in real time using a camera, and the direction of the sound source may be adjusted even in a place with strong echoes near a wall or a space with high echoes such as a large auditorium or a long corridor. This allows a reliable estimation.

도 1은 본 발명의 일실시예에 따른 음원 방향 검지 시스템의 개략적인 구성도이다.
도 2는 본 발명의 일실시예에 따른 음원 방향 검지 방법을 설명하는 순서도이다.
도 3은 본 발명의 일실시예에 따른 음원 방향 검지 시스템 및 방법의 반향 특징 추출 결과의 예를 나타내는 도면이다.1 is a schematic configuration diagram of a sound source direction detection system according to an embodiment of the present invention.
2 is a flowchart illustrating a sound source direction detection method according to an embodiment of the present invention.
3 is a view showing an example of the echo feature extraction results of the sound source direction detection system and method according to an embodiment of the present invention.

이하에서는 첨부도면을 참조하여 본 발명에 대해 상세히 설명한다. 그러나, 첨부 도면 및 이하의 설명은 본 발명에 따른 음원 방향 검지 시스템 및 방법의 가능한 일실시예에 불과하며, 본 발명의 기술적 사상은 이 내용에 의해 제한되는 것은 아니다.
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. However, the accompanying drawings and the following descriptions are only possible embodiments of the sound source direction detection system and method according to the present invention, and the technical idea of the present invention is not limited by the contents.

도 1은 본 발명의 일실시예에 따른 음원 방향 검지 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of a sound source direction detection system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명에 따른 음원 방향 검지 시스템(1)는 음성 취득부(10)와, 음원 방향 검지부(12)와, 반향 특징 추출부(14)와, 반향 검증 제거부(16)와, 영상 카메라(22)와 얼굴 검출부(24)로 이루어지는 영상 처리부(20)와, 검증기 학습부(30)를 포함하여 구성된다.Referring to FIG. 1, the sound source direction detection system 1 according to the present invention includes a sound acquisition unit 10, a sound source direction detection unit 12, an echo feature extraction unit 14, and an echo verification removal unit 16. And an image processing unit 20 including an image camera 22 and a face detection unit 24, and a verifier learning unit 30.

음성 취득부(10)는 음성 신호를 수신하고, 수신된 음성 신호를 디지털 신호화하는 역할을 한다. 음성 취득부(10)는 보통 다수의 마이크로폰 어레이로 구성된다.
The voice acquisition unit 10 receives a voice signal and serves to digitally receive the received voice signal. The voice acquisition unit 10 usually consists of a plurality of microphone arrays.

음원 방향 검지부(12)는 음성 취득부(10)의 디지털 신호를 처리하여 음성 신호의 음원 방향 정보를 검지하는 역할을 한다. 이때, 음원 방향 검지부(12)는 현재 음원 추적 기술 분야에서 널리 이용되고 있는 다양한 방법을 이용하여 음원 방향 정보를 검지할 수 있다. 이 방법에는 신호의 세기를 측정하고 거리에 따른 경로 손실을 고려하여 음원의 위치를 추적하는 방법, 두 개 이상의 음원 수신부로의 도달각(AOA)을 이용하여 음원의 위치를 추적하는 방법, 공간 스펙트럼을 이용하여 음원의 위치를 추적하는 방법, 음원 전달 시간을 이용하여 음원의 위치를 추적하는 방법(예: TOA, TDOA 방법) 등이 있다.The sound source direction detection unit 12 processes the digital signal of the sound acquisition unit 10 to detect sound source direction information of the voice signal. In this case, the sound source direction detection unit 12 may detect the sound source direction information by using various methods that are widely used in the current sound source tracking technology. In this method, the signal strength is measured and the location of the sound source is tracked in consideration of the path loss according to the distance, the method of tracking the location of the sound source using the angle of arrival (AOA) to two or more sound source receivers, and the spatial spectrum. There is a method of tracking the location of the sound source using the method, a method of tracking the location of the sound source using the sound source transmission time (for example, TOA, TDOA method).

그러나, 이러한 여러 가지 방법을 이용하여 음원 방향 검지부(12)가 검지한 음원 방향 정보는 실제 음원의 방향 정보 뿐만 아니라 반향음의 방향 정보를 포함하게 된다. 따라서, 이러한 반향음의 방향 정보는 정확한 음원 방향 검지에 방해가 되므로 제거될 필요성이 있게 된다. 반향음의 방향 정보의 제거는 아래와 같은 구성 및 방법으로 수행될 수 있다. However, the sound source direction information detected by the sound source direction detecting unit 12 using these various methods includes not only the direction information of the actual sound source but also the direction information of the echo sound. Therefore, the direction information of the reflection sound is required to be removed because it interferes with the accurate sound source direction detection. Removal of the direction information of the echo may be performed by the following configuration and method.

반향 특징 추출부(14)는 음원 방향 검지부(12)로부터 음성 신호 중 선행음 효과(precedence effect)를 이용하여 시간 파라미터 및 파워 파라미터에 따라 결정되는 선행음의 방향 정보를 획득하고, 획득된 선행음의 방향 정보와 음원 방향 정보와의 상관 관계(correlation)를 계산함으로써 반향 특징 정보를 추출하는 역할을 한다.The echo feature extractor 14 obtains the direction information of the preceding sound determined according to the time parameter and the power parameter from the sound source direction detection unit 12 by using the precedence effect among the voice signals, and obtains the obtained preceding sound. It extracts the echo feature information by calculating the correlation between the direction information and the sound source direction information.

반향 특징 정보의 추출은 '선행음 효과(precedence effect)'를 고려하여 수행된다. 선행음 효과는 '하스 효과(Hass effect)'라고도 알려져 있으며, 인간이 어떻게 반향 환경에서 음원의 방향을 정확히 인지하는지를 설명한다. 즉, 반향음은 음파가 진행되어 오다가 벽면이나 기타 장애물에 부딪혀 반사되어 나오기 때문에, 직접 전파되는 음보다 전파 경로가 길어지고, 이 때문에 도달 지연 시간이 발생하게 된다. 따라서, 도달이 늦은 소리의 위치 정보를 억제시키면, 실제 음원의 정확한 방향 정보를 얻을 수 있게 된다. 이러한 선행음 효과를 구현하기 위하여 수학식 1과 같은

필터를 설계하였다. The extraction of the echo feature information is performed in consideration of the 'precedence effect'. The leading sound effect, also known as the 'Hass Effect', describes how humans perceive the direction of a sound source in an echo environment. In other words, the reflected sound is reflected by the sound waves proceed to hit the wall or other obstacles, so that the propagation path is longer than the sound propagated directly, which causes the arrival delay time. Therefore, if the positional information of the late arrival sound is suppressed, accurate direction information of the actual sound source can be obtained. In order to implement the preceding sound effect, Equation 1

The filter was designed.

n : 프레임 번호n: frame number

θ : 음원 방위각θ: sound source azimuth

: 시간 파라미터

: Time parameter

δ: 파워 파라미터δ: power parameter

μ_δ: 필터 게이트μ _δ : filter gate

필터는 시간 파라미터

와 파워 파라미터 δ 두 개의 파라미터를 가지며, 처음 도달하는 소리의 방향 검지 결과를 기억하는 일종의 단기 기억 장치 역할을 한다. 필터 게이트 μ_δ는 파워(power)의 변화량으로 처음 도달하는 소리인지 아닌지를 판단하여 필터에 R(n,θ) 값을 기억시키기도 하고 버리기도 한다. 이때, 파워 파라미터 δ는 필터 게이트가 온-오프 되는 기준치이다.

Filter is a time parameter

It has two parameters, and a power parameter δ, and serves as a kind of short-term storage device to store the direction detection result of the first arriving sound. The filter gate μ _δ determines whether or not the sound first arrives by the amount of change in power, and stores or discards the R (n, θ) value in the filter. In this case, the power parameter δ is a reference value at which the filter gate is turned on and off.

이때, 시간 파라미터

는 0과 1 사이의 값을 가지기 때문에 필터에 기억된 값들은 시간이 지남에 따라 감소하게 된다. 이에 따라서, 지나치게 오랫동안 선행음 효과가 지속되는 것을 방지한다. 즉, 파라미터

과 δ에 따라서, 도달하는 음성 신호의 일정 부분을 선행음으로 기억하게 되는 것이다. At this time, time parameter

Since is a value between 0 and 1, the values stored in the filter will decrease over time. This prevents the preceding sound effect from continuing for too long. That is, parameters

According to and δ, a portion of the arriving audio signal is stored as a preceding sound.

이러한, 필터는 하나로만 구성되지 않고, 다양하게 선택된

과 δ의 파라미터 값들을 갖는 필터군으로 사용하여, 다양한 반향 환경에 대응할 수 있도록 설계할 수 있다. 이제부터는 수학식 2를 참조하여 필터군을 기반으로 한 반향 특징 추출을 위한 방법을 설명한다.These filters are not composed of only one, but variously selected

By using a filter group having parameter values of and δ, it can be designed to cope with various echo environments. Hereinafter, a method for extracting echo features based on a filter group will be described with reference to Equation 2.

n : 프레임 번호n: frame number

θ : 음원 방위각
θ: sound source azimuth

이때, 반향 특징 벡터 {ζ}는

필터군의 필터 개수와 동일한 크기의 차수를 가지며, 각 요소의 값은 각

필터의 값과 현재 음원 방향 검지 결과를 나타내는 R(n,θ)와의 내적으로 주어진다. 즉,

필터가 기억하는 선행음의 방향 정보와 현재의 음원 방향 검지 결과가 얼마나 잘 맞아 떨어지는지를 계산하여 반향 특징 벡터를 추출하게 된다. 즉, 이는 선행음의 방향 정보와 현재 음원 방향 정보와의 상관 관계(correlation)를 계산하는 것이다. 이 상관 관계 값, 즉 내적의 결과값이 클수록 선행음 정보와 현재 음원 방향 정보가 유사하며, 내적의 결과값이 작을수록 선행음 정보와 현재 음원 방향 정보가 비유사함을 의미한다.
In this case, the echo feature vector {ζ}

Have the same order of magnitude as the number of filters in the filter group, and the value of each element

It is given internally with the value of the filter and R (n, θ) representing the current sound source direction detection result. In other words,

The echo feature vector is extracted by calculating how well the direction information of the preceding sound memorized by the filter matches the current sound source direction detection result. That is, this is to calculate the correlation between the direction information of the preceding sound and the current sound source direction information. The higher the correlation value, that is, the inner product result value, the preceding sound information and the current sound source direction information are similar, and the smaller the inner product result means that the preceding sound information and the current sound source direction information are dissimilar.

반향 검증 제거부(16)는 반향 특징 추출부(14)에서 추출한 반향 특징 정보를 이용하여 음원 방향 정보가 반향의 반향 정보인지 아닌지를 판단하고, 음원 방향 정보 중 반향의 방향 정보로 판단되는 정보를 제거하는 역할을 한다. The echo verification removal unit 16 determines whether the sound source direction information is the echo information of the echo using the echo feature information extracted by the echo feature extractor 14, and determines the information determined as the echo direction information among the sound source direction information. It serves to remove.

이때, 반향 검증 제거부(16)에서 사용하는 검증기는 인공 신경망 모델 또는 선형 회귀 모델 등과 같은 잘 알려진 학습 모델을 활용할 수 있다. 이 검증기는 사전에 샘플 데이터들을 이용하여 학습이 될 수도 있고, 도 1에서 도시된 바와 같이 영상 처리부(20)를 이용하여 실시간으로 학습을 수행할 수도 있다.
In this case, the verifier used by the echo verification remover 16 may utilize a well-known learning model such as an artificial neural network model or a linear regression model. The verifier may be trained using sample data in advance, or may be trained in real time using the image processor 20 as illustrated in FIG. 1.

영상 처리부(20)는 영상 카메라(22)와 얼굴 검출부(24)로 구성되어, 영상 카메라(22)를 이용하여 실시간으로 영상 정보를 취득하고, 얼굴 검출부(24)를 이용하여 영상 정보 내의 사람의 얼굴 정보를 검출하여 그 위치를 추정한다. 즉, 영상 처리부(20)는 영상 정보를 획득한 후, 영상 정보에서 사람의 얼굴 부분을 검출하고, 사람의 얼굴의 위치 정보를 음원의 위치 정보로서 후술할 검증기 학습부(30)에 제공한다.
The image processing unit 20 includes an image camera 22 and a face detection unit 24. The image processing unit 20 acquires image information in real time using the image camera 22, and uses the face detection unit 24 to detect a person in the image information. Detect face information and estimate its location. That is, after obtaining the image information, the image processing unit 20 detects a face part of the person from the image information and provides the position information of the face of the person as the position information of the sound source to the verifier learner 30 which will be described later.

검증기 학습부(30)는 음원의 위치 정보를 이용하여 반향 검증 제거부(16)를 학습시키는 역할을 한다. 즉, 검증기 학습부(30)는 음원 방향 검지부(12)로부터 수신한 음원 방향 정보와, 영상 처리부(20)로부터 수신한 얼굴 위치 정보를 수신하여, 실제 음원(화자)과 반향음에 따른 방향 검지 결과를 구별하고, 이에 따른 학습 데이터를 생성하여 이를 검증기에 학습시킨다. The validator learner 30 learns the echo verification remover 16 using the location information of the sound source. That is, the verifier learner 30 receives sound source direction information received from the sound source direction detection unit 12 and face position information received from the image processor 20, and detects the direction according to the actual sound source (speaker) and the echo sound. Distinguish the results, generate training data accordingly, and train the validator.

반향 검증 제거부(16)는 이와 같은 방법으로 영상 처리부(20)로부터 입력되는 영상 정보를 이용하여 실시간으로 학습을 하면서 반향 검증 및 제거를 수행할 수 있다.
The echo verification remover 16 may perform echo verification and removal while learning in real time using the image information input from the image processor 20 in this manner.

이하, 도 2의 순서도를 참조하여 본 발명의 일실시예에 따른 음원 방향 검지 방법을 설명한다.Hereinafter, a sound source direction detecting method according to an embodiment of the present invention will be described with reference to the flowchart of FIG. 2.

본 발명의 일실시예에 따른 음성 방향 검지 시스템(1)는 마이크로폰 어레이로 구성되는 음성 취득부(10)를 이용하여 음성 신호를 수신한다(100). 음성 신호를 수신하면 이 음성 신호를 디지털 신호화하여 분석함으로써, 음원 방향을 검지한다(102). 음원 방향이 검지되면, 선행음의 방향 정보와 음원 방향 정보와의 상관 관계를 계산함으로써 반향 특징 정보를 추출한다(104).The voice direction detection system 1 according to the exemplary embodiment of the present invention receives a voice signal using the voice acquisition unit 10 configured as a microphone array (100). When the audio signal is received, the audio signal is digitally analyzed and analyzed to detect the sound source direction (102). When the sound source direction is detected, the echo feature information is extracted by calculating a correlation between the direction information of the preceding sound and the sound source direction information (104).

이때, 검증기 학습부(30)는 영상 처리부(20)로부터 음원의 위치 정보가 검출되었는지는 판단한다(106). 만약, 영상 처리부(20)로부터 음원의 위치 정보가 검출되었다고 판단되면 음원 방향 검지부(12)로부터 검지된 음원 방향 정보를 이용하여 검증기를 학습시키고(108), 반향 검증 및 제거(110)를 수행한다. 영상 처리부(20)로부터 음원의 위치 정보가 검출되지 않은 경우에는 검증기 학습 없이 기존에 학습된 데이터를 기반으로 반향 검증 및 제거(100)를 수행하게 된다.
In this case, the verifier learner 30 determines whether position information of the sound source is detected from the image processor 20 (106). If it is determined that the location information of the sound source is detected by the image processor 20, the verifier is trained using the sound source direction information detected by the sound source direction detection unit (108), and echo verification and removal (110) are performed. . When the location information of the sound source is not detected from the image processor 20, echo verification and removal 100 may be performed based on previously learned data without learning a validator.

도 3은 본 발명의 일실시예에 따른 음원 방향 검지 시스템 및 방법의 반향 특징 추출 결과의 예를 나타내는 도면이다.3 is a view showing an example of the echo feature extraction results of the sound source direction detection system and method according to an embodiment of the present invention.

도 3의 맨 위의 그래프는 수신되는 음성 신호의 파워의 크기를 dB 단위로 측정한 그래프이다. 두 번째 그래프는 이때의 프레임별 파워 증분을 나타내는 그래프이다. 파워 그래프에서 신호가 발생되거나 소멸되는 부분에 파워의 증분이 발생하여 솟아오르는 모양이 발생한 것을 확인할 수 있다. 본 발명에서는 이 파워 증분을 나타내는 파워 파라미터 δ와 시간 파라미터인

를 이용하여 선행음의 방향 정보를 획득하였다.3 is a graph measuring the magnitude of power of a received voice signal in dB units. The second graph is a graph showing the power increment for each frame at this time. It can be seen that a rising shape occurs due to an increment of power in a portion where a signal is generated or disappeared in the power graph. In the present invention, the power parameter δ and the time parameter representing this power increment are

The direction information of the preceding sound was obtained using.

세 번째부터 여섯 번째 그래프는 다양한 값의 시간 파라미터

와, 파워 파라미터 δ를 사용한 경우의

필터의 값과 반향 특징 벡터 {ζ}의 값을 나타내고 있다. 파워 증분이 큰 값을 가져 선행음으로 취급되는 구간에

필터의 값이 일정 구간에서 발생함을 할 수 있다. 이때, 그래프에서 표현되는 색깔은 필터의 값의 크기를 나타내고 있다. 그래프에서 확인할 수 있듯이, 시간 파라미터

와, 파워 파라미터 δ의 값이 달라짐에 따라 선행음 정보가 기억되는 범위와 값이 달라짐을 알 수 있다. The third through sixth graphs show time parameters of varying values.

And when the power parameter δ is used

The value of the filter and the echo feature vector {ζ} are shown. In a section where the power increment has a large value and is treated as a leading sound

The value of the filter may occur at a certain interval. At this time, the color represented in the graph represents the magnitude of the value of the filter. As you can see in the graph, the time parameter

As the value of the power parameter δ varies, it can be seen that the range and value of storing the preceding sound information vary.

본 발명에 따른 음성 신호 검지 시스템(1)는 이렇게 다양한 값의 시간 파라미터

와, 파워 파라미터 δ를 가지는 필터군을 사용하여 실시간으로 학습시킴으로써 반향 환경 및 동적 환경에서 보다 더 정확하게 음원 방향을 검지할 수 있게 된다.The voice signal detection system 1 according to the present invention has such a time parameter of various values.

By learning in real time using a filter group having a power parameter δ, the sound source direction can be detected more accurately than in an echo environment and a dynamic environment.

1 : 음원 방향 검지 시스템 10 : 음성 취득부
12 : 음원 방향 검지부 14 : 반향 특징 추출부
16 : 반향 검증 제거부 20 : 영상 처리부
22 : 영상 카메라 24 : 얼굴 검출부
30 : 검증기 학습부 1: Sound source direction detection system 10: Voice acquisition unit
12: sound source direction detection unit 14: echo feature extraction unit
16: echo verification removal unit 20: image processing unit
22: video camera 24: face detection unit
30: validator learning unit

Claims

A voice acquisition unit for receiving a voice signal and digitally converting the received voice signal;
A sound source direction detection unit for processing the digital signal of the sound acquisition unit to detect sound source direction information of the received voice signal;
The direction information of the preceding sound determined according to the time parameter and the power parameter is obtained from the sound source direction detecting unit by using the preceding sound effect among the voice signals, and the correlation between the direction information of the preceding sound and the sound source direction information is calculated. An echo feature extracting unit configured to extract echo feature information; And
An echo verification eliminator configured to determine whether the sound source direction information is the direction information of the echo using the echo feature information extracted by the echo feature extractor, and to remove information determined as the echo direction information of the sound source direction information; Sound source direction detection system, characterized in that.

The method of claim 1,
The sound source direction detection system further comprises a verifier learning unit for learning the echo verification removing unit by using the position information of the sound source.

The method of claim 2,
And a face detection unit which detects position information of a face from an image camera and image information obtained from the image camera.
And the verifier learning unit uses the position information of the face as position information of the sound source.

The method of claim 1,
The echo feature extracting unit is defined as in Equation 1 below.
[Equation 1]

n: frame number
θ: sound source azimuth

: Time parameter
δ: power parameter
μ _δ : filter gate
Δp: Power increment in n-1th frame and nth frame
s (n): Sound source direction detection result in nth frame
R: Sound source direction information at nth frame and sound source azimuth θ
function

Equivalent to

And the direction information of the preceding sound is obtained by a filter.

The method of claim 4, wherein
The echo feature information is defined as in Equation 2 below.
[Equation 2]

n: frame number
θ: sound source azimuth
Echo feature vector

Sound source direction detection system characterized in that.

Receiving a voice signal;
Detecting sound source direction information of the received voice signal;
Acquiring direction information of a preceding sound determined according to a time parameter and a power parameter by using a preceding sound effect among the received voice signals;
Extracting echo feature information by calculating a correlation between the direction information of the preceding sound and the sound source direction information;
Determining whether the sound source direction information is direction information of echo using the echo feature information; And
And removing information determined as echo direction information among the sound source direction information.

The method of claim 6,
Detecting location information of the sound source,
Determining whether the sound source direction information is the direction information of the echo using the echo feature information,
And learning by using the position information of the detected sound source.

The method of claim 7, wherein
Detecting position information of the sound source,
Obtaining image information from an image camera; And
And detecting position information of the face from the image information.

The method of claim 6,
Direction information of the preceding sound is defined as in Equation 1 below
[Equation 1]

n: frame number
θ: sound source azimuth

Equivalent to

A sound source direction detection method, characterized in that obtained by a filter.

10. The method of claim 9,
The echo feature information is defined as in Equation 2 below.
[Equation 2]

n: frame number
θ: sound source azimuth
Echo feature vector

The sound source direction detection method characterized by the above-mentioned.