KR101356039B1

KR101356039B1 - Blind source separation method using harmonic frequency dependency and de-mixing system therefor

Info

Publication number: KR101356039B1
Application number: KR1020120048808A
Authority: KR
Inventors: 이수영; 최충환; 장원일
Original assignee: 한국과학기술원
Priority date: 2012-05-08
Filing date: 2012-05-08
Publication date: 2014-01-29
Also published as: WO2013168848A1; KR20130125227A

Abstract

본 발명에 따른 암묵 신호 분리 방법은 주파수 영역에서 신호를 분리하는 방법으로서, 2개 이상의 음원으로부터의 신호가 혼합되어 수신되는 단계 및 복수의 하모닉 주파수 집단 중 동일한 하모닉 주파수 집단에 포함된 주파수빈 사이에만 종속성을 가정하여 상기 수신된 신호를 디믹싱하는 단계를 포함한다.The blind signal separation method according to the present invention is a method of separating signals in a frequency domain, and the signals from two or more sound sources are mixed and received only between frequency bins included in the same harmonic frequency group among a plurality of harmonic frequency groups. Demixing the received signal assuming a dependency.

Description

Blind Signal Separation Method Using Dependency Between Harmonic Frequency and Demixing System for It {BLIND SOURCE SEPARATION METHOD USING HARMONIC FREQUENCY DEPENDENCY AND DE-MIXING SYSTEM THEREFOR}

본 발명은 하모닉 주파수 사이의 종속관계를 이용한 암묵 신호 분리 방법 및 이를 위한 디믹싱 시스템에 관한 것이며, 보다 구체적으로 음성 및/또는 음악 신호에서 하모닉 주파수 사이에 종속관계를 바탕으로 독립 벡터 분석을 통해 신호를 분리하는 암묵 신호 분리 알고리즘에 관한 것이다. The present invention relates to a method for separating tacit signals using dependencies between harmonic frequencies and a demixing system therefor, and more particularly, to signals through independent vector analysis based on dependencies between harmonic frequencies in speech and / or music signals. It relates to a blind signal separation algorithm for separating the.

독립 성분 분석(ICA: Independent Component Analysis)은 출력 신호들 사이에 통계적 독립성을 이용하는 암묵 신호 분리(BSS: Blind Source Separation) 알고리즘이다. 주파수 도메인 독립 성분 분석(FDICA: Frequency Domain ICA)가 콘볼루티브 BSS 알고리즘을 위해 이용되어 왔는바, 이는 상기 알고리즘에서 시간 도메인에서의 콘볼루티브 혼합신호가 주파수 도메인에서 순간 혼합신호(instantaneous mixture)로 모델링될 수 있기 때문이다. 이러한 모델링에 따라 분리 문제가 단순화될 수 있다. FDCIA는 각 주파수 채널의 신호 성분을 성공적으로 분리한다. 그러나, 주파수 빈들(bins) 사이에서 분리된 주파수 성분들의 임의적인 퍼뮤테이션(random permutation) 문제가 발생된다. Independent Component Analysis (ICA) is a Blind Source Separation (BSS) algorithm that uses statistical independence between output signals. Frequency Domain Independent Component Analysis (FDICA) has been used for the convolutive BSS algorithm because it allows the convolution of the convolutional signal in the time domain in the frequency domain to an instantaneous mixture Because it can be modeled. This modeling can simplify the separation problem. FDCIA successfully separates the signal components of each frequency channel. However, the random permutation problem of separated frequency components between frequency bins occurs.

독립 성분 분석의 다변수 확장인 독립 벡터 분석(IVA: Independent Vector Analysis)은 주파수 성분들 사이의 종속성을 이용함으로써 상기 퍼뮤테이션 불확정성 문제를 해소한다. 종래의 기본 IVA 모델에 따른 연구(참조: KIM, T., ATTIAS, H.T., LEE, S.-Y., and LEE, T.-W.: 'Blind source separation exploiting higher-order frequency dependencies', IEEE Trans. Audio Speech Lang. Process., 2007, 15, (1), pp. 70-79)에서, 음원 신호는 주파수 성분 벡터로서 표현되고, 독립 벡터 분석은 선험적으로 전대역에서 방사상 대칭인 결합 확률 밀도 함수(PDF: Probability Density function)를 이용하여 음원 신호를 모델링하여 신호를 분리할 수 있다. 여기서, 상기 확률 밀도 함수는 주파수 성분 사이의 종속성을 가정한다. 보다 구체적으로, 이상에서 살펴본 바와 같은 기본 IVA(Original IVA) 모델에서는 혼합되기 전의 각각의 신호 사이에는 확률적으로 독립적인 관계가 있다는 제1가정과 각 신호의 서로 다른 주파수 성분 사이에는 종속적인 관계가 존재한다는 제2가정을 이용하였다. Independent Vector Analysis (IVA), a multivariate extension of independent component analysis, solves the permutation uncertainty problem by utilizing dependencies between frequency components. Studies based on conventional basic IVA models (KIM, T., ATTIAS, HT, LEE, S.-Y., and LEE, T.-W .: 'Blind source separation exploiting higher-order frequency dependencies', IEEE In Trans.Audio Speech Lang.Process., 2007, 15, (1), pp. 70-79), the sound source signal is represented as a frequency component vector, and independent vector analysis is a priori a combined probability density function that is radially symmetric in full band. (PDF: Probability Density function) can be used to model the sound source signal to isolate the signal. Here, the probability density function assumes a dependency between frequency components. More specifically, in the basic IVA model as described above, there is a dependent relationship between the first assumption that there is a probabilistic independent relationship between each signal before mixing and the different frequency components of each signal. A second assumption was used.

이후, 개선된 주파수 종속성 모델을 나타내는 독립 벡터 분석법이 제시되었다. 이는 부대역 국부 집단 IVA 모델(참조: LEE, I., JANG, G.-J., and LEE, T.-W.: 'Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals', Electronics Letters, 2009, 45, (13), pp. 710-711)로 지칭된다. 이 방법은 상기 기본 IVA 모델의 제1가정은 유지하되 각 신호의 서로 다른 주파수 성분 사이에서도 가까운 주파수 성분 사이에는 종속적인 관계를 가정하고, 먼 주파수 성분 사이에는 종속성이 없는 것으로 수정된 제2가정을 나타낸다. Subsequently, an independent vector analysis method is presented that represents an improved frequency dependency model. This is a subband local group IVA model (LEE, I., JANG, G.-J., and LEE, T.-W .: 'Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals', Electronics Letters, 2009, 45, (13), pp. 710-711). This method maintains the first hypothesis of the basic IVA model but assumes a dependent relationship between near frequency components even between different frequency components of each signal, and corrects the second hypothesis that there are no dependencies between distant frequency components. Indicates.

최근 들어, 우리가 주로 접할 수 있는 음성 또는 음악과 같은 음원 신호에서 효과적으로 신호를 분리할 수 있는 기술에 대한 필요성이 커지고 있다. In recent years, there is a growing need for technology that can effectively separate signals from sound sources such as voice or music, which we usually encounter.

한국공개공보 제10-2008-0019879호 (2008.03.05)Korean Laid-Open Publication No. 10-2008-0019879 (2008.03.05)

본 발명은 종래기술의 문제점을 해결하기 위해 안출된 것으로, 하모닉 주파수 구조를 갖는 음원 신호, 예컨대 음성 및/또는 음악 신호에 대해서 독립 벡터 분석을 통해 효과적으로 신호를 분리할 수 있는 기법을 제공하는 것을 목적으로 한다. Disclosure of Invention The present invention has been made to solve the problems of the prior art, and an object of the present invention is to provide a technique capable of effectively separating signals through independent vector analysis for sound source signals having a harmonic frequency structure, such as voice and / or music signals. It is done.

본 발명이 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 본 발명의 기재로부터 당해 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical objects to be achieved by the present invention are not limited to the above-mentioned technical problems, and other technical subjects which are not mentioned can be clearly understood by those skilled in the art from the description of the present invention .

본 발명에 따른 암묵 신호 분리 방법은 주파수 영역에서 신호를 분리하는 방법으로서 2개 이상의 음원으로부터의 신호가 혼합되어 수신되는 단계; 및 복수의 하모닉 주파수 집단 중 동일한 하모닉 주파수 집단에 포함된 주파수빈 사이에만 종속성을 가정하여 상기 수신된 신호를 디믹싱하는 단계를 포함한다. In accordance with an aspect of the present invention, there is provided a method of separating a signal in a frequency domain, the method comprising: receiving a mixture of signals from two or more sound sources; And demixing the received signal assuming a dependency only between frequency bins included in the same harmonic frequency group among a plurality of harmonic frequency groups.

본 발명에 따라 수신된 음원 신호를 디믹싱하는 단계는 독립 벡터 분석법을 통해 수행될 수 있다.Demixing the received sound source signal according to the present invention may be performed through independent vector analysis.

본 발명에 따른 암묵 신호 분리를 위한 디믹싱 시스템은 주파수 영역에서 신호를 분리하는 디믹싱 시스템으로서, 2개 이상의 음원으로부터의 신호가 혼합되어 수신되는 신호 수신부; 및 복수의 하모닉 주파수 집단 중 동일한 하모닉 주파수 집단에 포함된 주파수빈 사이에만 종속성을 가정하여 상기 수신된 신호를 디믹싱하는 디믹싱 필터를 포함한다. Demixing system for blind signal separation according to the present invention is a demixing system for separating the signal in the frequency domain, the signal receiving unit for receiving a mixture of signals from two or more sound sources; And a demixing filter for demixing the received signal assuming a dependency only between frequency bins included in the same harmonic frequency group among a plurality of harmonic frequency groups.

본 발명에 따르면 하모닉 주파수 구조를 갖는 음원 신호, 예컨대 음성 및/또는 음악 신호에 대해서 독립 벡터 분석을 통해 효과적으로 신호를 분리할 수 있는 알고리즘 및 방법을 제공할 수 있다. 또한, 본 발명에 따르면 암묵 신호 분리시에 주파수 빈들 퍼뮤테이션 문제를 방지할 수 있다. According to the present invention, it is possible to provide an algorithm and a method for effectively separating signals through independent vector analysis on a sound source signal having a harmonic frequency structure, such as a voice and / or music signal. In addition, according to the present invention, it is possible to prevent the frequency bin permutation problem at the time of blind signal separation.

도1은 본 발명의 실시예에 따른 암묵 신호 분리 알고리즘이 실행될 수 있는 디믹싱 시스템의 환경을 예시한다.
도2는 (a)에 기본 IVA 모델의 광역 주파수 집단, (b)에 부대역 국부 IVA 모델의 부대역 국부 주파수 집단, 및 (c)에 본 발명의 실시예에 따른 하모닉 주파수 집단 IVA 모델의 하모닉 주파수 집단을 나타낸다.
도3은 본 발명의 실시예에 따른 암묵 신호 분리 알고리즘의 성능을 실험하기 위함 모의 실험 환경을 나타낸다. 1 illustrates an environment of a demixing system in which a blind signal separation algorithm may be executed in accordance with an embodiment of the present invention.
2 is a harmonic of a wide frequency group of the basic IVA model, (b) a subband local frequency group of the subband local IVA model, and (c) a harmonic frequency group IVA model according to an embodiment of the present invention. Represents a frequency group.
Figure 3 shows a simulation environment for testing the performance of the blind signal separation algorithm according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예의 상세한 설명이 첨부된 도면들을 참조하여 설명된다. 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있으며, 도면들 중 인용부호들 및 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 인용부호들로 표시됨을 유의해야 한다. 참고로 본 발명을 설명함에 있어서 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a detailed description of preferred embodiments of the present invention will be given with reference to the accompanying drawings. The shape and the size of the elements in the drawings may be exaggerated for clarity of explanation and the same reference numerals are used for the same elements and the same elements are denoted by the same quote symbols as possible even if they are displayed on different drawings Should be. In the following description, well-known functions or constructions are not described in detail to avoid unnecessarily obscuring the subject matter of the present invention.

인간이 주로 다루는 신호는 음성이나 음악신호이다. 이러한 음성 신호와 음악 신호의 스펙트럼을 살펴보면 이들의 주파수 성분 사이에 하모닉 관계가 존재함을 알 수 있다. 즉, 음성 신호와 음악 신호는 강한 하모닉 구조를 갖는 것으로 지칭될 수 있다. The signal mainly dealt with by human is voice or music signal. Looking at the spectrum of the voice signal and the music signal, it can be seen that a harmonic relationship exists between their frequency components. That is, the voice signal and the music signal may be referred to as having a strong harmonic structure.

본 발명의 실시예에서는, 음성 신호와 음악 신호 이외에도, 하모닉 주파수(harmonic frequency) 성분들을 갖는 음원 신호에 대해서 하모닉 주파수 성분 사이의 종속성을 이용함으로써 신호를 분리할 수 있는 기술을 제시한다. 즉, 본 발명의 실시예에서는 하모닉 주파수 성분 사이에 종속 관계를 이용하여 독립 벡터 분석법을 위한 개선된 주파수 종속 모델을 제시한다. 종래의 주파수 종속 모델에 비해, 본 발명의 실시예에 따른 주파수 종속 모델은 음성 및 음악 신호와 같이 강한 하모닉 구조를 갖는 소리 신호를 분리하는데 매우 효과적이다. In an embodiment of the present invention, in addition to the voice signal and the music signal, for a sound source signal having harmonic frequency components, a technique capable of separating signals by using dependencies between the harmonic frequency components is proposed. That is, an embodiment of the present invention proposes an improved frequency dependent model for independent vector analysis using dependency relations between harmonic frequency components. Compared with the conventional frequency dependent model, the frequency dependent model according to the embodiment of the present invention is very effective for separating sound signals having strong harmonic structures such as voice and music signals.

도1은 본 발명의 실시예에 따른 암묵 신호 분리 알고리즘이 실행될 수 있는 디믹싱 시스템의 환경을 예시한다. 도1에 도시된 바와 같이, 2이상의 소스(10, 12)로부터 음원 신호가 혼합되어 1이상의 신호 수신부(20, 22)에 의해 수신되는 경우를 고려할 수 있다. 도1에서는 실내 환경을 예시한다. 따라서, 소스(10, 12)로부터의 신호는 직접 경로(D11, D12, D21, D22)를 통해서 신호 수신부(20, 22)에 도달할 뿐 아니라, 실내에 반향되어 반향 경로(R11, R12, R21, R22)를 통해서도 도달할 수 있다. 이렇게 수신된 음원 신호는 디믹싱 시스템(30)에 입력될 수 있다. 상기 디믹싱 시스템(30)을 통해 수행되는 디믹싱(de-mixing)을 통해 혼합되어 수신된 음원 신호가 분리될 수 있다. 이하에서는 디믹싱 시스템(30)에 신호 수신부(20, 22)를 포괄하는 개념으로 지칭될 수 있다. 1 illustrates an environment of a demixing system in which a blind signal separation algorithm may be executed in accordance with an embodiment of the present invention. As shown in FIG. 1, it can be considered that a sound source signal is mixed from two or more sources 10 and 12 and received by one or more signal receiving units 20 and 22. 1 illustrates an indoor environment. Therefore, the signals from the sources 10 and 12 not only reach the signal receiving units 20 and 22 through the direct paths D11, D12, D21 and D22, , R22). The sound source signal thus received may be input to the demixing system 30. [ The de-mixing performed through the demixing system 30 may separate the received sound source signals. Hereinafter, the demixing system 30 may be referred to as a concept including the signal receivers 20 and 22.

이때, 음원 신호나 혼합 환경에 대한 정보가 없는 상태가 "암묵" 상태로 지칭된다. 즉, 본 발명의 실시예에서는 암묵 상태에서 수신된 신호를 분리하는 알고리즘을 제공한다. At this time, the state in which there is no information about the sound source signal or the mixed environment is referred to as the "implicit" state. That is, an embodiment of the present invention provides an algorithm for separating the received signal in the tacit state.

이하에서는 본 발명의 실시예에 따라 하모닉 주파수 집단(harmonic frequency clique) 내에서의 종속성에 근거한 독립 벡터 분석법에 대해서 설명한다. In the following, independent vector analysis based on dependencies within a harmonic frequency clique according to an embodiment of the present invention will be described.

우선, 상기 디믹싱 시스템(30)에 수신된 혼합 신호는 단기 푸리에 변환(STFT: Short-Time Fourier Transform)을 통해 주파수 도메인에서 표현된다. 주파수 도메인에서 콘볼루티브 암목 신호 분리 알고리즘은 빈별(bin-wise) 순간 혼합 모델을 근사화하는 것으로부터 시작된다. 각각의 주파수빈에 대해서, 상기 모델은 아래와 같이 공식화될 수 있다:First, the mixed signal received by the demixing system 30 is represented in the frequency domain through a Short-Time Fourier Transform (STFT). The convolutional rock signal separation algorithm in the frequency domain begins by approximating a bin-wise instantaneous mixing model. For each frequency bin, the model can be formulated as follows:

_수학식 ₍₁₎

_Equation ₍₁₎

여기서, 아래 첨자 k는 주파수 빈의 색인(frequency bin index)를 나타낸다. y_k는 디믹싱 시스템(30)에 의해 디믹싱이 완료되어 분리된 신호를 나타내고, x_k는 신호 수신부(20,22)에서 수신되어 디믹싱 시스템(30)에 입력되는 신호를 나타내고, 그리고 s_k는 음원(10, 12)으로부터의 신호를 나타낸다. Here, the subscript k indicates the frequency bin index. y _k denotes a signal that is demixed by the demixing system 30 and is separated, x _k denotes a signal received by the signal receiving units 20 and 22 and input to the demixing system 30, and s _k represents a signal from the sound sources 10 and 12.

비록 도1에서는 2개의 음원과 2개의 마이크를 예시하지만, N개의 음원 신호와 N개의 마이크(microphone)가 존재하는 것으로 가정한다. N X N 행렬인 A _k 와 W _k 각각은 순간 믹싱(mixing) 및 디믹싱(de-mixing) 행렬을 나타낸다. 즉, A _k는 음원 신호의 경로에 대한 전달함수를 나타내고, W _k는 디믹싱 필터의 전달함수를 나타낸다. 따라서, 수신된 신호(x_k)에 디믹싱 행렬(W _k)를 곱함으로써 음원으로부터의 신호(s_k)를 획득할 수 있어야 한다. 또한, K는 주파수 빈의 개수를 나타내고 t는 프레임의 시간 색인을 나타낸다. Although FIG. 1 illustrates two sound sources and two microphones, it is assumed that there are N sound source signals and N microphones. Each of the NXN matrices A _k and W _k represents an instant mixing and de-mixing matrix. That is, A _k represents a transfer function for the path of the sound source signal, and W _k represents a transfer function of the demixing filter. Therefore, it should be possible to obtain the signal s _k from the sound source by multiplying the received signal x _k by the demixing matrix W _k . K represents the number of frequency bins and t represents the time index of the frame.

본 발명의 실시예에서 분리 신호(y)는 아래 수학식(2)로 제안되는 다변수 확률 밀도 함수가 이용되어 표현될 수 있다. In the embodiment of the present invention, the separation signal y may be represented using a multivariate probability density function proposed by Equation (2) below.

_수학식 ₍₂₎

_Equation ₍₂₎

여기서, y_ik는 y_k=[y_lk, ..., y_Nk]^T의 i번째 원소를 나타낸다. H는 하모닉 집단의 총 개수를, 그리고 σ_hk는 하모닉 집단 h에 속하는 k번째 주파수 빈 그룹의 분리 신호의 절대값의 표준편차이며, 예컨대 1로 설정될 수 있다. Ch는 하모닉 집단 h에 속하는 주파수 빈 그룹을 나타낸다. H의 기본 주파수(fundamental frequency)는 Fh로 표시되고 아래와 같이 정의된다. Here, y _ik represents the i-th element of y _k = [y _lk , ..., y _Nk ] ^T. H is the total number of harmonic groups, and σ _hk is the standard deviation of the absolute value of the separated signal of the k-th frequency bin group belonging to the harmonic group h, for example, can be set to one. Ch represents a frequency bin group belonging to the harmonic group h. The fundamental frequency of H is represented by Fh and defined as follows.

_수학식 ₍₃₎

_Equation ₍₃₎

본 발명의 실시예에서, F1은 예컨대 55Hz로 설정될 수 있다. 또한, 하모닉 집단의 총 개수는 49로 예시될 수 있다. 이들 하모닉 집단들의 기본 주파수들은 A1(55Hz)로부터 A5(880Hz)까지의 음계 주파수(note frequency)들을 나타낸다. 이러한 주파수 범위는 인간 음성 신호의 전체 주파수를 포괄할 수 있다. 1≤h≤H-1인 조건하에서, 아래와 같은 수학식이 성립될 수 있다. In an embodiment of the present invention, F1 may be set to 55 Hz, for example. Also, the total number of harmonic populations can be illustrated as 49. The fundamental frequencies of these harmonic groups represent note frequencies from A1 (55 Hz) to A5 (880 Hz). This frequency range may cover the entire frequency of the human speech signal. Under the condition that 1≤h≤H-1, the following equation may be established.

_수학식 ₍₄₎

_Equation ₍₄₎

여기서, f_k는 k번째 빈의 주파수를 나타내고, Ch는 h번째 국부 주파수 집단(clique)을 나타낸다. 집단 C_k는 F_h의 처음 8개의 배수 주파수로 이루어진 주파수 빈들을 포함한다(즉, M=8). F_h의 m번째 배수 주파수의 대역폭(즉, mF_h)은 2δmF_h이고, 두 개의 연속적은 하모닉 집단들은 서로 중첩될 수 있다. 이때, 연속적인 하모닉 집단들이 50%가 중첩되는 것이 예시된다. 집단 C_H={1,K, K}은 모든 주파수 빈들을 포함한다. 이를 통해 55Hz 보다 작은 주파수를 갖는 주파수 빈들의 퍼뮤테이션이 방지되고 디믹싱 필터의 학습 속도가 향상될 수 있다. 도2(c)에는 본 발명의 실시예에 따른 하모닉 주파수 집단 IVA 모델의 하모닉 주파수 집단을 나타낸다. 도2(c)로부터 세로축에 49개의 하모닉 주파수 집단이 예시되고, 동일한 하모닉 주파수 집단에는 8개의 하모닉 주파수 빈들이 포함됨을 알 수 있다. 이때, 하모닉 주파수 집단의 개수, 하모닉 주파수 집단에 포함되는 하모닉 주파수 빈들의 개수, 및 연속적인 하모닉 집단들의 중첩 정도는 발명의 실시예에 따라 다양하게 변경될 수 있다. Where f _k denotes the frequency of the k-th bin and Ch denotes the h-th local frequency clique. Population C _k contains frequency bins of the first eight multiples of F _h (ie, M = 8). The bandwidth of the m th multiple frequency of F _h (ie, mF _h ) is 2δmF _h , and two consecutive harmonic groups can overlap each other. At this time, it is illustrated that 50% of consecutive harmonic groups overlap. The population C _H = {1, K, K } includes all frequency bins. This can prevent permutation of frequency bins with frequencies less than 55 Hz and improve the learning speed of the demixing filter. Figure 2 (c) shows a harmonic frequency group of the harmonic frequency group IVA model according to an embodiment of the present invention. It can be seen from FIG. 2C that 49 harmonic frequency groups are illustrated on the vertical axis, and 8 harmonic frequency bins are included in the same harmonic frequency group. In this case, the number of harmonic frequency groups, the number of harmonic frequency bins included in the harmonic frequency group, and the degree of overlap of consecutive harmonic groups may be variously changed according to an embodiment of the present invention.

디믹싱 시스템(30)에 수신된 신호를 디믹싱 하기 위해서 디믹싱 필터에서 디믹싱을 위한 전달함수(W)를 산출해야 한다. 예컨대, 전달함수(W)의 파라미터는 아래와 같은 방식으로 필터 파라미터 산출부에서 구해질 수 있다. In order to demix the received signal to the demixing system 30, it is necessary to calculate a transfer function W for demixing in the demixing filter. For example, the parameters of the transfer function W may be obtained from the filter parameter calculator in the following manner.

주파수 빈들 사이의 종속성 모델의 특성을 최대화하면서 독립 벡터 분석을 수행하기 위해서 아래와 같은 로그 우도 함수(log-likelihood)가 비용함수로서 사용될 수 있다. In order to perform independent vector analysis while maximizing the characteristics of the dependency model between frequency bins, the following log-likelihood function can be used as a cost function.

_수학식 ₍₅₎

_Equation ₍₅₎

최적화된 분리 신호를 획득하기 위해 아래와 같은 자연 경도 학습규칙(natural gradient learning rule)이 적용될 수 있다. In order to obtain an optimized separation signal, the following natural gradient learning rule may be applied.

_수학식 ₍₆₎

_Equation ₍₆₎

여기서, I는 N X N의 단위 행렬을 나타낸다. Φ(y_k)는 N X 1의 컬럼 벡터(column vector)를 나타내며, 여기서 i번째 원소는 아래와 같이 정의된다. Where I represents the unit matrix of NXN. Φ (y _k ) denotes a column vector of NX 1, where the i th element is defined as follows.

_수학식 ₍₇₎

_Equation ₍₇₎

여기서, S_k는 k번째 주파수 빈을 포함하는 집단들의 색인 그룹이다. Here, S _k is an index group of groups including the k th frequency bin.

본 발명의 실시예에서, 디믹싱 시스템(30)에 포함된 필터 파라미터 산출부(미도시)는 수학식(5)와 같이 표현되는 비용함수를 이용하여 디믹싱을 위한 전달함수(W)를 구할 수 있다. 이후, 산출된 전달함수(W)를 이용하여 디믹싱 시스템(30)에 수신된 신호를 디믹싱 필터에서 디믹싱한다. In an embodiment of the present invention, the filter parameter calculator (not shown) included in the demixing system 30 may obtain a transfer function W for demixing using a cost function expressed as Equation (5). Can be. Thereafter, the signal received by the demixing system 30 is demixed in the demixing filter using the calculated transfer function W. FIG.

이때, 필터 파라미터 산출부는 디믹싱 필터로부터의 출력을 수신하고, 이에 기초하여 상기 수학식 (6)과 같이 표현되는 학습규칙에 따라 반복적으로 필터 파라미터를 구하여 디믹싱 필터에 공급할 수 있다. 이에 따라 디믹싱 필터는 적응적으로 동작할 수 있다. 즉, 전달함수(W)는, 수학식(6)의 학습규칙에 따라 반복적으로 연산을 수행함으로써, 적응적으로 얻어질 수 있다. 이후, 전달함수(W)가 수렴하는지 여부를 판단하여, 그러하지 않은 경우에는 이전 단계로 돌아가 전달함수(W)를 다시 산출하여 디믹싱을 수행할 수 있다. In this case, the filter parameter calculator may receive an output from the demixing filter, and repeatedly obtains the filter parameter according to the learning rule expressed by Equation (6) and supplies the filter parameter to the demixing filter. Thus, the demixing filter can operate adaptively. That is, the transfer function W may be adaptively obtained by repeatedly performing a calculation according to the learning rule of Equation (6). Thereafter, it is determined whether or not the transfer function W converges. If not, the flow returns to the previous step to calculate the transfer function W again and perform demixing.

이와 같이 적응적인 방식으로 획득된 전달함수(W)를 이용하여 분리된 신호(y)를 획득할 수 있으며, 이는 이후 필요에 따라 시간 도메인에서 표현되도록 변환될 수 있다.
The separated signal y may be obtained using the transfer function W obtained in this adaptive manner, which may then be transformed to be expressed in the time domain as needed.

시뮬레이션 결과(Simulation results ( SimulationSimulation ResultsResults ):):

본 발명의 실시예에 따라 제안된 알고리즘의 성능을 평가하기 위해 다양한 2 X 2 BSS 실험을 수행하였다. 본 실험에서 TIMIT 데이터베이스에서 8-s-길이 실제 음성 신호들, 8kHz 샘플링 레이트(sampling rate)의 바이올린 단선율 음악 신호들(monophonic music signals), 그리고 영상법(참조: ALLEN, J.B., and BERKLEY, D.A.: 'Image method for efficiently simulating small-room acoustics', J. Acoust. Soc. Am., 1979, 65, (4), pp. 943-950)에 의해 생성된 룸 임펄스 응답(room impulse response)들이 이용되었다. Various 2 × 2 BSS experiments were performed to evaluate the performance of the proposed algorithm according to an embodiment of the present invention. In this experiment, 8-s-length real speech signals from the TIMIT database, monophonic music signals at 8 kHz sampling rate, and imaging (see ALLEN, JB, and BERKLEY, DA: Room impulse responses generated by 'Image method for efficiently simulating small-room acoustics', J. Acoust.Soc.Am., 1979, 65, (4), pp. 943-950) were used. .

도3은 본 발명의 실시예에 따른 암묵 신호 분리 알고리즘의 성능을 실험하기 위함 모의 실험 환경을 나타낸다. 두 개의 마이크(1,2)와 두 개의 소스 신호들(A 내지 K)이 7m X 5m X 2.75m 크기의 입방체의 룸에 배치되었다. 반향 시간(reverberation time)은 100ms로, 그리고 벽, 마루, 및 천장의 반사 계수는 0.57로 설정되었다. 본 실험에서 2048-포인트 FFT(Fast Fourier Transform), 2048-탭 해닝 윈도우(tab hanning window), 및 512 샘플의 쉬프트 크기(shift size)를 사용하였다. Figure 3 shows a simulation environment for testing the performance of the blind signal separation algorithm according to an embodiment of the present invention. Two microphones (1, 2) and two source signals (A to K) were placed in a cube room measuring 7m x 5m x 2.75m. The reverberation time was set to 100 ms and the reflection coefficients of the walls, floor and ceiling were set to 0.57. In this experiment a 2048-point Fast Fourier Transform (FFT), a 2048-tab hanning window, and a shift size of 512 samples were used.

본 발명의 실시예에 따라 제안된 모델의 성능을 다른 모델의 성능과 비교하기 위해서, 본 실험에서는 대조군으로 기본 IVA(참조: KIM, T., ATTIAS, H.T., LEE, S.-Y., and LEE, T.-W.: 'Blind source separation exploiting higher-order frequency dependencies', IEEE Trans. Audio Speech Lang. Process., 2007, 15, (1), pp. 70-79)와 부대역 국부 집단 IVA(참조: LEE, I., JANG, G.-J., and LEE, T.-W.: 'Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals', Electronics Letters, 2009, 45, (13), pp. 710-711)를 이용하여 동일한 실험을 진행하였다. 상기 부대역 국부 집단은 128-빈 쉬프트된 7개의 256-빈 집단들로 구성된다. 도2(a) 및 도2(b)는 각각 기본 IVA 모델의 광역 주파수 집단과 부대역 국부 IVA 모델의 부대역 국부 주파수 집단을 나타낸다. In order to compare the performance of the proposed model with the performance of other models according to an embodiment of the present invention, in this experiment, the basic IVA (see: KIM, T., ATTIAS, HT, LEE, S.-Y., and LEE, T.-W .: 'Blind source separation exploiting higher-order frequency dependencies', IEEE Trans.Audio Speech Lang.Process., 2007, 15, (1), pp. 70-79) and subband local population IVA (See LEE, I., JANG, G.-J., and LEE, T.-W .: 'Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals', Electronics Letters, 2009, 45, (13), pp. 710-711) the same experiment was carried out. The subband local group consists of seven 256-bin shifted groups that are 128-bin shifted. 2 (a) and 2 (b) show the wideband frequency band of the basic IVA model and the subband local frequency band of the subband local IVA model, respectively.

암묵 신호 분리 알고리즘의 분리 성능은 아래 수학식(8)에 의해 정의되는 신호 대 간섭비(SIR: Signal-to-Interference Ratio)의 관점에서 측정될 수 있다. The separation performance of the blind signal separation algorithm may be measured in terms of signal-to-interference ratio (SIR) defined by Equation (8) below.

_수학식 ₍₈₎

_Equation ₍₈₎

여기서, q(i)는 i번째 음원이 나타나는 분리된 음원 색인을 나타내고, 그리고 h_iq _(j)는

와 같이 정의되는 전역 임펄스 응답을 나타내며, 여기서 a_ijk 및 w_ijk는 각각 A _k와 W _k의 (i, j)번째 성분을 나타낸다. Where q (i) represents the separated sound source index at which the i-th sound source appears, and h _iq _(j)

And refers to the global impulse response is defined as, where a and w _ijk _ijk represents the (i, j) th element of A _k and W _k, respectively.

두 개의 음성 신호를 분리한 실험 결과와, 음성 신호와 음악 신호를 분리한 실험 결과가 각각 아래 표1 및 표2에 표시되어 있다. 표1에서는 실험에 두 개의 음성 신호가 이용된 것을 나타내고, 표2에서는 실험에 하나의 음성 신호와 하나의 음악 신호가 이용된 것을 나타낸다. The experimental results of separating the two voice signals and the experimental results of separating the voice signal and the music signal are shown in Tables 1 and 2, respectively. Table 1 shows that two voice signals were used in the experiment, and Table 2 shows that one voice signal and one music signal were used in the experiment.

Experiment numberExperiment number 1One 22 33 44 55 66 77 Source locationsSource locations A, IA, I B, GB, G E, GE, G I, KI, K C, DC, D E, FE, F I, JI, J Input SIRInput SIR 0.70.7 0.00.0 0.00.0 -0.4-0.4 -0.0-0.0 0.00.0 -0.4-0.4 Original IVA Original IVA 16.116.1 14.214.2 14.514.5 8.78.7 10.210.2 14.514.5 14.114.1 Sub-band local clique IVA Sub-band local clique IVA 19.919.9 17.317.3 17.317.3 8.38.3 11.911.9 18.318.3 15.015.0 Harmonic clique IVAHarmonic clique IVA 21.321.3 18.618.6 18.118.1 10.110.1 13.413.4 19.219.2 15.815.8

Experiment numberExperiment number 1One 22 33 44 55 66 77 Source locationsSource locations A, IA, I B, GB, G E, GE, G I, KI, K C, DC, D E, FE, F I, JI, J Input SIRInput SIR 0.20.2 0.40.4 0.30.3 -0.5-0.5 -0.1-0.1 0.10.1 -0.4-0.4 Original IVA Original IVA 10.210.2 8.98.9 9.79.7 6.86.8 7.07.0 9.29.2 10.010.0 Sub-band local clique IVA Sub-band local clique IVA 14.214.2 13.213.2 10.810.8 9.29.2 8.98.9 12.612.6 12.412.4 Harmonic clique IVAHarmonic clique IVA 15.415.4 14.714.7 14.414.4 9.69.6 10.210.2 14.114.1 13.013.0

표1 및 표2로부터 알 수 있는 바와 같이, Harmonic clique IVA로 표시된, 본 발명의 실시예에 따른 암묵 신호 분리 알고리즘을 이용한 경우 SIR 관점에서 기본 IVA(original IVA) 및 부대역 국부 집단 IVA(Sub-band local clique IVA)보다 일관적으로 더 나은 성능을 나타내고 있다. As can be seen from Table 1 and Table 2, when the blind signal separation algorithm according to the embodiment of the present invention, denoted Harmonic clique IVA, is used in terms of SIR, original IVA and subband local population IVA (Sub-). consistently better performance than band local clique IVA).

이상에서 살펴본 바와 같이, 본 발명의 실시예에서는, 음성 및 음악 신호와 같이 하모닉 구조를 갖는 음원 신호에 대해서, 이러한 강력한 하모닉 구조를 고려한 BSS 알고리즘을 제안한다. 하모닉 집단들은 선험적으로 음원의 기본 주파수들의 배수 주파수들에 대한 주파수 빈들 사이에 종속성을 할당한다. 전술한 시뮬레이션 결과로부터 본 발명의 실시예에 따라 제안된 암묵 신호 분리 알고리즘에 따르면 종래의 BSS 알고리즘보다 음성 및 음악 신호들의 분리를 보다 효과적으로 달성할 수 있다. As described above, in the embodiment of the present invention, for a sound source signal having a harmonic structure such as voice and music signals, a BSS algorithm considering such a strong harmonic structure is proposed. Harmonic groups a priori assign a dependency between frequency bins for multiples of the fundamental frequencies of the sound source. According to the blind signal separation algorithm proposed according to the embodiment of the present invention from the above simulation results, it is possible to more effectively achieve separation of voice and music signals than the conventional BSS algorithm.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해되어야 하고, 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. will be. Therefore, it should be understood that the above-described embodiments are to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than the foregoing description, It is intended that all changes and modifications derived from the equivalent concept be included within the scope of the present invention.

10, 12: 소스
20, 22: 신호 수신부
30: 디믹싱 시스템10, 12: source
20, 22: Signal receiving section
30: De-mixing system

Claims

In a method for separating a signal in the frequency domain,
Mixing and receiving signals from two or more sound sources; And
Demixing the received signal assuming a dependency only between frequency bins included in the same harmonic frequency group among a plurality of harmonic frequency groups;
The blind signal separation method comprising a.

The method of claim 1,
Wherein the demixing comprises:
A blind signal separation method characterized in that it is carried out through an independent vector analysis.

The method of claim 1,
Wherein the demixing comprises:
Cost function

Calculate the transfer function (W) using

Repeatedly calculating the transfer function according to the transfer function; And
And demixing the received signal using the transfer function,
Where L is the log likelihood function

A blind signal separation method characterized in that.

Claim 4 has been abandoned due to the setting registration fee.

4. The method according to any one of claims 1 to 3,
And a signal from the sound source comprises at least one of a voice signal and a music signal.

4. The method according to any one of claims 1 to 3,
The method of claim 2, wherein consecutive harmonic frequency groups of the plurality of harmonic frequency groups overlap each other.

4. The method according to any one of claims 1 to 3,
The plurality of harmonic frequency groups includes 49 harmonic frequency groups, and each of the plurality of harmonic frequency groups includes frequency bins consisting of the first eight multiples of the corresponding fundamental frequency Fh. Way.

In a demixing system for separating signals in the frequency domain,
A signal receiving unit which receives signals from two or more sound sources mixed together; And
A demixing filter demixing the received signal assuming a dependency only between frequency bins included in the same harmonic frequency group among a plurality of harmonic frequency groups;
Demixing system for blind signal separation comprising a.

Claim 8 was abandoned when the registration fee was paid.

The method of claim 7, wherein
The blind signal separation is a demixing system for blind signal separation, characterized in that performed through independent vector analysis.

Claim 9 has been abandoned due to the setting registration fee.

The method of claim 7, wherein
Cost function

Calculate the transfer function (W) using

Further comprising a filter parameter calculator for repeatedly calculating the transfer function according to,
Where L is the log likelihood function

Demixing system for blind signal separation, characterized in that.

Claim 10 has been abandoned due to the setting registration fee.

10. The method according to any one of claims 7 to 9,
The signal from the sound source may comprise any one or more of a speech signal and a music signal.

Claim 11 was abandoned when the registration fee was paid.

10. The method according to any one of claims 7 to 9,
And a continuous harmonic frequency group among the plurality of harmonic frequency groups overlaps each other.

Claim 12 is abandoned in setting registration fee.

10. The method according to any one of claims 7 to 9,
The plurality of harmonic frequency groups includes 49 harmonic frequency groups, and each of the plurality of harmonic frequency groups includes frequency bins consisting of the first eight multiples of the corresponding fundamental frequency Fh. Demixing system for