KR20240009758A

KR20240009758A - A method of online beamforming and steering vector estimation based on target masks and ICA for robust speech recognition

Info

Publication number: KR20240009758A
Application number: KR1020220087067A
Authority: KR
Inventors: 박형민; 신의협
Original assignee: 서강대학교산학협력단
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2024-01-23
Also published as: WO2024014797A1

Abstract

본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템은 입력 제공부, 디믹싱 제공부 및 결과 제공부를 포함할 수 있다. 입력 제공부는 타겟지점에서의 타겟신호 및 노이즈 신호에 상응하는 공간전달함수에 기초하여 마이크들의 입력신호를 제공할 수 있다. 디믹싱 제공부는 마이크들의 입력신호 및 공간 제약 조건이 걸린 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬을 제공할 수 있다. 결과 제공부는 디믹싱 행렬에 기초하여 마이크들의 입력신호로부터 결과신호를 추출할 수 있다.
본 발명에 따른 빔포밍 및 방향 벡터 추정 시스템에서는, 복수의 제한조건들을 포함하는 비용함수를 이용해서 디믹싱 행렬을 산출함으로써 타겟지점으로부터 발생되는 타겟신호를 원하는 채널에 고정적으로 왜곡없이 추출할 수 있다.The beamforming and direction vector estimation system according to an embodiment of the present invention may include an input providing unit, a demixing providing unit, and a result providing unit. The input provider may provide input signals from microphones based on a spatial transfer function corresponding to the target signal and noise signal at the target point. The demixing provider may provide a demixing matrix determined according to independent component analysis (ICA) with spatial constraints and input signals from microphones. The result provider may extract result signals from input signals from microphones based on the demixing matrix.
In the beamforming and direction vector estimation system according to the present invention, by calculating a demixing matrix using a cost function including a plurality of constraint conditions, the target signal generated from the target point can be extracted fixedly and without distortion to the desired channel. .

Description

Real-time beamforming and steering vector estimation method based on target masks and ICA for robust speech recognition {A method of online beamforming and steering vector estimation based on target masks and ICA for robust speech recognition}

본 발명은 음성 인식 시스템에서의 타겟 음원에 대한 빔포밍 및 방향 벡터 추정 시스템에 관한 것으로서, 더욱 구체적으로는 목표 마스크와 독립 성분 분석에 기반하여 타겟 음원과 노이즈에 대한 모델을 동시에 고려하여 빔포밍 및 방향 벡터 추정에 대한 성능을 향상시킬 수 있는 방법에 관한 것이다. The present invention relates to a beamforming and direction vector estimation system for a target sound source in a speech recognition system. More specifically, beamforming and direction vector estimation system are performed by simultaneously considering models for the target sound source and noise based on a target mask and independent component analysis. This relates to a method that can improve performance for direction vector estimation.

마이크를 통해서 입력되는 마이크들의 입력신호는 음성인식에 필요한 타겟 음성뿐만 아니라 음성인식에 방해가 되는 노이즈들이 포함될 수 있다. 마이크들의 입력신호에서 노이즈를 제거하고, 원하는 타겟 음성만을 추출하여 음성인식의 성능을 높이기 위한 다양한 연구가 진행되고 있다. Input signals from microphones input through microphones may include not only the target voice required for voice recognition but also noise that interferes with voice recognition. Various research is being conducted to improve voice recognition performance by removing noise from microphone input signals and extracting only desired target voices.

(한국등록특허) 제10-1133308호 (등록일자, 2012.3.28)(Korean registered patent) No. 10-1133308 (registration date, March 28, 2012)

본 발명이 이루고자 하는 기술적 과제는 복수의 제한조건들을 추가적으로 포함하는 비용함수를 이용해서 디믹싱 행렬을 산출함으로써 타겟지점으로부터 발생되는 타겟신호를 왜곡없이 추출할 수 있는 빔포밍 및 방향 벡터 추정 시스템을 제공하는 것이다. The technical problem to be achieved by the present invention is to provide a beamforming and direction vector estimation system that can extract a target signal generated from a target point without distortion by calculating a demixing matrix using a cost function that additionally includes a plurality of constraints. It is done.

이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템은 입력 제공부, 디믹싱 제공부 및 결과 제공부를 포함할 수 있다. 입력 제공부는 타겟지점에서의 타겟신호 및 노이즈 신호에 상응하는 공간전달함수에 기초하여 마이크들의 입력신호를 제공할 수 있다. 디믹싱 제공부는 상기 마이크들의 입력신호 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬을 제공할 수 있다. 결과 제공부는 상기 디믹싱 행렬에 기초하여 상기 마이크들의 입력신호로부터 결과신호를 추출할 수 있다. In order to solve this problem, the beamforming and direction vector estimation system according to an embodiment of the present invention may include an input providing unit, a demixing providing unit, and a result providing unit. The input provider may provide input signals from microphones based on a spatial transfer function corresponding to the target signal and noise signal at the target point. The demixing provider may provide a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA). The result provider may extract a result signal from the input signals of the microphones based on the demixing matrix.

일 실시예에 있어서, 상기 공간전달함수는 상기 타겟 지점으로부터 상기 입력제공부까지의 전달함수에 해당하는 방향 벡터 및 상기 노이즈 신호가 상기 입력제공부까지 전달되기까지의 전달함수에 해당하는 노이즈 전달함수를 포함할 수 있다. In one embodiment, the spatial transfer function is a direction vector corresponding to a transfer function from the target point to the input provider and a noise transfer function corresponding to a transfer function until the noise signal is transmitted to the input provider. may include.

일 실시예에 있어서, 상기 디믹싱 행렬에 포함되는 제1 성분 과 상기 공간전달함수에 포함되는 방향 벡터 의 곱은 1이고, 상기 디믹싱 행렬에 포함되는 상기 제1 성분을 제외한 나머지 성분 과 상기 방향 벡터의 곱은 0일 수 있다.In one embodiment, the demixing matrix The first ingredient included in and the direction vector included in the spatial transfer function The product of is 1, and the remaining components excluding the first component included in the demixing matrix and the direction vector The product of may be 0.

일 실시예에 있어서, 상기 디믹싱 행렬은 상기 독립성분 분석에 따른 비용함수(Cost Function, CF)에 기초하여 결정될 수 있다. In one embodiment, the demixing matrix may be determined based on a cost function (CF) according to the independent component analysis.

일 실시예에 있어서, 상기 마이크들의 입력신호 및 상기 디믹싱 행렬에 기초하여 생성되는 결과 행렬의 제1 성분은 상기 타겟 신호에 상응할 수 있다. In one embodiment, the first component of a result matrix generated based on the input signals of the microphones and the demixing matrix may correspond to the target signal.

일 실시예에 있어서, 상기 비용함수는 [수학식1]과 같이 표현되고,In one embodiment, the cost function is expressed as [Equation 1],

[수학식1][Equation 1]

여기서, 는 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 및 는 각각 왜곡 방지 및 널 조건의 제약 정도를 조절하는 파라미터, 는 디믹싱 행렬의 제1 성분, 는 방향 벡터일 수 있다.here, is the cost function, k and m are natural numbers representing the frequency and channel index, respectively, is the cost function of independent component analysis, and are parameters that control the degree of restriction of distortion prevention and null conditions, respectively, is the first component of the demixing matrix, may be a direction vector.

일 실시예에 있어서, 상기 비용함수는 [수학식2]과 같이 표현되고,In one embodiment, the cost function is expressed as [Equation 2],

[수학식2][Equation 2]

여기서, 는 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 및 는 각각 왜곡 방지 및 널 조건을 보장하기 위한 라그랑즈 승수, 는 디믹싱 행렬의 제1 성분, 는 방향 벡터일 수 있다. here, is the cost function, k and m are natural numbers representing the frequency and channel index, respectively, and are the Lagrangian multipliers to prevent distortion and ensure null conditions, respectively; is the first component of the demixing matrix, may be a direction vector.

일 실시예에 있어서, 상기 비용함수는 [수학식3]과 같이 표현되고,In one embodiment, the cost function is expressed as [Equation 3],

[수학식3][Equation 3]

여기서, 는 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 왜곡 방지 조건을 보장하기 위한 라그랑즈 승수, 는 디믹싱 행렬의 제1 성분, 는 방향 벡터, 는 널 조건의 제약 정도를 조절하는 파라미터일 수 있다. here, is the cost function, k and m are natural numbers representing the frequency and channel index, respectively, is the Lagrangian multiplier to ensure the anti-distortion condition, is the first component of the demixing matrix, is the direction vector, may be a parameter that controls the degree of constraint of the null condition.

일 실시예에 있어서, 상기 타겟신호에 대한 상기 결과신호는 라플라시안 함수에 따라 분포될 수 있다. In one embodiment, the result signal with respect to the target signal may be distributed according to a Laplacian function.

일 실시예에 있어서, 상기 방향 벡터는 상기 마이크들의 입력신호에 대한 공간 공분산 행렬(Input Spatial Covariance Matrix, ISCM)) 및 상기 노이즈 신호에 대한 공간 공분산 행렬(Noise Spatial Covariance Matrix, NSCM))의 차에 따라 결정될 수 있다. In one embodiment, the direction vector is the difference between the spatial covariance matrix (Input Spatial Covariance Matrix, ISCM) for the input signals of the microphones and the spatial covariance matrix (Noise Spatial Covariance Matrix, NSCM) for the noise signals. It can be decided accordingly.

일 실시예에 있어서, 상기 노이즈 신호에 대한 공간 공분산 행렬은 상기 결과신호 중 상기 타겟 신호에 상응하는 값과 상기 노이즈 신호에 상응하는 값의 비율에 따라 결정될 수 있다.In one embodiment, the spatial covariance matrix for the noise signal may be determined according to the ratio of a value corresponding to the target signal and a value corresponding to the noise signal among the result signals.

일 실시예에 있어서, 상기 빔포밍 및 방향 벡터 추정 시스템은 일정한 시간간격에 해당하는 프레임마다 구동되어 상기 디믹싱 행렬을 업데이트할 수 있다. In one embodiment, the beamforming and direction vector estimation system may be driven for each frame corresponding to a constant time interval to update the demixing matrix.

이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템은 입력 제공부, 디믹싱 제공부, 결과 제공부 및 음성 제공부를 포함할 수 있다. 입력 제공부는 타겟지점에서의 타겟신호 및 노이즈 신호에 상응하는 공간전달함수에 기초하여 마이크들의 입력신호를 제공할 수 있다. 디믹싱 제공부는 상기 마이크들의 입력신호 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬을 제공할 수 있다. 결과 제공부는 상기 디믹싱 행렬에 기초하여 상기 마이크들의 입력신호로부터 결과신호를 추출할 수 있다. 음성 제공부는 상기 결과신호를 음성으로 제공할 수 있다. To solve this problem, the beamforming and direction vector estimation system according to an embodiment of the present invention may include an input providing unit, a demixing providing unit, a result providing unit, and a voice providing unit. The input provider may provide input signals from microphones based on a spatial transfer function corresponding to the target signal and noise signal at the target point. The demixing provider may provide a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA). The result provider may extract a result signal from the input signals of the microphones based on the demixing matrix. The voice provider may provide the result signal as a voice.

이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템의 동작방법에서는, 입력 제공부가 타겟지점에서의 타겟신호 및 노이즈 신호의 각각에 상응하는 공간전달함수에 기초하여 마이크들의 입력신호를 제공할 수 있다. 디믹싱 제공부가 상기 마이크들의 입력신호 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬을 제공할 수 있다. 결과 제공부가 상기 디믹싱 행렬에 기초하여 상기 마이크들의 입력신호로부터 결과신호를 추출할 수 있다. In order to solve this problem, in the method of operating the beamforming and direction vector estimation system according to an embodiment of the present invention, the input provider operates the microphones based on the spatial transfer functions corresponding to each of the target signal and noise signal at the target point. An input signal can be provided. The demixing provider may provide a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA). The result provider may extract a result signal from the input signals of the microphones based on the demixing matrix.

이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템의 동작방법에서는, 입력 제공부가 타겟지점에서의 타겟신호 및 노이즈 신호의 각각에 상응하는 공간전달함수에 기초하여 마이크들의 입력신호를 제공할 수 있다. 디믹싱 제공부가 상기 마이크들의 입력신호 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬을 제공할 수 있다. 결과 제공부가 상기 디믹싱 행렬에 기초하여 상기 마이크들의 입력신호로부터 결과신호를 추출할 수 있다. 음성 제공부가 상기 결과신호를 음성으로 제공할 수 있다. In order to solve this problem, in the method of operating the beamforming and direction vector estimation system according to an embodiment of the present invention, the input provider operates the microphones based on the spatial transfer functions corresponding to each of the target signal and noise signal at the target point. An input signal can be provided. The demixing provider may provide a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA). The result provider may extract a result signal from the input signals of the microphones based on the demixing matrix. The voice provider may provide the result signal as voice.

위에서 언급된 본 발명의 기술적 과제 외에도, 본 발명의 다른 특징 및 이점들이 이하에서 기술되거나, 그러한 기술 및 설명으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.In addition to the technical problems of the present invention mentioned above, other features and advantages of the present invention are described below, or can be clearly understood by those skilled in the art from such description and description.

이상과 같은 본 발명에 따르면 다음과 같은 효과가 있다.According to the present invention as described above, the following effects are achieved.

본 발명에 따른 빔포밍 및 방향 벡터 추정 시스템에서는, 복수의 제한조건들을 포함하는 비용함수를 이용해서 디믹싱 행렬을 산출함으로써 타겟지점으로부터 발생되는 타겟신호를 왜곡없이 추출할 수 있다. 그리고 타겟 신호뿐만 아니라 노이즈 신호에 상응하는 값을 사용하여 방향 벡터를 효과적으로 추정할 수 있다.In the beamforming and direction vector estimation system according to the present invention, a target signal generated from a target point can be extracted without distortion by calculating a demixing matrix using a cost function including a plurality of constraint conditions. And the direction vector can be effectively estimated using values corresponding to the noise signal as well as the target signal.

이 밖에도, 본 발명의 실시 예들을 통해 본 발명의 또 다른 특징 및 이점들이 새롭게 파악될 수도 있을 것이다.In addition, other features and advantages of the present invention may be newly understood through embodiments of the present invention.

도 1은 본 발명의 실시예들에 따른 빔포밍 및 방향 벡터 추정 시스템을 나타내는 도면이다.
도 2는 도 1의 빔포밍 및 방향 벡터 추정 시스템에 적용되는 마이크들의 입력신호를 설명하기 위한 도면이다.
도 3은 도 1의 빔포밍 및 방향 벡터 추정 시스템에서 사용되는 공간전달함수를 설명하기 위한 도면이다.
도 4 및 5는 도 1의 빔포밍 및 방향 벡터 추정 시스템의 온라인 동작을 설명하기 위한 도면들이다.
도 6은 도 1의 빔포밍 및 방향 벡터 추정 시스템의 일 실시예를 설명하기 위한 도면이다.
도 7은 본 발명의 실시예들에 따른 빔포밍 및 방향 벡터 추정 시스템의 동작방법을 나타내는 도면이다.
도 8은 도 7의 빔포밍 및 방향 벡터 추정 시스템의 동작방법의 일 실시예를 설명하기 위한 도면이다. 1 is a diagram illustrating a beamforming and direction vector estimation system according to embodiments of the present invention.
FIG. 2 is a diagram for explaining input signals from microphones applied to the beamforming and direction vector estimation system of FIG. 1.
FIG. 3 is a diagram for explaining the spatial transfer function used in the beamforming and direction vector estimation system of FIG. 1.
Figures 4 and 5 are diagrams for explaining the online operation of the beamforming and direction vector estimation system of Figure 1.
FIG. 6 is a diagram for explaining an embodiment of the beamforming and direction vector estimation system of FIG. 1.
Figure 7 is a diagram showing a method of operating a beamforming and direction vector estimation system according to embodiments of the present invention.
FIG. 8 is a diagram for explaining an embodiment of an operating method of the beamforming and direction vector estimation system of FIG. 7.

본 명세서에서 각 도면의 구성 요소들에 참조번호를 부가함에 있어서 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다.In this specification, it should be noted that when adding reference numbers to the components of each drawing, the same components are given the same number as much as possible even if they are shown in different drawings.

한편, 본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of the terms described in this specification should be understood as follows.

단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한, 복수의 표현을 포함하는 것으로 이해되어야 하는 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.Unless the context clearly defines otherwise, singular expressions should be understood to include plural expressions, and the scope of rights should not be limited by these terms.

"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms such as “include” or “have” should be understood as not precluding the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

이하, 첨부되는 도면을 참고하여 상기 문제점을 해결하기 위해 고안된 본 발명의 바람직한 실시예들에 대해 상세히 설명한다.Hereinafter, preferred embodiments of the present invention designed to solve the above problems will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예들에 따른 빔포밍 및 방향 벡터 추정 시스템을 나타내는 도면이고, 2는 도 1의 빔포밍 및 방향 벡터 추정 시스템에 적용되는 마이크들의 입력신호를 설명하기 위한 도면이고, 도 3은 도 1의 빔포밍 및 방향 벡터 추정 시스템에서 사용되는 공간전달함수를 설명하기 위한 도면이다.1 is a diagram illustrating a beamforming and direction vector estimation system according to embodiments of the present invention, and 2 is a diagram illustrating input signals from microphones applied to the beamforming and direction vector estimation system of FIG. 1, and FIG. 3 is a diagram to explain the spatial transfer function used in the beamforming and direction vector estimation system of FIG. 1.

도 1 내지 3을 참조하면, 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템(10)은 입력 제공부(100), 디믹싱 제공부(200) 및 결과 제공부(300)를 포함할 수 있다. 입력 제공부(100)는 타겟지점(TP)에서의 타겟신호(TS) 및 노이즈 신호(NS)에 상응하는 공간전달함수(A)에 기초하여 마이크들의 입력신호(XS)를 제공할 수 있다. 예를 들어, 입력 제공부(100)는 복수의 마이크들일 수 있다. 복수의 마이크들은 제1 마이크(101) 내지 제3 마이크(103)를 포함할 수 있다. 공간전달함수(A)는 타겟신호(TS)가 타겟지점(TP)으로부터 입력 제공부(100)까지 전달되기까지의 전달함수에 해당하는 방향 벡터(Steering Vector)(H) 및 노이즈 신호(NS)가 입력 제공부(100)까지 전달되기까지의 전달함수에 해당하는 노이즈 전달함수(D)를 포함할 수 있다. 이 경우, 마이크들의 입력신호(XS)는 아래와 같이 [수학식1-1]로 표현될 수 있다. Referring to FIGS. 1 to 3, the beamforming and direction vector estimation system 10 according to an embodiment of the present invention may include an input providing unit 100, a demixing providing unit 200, and a result providing unit 300. You can. The input provider 100 may provide input signals (XS) from microphones based on a spatial transfer function (A) corresponding to the target signal (TS) and noise signal (NS) at the target point (TP). For example, the input providing unit 100 may be a plurality of microphones. The plurality of microphones may include a first microphone 101 to a third microphone 103. The spatial transfer function (A) is a direction vector (Steering Vector) (H) and noise signal (NS) corresponding to the transfer function until the target signal (TS) is transferred from the target point (TP) to the input provider 100. It may include a noise transfer function (D) corresponding to the transfer function until the noise is transmitted to the input providing unit 100. In this case, the input signal (XS) of the microphones can be expressed as [Equation 1-1] as follows.

[수학식1-1][Equation 1-1]

, ,

여기서, 는 마이크들의 입력신호(XS),는 공간전달함수(A), 는 타겟신호(TS),는 노이즈 신호(NS), 방향 벡터(H), 는 노이즈 전달함수(D), 는 주파수 인덱스, 는 프레임 인덱스일 수 있다. here, is the input signal of the microphones (XS), is the spatial transfer function (A), is the target signal (TS), is the noise signal (NS), direction vector (H), is the noise transfer function (D), is the frequency index, may be a frame index.

디믹싱 제공부(200)는 마이크들의 입력신호(XS) 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬(W)를 제공할 수 있다. The demixing provider 200 may provide a demixing matrix (W) determined according to input signals (XS) of microphones and independent component analysis (ICA).

일 실시예에 있어서, 디믹싱 행렬(W)는 독립성분 분석 및 공간 제약 조건에 따른 비용함수(Cost Function, CF)에 기초하여 결정될 수 있다. 공간 제약 조건은 [수학식 1-1]과 이후 후술할 [수학식 4]에 의해서 각각 왜곡 방지 조건()과 널 조건()으로 표현될 수 있다. 예를 들어, 비용함수는 아래의 [수학식1], [수학식2] 및 [수학식3]으로 표현될 수 있다. In one embodiment, the demixing matrix (W) may be determined based on independent component analysis and a cost function (CF) according to space constraints. The space constraint condition is a distortion prevention condition ( ) and null condition ( ) can be expressed as For example, the cost function can be expressed as [Equation 1], [Equation 2], and [Equation 3] below.

[수학식1][Equation 1]

여기서, 는 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 및 는 각각 왜곡 방지 및 널 조건의 제약 정도를 조절하는 파라미터 와 는 디믹싱 행렬(W)의 제1 및 m 성분, 는 방향 벡터(H) 일 수 있다. 한편 비용함수 를 구성하는 기본적인 독립성분 분석의 기존 비용함수 는 아래의 [수학식1-2]로 표현될 수 있다.here, is the cost function, k and m are natural numbers representing the frequency and channel index, respectively, is the cost function of independent component analysis, and are parameters that control the degree of restriction of distortion prevention and null conditions, respectively. and are the first and m components of the demixing matrix (W), may be a direction vector (H). Meanwhile, the cost function The existing cost function of the basic independent component analysis that constitutes Can be expressed as [Equation 1-2] below.

[수학식1-2][Equation 1-2]

여기서, k 및 m은 주파수 및 채널 인덱스를 나타내는 자연수, 와 는 디믹싱 행렬(W)의 제1 및 m 성분, 는 디믹싱 행렬(W), 와 는 각각 타겟 신호와 노이즈 신호의 모델링에 따라 결정되는 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)을 나타낼 수 있다. [수학식1] 을 최적화 하는 디믹싱 행렬(W)의 제 1 성분 는 아래와 같은 [수학식1-3]에 의해 반복적으로 수렴 추정할 수 있다.where k and m are natural numbers representing the frequency and channel index, and are the first and m components of the demixing matrix (W), is the demixing matrix (W), and may represent a Weighted Spatial Covariance Matrix (WSCM) determined according to modeling of the target signal and the noise signal, respectively. The first component of the demixing matrix (W) that optimizes [Equation 1] Can be repeatedly estimated to converge using [Equation 1-3] below.

[수학식1-3] [Equation 1-3]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 타겟 신호에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 왜곡 방지 조건의 제약 정도를 조절하는 파라미터, 는 방향 벡터(H), 는 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식1]을 최적화하는 디믹싱 행렬(W)의 제 m 성분은 아래와 같이 [수학식1-4]에 의해 반복적으로 수렴 추정할 수 있다.Here, k is a natural number representing the frequency index, is the Weighted Spatial Covariance Matrix (WSCM) for the target signal, is a parameter that controls the degree of constraint of the distortion prevention condition, is the direction vector (H), May be the first component of the demixing matrix (W). And the mth component of the demixing matrix (W) that optimizes [Equation 1] can be repeatedly estimated to converge using [Equation 1-4] as shown below.

[수학식1-4] [Equation 1-4]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 노이즈 신호에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 널 조건의 제약 정도를 조절하는 파라미터, 는 방향 벡터(H), 는 디믹싱 행렬(W), 는 디믹싱 행렬(W)의 제 m 성분일 수 있다.Here, k is a natural number representing the frequency index, is the Weighted Spatial Covariance Matrix (WSCM) for the noise signal, is a parameter that controls the degree of constraint of the null condition, is the direction vector (H), is the demixing matrix (W), may be the mth component of the demixing matrix (W).

[수학식2][Equation 2]

여기서, 는 비용함수, k 및 m은 주파수 및 채널 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 및 는 각각 왜곡 방지 및 널 조건을 보장하기 위한 라그랑즈 승수, 와 는 디믹싱 행렬(W)의 제1 및 m 성분, 는 방향 벡터(H)일 수 있다. 그리고 [수학식2] 을 최적화 하는 디믹싱 행렬(W)의 제 1 성분 는 아래와 같은 [수학식2-1]에 의해 반복적으로 수렴 추정할 수 있다.here, is the cost function, k and m are natural numbers representing the frequency and channel index, is the cost function of independent component analysis, and are the Lagrangian multipliers to prevent distortion and ensure null conditions, respectively; and are the first and m components of the demixing matrix (W), may be a direction vector (H). And the first component of the demixing matrix (W) that optimizes [Equation 2] The convergence can be estimated repeatedly using [Equation 2-1] below.

[수학식2-1] [Equation 2-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 타겟 신호에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 방향 벡터(H), 는 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식2]을 최적화하는 디믹싱 행렬(W)의 제 m 성분은 아래와 같이 [수학식2-2]에 의해 반복적으로 수렴 추정할 수 있다.Here, k is a natural number representing the frequency index, is the Weighted Spatial Covariance Matrix (WSCM) for the target signal, is the direction vector (H), May be the first component of the demixing matrix (W). And the mth component of the demixing matrix (W) that optimizes [Equation 2] can be repeatedly estimated to converge using [Equation 2-2] as shown below.

[수학식2-2] [Equation 2-2]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 노이즈 신호에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 방향 벡터(H), 는 디믹싱 행렬(W), 는 디믹싱 행렬(W)의 제 m 성분일 수 있다.Here, k is a natural number representing the frequency index, is the Weighted Spatial Covariance Matrix (WSCM) for the noise signal, is the direction vector (H), is the demixing matrix (W), may be the mth component of the demixing matrix (W).

[수학식3][Equation 3]

여기서, 는 비용함수, k 및 m은 주파수 및 채널 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 는 왜곡 방지 조건을 보장하기 위한 라그랑즈 승수, 와 는 디믹싱 행렬(W)의 제1 및 m 성분, 는 방향 벡터(H), 는 널 조건의 제약 정도를 조절하는 파라미터일 수 있다. 그리고 [수학식3] 을 최적화 하는 디믹싱 행렬(W)의 제 1 성분 는 아래와 같은 [수학식3-1]에 의해 반복적으로 수렴 추정할 수 있다.here, is the cost function, k and m are natural numbers representing the frequency and channel index, is the cost function of independent component analysis, is the Lagrangian multiplier to ensure the anti-distortion condition, and are the first and m components of the demixing matrix (W), is the direction vector (H), may be a parameter that controls the degree of constraint of the null condition. And the first component of the demixing matrix (W) that optimizes [Equation 3] Can be repeatedly estimated to converge using [Equation 3-1] below.

[수학식3-1] [Equation 3-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 타겟 신호에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 방향 벡터(H), 는 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식3]을 최적화하는 디믹싱 행렬(W)의 제 m 성분은 아래와 같이 [수학식3-2]에 의해 반복적으로 수렴 추정할 수 있다.Here, k is a natural number representing the frequency index, is the Weighted Spatial Covariance Matrix (WSCM) for the target signal, is the direction vector (H), May be the first component of the demixing matrix (W). And the mth component of the demixing matrix (W) that optimizes [Equation 3] can be repeatedly estimated to converge using [Equation 3-2] as shown below.

[수학식3-2] [Equation 3-2]

[수학식1], [수학식2] 및 [수학식3]에 표현되는 바와 같이 각 수학식에서이후에 배치되는 제한조건들을 비용함수에 적용함으로써 디믹싱 행렬(W)에 기초하여 제공되는 결과신호(RS)를 포함하는 행렬의 제1 채널(성분)을 타겟신호(TS)에 대한 결과로 고정할 수 있고, 타겟신호(TS)에 대한 왜곡도 감소시킬 수 있다. 또한, [수학식1]에서의 파라미터 에 의해서 조절되는 제한적인 정도의 왜곡 방지 조건을 가지는 제곱항으로 인하여 발생할 수 있는 음성의 왜곡을 감소시키기 위하여 [수학식3]은 [수학식1]과 [수학식2]를 하이브리드 형태로 조합하여 구성한 것일 수 있다. 일 실시예에 있어서, 마이크들의 입력신호(XS) 및 디믹싱 행렬(W)에 기초하여 생성되는 결과 행렬의 제1 성분은 타겟신호(TS)에 상응할 수 있다. 여기서, [수학식1], [수학식2] 및 [수학식3]의 는 [수학식1-2]의 와 동일할 수 있다. In each equation, as expressed in [Equation 1], [Equation 2], and [Equation 3], By applying the constraints placed later to the cost function, the first channel (component) of the matrix including the result signal (RS) provided based on the demixing matrix (W) is fixed as the result for the target signal (TS) This can be done, and distortion of the target signal (TS) can also be reduced. Additionally, the parameters in [Equation 1] [Equation 3] is composed by combining [Equation 1] and [Equation 2] in a hybrid form to reduce distortion of speech that may occur due to a square term with a limited degree of distortion prevention condition controlled by It may be. In one embodiment, the first component of the result matrix generated based on the input signal (XS) of the microphones and the demixing matrix (W) may correspond to the target signal (TS). Here, [Equation 1], [Equation 2], and [Equation 3] is [Equation 1-2] may be the same as

일 실시예에 있어서, 디믹싱 행렬(W)에 포함되는 제1 성분(W1)와 공간전달함수(A)에 포함되는 방향 벡터(H)의 곱은 1이고, 디믹싱 행렬(W)에 포함되는 제1 성분(W1)을 제외한 나머지 성분(Wm)과 방향 벡터(H)의 곱은 0일 수 있다. 이와 같은 내용은 아래의 [수학식4]와 같이 표현될 수 있다.In one embodiment, the product of the first component (W1) included in the demixing matrix (W) and the direction vector (H) included in the spatial transfer function (A) is 1, and the product of the first component (W1) included in the demixing matrix (W) is 1. The product of the remaining components (Wm) excluding the first component (W1) and the direction vector (H) may be 0. This content can be expressed as [Equation 4] below.

[수학식4][Equation 4]

, ,

여기서, 는 타겟신호(TS)에 상응하는 결과신호(RS), 는 노이즈 신호(NS)에 상응하는 결과신호(RS), 는 마이크들의 입력신호(XS)일 수 있다.here, is the result signal (RS) corresponding to the target signal (TS), is the result signal (RS) corresponding to the noise signal (NS), may be the input signal (XS) of the microphones.

결과 제공부(300)는 디믹싱 행렬(W)에 기초하여 마이크들의 입력신호(XS)로부터 결과신호(RS)를 추출할 수 있다. 예를 들어, 독립성분 분석(Independent Component Analysis, ICA)을 통해서 비용함수를 최적화하는 경우, 타겟신호(TS)에 대한 결과신호(RS) 및 노이즈 신호(NS)에 대한 결과신호(RS)로 구분되어 제공될 수 있다. The result providing unit 300 may extract the result signal (RS) from the input signal (XS) of the microphones based on the demixing matrix (W). For example, when optimizing the cost function through Independent Component Analysis (ICA), it is divided into the result signal (RS) for the target signal (TS) and the result signal (RS) for the noise signal (NS). can be provided.

일 실시예에 있어서, 타겟신호(TS)에 대한 결과신호(RS)는 시간에 따라 변하는 분산을 가지는 라플라시안 분포를 따를 수 있다. 예를 들어, 라플라시안 함수에 따라 분포되는 타겟신호(TS)에 상응하는 결과신호(RS)는 아래의 [수학식5]와 같이 모델링될 수 있다. In one embodiment, the result signal (RS) for the target signal (TS) may follow a Laplacian distribution with variance that varies with time. For example, the result signal (RS) corresponding to the target signal (TS) distributed according to the Laplacian function can be modeled as shown in [Equation 5] below.

[수학식5][Equation 5]

, ,

여기서, 는 타겟신호(TS)에 대한 출력 신호(RS)의 확률밀도함수, 는 타겟신호(TS)의 시간에 따라 변하는 분산, 는 결과신호(RS)일 수 있다. [수학식 5]에 의해서 모델링 된 신호를 통해서 [수학식 1-2]의 타겟에 대한 가중 공간 공분산 행렬 는 아래와 같은 [수학식 5-1]과 같이 계산된다.here, is the probability density function of the output signal (RS) for the target signal (TS), is the time-varying variance of the target signal (TS), may be a result signal (RS). Weighted spatial covariance matrix for the target of [Equation 1-2] through the signal modeled by [Equation 5] is calculated as shown in [Equation 5-1] below.

[수학식 5-1][Equation 5-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 전체 프레임의 개수,는 타겟신호(TS)에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 가중치 함수, 는 미리 정해진 마스크, 와 는 마이크들의 입력신호(XS)를 나타낼 수 있다. 이러한 라플라시안 분포를 따르도록 모델링된 타겟 신호(TS)로부터 가중치 함수는 결과신호(RS) 와 미리 정해진 마스크 로부터 추정된 시간에 따라 변하는 분산 을 모두 반영하여 계산 될 수 있다. 그리고 노이즈 신호(NS)에 대해서는 아래의 [수학식 5-2]와 같이 모델링 될 수 있다.Here, k is a natural number representing the frequency index, is the total number of frames, is the Weighted Spatial Covariance Matrix (WSCM) for the target signal (TS), is the weight function, is a predetermined mask, and may represent the input signal (XS) of the microphones. The weight function from the target signal (TS) modeled to follow this Laplacian distribution is the result signal (RS). with a predetermined mask The time-varying variance estimated from It can be calculated by reflecting all. And the noise signal (NS) can be modeled as shown in [Equation 5-2] below.

[수학식 5-2] [Equation 5-2]

여기서, 는 노이즈 신호(NS)에 대한 출력 신호의 확률 밀도 함수, 는 노이즈 신호(NS)에 대한 출력 신호일 수 있다. 이에 따라서 [수학식 1-2]의 노이즈에 대한 가중 공간 공분산 행렬 는 아래의 [수학식 5-3]과 같이 계산된다.here, is the probability density function of the output signal for the noise signal (NS), may be an output signal for the noise signal (NS). Accordingly, the weighted spatial covariance matrix for noise in [Equation 1-2] is calculated as in [Equation 5-3] below.

[수학식 5-3][Equation 5-3]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 전체 프레임의 개수, 는 노이즈 신호(NS)에 대한 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 노이즈 신호(NS)에 대한 가중치 함수, 는 마이크들의 입력신호(XS)를 나타낼 수 있다.Here, k is a natural number representing the frequency index, is the total number of frames, is the Weighted Spatial Covariance Matrix (WSCM) for the noise signal (NS), is the weight function for the noise signal (NS), may represent the input signal (XS) of the microphones.

일 실시예에 있어서, 방향 벡터(H)는 마이크들의 입력신호(XS)에 대한 공간 공분산 행렬(Input Spatial Covariance Matrix, ISCM)) 및 노이즈 신호(NS)에 대한 공간 공분산 행렬(Noise Spatial Covariance Matrix, NSCM)의 차에 따라 결정될 수 있다. 또 다른 실시예에 있어서, 노이즈 신호(NS)에 대한 공간 공분산 행렬(NSCM)는 결과신호(RS) 중 타겟신호(TS)에 상응하는 값과 노이즈 신호(NS)에 상응하는 값의 비율에 따라 결정될 수 있다. 예를 들어, 방향 벡터(H)를 추정하는 방식은 아래의 [수학식6]과 같이 표시될 수 있다. In one embodiment, the direction vector (H) is a spatial covariance matrix (ISCM) for the input signal (XS) of the microphones and a spatial covariance matrix (Noise Spatial Covariance Matrix) for the noise signal (NS). NSCM) can be determined depending on the difference. In another embodiment, the spatial covariance matrix (NSCM) for the noise signal (NS) is determined according to the ratio of the value corresponding to the target signal (TS) and the value corresponding to the noise signal (NS) among the result signal (RS). can be decided. For example, the method of estimating the direction vector (H) can be expressed as [Equation 6] below.

[수학식6][Equation 6]

, ,

여기서, 는 마이크들의 입력신호(XS)에 대한 공간 공분산 행렬(ISCM)), 는 타겟신호(TS)에 대한 공간 공분산 행렬(TSCM), 는 노이즈 신호(NS)에 대한 공간 공분산 행렬(NSCM), 는 입력 성분 중 노이즈 성분의 기여도를 나타내는 비율일 수 있다. 그리고 방향 벡터 추정에 있어서 는 아래의 [수학식6-1]과 같이 마이크들의 입력신호(XS)에 고정된 외부 마스크의 제곱근 값을 곱한 것으로 대체될 수 있다. here, is the spatial covariance matrix (ISCM) for the input signals (XS) of the microphones), is the spatial covariance matrix (TSCM) for the target signal (TS), is the spatial covariance matrix (NSCM) for the noise signal (NS), may be a ratio representing the contribution of the noise component among the input components. And in direction vector estimation Can be replaced by multiplying the input signal (XS) of the microphones by the square root value of the fixed external mask as shown in [Equation 6-1] below.

[수학식6-1][Equation 6-1]

이러한 타겟 신호(TS)에 대한 공간 공분산 행렬 로부터 주요 아이젠 벡터를 추출함으로써 방향 벡터를 추정할 수 있다.Spatial covariance matrix for these target signals (TS) The direction vector can be estimated by extracting the main Eisen vector from .

도 4 및 5는 도 1의 빔포밍 및 방향 벡터 추정 시스템의 온라인 동작을 설명하기 위한 도면들이고, 도 6은 도 1의 빔포밍 및 방향 벡터 추정 시스템의 일 실시예를 설명하기 위한 도면이다.FIGS. 4 and 5 are diagrams for explaining the online operation of the beamforming and direction vector estimation system of FIG. 1, and FIG. 6 is a diagram for explaining an embodiment of the beamforming and direction vector estimation system of FIG. 1.

도 1 내지 6을 참조하면, 본 발명에 따른 빔포밍 및 방향 벡터 추정 시스템(10)은 온라인으로 동작할 수도 있다. 이 경우, 빔포밍 및 방향 벡터 추정 시스템(10)은 일정한 시간간격에 해당하는 프레임마다 디믹싱 행렬(W)를 업데이트할 수 있다. 예를 들어, 복수의 시간들은 제1 시간(T1) 내지 제4 시간(T4)을 포함할 수 있고, 복수의 시간간격은 제1 프레임 간격(FI1) 내지 제3 프레임 간격(FI3)을 포함할 수 있다. 제1 프레임 간격(FI1)은 제1 시간(T1)부터 제2 시간(T2)까지의 시간간격일 수 있고, 제2 프레임 간격(FI2)은 제2 시간(T2)부터 제3 시간(T3)까지의 시간간격일 수 있다. 또한, 제3 프레임 간격(FI3)은 제3 시간(T3)부터 제4 시간(T4)까지의 시간간격일 수 있다. 이 경우, 본 발명에 따른 빔포밍 및 방향 벡터 추정 시스템(10)은 제1 프레임 간격(FI1)동안 디믹싱 행렬(W)를 업데이트하고, 제2 프레임 시간간격동안 다시 디믹싱 행렬(W)를 업데이트할 수 있다. 여기서, 도 5에 도시되는 바와 같이 디믹싱 행렬(W)를 업데이트하는 과정에서 최근 프레임에 대해서 가중치(WT)를 점진적으로 높여 적용하는 경우, 빔포밍 및 방향 벡터 추정 시스템(10)의 성능을 높일 수 있다. 이 경우, 제 t 프레임 간격 동안의 디믹싱 행렬에 기초하여 마이크들의 입력신호(XS)로부터 결과신호(RS)를 아래의 [수학식7]과 같이 추출할 수 있다.Referring to FIGS. 1 to 6, the beamforming and direction vector estimation system 10 according to the present invention may operate online. In this case, the beamforming and direction vector estimation system 10 can update the demixing matrix (W) for each frame corresponding to a certain time interval. For example, the plurality of times may include a first time (T1) to a fourth time (T4), and the plurality of time intervals may include a first frame interval (FI1) to a third frame interval (FI3). You can. The first frame interval (FI1) may be a time interval from the first time (T1) to the second time (T2), and the second frame interval (FI2) may be a time interval from the second time (T2) to the third time (T3). It may be a time interval up to. Additionally, the third frame interval FI3 may be a time interval from the third time T3 to the fourth time T4. In this case, the beamforming and direction vector estimation system 10 according to the present invention updates the demixing matrix (W) during the first frame interval (FI1) and updates the demixing matrix (W) again during the second frame time interval. It can be updated. Here, as shown in FIG. 5, when the weight (WT) is gradually increased and applied to the most recent frame in the process of updating the demixing matrix (W), the performance of the beamforming and direction vector estimation system 10 can be improved. You can. In this case, the result signal (RS) can be extracted from the input signal (XS) of the microphones based on the demixing matrix during the t frame interval as shown in [Equation 7] below.

[수학식7][Equation 7]

여기서, 는 t-1프레임 간격 동안 추정된 디믹싱 행렬(W)을 통해서 추정된 t프레임 간격의 타겟신호(TS)에 상응하는 결과신호(RS), 는 t-1 프레임 간격 동안 추정된 디믹싱 행렬의 제1 채널(성분), 는 마이크들의 입력신호(XS)일 수 있다.here, is the result signal (RS) corresponding to the target signal (TS) of the t-frame interval estimated through the demixing matrix (W) estimated during the t-1 frame interval, is the first channel (component) of the demixing matrix estimated for the t-1 frame interval, may be the input signal (XS) of the microphones.

일 실시예에 있어서, t프레임 간격에서의 디믹싱 행렬(W)을 추정하기 위해서 독립성분 분석(Independent Component Analysis, ICA)의 비용함수에 해당하는 [수학식1-2]의 는 t번째 프레임에서 정의되는 아래의 [수학식8]의 로 대체될 수 있다.In one embodiment, in order to estimate the demixing matrix (W) at the t frame interval, [Equation 1-2] corresponding to the cost function of Independent Component Analysis (ICA) is used. is defined in the tth frame of [Equation 8] below. can be replaced with

[수학식8][Equation 8]

여기서, k 및 m는 주파수 및 채널 인덱스를 나타내는 자연수, 와 는 t번째 프레임에서 추정되는 디믹싱 행렬(W)의 제 1 및 m성분, 는 t번째 프레임에서 추정되는 디믹싱 행렬(W), 와 는 각각 타겟 신호(TS)와 노이즈 신호(RS)의 모델링에 따라 결정되는 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 망각인자(forgetting factor), 는 마이크들의 입력신호(XS)를 나타낼 수 있다. 이러한 가중 공간 공분산 행렬은 아래의 [수학식 8-1]과 같이 재귀적으로 온라인 업데이트를 수행할 수 있다.where k and m are natural numbers representing the frequency and channel index, and are the first and m components of the demixing matrix (W) estimated in the t frame, is the demixing matrix (W) estimated at the tth frame, and is the Weighted Spatial Covariance Matrix (WSCM) in the t-th frame, which is determined according to modeling of the target signal (TS) and noise signal (RS), respectively. is the forgetting factor, may represent the input signal (XS) of the microphones. This weighted spatial covariance matrix can be recursively updated online as shown in [Equation 8-1] below.

[수학식8-1][Equation 8-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 와 는 각각 t번째와 t-1번째 프레임에서 추정되는 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM), 는 망각인자(forgetting factor), 는 가중치 함수, 는 마이크들의 입력신호(XS)일 수 있다. 그리고 t번째프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)의 역행렬을로 나타낼 때, 아래의 [수학식8-2]와 같이 재귀적으로 역행렬의 업데이트를 곧바로 수행할 수 있다.Here, k is a natural number representing the frequency index, and is the Weighted Spatial Covariance Matrix (WSCM) estimated at the t-th and t-1th frames, respectively; is the forgetting factor, is the weight function, may be the input signal (XS) of the microphones. And the inverse matrix of the Weighted Spatial Covariance Matrix (WSCM) at the tth frame is When expressed as [Equation 8-2] below, the update of the inverse matrix can be performed recursively immediately.

[수학식8-2][Equation 8-2]

여기서, 와 는 각각 t번째와 t-1번째 프레임에서 추정되는 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)의 역행렬, 는 망각인자(forgetting factor), 는 가중치 함수, 는 마이크들의 입력신호(XS)일 수 있다. 그리고, 타겟신호(TS)및 노이즈신호(NS)에 대한 결과신호(RS)가 각각 [수학식5]와 [수학식5-2] 같이 라플라시안 분포를 따를 수 있다. [수학식5]의 라플라시안 함수에 따라 분포되는 타겟신호(TS)에 상응하는 결과신호(RS)를 온라인으로 업데이트하고자 기존의 [수학식5-1]와 [수학식5-3]을 대신하여 아래의 [수학식8-3]와 같이 온라인으로 업데이트 될 수 있다. here, and is the inverse matrix of the Weighted Spatial Covariance Matrix (WSCM) estimated in the t-th and t-1th frames, respectively, is the forgetting factor, is the weight function, may be the input signal (XS) of the microphones. And, the result signal (RS) for the target signal (TS) and the noise signal (NS) may follow the Laplacian distribution as shown in [Equation 5] and [Equation 5-2], respectively. In order to update the result signal (RS) corresponding to the target signal (TS) distributed according to the Laplacian function of [Equation 5] online, instead of the existing [Equation 5-1] and [Equation 5-3] It can be updated online as shown in [Equation 8-3] below.

[수학식8-3][Equation 8-3]

여기서, 와 는 각각 타겟신호(TS)및 노이즈신호(NS)에 대한 가중치 함수, 는 타겟신호(TS)의 시간에 따라 변하는 분산, 는 스무딩 팩터(smoothing factor),는 미리 정해진 마스크, 는 마이크들의 입력신호(XS)의 대표값, 는 t-1번째 프레임에서 추정한 디믹싱 행렬(W)을 사용한 t번째 타겟신호(TS)에 대응되는 프레임의 결과신호(RS), 는 t-1번째 프레임에서 추정한 디믹싱 행렬(W)을 사용한 t번째 노이즈신호(NS)에 대응되는 프레임의 결과신호(RS)일 수 있다.here, and is a weight function for the target signal (TS) and noise signal (NS), respectively, is the time-varying variance of the target signal (TS), is the smoothing factor, is a predetermined mask, is the representative value of the input signal (XS) of the microphones, is the result signal (RS) of the frame corresponding to the t-th target signal (TS) using the demixing matrix (W) estimated in the t-1-th frame, may be the result signal (RS) of the frame corresponding to the t-th noise signal (NS) using the demixing matrix (W) estimated from the t-1-th frame.

일 실시예에 있어서, 온라인 업데이트를 수행하기 위해서 기존의 [수학식1], [수학식2] 및 [수학식3]는 각각 아래의 [수학식9], [수학식10] 및 [수학식11]로 표현될 수 있다.In one embodiment, in order to perform an online update, the existing [Equation 1], [Equation 2], and [Equation 3] are respectively [Equation 9], [Equation 10], and [Equation 10] below. 11].

[수학식9][Equation 9]

여기서, 는 t번째 프레임에서의 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 t번?? 프레임에서의 독립성분 분석의 비용함수, 와 는 각각 왜곡 방지 및 널 조건의 제약 정도를 조절하는 파라미터, 와 는 t번째 프레임에서의 디믹싱 행렬(W)의 제 1 및 m 성분, 는 t번째 프레임에서 추정된 방향 벡터(H)일 수 있다. 그리고 [수학식9]를 최적화하는 디믹싱 행렬(W)의 제 1성분은 아래와 같은 [수학식9-1]에 의해 업데이트할 수 있다.here, is the cost function in the tth frame, k and m are natural numbers representing the frequency and channel index, respectively, is t times?? Cost function of independent component analysis in the frame, and are parameters that control the degree of restriction of distortion prevention and null conditions, respectively, and are the first and m components of the demixing matrix (W) in the t frame, may be the direction vector (H) estimated in the t-th frame. And the first component of the demixing matrix (W) that optimizes [Equation 9] can be updated by [Equation 9-1] below.

[수학식9-1][Equation 9-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 와 는 타겟 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)와 이에 대응하는 역행렬, 는 왜곡 방지 조건의 제약 정도를 조절하는 파라미터, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식9]를 최적화하는 디믹싱 행렬(W)의 제 m성분은 아래와 같이 [수학식9-2]에 의해 업데이트 할 수 있다.Here, k is a natural number representing the frequency index, and is the Weighted Spatial Covariance Matrix (WSCM) and its corresponding inverse matrix in the t frame for the target signal, is a parameter that controls the degree of constraint of the distortion prevention condition, is the direction vector (H) in the tth frame, May be the first component of the demixing matrix (W) in the t-th frame. And the m component of the demixing matrix (W) that optimizes [Equation 9] can be updated by [Equation 9-2] as follows.

[수학식9-2][Equation 9-2]

여기서, k는 주파수 인덱스를 나타내는 자연수, 와 는 노이즈 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)와 이에 대응하는 역행렬, 는 널 조건의 제약 정도를 조절하는 파라미터, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제m 성분일 수 있다.Here, k is a natural number representing the frequency index, and is the Weighted Spatial Covariance Matrix (WSCM) and its corresponding inverse matrix in the t frame for the noise signal, is a parameter that controls the degree of constraint of the null condition, is the direction vector (H) in the tth frame, may be the mth component of the demixing matrix (W) in the tth frame.

[수학식10][Equation 10]

여기서, 는 t번째 프레임에서의 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 t번째 프레임에서의 독립성분 분석의 비용함수, 와 는 t번째 프레임에서 각각 왜곡 방지 및 널 조건을 보장하기 위한 라그랑즈 승수, 와 는 t번째 프레임에서의 디믹싱 행렬(W)의 제1 및 m 성분, 는 t번째 프레임에서 추정된 방향 벡터(H)일 수 있다. 그리고 [수학식10]를 최적화하는 디믹싱 행렬(W)의 제1 성분은 아래와 같은 [수학식10-1]에 의해 업데이트할 수 있다.here, is the cost function in the tth frame, k and m are natural numbers representing the frequency and channel index, respectively, is the cost function of independent component analysis in the tth frame, and is a Lagrangian multiplier to prevent distortion and ensure a null condition, respectively, in the tth frame, and are the first and m components of the demixing matrix (W) in the t frame, may be the direction vector (H) estimated in the t-th frame. And the first component of the demixing matrix (W) that optimizes [Equation 10] can be updated by [Equation 10-1] below.

[수학식10-1][Equation 10-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 타겟 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)의 역행렬, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식10]를 최적화하는 디믹싱 행렬(W)의 제 m성분은 아래와 같이 [수학식10-2]에 의해 업데이트 할 수 있다.Here, k is a natural number representing the frequency index, is the inverse matrix of the Weighted Spatial Covariance Matrix (WSCM) in the t frame for the target signal, is the direction vector (H) in the tth frame, May be the first component of the demixing matrix (W) in the t-th frame. And the m component of the demixing matrix (W) that optimizes [Equation 10] can be updated by [Equation 10-2] as follows.

[수학식10-2][Equation 10-2]

여기서, k는 주파수 인덱스를 나타내는 자연수, 와 는 노이즈 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)와 이에 대응하는 역행렬, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제m 성분일 수 있다.Here, k is a natural number representing the frequency index, and is the Weighted Spatial Covariance Matrix (WSCM) and its corresponding inverse matrix in the t frame for the noise signal, is the direction vector (H) in the tth frame, may be the mth component of the demixing matrix (W) in the tth frame.

[수학식11][Equation 11]

여기서, 는 t번째 프레임에서의 비용함수, k 및 m은 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 t번?? 프레임에서의 독립성분 분석의 비용함수, 는 t번째 프레임에서 왜곡 방지 조건을 보장하기 위한 라그랑즈 승수, 는 널 조건의 제약 정도를 조절하는 파라미터, 와 는 t번째 프레임에서의 디믹싱 행렬(W)의 제1 및 m 성분, 는 t번째 프레임에서 추정된 방향 벡터(H)일 수 있다. 그리고 [수학식11]를 최적화하는 디믹싱 행렬(W)의 제1성분은 아래와 같은 [수학식11-1]에 의해 업데이트할 수 있다.here, is the cost function in the tth frame, k and m are natural numbers representing the frequency and channel index, respectively, is t times?? Cost function of independent component analysis in the frame, is the Lagrangian multiplier to ensure the anti-distortion condition in the t frame, is a parameter that controls the degree of constraint of the null condition, and are the first and m components of the demixing matrix (W) in the t frame, may be the direction vector (H) estimated in the t-th frame. And the first component of the demixing matrix (W) that optimizes [Equation 11] can be updated by [Equation 11-1] as follows.

[수학식11-1][Equation 11-1]

여기서, k는 주파수 인덱스를 나타내는 자연수, 는 타겟 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)의 역행렬, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제1 성분일 수 있다. 그리고 [수학식11]를 최적화하는 디믹싱 행렬(W)의 제 m성분은 아래와 같이 [수학식11-2]에 의해 업데이트 할 수 있다.Here, k is a natural number representing the frequency index, is the inverse matrix of the Weighted Spatial Covariance Matrix (WSCM) in the t frame for the target signal, is the direction vector (H) in the tth frame, May be the first component of the demixing matrix (W) in the t-th frame. And the m component of the demixing matrix (W) that optimizes [Equation 11] can be updated by [Equation 11-2] as follows.

[수학식11-2][Equation 11-2]

여기서, k는 주파수 인덱스를 나타내는 자연수, 와 는 노이즈 신호에 대한 t번째 프레임에서의 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)와 이에 대응하는 역행렬, 는 널 조건의 제약 정도를 조절하는 파라미터, 는 t번째 프레임에서의 방향 벡터(H), 는 t번째 프레임에서의 디믹싱 행렬(W)의 제 m성분일 수 있다.Here, k is a natural number representing the frequency index, and is the Weighted Spatial Covariance Matrix (WSCM) and its corresponding inverse matrix in the t frame for the noise signal, is a parameter that controls the degree of constraint of the null condition, is the direction vector (H) in the tth frame, may be the mth component of the demixing matrix (W) in the tth frame.

[수학식9], [수학식10] 및 [수학식11]에 표현되는 바와 같이 각 수학식에서 이후에 배치되는 제한조건들을 비용함수에 적용함으로써 매 프레임마다 온라인 업데이트 되는 디믹싱 행렬(W)에 기초하여 제공되는 결과신호(RS)를 포함하는 행렬의 제1 채널(성분)을 타겟신호(TS)에 대한 결과로 고정할 수 있고, 타겟신호(TS)에 대한 왜곡도 감소시킬 수 있다. 또한, [수학식1]에서의 파라미터 에 의해서 조절되는 제한적인 정도의 왜곡 방지 조건을 가지는 제곱항으로 인하여 발생할 수 있는 음성의 왜곡을 감소시키기 위하여 [수학식11]은 [수학식9]과 [수학식10]를 하이브리드 형태로 조합하여 구성한 것일 수 있다. 일 실시예에 있어서, 마이크들의 입력신호(XS) 및 매 프레임마다 온라인 업데이트 되는 디믹싱 행렬(W)에 기초하여 생성되는 결과 행렬의 제1 성분은 타겟신호(TS)에 상응할 수 있다. 여기서, [수학식9], [수학식10] 및 [수학식11]의 는 [수학식8]의 와 동일할 수 있다.In each equation, as expressed in [Equation 9], [Equation 10], and [Equation 11], By applying the constraints placed later to the cost function, the first channel (component) of the matrix including the result signal (RS) provided based on the demixing matrix (W) updated online every frame is converted to the target signal (TS). ) can be fixed as a result, and distortion of the target signal (TS) can also be reduced. Additionally, the parameters in [Equation 1] [Equation 11] is composed by combining [Equation 9] and [Equation 10] in a hybrid form to reduce the distortion of speech that may occur due to the square term with a limited degree of distortion prevention condition controlled by It could be. In one embodiment, the first component of the result matrix generated based on the input signal (XS) of the microphones and the demixing matrix (W) updated online every frame may correspond to the target signal (TS). Here, [Equation 9], [Equation 10], and [Equation 11] of [Equation 8] It may be the same as

일 실시예에 있어서, 방향 벡터(H)는 매 프레임마다 온라인으로 계산되는 마이크들의 입력신호(XS)에 대한 공간 공분산 행렬(Input Spatial Covariance Matrix, ISCM)) 및 노이즈 신호(NS)에 대한 공간 공분산 행렬(Noise Spatial Covariance Matrix, NSCM)의 차에 따라 결정될 수 있다. 또 다른 실시예에 있어서, 노이즈 신호(NS)에 대한 공간 공분산 행렬(NSCM)는 결과신호(RS) 중 타겟신호(TS)에 상응하는 값과 노이즈 신호(NS)에 상응하는 값의 비율에 따라 매 프레임 결정될 수 있다. 예를 들어, 방향 벡터(H)를 추정하는 방식은 아래의 [수학식12]과 같이 표시될 수 있다. In one embodiment, the direction vector (H) is the spatial covariance matrix (ISCM) for the input signal (XS) of the microphones and the spatial covariance matrix (ISCM) for the noise signal (NS) calculated online every frame. It can be determined according to the difference in the matrix (Noise Spatial Covariance Matrix, NSCM). In another embodiment, the spatial covariance matrix (NSCM) for the noise signal (NS) is determined according to the ratio of the value corresponding to the target signal (TS) and the value corresponding to the noise signal (NS) among the result signal (RS). It can be decided every frame. For example, the method of estimating the direction vector (H) can be expressed as [Equation 12] below.

[수학식12] [Equation 12]

여기서, k와 m는 각각 주파수 및 채널 인덱스를 나타내는 자연수, 는 t번째 프레임에서 추정되는 디믹싱 행렬(W)의 제 m 성분, 는 t번째 프레임에서 디믹싱 행렬(W)의 역행렬, 는 입력 성분 중 노이즈 성분의 기여도를 나타내는 비율, 는 망각인자(forgetting factor), 는 마이크들의 입력신호(XS), 는 마이크들의 입력신호(XS)에 대한 t번째 프레임에서의 공간 공분산 행렬(ISCM)), 는 노이즈신호(NS)에 대한 t번째 프레임에서의 공간 공분산 행렬(NSCM), 는 0과 1사이의 실수값을 가지는 스케일링 팩터(scaling factor), 는 타겟신호(TS)에 대한 t번째 프레임에서의 공간 공분산 행렬(TSCM)일 수 있다. 그리고 방향 벡터 추정에 있어서 는 아래의 [수학식12-1]과 같이 마이크들의 입력신호(XS)에 고정된 외부 마스크의 제곱근 값을 곱한 것으로 대체될 수 있다. Here, k and m are natural numbers representing the frequency and channel index, respectively, is the mth component of the demixing matrix (W) estimated in the tth frame, is the inverse matrix of the demixing matrix (W) in the tth frame, is the ratio representing the contribution of the noise component among the input components, is the forgetting factor, is the input signal of the microphones (XS), is the spatial covariance matrix (ISCM) in the t frame for the input signals (XS) of the microphones), is the spatial covariance matrix (NSCM) in the t frame for the noise signal (NS), is a scaling factor with a real value between 0 and 1, may be the spatial covariance matrix (TSCM) in the t-th frame for the target signal (TS). And in direction vector estimation can be replaced by multiplying the input signal (XS) of the microphones by the square root value of the fixed external mask, as shown in [Equation 12-1] below.

[수학식12-1][Equation 12-1]

이러한 타겟 신호(TS)에 대한 공간 공분산 행렬 로부터 주요 아이젠 벡터를 추출함으로써 방향 벡터를 매 구간 프레임마다 온라인으로 추정할 수 있다.Spatial covariance matrix for these target signals (TS) By extracting the main Eisen vector from , the direction vector can be estimated online for each section frame.

일 실시예에 있어서, 미리 정해진 마스크 를 아래의 [수학식13]과 같이 확산성을 기반으로 하여 매 프레임 추정할 수 있다.In one embodiment, a predetermined mask can be estimated for each frame based on diffusion as shown in [Equation 13] below.

[수학식13][Equation 13]

여기서, 와 는 각각 치우침 및 기울기를 조절하는 파라미터, 는 임의의 마이크들의 입력신호(XS) 쌍 와 대해서 구한 확산성 수치 들의 중간 값일 수 있다. 또 다른 실시예에 있어서, 미리 정해진 마스크 는 미리 학습된 신경망의 출력에 의한 값일 수 있다.here, and are parameters that control bias and slope, respectively, is a pair of input signals (XS) of random microphones and The diffusivity value obtained for It may be the middle value of . In another embodiment, a predetermined mask may be a value resulting from the output of a pre-trained neural network.

일 실시예에 있어서, 디믹싱 제공부(200) 및 결과제공부(300)의 [수학식4]와 같이 하나의 타겟지점(TP)및 타겟신호(TS)는 아래의 [수학식14]와 같이 복수의 타겟지점(TP)및 타겟신호(TS)로 확장될 수 있다. 예를 들어, 총 N개의 타겟 신호가 있다고 할 때,In one embodiment, one target point (TP) and target signal (TS) of the demixing providing unit 200 and the result providing unit 300 as shown in [Equation 4] are as shown in [Equation 14] below. Likewise, it can be expanded to multiple target points (TP) and target signals (TS). For example, if there are a total of N target signals,

[수학식14] [Equation 14]

여기서, 는 N개의 타겟신호(TS)에 상응하는 N개의 결과신호(RS), 는 노이즈 신호(NS)에 상응하는 결과신호(RS), 는 마이크들의 입력신호(XS)일 수 있다. 그리고 [수학식1], [수학식2] 및 [수학식3]으로 표현되는 비용함수는 복수의 타겟지점(TP)및 타겟신호(TS)의 경우로 확장될 수 있다. 예를 들어, 총 N개의 타겟 신호가 있다고 할 때, [수학식3]에 대응되는 비용함수는 아래의 [수학식15]로 확장될 수 있다.here, are N result signals (RS) corresponding to N target signals (TS), is the result signal (RS) corresponding to the noise signal (NS), may be the input signal (XS) of the microphones. And the cost function expressed by [Equation 1], [Equation 2], and [Equation 3] can be extended to the case of multiple target points (TP) and target signals (TS). For example, assuming that there are a total of N target signals, the cost function corresponding to [Equation 3] can be expanded to [Equation 15] below.

[수학식15][Equation 15]

여기서, 는 비용함수, k는 주파수 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 는 n번째 타겟신호(TS)의 왜곡 방지 조건을 보장하기 위한 라그랑즈 승수 는 디믹싱 행렬(W)의 m 성분, 는 n번째 타겟신호(TS)에 대응되는 방향 벡터(H), 는 널 조건의 제약 정도를 조절하는 파라미터, 는 디믹싱 행렬(W), 와 는 각각 m번째 타겟 신호와 노이즈 신호의 모델링에 따라 결정되는 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)일 수 있다.here, is the cost function, k is a natural number representing the frequency index, is the cost function of independent component analysis, is a Lagrangian multiplier to ensure the distortion prevention condition of the nth target signal (TS) is the m component of the demixing matrix (W), is the direction vector (H) corresponding to the nth target signal (TS), is a parameter that controls the degree of constraint of the null condition, is the demixing matrix (W), and may be a Weighted Spatial Covariance Matrix (WSCM) determined according to modeling of the mth target signal and noise signal, respectively.

일 실시예에 있어서, 복수의 타겟신호(TS)에 대해 상응하는 복수의 방향 벡터(H)는 마이크들의 입력신호(XS)에 대한 공간 공분산 행렬(Input Spatial Covariance Matrix, ISCM)) 및 노이즈 신호(NS)에 대한 공간 공분산 행렬(Noise Spatial Covariance Matrix, NSCM)의 차에 따라 결정될 수 있다. 또 다른 실시예에 있어서, 노이즈 신호(NS)에 대한 공간 공분산 행렬(NSCM)는 결과신호(RS) 중 타겟신호(TS)에 상응하는 값과 노이즈 신호(NS)에 상응하는 값의 비율에 따라 결정될 수 있다. 예를 들어, 총 N개의 타겟 신호가 있다고 할 때, 복수의 방향 벡터(H)를 추정하는 방식은 [수학식6]과 대응되는 추정 방식은 아래의 [수학식16]과 같이 확장될 수 있다.In one embodiment, the plurality of direction vectors (H) corresponding to the plurality of target signals (TS) are the spatial covariance matrix (ISCM) for the input signals (XS) of the microphones and the noise signal ( It can be determined according to the difference of the spatial covariance matrix (NSCM) for NS). In another embodiment, the spatial covariance matrix (NSCM) for the noise signal (NS) is determined according to the ratio of the value corresponding to the target signal (TS) and the value corresponding to the noise signal (NS) among the result signal (RS). can be decided. For example, when there are a total of N target signals, the method for estimating a plurality of direction vectors (H) is [Equation 6], and the corresponding estimation method can be expanded as in [Equation 16] below. .

[수학식16][Equation 16]

, ,

여기서, 는 마이크들의 입력신호(XS)에 대한 공간 공분산 행렬(ISCM)), 는 m번째 타겟신호(TS)에 대한 공간 공분산 행렬(TSCM), 는 노이즈 신호(NS)에 대한 공간 공분산 행렬(NSCM), 는 m번째 타겟신호(TS)를 제외한 나머지 신호에 대한 공간 공분산 행렬(NSCM), 는 입력 성분 중 m번째 타겟신호(TS)를 제외한 나머지 성분의 기여도를 나타내는 비율일 수 있다. 이러한 타겟 신호(TS)에 대한 공간 공분산 행렬 로부터 주요 아이젠 벡터를 추출함으로써 N개의 방향 벡터를 추정할 수 있다.here, is the spatial covariance matrix (ISCM) for the input signals (XS) of the microphones), is the spatial covariance matrix (TSCM) for the mth target signal (TS), is the spatial covariance matrix (NSCM) for the noise signal (NS), is the spatial covariance matrix (NSCM) for the remaining signals excluding the mth target signal (TS), may be a ratio representing the contribution of the remaining components excluding the mth target signal (TS) among the input components. Spatial covariance matrix for these target signals (TS) N direction vectors can be estimated by extracting the main Eisen vectors from .

일 실시예에 있어서, 디믹싱 제공부(200) 및 결과제공부(300)의 하나 또는 복수의 타겟지점(TP)및 타겟신호(TS)의 디믹싱 행렬(W)은 단일 프레임에서의 마이크들의 입력신호(XS)와의 곱이 아닌 아래의 [수학식16]와 같이 복수의 프레임에서의 마이크들의 입력신호(XS)와의 컨볼루션을 통한 연산으로 확장될 수 있다. In one embodiment, the demixing matrix (W) of one or a plurality of target points (TP) and target signals (TS) of the demixing provider 200 and the result provider 300 is calculated using the demixing matrix (W) of the microphones in a single frame. Rather than multiplying with the input signal (XS), it can be expanded to an operation through convolution with the input signals (XS) of microphones in multiple frames as shown in [Equation 16] below.

[수학식17][Equation 17]

여기서, 는 복수의 타겟신호(TS)에 상응하는 복수의 결과신호(RS), 는 노이즈 신호(NS)에 상응하는 결과신호(RS), 는 컨볼루션 디믹싱 행렬(W), 는 마이크들의 입력신호(XS), D와 L은 프레임 딜레이와 컨볼루션 필터의 길이를 나타내는 자연수일 수 있다. 이러한 컨볼루션 필터로의 확장을 통해 D 프레임 이전의 L개의 프레임을 함께 사용하여 효과적으로 타겟 신호를 분리할 수 있다. 그리고 [수학식15]으로 표현되는 비용함수는 컨볼루션 디믹싱 행렬(W)의 경우로 확장될 수 있다. 예를 들어, [수학식15]에 대응되는 비용함수는 아래의 [수학식18]로 확장될 수 있다.here, is a plurality of result signals (RS) corresponding to a plurality of target signals (TS), is the result signal (RS) corresponding to the noise signal (NS), is the convolutional demixing matrix (W), may be the input signal (XS) of the microphones, and D and L may be natural numbers representing the length of the frame delay and convolution filter. Through this expansion to a convolutional filter, the target signal can be effectively separated by using the L frames before the D frame together. And the cost function expressed in [Equation 15] can be extended to the case of the convolution demixing matrix (W). For example, the cost function corresponding to [Equation 15] can be expanded to [Equation 18] below.

[수학식18] [Equation 18]

여기서, 는 비용함수, k는 주파수 인덱스를 나타내는 자연수, 는 독립성분 분석의 비용함수, 는 n번째 타겟신호(TS)의 왜곡 방지 조건을 보장하기 위한 라그랑즈 승수 는 디믹싱 행렬(W)의 m 성분, 는 n번째 타겟신호(TS)에 대응되는 방향 벡터(H), 는 널 조건의 제약 정도를 조절하는 파라미터, 는 컨볼루션 디믹싱 행렬(W), 와 는 각각 m번째 타겟 신호와 노이즈 신호의 모델링에 따라 결정되는 가중 공간 공분산 행렬(Weighted Spatial Covariance Matrix, WSCM)일 수 있다.here, is the cost function, k is a natural number representing the frequency index, is the cost function of independent component analysis, is a Lagrangian multiplier to ensure the distortion prevention condition of the nth target signal (TS) is the m component of the demixing matrix (W), is the direction vector (H) corresponding to the nth target signal (TS), is a parameter that controls the degree of constraint of the null condition, is the convolutional demixing matrix (W), and may be a Weighted Spatial Covariance Matrix (WSCM) determined according to modeling of the mth target signal and noise signal, respectively.

일 실시예에 있어서, 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템(10)은 입력 제공부(100), 디믹싱 제공부(200), 결과 제공부(300) 및 음성 제공부(400)를 포함할 수 있다. 입력 제공부(100)는 타겟지점(TS)에서의 타겟신호(TS) 및 노이즈 신호(NS)에 상응하는 공간전달함수(A)에 기초하여 마이크들의 입력신호(XS)를 제공할 수 있다. 디믹싱 제공부(200)는 마이크들의 입력신호(XS) 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬(W)를 제공할 수 있다. 결과 제공부(300)는 디믹싱 행렬(W)에 기초하여 마이크들의 입력신호(XS)로부터 결과신호(RS)를 추출할 수 있다. 음성 제공부(400)는 결과신호(RS)를 음성(SO)으로 제공할 수 있다.In one embodiment, the beamforming and direction vector estimation system 10 according to an embodiment of the present invention includes an input provider 100, a demixing provider 200, a result provider 300, and a voice provider ( 400). The input provider 100 may provide input signals (XS) from microphones based on a spatial transfer function (A) corresponding to the target signal (TS) and noise signal (NS) at the target point (TS). The demixing provider 200 may provide a demixing matrix (W) determined according to input signals (XS) of microphones and independent component analysis (ICA). The result providing unit 300 may extract the result signal (RS) from the input signal (XS) of the microphones based on the demixing matrix (W). The voice provider 400 may provide the result signal (RS) as a voice (SO).

도 7은 본 발명의 실시예들에 따른 빔포밍 및 방향 벡터 추정 시스템의 동작방법을 나타내는 도면이고, 도 8은 도 7의 빔포밍 및 방향 벡터 추정 시스템의 동작방법의 일 실시예를 설명하기 위한 도면이다.FIG. 7 is a diagram illustrating an operation method of the beamforming and direction vector estimation system according to embodiments of the present invention, and FIG. 8 is a diagram illustrating an embodiment of the operation method of the beamforming and direction vector estimation system of FIG. 7. It is a drawing.

도 1 내지 8을 참조하면, 본 발명의 실시예에 따른 빔포밍 및 방향 벡터 추정 시스템(10)의 동작방법에서는, 입력 제공부(100)가 타겟지점(TS)에서의 타겟신호(TS) 및 노이즈 신호(NS)의 각각에 상응하는 공간전달함수(A)에 기초하여 마이크들의 입력신호(XS)를 제공할 수 있다(S100). 디믹싱 제공부(200)가 마이크들의 입력신호(XS) 및 독립성분 분석(Independent Component Analysis, ICA)에 따라 결정되는 디믹싱 행렬(W)를 제공할 수 있다(S200). 결과 제공부(300)가 디믹싱 행렬(W)에 기초하여 마이크들의 입력신호(XS)로부터 결과신호(RS)를 추출할 수 있다(S300). 음성 제공부(400)가 결과신호(RS)를 음성(SO)으로 제공할 수 있다(S400). 본 발명에 따른 빔포밍 및 방향 벡터 추정 시스템(10)에서는, 복수의 공간적 제한조건들을 포함하는 비용함수를 이용해서 디믹싱 행렬(W)를 산출함으로써 타겟지점(TS)으로부터 발생되는 타겟신호(TS)를 왜곡없이 추출할 수 있다.Referring to FIGS. 1 to 8, in the operating method of the beamforming and direction vector estimation system 10 according to an embodiment of the present invention, the input provider 100 provides a target signal (TS) and a target signal (TS) at the target point (TS). The input signals (XS) of the microphones can be provided based on the spatial transfer function (A) corresponding to each of the noise signals (NS) (S100). The demixing provider 200 may provide a demixing matrix (W) determined according to input signals (XS) of microphones and independent component analysis (ICA) (S200). The result providing unit 300 may extract the result signal (RS) from the input signal (XS) of the microphones based on the demixing matrix (W) (S300). The voice provider 400 may provide the result signal (RS) as a voice (SO) (S400). In the beamforming and direction vector estimation system 10 according to the present invention, the target signal (TS) generated from the target point (TS) is calculated by calculating the demixing matrix (W) using a cost function including a plurality of spatial constraints. ) can be extracted without distortion.

10: 빔포밍 및 방향 벡터 추정 시스템 100: 입력 제공부
200: 디믹싱 제공부 300: 결과 제공부10: Beamforming and direction vector estimation system 100: Input provision unit
200: Demixing provision unit 300: Result provision unit

Claims

an input providing unit that provides input signals from microphones based on a spatial transfer function corresponding to the target signal and noise signal at the target point;
a demixing provider that provides a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA); and
A beamforming and direction vector estimation system including a result providing unit that extracts result signals from input signals of the microphones based on the demixing matrix.

According to paragraph 1,
The spatial transfer function includes a direction vector corresponding to a transfer function from the target point to the input provider and a noise transfer function corresponding to a transfer function until the noise signal is transmitted to the input provider. Beamforming and direction vector estimation system.

According to paragraph 2,
The product of the first component included in the demixing matrix and the direction vector included in the spatial transfer function is 1, and the product of the direction vector and the remaining components excluding the first component included in the demixing matrix is 0. Beamforming and direction vector estimation system.

According to paragraph 3,
Beamforming and direction vector estimation system, wherein the demixing matrix is determined based on the independent component analysis and a cost function (CF) according to space constraints.

According to paragraph 4,
A beamforming and direction vector estimation system, wherein a first component of a result matrix generated based on the input signals of the microphones and the demixing matrix corresponds to the target signal.

According to clause 5,
The cost function is expressed as [Equation 1],
[Equation 1]

here, is the cost function, k and m are natural numbers representing the frequency and channel index, respectively, is the cost function of independent component analysis, and are parameters that control the degree of restriction of distortion prevention and null conditions, respectively. and are the first and m components of the demixing matrix, Beamforming and direction vector estimation system, characterized in that is a direction vector.

According to clause 5,
The cost function is expressed as [Equation 2],
[Equation 2]

here, is the cost function, k and m are natural numbers representing the frequency and channel index, is the cost function of independent component analysis, and are the Lagrangian multipliers to prevent distortion and ensure null conditions, respectively; and are the first and m components of the demixing matrix, Beamforming and direction vector estimation system, characterized in that is a direction vector.

According to clause 5,
The cost function is expressed as [Equation 3],
[Equation 3]

here, is the cost function, k and m are natural numbers representing the frequency and channel index, is the cost function of independent component analysis, is the Lagrangian multiplier to ensure the anti-distortion condition, and are the first and m components of the demixing matrix, is the direction vector (H), is a beamforming and direction vector estimation system characterized in that is a parameter that controls the degree of constraint of the null condition.

According to clauses 6, 7 and 8,
Cost function of the above independent component analysis is expressed as [Equation 1-2],
[Equation 1-2]

where k and m are natural numbers representing the frequency and channel index, and are the first and m components of the demixing matrix (W), is the demixing matrix (W), and is a weighted spatial covariance matrix (WSCM) determined according to modeling of the target signal and the noise signal, respectively. Beamforming and direction vector estimation system.

According to paragraph 4,
A beamforming and direction vector estimation system, wherein the first to N components of the result matrix generated based on the input signals of the microphones and the demixing matrix are expanded to correspond to the N plurality of target signals.

According to clause 10,
The cost function is expressed as [Equation 15],
[Equation 15]

here, is the cost function, k is a natural number representing the frequency index, is the cost function of independent component analysis, is a Lagrangian multiplier to ensure the distortion prevention condition of the nth target signal (TS) is the m component of the demixing matrix (W), is the direction vector (H) corresponding to the nth target signal (TS), is a parameter that controls the degree of constraint of the null condition, is the demixing matrix (W), and A beamforming direction vector estimation system characterized in that is a Weighted Spatial Covariance Matrix (WSCM) determined according to modeling of the target signal and the noise signal, respectively.

According to clauses 9 and 11,
A beamforming and direction vector estimation system, wherein the resultant signal for the target signal follows a Laplacian distribution with variance that varies with time.

According to clause 12,
The calculation formula of the Weighted Spatial Covariance Matrix (WSCM) and weight function for the target signal (TS) through the Laplacian distribution with variance varying with time is expressed as [Equation 5-1],
[Equation 5-1]

Here, k is a natural number representing the frequency index, is the total number of frames, is the Weighted Spatial Covariance Matrix (WSCM) for the target signal (TS), is the weight function, is a predetermined mask, and Beamforming and direction vector estimation system characterized in that is the input signal of the microphones (XS) and the representative value of the input signal of the microphones (XS)

According to clauses 6, 7, 8 and 11,
The direction vector is characterized in that it is determined according to the difference between the spatial covariance matrix (Input Spatial Covariance Matrix, ISCM) for the input signal of the microphones and the spatial covariance matrix (Noise Spatial Covariance Matrix, NSCM) for the noise signal. Beamforming and direction vector estimation system.

According to clause 14,
A beamforming and direction vector estimation system, wherein the spatial covariance matrix for the noise signal is determined according to the ratio of a value corresponding to the target signal and a value corresponding to the noise signal among the result signals.

In clauses 4 and 5,
The beamforming and direction vector estimation system is operated for each frame corresponding to a constant time interval to update the demixing matrix online.

According to clause 13,
predetermined mask is a beamforming and direction vector estimation system that can be estimated every frame based on diffusivity.

an input providing unit that provides input signals from microphones based on spatial transfer functions corresponding to each of the target signal and noise signal at the target point;
a demixing provider that provides a demixing matrix determined according to the input signals of the microphones, independent component analysis (ICA), and spatial constraints;
a result providing unit that extracts a result signal from the input signals of the microphones based on the demixing matrix; and
A beamforming and direction vector estimation system including a voice provider that provides the result signal as a voice.

An input providing unit providing input signals from microphones based on spatial transfer functions corresponding to each of the target signal and noise signal at the target point;
A demixing provider providing a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA) subject to space constraints; and
A method of operating a beamforming and direction vector estimation system including the step of a result providing unit extracting a result signal from input signals of the microphones based on the demixing matrix.

An input providing unit providing input signals from microphones based on spatial transfer functions corresponding to each of the target signal and noise signal at the target point;
A demixing provider providing a demixing matrix determined according to the input signals of the microphones and independent component analysis (ICA) subject to space constraints;
A result providing unit extracting a result signal from the input signal of the microphones based on the demixing matrix; and
A method of operating a beamforming and direction vector estimation system comprising the step of a voice providing unit providing the result signal as a voice.