KR102003520B1

KR102003520B1 - Signal processing apparatus and method thereof

Info

Publication number: KR102003520B1
Application number: KR1020120105358A
Authority: KR
Inventors: 이승열; 신호선; 최민석; 강홍구; 김상윤; 언기완
Original assignee: 삼성전자주식회사; 연세대학교 산학협력단
Priority date: 2012-09-21
Filing date: 2012-09-21
Publication date: 2019-07-24
Also published as: KR20140038800A

Abstract

신호 처리 장치 및 그 방법이 개시된다. 본 발명에 따른 신호 처리 장치는 골 전도 마이크를 통해 제1 음성 신호를 입력받는 입력부, 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호를 생성하고, 제1 및 제2 음성 신호를 조합하여 확장 음성 신호를 생성하는 음성 신호 처리부, 글로벌 필터가 저장된 저장부, 제1 음성 신호의 주파수 성분을 고려하여, 글로벌 필터를 조정하는 필터 조정부 및 조정된 글로벌 필터값이 적용된 확장 음성 신호의 노이즈를 제거하는 노이즈 제거부를 포함한다. 이에 따라, 고 잡음 환경에서도 골 전도 마이크를 이용하여 사용자에게 높은 수준의 음질을 제공할 수 있다.A signal processing apparatus and method thereof are disclosed. A signal processing apparatus according to the present invention includes an input unit for receiving a first speech signal through a bone conduction microphone, a second speech signal having a frequency included in a predetermined frequency band using a frequency component of the first speech signal, A speech signal processing unit for generating an expanded speech signal by combining the first and second speech signals, a storage unit for storing a global filter, a filter adjustment unit for adjusting a global filter in consideration of a frequency component of the first speech signal, And a noise removing unit for removing noise of the extended speech signal to which the filter value is applied. Accordingly, a high-quality sound quality can be provided to a user by using a bone conduction microphone even in a high noise environment.

Description

[0001] The present invention relates to a signal processing apparatus and method,

본 발명은 신호 처리 장치 및 그 방법에 관한 것으로서, 보다 상세하게는 골 전도 마이크를 통해 입력되는 음성 신호에 대한 음질 향상을 위한 신호 처리 장치 및 그 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a signal processing apparatus and a method thereof, and more particularly, to a signal processing apparatus and method for enhancing sound quality of a speech signal input through a bone conduction microphone.

일반적으로, 사용자의 발화 음성은 고 잡음 환경에서 발화 음성에 섞여있는 노이즈로 인해 정확한 음성 구간 검출이 불가능하며, 이런 잡음으로 인하여 입력된 음성 자체의 고유한 특징을 추출하기가 어렵다.Generally speaking, the speech utterance of the user is not able to detect the accurate speech section due to the noise mixed with the speech utterance in the high noise environment, and it is difficult to extract the characteristic feature of the inputted speech itself due to such noise.

이 같은 문제를 개선하기 위해서 골 전도 마이크(Bone Conduction Microphone)이 이용된다. 이 같은 골 전도 마이크는 사용자의 음성 발성으로 인한 뼈와 두개골의 진동을 측정하여 음성 신호를 출력하는 마이크로써, 공기를 통해 전달되는 동안 발화 음성에 노이즈가 섞이는 일반 에어 마이크(Air conduction Microphone)과 달리 진동을 측정하여 취득되는 특성상의 요인으로 노이즈의 유입이 적어 고 잡음 환경에서 유용하게 이용된다. 그러나, 이 같은 골 전도 마이크는 뼈와 두개골의 진동을 측정하여 음성 신호를 출력하기 때문에, 일반 에어 마이크와 달리 입력되는 음성 신호의 고주파 성분이 많이 감쇄된다. 따라서, 골 전도 마이크를 통해 출력되는 음성 신호를 스피커를 통해 들을 경우, 사용자의 발화 음성은 명확하지 않고 둔탁한 느낌을 받게 된다.Bone conduction microphones are used to improve these problems. Such a bone conduction microphone is a microphone for outputting a voice signal by measuring vibrations of bones and skulls caused by a user's voice utterance. Unlike an air conduction microphone in which noises are mixed in the voice during transmission through the air, The noise is less influx due to characteristics acquired by measuring the vibration, and is useful in a noisy environment. However, since such a bone conduction microphone outputs voice signals by measuring vibrations of bones and skulls, unlike general air microphones, high frequency components of input voice signals are attenuated much. Therefore, when a voice signal output through the bone conduction microphone is heard through a speaker, the voice of the user is not clear but a blurred feeling.

따라서, 이 같은 문제점을 개선하기 위해서 종래의 신호 처리 장치는 골 전도 마이크와 일반 에어 마이크를 사용하는 2채널 방식 또는 골 전도 마이크와 일반 에어 마이크 2개를 사용하는 3 채널 방식을 통해 골 전도 마이크로부터 입력되는 음성 신호에 대한 음질을 개선하였다.Therefore, in order to solve such a problem, the conventional signal processing apparatus is a two-channel system using a bone conduction microphone and a general air microphone, or a three-channel system using a bone conduction microphone and two general air microphones. The sound quality of the input voice signal is improved.

구체적으로, 종래의 신호 처리 장치는 개별 사용자에 맞추어 미리 골 전도 마이크 및 에어 마이크를 통해 입력된 발화 음성에 대한 음성 신호의 구간 별 메그니튜드(magnitude) 비를 계산한 후, 각 메그니튜드 값이 평균값을 통해 필터값을 획득한다. 이후, 사용자의 발화 음성에 대한 음성 신호가 골 전도 마이크를 통해 입력되면, 신호 처리 장치는 입력된 음성 신호 별 메그니튜드에 기획득한 필터값을 곱해주어 출력 음성 신호를 생성한다.Specifically, in the conventional signal processing apparatus, after calculating the magnitude ratio of the speech signal with respect to the speech sound inputted through the bone conduction microphone and the air microphone in advance according to the individual user, the magnitude ratio of each magnitude value The filter value is obtained through this average value. Thereafter, when a speech signal for the speech voice of the user is inputted through the bone conduction microphone, the signal processing device multiplies the magnitude of the input speech signal by the filter value, and generates an output speech signal.

그러나, 이 같은 필터값을 생성하는 과정에 있어서, 특정 사용자의 조건에 기초하여 필터값이 생성되기 때문에, 또다른 사용자의 발화 음성에 대한 음성 신호가 골 전도 마이크를 통해 입력될 경우, 신호 처리 장치는 해당 사용자의 조건에 기초하여 전술한 필터값 생성 과정을 재수행해야 하는 문제가 있다.However, in the process of generating such a filter value, since a filter value is generated based on a condition of a specific user, when a speech signal for another user's speech voice is input through the bone conduction microphone, There is a problem in that it is necessary to re-execute the above-described filter value generation process based on the condition of the user.

본 발명은 상술한 필요성에 따라 안출된 것으로써, 본 발명의 목적은 고 잡음 환경에서도 골 전도 마이크를 이용하여 사용자에게 높은 수준의 음질을 제공하기 위함을 목적으로 한다.The present invention has been made in view of the above-mentioned needs, and it is an object of the present invention to provide a high-quality sound quality to a user by using a bone conduction microphone even in a high noise environment.

이상과 같은 목적을 달성하기 위한 본 발명의 일 실시 예에 따른 신호 처리 장치는 골 전도 마이크를 통해 제1 음성 신호를 입력받는 입력부, 상기 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호를 생성하고, 상기 제1 및 상기 제2 음성 신호를 조합하여 확장 음성 신호를 생성하는 음성 신호 처리부, 글로벌 필터가 저장된 저장부, 상기 제1 음성 신호의 주파수 성분을 고려하여, 상기 글로벌 필터를 조정하는 필터 조정부 및 상기 조정된 글로벌 필터값이 적용된 상기 확장 음성 신호의 노이즈를 제거하는 노이즈 제거부를 포함한다.According to an aspect of the present invention, there is provided a signal processing apparatus including an input unit for inputting a first speech signal through a bone conduction microphone, A storage unit in which a global filter is stored, a storage unit in which a frequency of the first audio signal is converted into a frequency of the first audio signal, And a noise removal unit for removing noise of the extended speech signal to which the adjusted global filter value is applied.

그리고, 상기 음성 신호 처리부는, 상기 제1 음성 신호와 상기 제2 음성 신호를 가산한 후 정규화를 수행하여 상기 확장 음성 신호를 생성할 수 있다.The audio signal processing unit may generate the extended audio signal by adding the first audio signal and the second audio signal, and then performing normalization.

또한, 상기 음성 신호 처리부는 아래 수식을 이용하여 상기 제2 음성 신호 및 상기 확장 음성 신호를 생성하는 것을 특징으로 하는 신호 처리 장치 The audio signal processing unit generates the second audio signal and the extended audio signal using the following equation:

,

이며, 여기서,

은 상기 제1 음성 신호를 저주파 필터에 통과시킨 신호값,

는 상기 제2 음성 신호값,

는 상기 확장 음성 신호값, k는 주파수 값, l은 시간축 프레임 값, i는 입력된 제1 음성 신호의 로우 밴드의 주파수 값(low_b)에서 하이 밴드의 주파수 값(high_b) 사이의 주파수 값일 수 있다.

Lt; / RTI >

A signal value obtained by passing the first speech signal through a low-pass filter,

The second audio signal value,

K is a frequency value, l is a time-base frame value, and i may be a frequency value between a frequency value (low_b) of a low band of a first audio signal and a frequency value (high_b) of a high band .

그리고, 상기 글로벌 필터는 복수의 피실험자에 의해 발화된 테스트 음성에 대한 골 전도 마이크의 센싱값 및 일반 에어 마이크의 센싱값을 각각 복수의 구간으로 구분하고, 각 구간 별로 두 센싱값을 비교한 비교 결과의 평균값을 산출한 후 산출된 평균값을 수집하여 생성한 이퀄라이제이션 필터일 수 있다.The global filter classifies the sensed value of the bone conduction microphone and the sensed value of the general air microphone into a plurality of intervals for the test voice uttered by the plurality of subjects and compares the two sensed values And then collecting the calculated average value to generate an equalization filter.

또한, 상기 조정된 글로벌 필터는 아래 수식으로 표현되는 것을 특징으로 하는 신호 처리 장치Further, the adjusted global filter is expressed by the following equation

이며, 여기서,

는 글로벌 필터값, k는 주파수값, α_agg는 변수일 수 있다.

Lt; / RTI >

Is the global filter value, k is the frequency value, and _agg is the variable.

그리고, 상기 필터 조정부는, 상기 입력부에 입력된 제1 음성 신호의 크기가 기설정된 임계값을 초과하면 상기 변수를 1로 설정하고, 상기 임계값 미만이면 상기 변수를 0 내지 1 사이값으로 설정하여, 상기 글로벌 필터를 조정할 수 있다.The filter adjusting unit sets the variable to 1 if the size of the first audio signal input to the input unit exceeds a predetermined threshold value and sets the variable to a value between 0 and 1 if the size of the first audio signal is less than the threshold value , The global filter can be adjusted.

한편, 본 발명의 일 실시 예에 따르면, 신호 처리 장치에서 골 전도 마이크를 통해 입력된 음성 신호에 대한 음질 향상을 위한 신호 처리 방법에 있어서, 상기 방법은 상기 골 전도 마이크를 통해 제1 음성 신호를 입력받는 단계, 상기 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호를 생성하고, 상기 제1 및 상기 제2 음성 신호를 조합하여 확장 음성 신호를 생성하는 단계, 상기 제1 음성 신호의 주파수 성분을 고려하여, 기저장된 글로벌 필터를 조정하는 단계 및 상기 조정된 글로벌 필터값이 적용된 상기 확장 음성 신호의 노이즈를 제거하는 단계를 포함한다.According to an embodiment of the present invention, there is provided a signal processing method for enhancing sound quality of a speech signal input through a bone conduction microphone in a signal processing apparatus, the method comprising: receiving a first speech signal through the bone conduction microphone; Generating a second audio signal having a frequency included in a preset critical frequency band using a frequency component of the first audio signal, combining the first and second audio signals to generate an extended audio signal, Adjusting a pre-stored global filter in consideration of a frequency component of the first speech signal, and removing noise of the extended speech signal to which the adjusted global filter value is applied.

그리고, 상기 확장 음성 신호를 생성하는 단계는, 상기 제1 음성 신호와 상기 제2 음성 신호를 가산한 후 정규화를 수행하여 상기 확장 음성 신호를 생성할 수 있다.The generating of the extended voice signal may generate the extended voice signal by adding the first voice signal and the second voice signal, and then performing normalization.

또한, 상기 확장 음성 신호를 생성하는 단계는, 아래 수식을 이용하여 상기 제2 음성 신호 및 상기 확장 음성 신호를 생성하는 것을 특징으로 하는 신호 처리 방법The step of generating the extended speech signal may include generating the second speech signal and the extended speech signal using the following equation

,

이며, 여기서,

은 상기 제1 음성 신호를 저주파 필터에 통과시킨 신호값,

는 상기 제2 음성 신호값,

Lt; / RTI >

The second audio signal value,

또한, 상기 조정된 글로벌 필터는 아래 수식으로 표현되는 것을 특징으로 하는 신호 처리 방법Further, the adjusted global filter is represented by the following equation

이며, 여기서,

는 글로벌 필터값, k는 주파수값, α_agg는 변수일 수 있다.

Lt; / RTI >

Is the global filter value, k is the frequency value, and _agg is the variable.

그리고, 상기 기저장된 글로벌 필터를 조정하는 단계는, 상기 입력된 제1 음성 신호의 크기가 기설정된 임계값을 초과하면 상기 변수를 1로 설정하고, 상기 임계값 미만이면 상기 변수를 0 내지 1 사이값으로 설정하여, 상기 글로벌 필터를 조정할 수 있다.The step of adjusting the pre-stored global filter may set the variable to 1 if the magnitude of the input first audio signal exceeds a predetermined threshold value, and if the magnitude of the input signal is less than the threshold value, Value to adjust the global filter.

이상과 같이 본 발명의 다양한 실시 예에 따르면, 신호 처리 장치는 골 전도 마이크의 장점을 극대화시키면서, 나아가, 골 전도 마이크를 통해 출력되는 음성 신호의 고주파 성분을 보상하여 일반 에어 마이크에서 출력되는 음성 신호의 수준으로 음질을 향상시킬 수 있다. As described above, according to various embodiments of the present invention, the signal processing apparatus maximizes the advantages of the bone conduction microphone, further compensates the high frequency component of the voice signal output through the bone conduction microphone, The sound quality can be improved.

도 1은 본 발명의 일 실시 예에 따른 신호 처리 장치의 블록도,
도 2는 본 발명의 일 실시예에 따른 신호 처리 장치에서 골 전도 마이크를 통해 입력된 음성 신호에 대한 음질 향상 방법의 흐름도이다.1 is a block diagram of a signal processing apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method of improving sound quality of a speech signal input through a bone conduction microphone in a signal processing apparatus according to an exemplary embodiment of the present invention. Referring to FIG.

이하에서는 첨부된 도면을 참조하며 본 발명을 보다 상세하게 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention.

도 1은 본 발명의 일 실시 예에 따른 신호 처리 장치의 블록도이다.1 is a block diagram of a signal processing apparatus according to an embodiment of the present invention.

일반적으로, 골 전도 마이크는 사용자의 음성 발화로 인한 뼈와 두개골의 진동을 측정하여 사용자의 발화 음성을 오디오 신호로 전환하는 마이크이다. 이 같은 골 전도 마이크는 사용자의 발화 음성이 공기를 통해 전달되는 동안 발생할 수 있는 노이즈를 상당부분 차단할 수 있는 장점을 가지고 있으나, 에어 마이크와 같은 일반 마이크와는 달리 입력되는 음성 신호의 고주파 성분의 수집이 어려워 오디오 신호로 출력되는 사용자의 발화 음성은 다소 둔탁하게 출력되는 단점이 있다. 따라서, 본 발명에 따른 신호 처리 장치는 후술할 도 1의 각 구성을 통해 골 전도 마이크의 장점을 극대화시키면서, 나아가, 골 전도 마이크를 통해 출력되는 음성 신호의 고주파 성분을 보상하여 에어 마이크에서 출력되는 음성 신호의 수준으로 음질을 개선할 수 있다.Generally, a bone conduction microphone is a microphone that converts a user's utterance voice into an audio signal by measuring the vibration of the bone and the skull caused by the voice utterance of the user. Such a bone conduction microphone has a merit that it can block a large amount of noise that may be generated while a user's voice is transmitted through the air, but unlike a general microphone such as an air microphone, the collection of high frequency components So that the user's utterance voice output as an audio signal is somewhat dull output. Therefore, the signal processing apparatus according to the present invention maximizes the advantages of the bone conduction microphone through the respective constitutions of FIG. 1 to be described later, further compensates the high frequency component of the voice signal output through the bone conduction microphone, The sound quality can be improved by the level of the voice signal.

도 1에 도시된 바와 같이, 신호 처리 장치는 입력부(110), 음성 신호 처리부(120), 저장부(130), 필터 조정부(140) 및 노이즈 제거부(150)를 포함한다.1, the signal processing apparatus includes an input unit 110, a voice signal processing unit 120, a storage unit 130, a filter adjusting unit 140, and a noise removing unit 150.

입력부(110)는 골 전도 마이크를 통해 제1 음성 신호를 입력받는다. 그리고, 음성 신호 처리부(120)는 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호를 생성하고, 제1 및 상기 제2 음성 신호를 조합하여 확장 음성 신호를 생성한다. 저장부(130)는 글로벌 필터가 저장되며, 필터 조정부(140)는 제1 음성 신호의 주파수 성분을 고려하여, 글로벌 필터를 조정한다. 그리고, 노이즈 제거부(150)는 필터 조정부(140)를 통해 조정된 글로벌 필터값이 적용된 확장 음성 신호의 노이즈를 제거한다.The input unit 110 receives the first voice signal through the bone conduction microphone. The audio signal processing unit 120 generates a second audio signal having a frequency included in a preset critical frequency band using the frequency component of the first audio signal, and combines the first and second audio signals to expand And generates a voice signal. The storage unit 130 stores the global filter, and the filter adjustment unit 140 adjusts the global filter in consideration of the frequency component of the first audio signal. The noise removing unit 150 removes the noise of the extended speech signal to which the global filter value adjusted through the filter adjusting unit 140 is applied.

구체적으로, 골 전도 마이크를 통해 생성된 제1 음성 신호가 입력부(110)를 통해 입력되면, 음성 신호 처리부(120)는 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호를 생성한다. 여기서, 제2 음성 신호는 제1 음성 신호의 하이 밴드(High Band) 영역이 확장된 음성 신호로써, 음성 신호 처리부(120)는 제1 음성 신호의 로우 밴드(Low Band) 영역의 주파수 성분을 이용하여 제2 음성 신호를 생성할 수 있다.Specifically, when the first voice signal generated through the bone conduction microphone is input through the input unit 110, the voice signal processor 120 uses a frequency component of the first voice signal to frequency- The second audio signal having the second audio signal. Here, the second audio signal is a voice signal in which the high band region of the first voice signal is extended, and the voice signal processing unit 120 uses the frequency component of the low band region of the first voice signal Thereby generating a second audio signal.

실시예에 따라, 음성 신호 처리부(120)는 제1 음성 신호의 로우 밴드의 트랜스포지션(Transposition)을 이용한 대역 확장 기법(Artificial Bandwidth Extension : ABE)을 통해 제1 음성 신호의 하이 밴드 영역에 대한 확장된 음성 신호인 제2 음성 신호를 생성할 수 있다. 여기서, 대역 확장 기법은 저주파수 영역인 로우 밴드 영역으로부터 고주파수 영역인 하이 밴드 영역의 음성 신호를 확장하여 원래의 음성 성분을 살려내는 기법이다. According to an embodiment, the speech signal processor 120 may perform an extension to the high-band region of the first speech signal through an Artificial Bandwidth Extension (ABE) using a low-band transposition of the first speech signal, It is possible to generate the second audio signal as the audio signal. Here, the band extension scheme is a technique for extracting original speech components by extending a high-band speech signal from a low-band region, which is a low-frequency region, to a high-frequency region.

그러나, 본 발명은 이에 한정되지 않으며, 음성 신호 처리부(120)는 스펙트럼 폴딩(Spectral Folding) 기법과 같이, 제1 음성 신호에 포함된 하이 밴드 영역의 음성 신호를 확장하는 기법을 이용할 수 있다. 한편, 전술한 바와 같이, 대역 확장 기법을 통해 제1 음성 신호의 하이 밴드 영역에 대한 확장된 음성 신호인 제2 음성 신호를 생성하는 알고리즘은 아래[수학식 1]와 같이 정의될 수 있다.However, the present invention is not limited to this, and the voice signal processing unit 120 may use a technique of extending a voice signal in a high-band region included in a first voice signal, such as a spectral folding technique. Meanwhile, as described above, an algorithm for generating the second voice signal, which is an extended voice signal for the high-band region of the first voice signal through the band extension technique, can be defined as Equation (1) below.

여기서,

는 제2 음성 신호값이 되며, Thr은 주파수 임계값, k는 주파수 빈(bin)에 대한 주파수 값, l은 시간축 프레임 값, i는 입력된 제1 음성 신호의 로우 밴드의 주파수 값(low_b)에서 하이 밴드의 주파수 값(high_b) 사이의 주파수 값이 될 수 있다. 여기서, 로우 밴드의 주파수 값(low_b)과 하이 밴드의 주파수 값(high_b)은 제2 음성 신호를 생성하기 위한 기설정된 구간이 될 수 있다.here,

K is a frequency value for a frequency bin, l is a time axis frame value, i is a low band frequency value low_b of the input first audio signal, Lt; RTI ID = 0.0 > high_b. &Lt; / RTI > Here, the low band frequency value low_b and the high band frequency value high_b may be a predetermined interval for generating the second audio signal.

따라서, 주파수 빈에 대한 주파수 값(k)이 기설정된 주파수 임계값(Thr)보다 작으면, 음성 신호 처리부(120)는 제1 음성 신호에 포함된 하이 밴드 영역에 대한 제2 음성 신호를 생성하지 않는다. Accordingly, when the frequency value k for the frequency bin is smaller than the predetermined frequency threshold value Thr, the audio signal processing unit 120 does not generate the second audio signal for the high band region included in the first audio signal Do not.

한편, 주파수 빈에 대한 주파수 값(k)이 기설정된 주파수 임계값(Thr) 이상, 종료 주파수 임계값(End) 미만이면, 음성 신호 처리부(120)는 제1 음성 신호의 로우 밴드 영역의 주파수 성분을 이용하여 기설정된 구간 즉, low_b에서 high_b 사이의 구간에 음성값을 이용하여 임계값(Thr) 이상, 종료 주파수 임계값(End) 미만 구간에 대한 제2 음성 신호를 생성할 수 있다.On the other hand, if the frequency value k for the frequency bin is equal to or greater than the preset frequency threshold value Thr and less than the end frequency threshold value End, the audio signal processing unit 120 outputs the frequency component of the low- It is possible to generate a second voice signal for a period between a threshold value Thr and an end frequency threshold value (End) using a voice value in a predetermined interval, that is, a period between low_b and high_b.

예를 들어, 8KHz의 샘플링 레이트(Sampling Rate)를 가지는 제1 음성 신호가 골 전도 마이크를 통해 입력될 수 있다. 이 경우, 최대 주파수 값이 4KHz이고, 기설정된 주파수 임계값(Thr)이 2KHz이면, 음성 신호 처리부(120)는 0 에서 2KHz에 해당하는 음성 신호에 기초하여 2KHz에서 4KHz에 해당하는 음성 신호를 제2 음성 신호로 생성할 수 있다. 또다른 예를 들어, 최대 주파수 값이 4KHz인 제1 음성 신호가 입력되었을 때, 기설정된 주파수 임계값(Thr)이 3KHz이면, 음성 신호 처리부(120)는 0 에서 1KHz에 해당하는 음성 신호에 기초하여 3KHz에서 4KHz에 해당하는 음성 신호를 제2 음성 신호로 생성할 수 있다.For example, a first speech signal having a sampling rate of 8 KHz may be input through a bone conduction microphone. In this case, if the maximum frequency value is 4 KHz and the predetermined frequency threshold value Thr is 2 KHz, the audio signal processing unit 120 outputs the audio signal corresponding to 2 KHz to 4 KHz based on the audio signal corresponding to 0 to 2 KHz 2 audio signal. For example, when a first audio signal having a maximum frequency value of 4 KHz is input, if the preset frequency threshold value Thr is 3 KHz, the audio signal processing unit 120 outputs the audio signal corresponding to 0 to 1 KHz So that a voice signal corresponding to 3 KHz to 4 KHz can be generated as a second voice signal.

이와 같이, 제2 음성 신호가 생성되면, 음성 신호 처리부(120)는 입력부(110)를 통해 입력된 제1 음성 신호와 생성된 제2 음성 신호를 가산처리하여 확장 음성 신호를 생성한다. 이와 같이, 음성 신호 처리부(120)를 통해 제1 및 제2 음성 신호를 가산처리하여 확장 음성 신호를 생성하는 알고리즘은 아래[수학식 2]와 같이 정의될 수 있다.When the second audio signal is generated, the audio signal processing unit 120 adds the first audio signal input through the input unit 110 and the generated second audio signal to generate an extended audio signal. The algorithm for generating the extended speech signal by adding the first and second speech signals through the speech signal processing unit 120 may be defined as Equation (2) below.

여기서,

는 제1 음성 신호 중 로우 밴드 영역의 음성 신호가 될 수 있다. 이 같은 제1 음성 신호 중 로우 밴드 영역의 음성 신호는 저대역 필터를 통과한 음성 신호가 될 수 있으며,

는 [수학식 1]을 통해 산출된 제2 음성 신호이다. 이와 같이, 음성 신호 처리부(120)는 골 전도 마이크를 통해 입력된 제1 음성 신호 중 저대역 필터를 통과한 음성 신호와 [수학식 1]을 통해 산출된 제2 음성 신호를 가산하여 확장 음성 신호인

를 생성할 수 있다.here,

May be a voice signal in the low band region of the first voice signal. Among the first speech signals, the speech signal in the low band region may be a speech signal that has passed through the low-pass filter,

Is a second speech signal calculated through Equation (1). As described above, the speech signal processing unit 120 adds the speech signal that has passed through the low-pass filter among the first speech signals input through the bone conduction microphone and the second speech signal calculated through Equation (1) sign

Lt; / RTI >

한편, [수학식 1]에서 주파수 값(k)이 기설정된 주파수 임계 값(Thr) 미만인 경우, 확장 음성 신호는 입력된 제1 음성 신호가 될 수 있다.In Equation (1), if the frequency value (k) is less than the predetermined frequency threshold value (Thr), the extended speech signal can be the inputted first speech signal.

이와 같이, 확장 음성 신호가 생성되면, 음성 신호 처리부(120)는 생성된 확장 음성 신호에 대한 정규화를 수행한다. 이 같이, 확장 음성 신호에 대한 정규화를 수행하는 것은 저대역 필터를 통과한 음성 신호와 제2 음성 신호가 자연스럽게 연결되도록 하기 위한 것으로써, 음성 신호 처리부(120)는 아래[수학식 3]와 정의된 정규화 알고리즘을 통해 확장 음성 신호에 대한 정규화를 수행할 수 있다. When the extended speech signal is generated, the speech signal processing unit 120 normalizes the generated extended speech signal. As described above, the normalization of the extended speech signal is performed in order to allow the second speech signal to be naturally connected to the speech signal that has passed through the low-pass filter. The speech signal processor 120 calculates Lt; / RTI > normalization algorithm for the extended speech signal.

[수학식 3]에서 정의된 바와 같이, 주파수 값(k)이 기설정된 주파수 임계값(Thr) 미만이면, 음성 신호 처리부(120)는 확장 음성 신호에 대한 모든 시간축 프레임에 대해서 동일한 필터를 적용한다. 즉, 주파수 값(k)이 기설정된 주파수 임계값(Thr) 미만이면, 음성 신호 처리부(120)는 확장 음성 신호에 대한 각각의 시간축 프레임별 주파수 빈에 해당하는 주파수 값에 1을 곱한다.If the frequency value k is less than the predetermined threshold Thr as defined in Equation (3), the speech signal processing unit 120 applies the same filter to all time-base frames for the extended speech signal . That is, if the frequency value k is less than the predetermined frequency threshold value Thr, the voice signal processing unit 120 multiplies the frequency value corresponding to each time axis frame frequency bin for the extended voice signal by 1.

한편, 주파수 값(k)이 기설정된 주파수 임계값(Thr) 이상이면, 신호 처리 장치는

를 통해 산출된 제1 결과값을

를 통해 산출된 제2 결과값으로 나누어 각 프레임 별로 적용할 변수값인

을 생성한다. 여기서, 제1 결과값은 제1 음성 신호의 기설정된 주파수 임계값(Thr)부터 종료 주파수 임계값(End)까지의 합이 될 수 있으며, 제2 결과값은 제2 음성 신호에 해당하는 주파수 임계값(Thr)부터 종료 주파수 임계값(End)까지의 합이 될 수 있다. 이 같은 제1 결과값을 제2 결과값으로 나누어 각 프레임 별로 적용할 변수값이 생성되면, 음성 신호 처리부(120)는 각 프레임 별로 적용할 변수값을 각 프레임별 주파수 빈에 해당하는 주파수 값에 곱하여 확장 음성 신호에 대한 정규화를 수행할 수 있다.On the other hand, if the frequency value k is equal to or greater than the preset frequency threshold value Thr,

Lt; RTI ID = 0.0 >

The second resultant value calculated through the first step and the variable value to be applied to each frame

. Here, the first resultant value may be a sum of a predetermined frequency threshold Thr of the first speech signal from the end frequency threshold value (End), and the second resultant value may be a sum of a frequency threshold Can be the sum of the value Thr to the end frequency threshold (End). When the first result value is divided into the second result value and a variable value to be applied to each frame is generated, the speech signal processing unit 120 outputs a variable value to be applied to each frame to a frequency value corresponding to a frequency bin of each frame And perform normalization on the extended speech signal.

한편, 전술한 바와 같이, 골 전도 마이크를 통해 입력된 제1 음성 신호의 주파수 성분을 고려하여, 저장부(130)에 기저장된 글로벌 필터를 조정하는 필터 조정부(140)는 글로벌 필터 조정 알고리즘을 통해 기저장된 글로벌 필터를 조정할 수 있다. 여기서, 저장부(130)에 기저장되는 글로벌 필터는 복수의 피실험자에 의해 발화된 테스트 음성에 대한 골 전도 마이크의 센싱값 및 일반 에어 마이크의 센싱값을 각각 복수의 구간으로 구분하고, 각 구간 별로 두 센싱값을 비교한 비교 결과의 평균값을 산출한 후, 산출된 평균값을 수집하여 생성한 이퀄라이제이션(Equalization) 필터가 될 수 있다. 이 같이, 저장부(130)에 기저장되는 글로벌 필터는 아래[수학식 4]와 같이 정의된 글로벌 필터 알고리즘을 통해 생성될 수 있다.As described above, the filter adjusting unit 140, which adjusts the global filter stored in the storage unit 130 in consideration of the frequency component of the first voice signal input through the bone conduction microphone, The pre-stored global filter can be adjusted. Here, the global filter, which is stored in the storage unit 130, divides the sensing value of the bone conduction microphone and the sensing value of the general air microphone into test intervals for test speech uttered by a plurality of subjects, An equalization filter may be obtained by calculating an average value of comparison results obtained by comparing two sensed values and then collecting the calculated average values. In this way, the global filter stored in the storage unit 130 can be generated through the global filter algorithm defined by Equation (4) below.

여기서, F_BC(k)는 글로벌 필터값이며, k는 프레임 빈에 대한 주파수 값, l은 시간축 프레임 값, m은 사용자의 인덱스를 의미한다. 이와 같은 글로벌 필터 알고리즘을 통해 복수의 피실험자에 의해 발화된 테스트 음성에 대한 골 전도 마이크의 센싱값 및 일반 에어 마이크의 센싱값을 각각 복수의 구간으로 구분하고, 각 구간 별로 두 센싱값을 비교한 비교 결과의 평균값을 산출한 후, 산출된 평균값을 수집하여 복수의 피실험자들에 대한 글로벌 필터를 획득할 수 있다.Here, F _BC (k) is a global filter value, k is a frequency value for a frame bin, 1 is a time axis frame value, and m is a user index. Through the global filter algorithm, a sensed value of a bone conduction microphone and a sensed value of a general air microphone for a test voice uttered by a plurality of subjects are divided into a plurality of sections, and two sensed values are compared After calculating the average value of the results, the calculated average value may be collected to obtain a global filter for a plurality of subjects.

이 같은 글로벌 필터가 저장부(130)에 저장되면, 필터 조정부(140)는 골 전도 마이크를 통해 입력된 제1 음성 신호의 주파수 성분을 고려하여, 저장부(130)에 기저장된 글로벌 필터를 조정할 수 있다. 이 같은 필터 조정부(140)는 아래[수학식 5]와 같이 정의된 글로벌 필터 조정 알고리즘을 통해 저장부(130)에 기저장된 글로벌 필터를 조정할 수 있다.When the global filter is stored in the storage unit 130, the filter adjustment unit 140 adjusts the pre-stored global filter in the storage unit 130 in consideration of the frequency component of the first speech signal input through the bone conduction microphone . Such a filter adjusting unit 140 may adjust a global filter previously stored in the storage unit 130 through a global filter adjusting algorithm defined by Equation (5) below.

여기서,

는 글로벌 필터값이며, k는 주파수 값, α_agg는 변수이다. 즉, 필터 조정부(140)는 입력부(110)를 통해 골 전도 마이크로부터 입력된 제1 음성 신호의 크기가 기설정된 임계값을 초과하면, 변수 α_agg를 1로 설정하고, 임계값 미만이면, 변수 α_agg를 0 내지 1 사이 값으로 설정하여 글로벌 필터를 조정할 수 있다. 다시 말해, 제1 음성 신호의 크기가 기설정된 임계값을 초과할 경우, 변수 α_agg를 1로 설정함으로써, 입력된 제1 음성 신호는 기저장된 글로벌 필터값으로 보상될 수 있다. 한편, 제1 음성 신호의 크기가 기설정된 임계값 미만이면, 변수 α_agg를 0 내지 1 사이 값으로 설정함으로써 보상의 정도를 결정하며, 변수 α_agg가 0이면, 입력된 제1 음성 신호는 기저장된 글로벌 필터값으로의 보상이 이루어지지 않는다.here,

Is a global filter value, k is a frequency value, and? _Agg is a variable. That is, if the size of the first speech signal input from the bone conduction microphone through the input unit 110 exceeds a predetermined threshold value, the filter adjusting unit 140 sets the variable? _Agg to 1, the global filter can be adjusted by setting? _agg to a value between 0 and 1. In other words, when the size of the first speech signal exceeds a predetermined threshold value, by setting the variable? _Agg to 1, the inputted first speech signal can be compensated with the previously stored global filter value. On the other hand, if the magnitude of the first audio signal is less than a predetermined threshold value, the degree of compensation is determined by setting the variable? _Agg to a value between 0 and 1. If the variable? _Agg is 0, Compensation to the stored global filter value is not performed.

이와 같이, 입력된 제1 음성 신호에 대한 글로벌 필터가 조정되면, 음성 신호 처리부(120)는 아래[수학식 6]와 같이 정의된 출력 음성 신호 알고리즘을 통해 출력 음성 신호를 생성할 수 있다.When the global filter for the input first audio signal is adjusted, the audio signal processing unit 120 can generate an output audio signal through the output audio signal algorithm defined by Equation (6) below.

즉, 음성 신호 처리부(120)는 위에서 정의된 [수학식 2], [수학식 3], [수학식 5]를 통해 산출된 각각의 결과값에 대한 곱으로 출력 음성 신호인

를 생성할 수 있다. That is, the audio signal processing unit 120 multiplies each result value calculated through the above-described [Expression 2], [Expression 3], and [Expression 5]

Lt; / RTI >

이와 같이, 출력 음성 신호가 생성되면, 노이즈 제거부(150)는 생성된 출력 음성 신호에 남아있는 노이즈를 제거한다. 실시예에 따라, 노이즈 제거부(150)는 OM-LSA(Optimally Modified Log-spectral Amplitude) 및 IMCRA(Improved Minima Controlled Recursive Average) 기법 중 적어도 하나를 이용하여 게인값을 산출한다. 이후, 노이즈 제거부(150)는 산출된 게인값과 음성 신호 처리부(120)로부터 생성된 출력 음성 신호를 곱셈하여 노이즈가 제거된 출력 음성 신호를 출력한다. 전술한 기법 중 OM-LSA(Optimally Modified Log-spectral Amplitude) 기법은 노이즈 성분 추정 기법을 통해 추정된 노이즈 성분을 이용하여 게인값을 산출하는 기법으로써, 이 같은 기법은 공지된 기술이기에 본 발명에서는 상세한 설명을 생략하도록 한다. Thus, when the output speech signal is generated, the noise removing unit 150 removes the remaining noise from the generated output speech signal. According to an embodiment, the noise removing unit 150 calculates a gain value using at least one of an Optimally Modified Log-spectral Amplitude (OM-LSA) and an Improved Minima Controlled Recursive Average (IMCRA) technique. Thereafter, the noise eliminator 150 multiplies the calculated gain value by the output speech signal generated from the speech signal processing unit 120, and outputs an output speech signal from which the noise is removed. The OM-LSA (Optimally Modified Log-spectral Amplitude) technique is a technique for calculating a gain value using a noise component estimated through a noise component estimation technique. Since this technique is a known technique, The description will be omitted.

또다른 기법인 IMCRA(Improved Minima Controlled Recursive Average) 기법은 입력된 음성 신호에 대한 노이즈 성분을 추정하는 기법으로써, 이 같은 기법 역시 공지된 기술이기에 본 발명에서는 상세한 설명을 생략하도록 한다. 또한, 본 발명에서는 OM-LSA(Optimally Modified Log-spectral Amplitude) 및 IMCRA(Improved Minima Controlled Recursive Average) 기법에 대해서만 한정하였으나, 이에 한정되지 않으며, 노이즈 성분을 추정 및 제거하는 또다른 기법이 이용될 수 있음이 바람직하다.Another technique, IMCRA (Improved Minima Controlled Recursive Average), is a technique for estimating a noise component of an inputted speech signal. Since this technique is also a known technique, a detailed description will be omitted in the present invention. In addition, the present invention is limited to the OM-LSA (Optimally Modified Log-spectral Amplitude) and the IMCRA (Improved Minima Controlled Recursive Average) techniques. However, the present invention is not limited to this and another technique for estimating and eliminating noise components .

이와 같이, 본 발명에 따른 신호 처리 장치는 골 전도 마이크의 장점을 극대화시키면서, 나아가, 골 전도 마이크를 통해 출력되는 음성 신호의 고주파 성분을 보상하여 일반 에어 마이크에서 출력되는 음성 신호의 수준으로 음질을 향상시킬 수 있다. 지금까지, 본 발명에 따른 신호 처리 장치의 각 구성에 대해서 상세히 설명하였다. 이하에서는, 본 발명에 따른 신호 처리 장치에서 골 전도 마이크를 통해 입력된 음성 신호에 대한 음질을 향상시키는 방법에 대해서 상세히 설명하도록 한다.As described above, the signal processing apparatus according to the present invention maximizes the merit of the bone conduction microphone, and further compensates for the high frequency component of the voice signal output through the bone conduction microphone, thereby improving the sound quality to the level of the voice signal output from the general air microphone Can be improved. Up to now, each configuration of the signal processing apparatus according to the present invention has been described in detail. Hereinafter, a method for improving sound quality of a voice signal input through a bone conduction microphone in a signal processing apparatus according to the present invention will be described in detail.

도 2는 본 발명의 일 실시예에 따른 신호 처리 장치에서 골 전도 마이크를 통해 입력된 음성 신호에 대한 음질 향상 방법의 흐름도이다.FIG. 2 is a flowchart illustrating a method of improving sound quality of a speech signal input through a bone conduction microphone in a signal processing apparatus according to an exemplary embodiment of the present invention. Referring to FIG.

도 2에 도시된 바와 같이, 신호 처리 장치는 골 전도 마이크를 통해 사용자의 발화 음성에 대한 제1 음성 신호를 입력받는다(S210). 이 같은 사용자의 발화 음성에 대한 제1 음성 신호가 골 전도 마이크를 통해 입력되면, 신호 처리 장치는 입력된 제1 음성 신호의 주파수 성분을 이용하여 기설정된 임계 주파수 대역에 포함되는 주파수를 가지는 제2 음성 신호 및 입력된 제1 음성 신호와 제2 음성 신호를 조합하여 확장 음성 신호를 생성한다(S220).As shown in FIG. 2, the signal processing apparatus receives a first speech signal for a speech voice of a user through a bone conduction microphone (S210). When a first speech signal for a speech uttered by the user is input through the bone conduction microphone, the signal processor outputs a second speech signal having a frequency included in a preset critical frequency band using the frequency component of the input first speech signal, The voice signal and the input first voice signal and the second voice signal are combined to generate an extended voice signal (S220).

여기서, 제2 음성 신호는 제1 음성 신호의 하이 밴드 영역이 확장된 음성 신호로써, 신호 처리 장치는 제1 음성 신호의 로우 밴드 영역의 주파수 성분을 이용하여 제2 음성 신호를 생성할 수 있다. 실시예에 따라, 신호 처리 장치는 제1 음성 신호의 로우 밴드의 트랜스포지션을 이용한 대역 확장 기법을 통해 제1 음성 신호의 하이 밴드 영역에 대한 확장된 음성 신호인 제2 음성 신호를 생성할 수 있다. 여기서, 대역 확장 기법은 저주파수 영역인 로우 밴드 영역으로부터 고주파수 영역인 하이 밴드 영역의 음성 신호를 확장하여 원래의 음성 성분을 살려내는 기법이다.Here, the second audio signal is a voice signal in which the high band region of the first voice signal is extended, and the signal processing apparatus can generate the second voice signal using the frequency component of the low band region of the first voice signal. According to an embodiment, the signal processing apparatus can generate a second speech signal, which is an extended speech signal for the high-band region of the first speech signal, through a band extension technique using a low-band transmission of the first speech signal . Here, the band extension scheme is a technique for extracting original speech components by extending a high-band speech signal from a low-band region, which is a low-frequency region, to a high-frequency region.

그러나, 본 발명은 이에 한정되지 않으며, 신호 처리 장치는 스펙트럼 폴딩 기법과 같이, 제1 음성 신호에 포함된 하이 밴드 영역의 음성 신호를 확장하는 기법을 이용할 수 있다. 한편, 대역 확장 기법을 통해 제1 음성 신호의 하이 밴드 영역에 대한 확장된 음성 신호인 제2 음성 신호를 생성하는 알고리즘은 전술한 [수학식 1]와 같이 정의될 수 있다. 따라서, 신호 처리 장치는 [수학식 1]과 같이 정의된 알고리즘을 통해 제1 음성 신호에 포함된 하이 밴드 영역에 대한 제2 음성 신호인

를 생성할 수 있다.However, the present invention is not limited to this, and the signal processing apparatus can use a technique of extending a voice signal in a high band region included in a first voice signal, such as a spectrum folding technique. Meanwhile, the algorithm for generating the second voice signal, which is an extended voice signal for the high-band region of the first voice signal through the band extension technique, can be defined as Equation (1). Accordingly, the signal processing apparatus outputs the second voice signal for the high-band region included in the first voice signal through the algorithm defined by Equation (1)

Lt; / RTI >

즉, 주파수 빈에 대한 주파수 값(k)이 기설정된 주파수 임계값(Thr)보다 작으면, 신호 처리 장치는 제1 음성 신호에 포함된 하이 밴드 영역에 대한 제2 음성 신호를 생성하지 않는다. 한편, 주파수 빈에 대한 주파수 값(k)이 기설정된 주파수 임계값(Thr) 이상, 종료 주파수 임계값(End) 미만이면, 신호 처리 장치는 기설정된 구간 즉, low_b에서 high_b 사이의 구간의 로우 밴드 영역의 제1 음성 신호의 주파수 성분을 이용하여 기설정된 주파수 임계값(Thr) 이상, 종료 주파수 임계값(End) 미만 구간에 대한 제2 음성 신호를 생성할 수 있다.That is, when the frequency value k for the frequency bin is smaller than the predetermined frequency threshold Thr, the signal processing apparatus does not generate the second audio signal for the high band region included in the first audio signal. On the other hand, if the frequency value k for the frequency bin is equal to or greater than the predetermined frequency threshold value Thr and less than the end frequency threshold value End, the signal processing apparatus determines whether or not the low band of the interval between low_b and high_b The second audio signal can be generated using the frequency component of the first audio signal of the region for a period equal to or greater than a predetermined frequency threshold value Thr and less than the end frequency threshold value End.

예를 들어, 8KHz의 샘플링 레이트(Sampling Rate)를 가지는 제1 음성 신호가 골 전도 마이크를 통해 입력될 수 있다. 이 경우, 최대 주파수는 4KHz이고, 기설정된 주파수 임계값(Thr)이 2KHz이면, 0에서 2KHz에 해당하는 음성 신호에 기초하여 2KHz에서 4KHz에 해당하는 음성 신호를 제2 음성 신호로 생성할 수 있다. 또다른 예를 들어, 최대 주파수 값이 4KHz인 제1 음성 신호가 입력되었을 때, 기설정된 주파수 임계값(Thr)이 3KHz이면, 0에서 1KHz에 해당하는 음성 신호에 기초하여 3KHz에서 4KHz에 해당하는 음성 신호를 제2 음성 신호로 생성할 수 있다.For example, a first speech signal having a sampling rate of 8 KHz may be input through a bone conduction microphone. In this case, if the maximum frequency is 4 KHz and the predetermined threshold Thr is 2 KHz, a speech signal corresponding to 2 KHz to 4 KHz can be generated as the second speech signal based on the speech signal corresponding to 0 to 2 KHz . For example, when a first speech signal having a maximum frequency value of 4 KHz is input, if the predetermined threshold Thr is 3 KHz, the speech signal corresponding to 4 KHz to 3 KHz based on the speech signal corresponding to 0 to 1 KHz The voice signal can be generated as the second voice signal.

이와 같이, 제2 음성 신호가 생성되면, 신호 처리 장치는 골 전도 마이크를 통해 입력된 제1 음성 신호와 전술한 단계 220에서 생성된 제2 음성 신호를 가산처리하여 확장 음성 신호를 생성한다. 이와 같이, 제1 음성 신호 및 제2 음성 신호를 가산처리하여 확장 음성 신호를 생성하는 알고리즘은 전술한 [수학식 2]와 같이 정의될 수 있다.When the second voice signal is generated, the signal processing apparatus adds the first voice signal input through the bone conduction microphone and the second voice signal generated in step 220 to generate an extended voice signal. As described above, the algorithm for generating the extended speech signal by adding the first speech signal and the second speech signal can be defined as Equation (2).

전술한 바와 같이, 제1 음성 신호 중 로우 밴드 영역의 음성 신호는 저대역 필터를 통과한 음성 신호가 될 수 있으며, 제2 음성 신호는 전술한 [수학식 1]을 통해 산출될 수 있다. 따라서, 신호 처리 장치는 전술한 [수학식 2]에서 정의된 알고리즘에 따라 골 전도 마이크를 통해 입력된 제1 음성 신호 중 저대역 필터를 통과하는 음성 신호와 제2 음성 신호를 가산처리하여 확장 음성 신호인

를 생성할 수 있다.As described above, the voice signal in the low band region of the first voice signal can be a voice signal that has passed through the low-pass filter, and the second voice signal can be calculated through the above-described Equation (1). Accordingly, the signal processing apparatus adds the voice signal passing through the low-pass filter and the second voice signal among the first voice signals inputted through the bone conduction microphone according to the algorithm defined in the above-mentioned [Equation 2] Signaling

Lt; / RTI >

한편, 전술한 [수학식 1]에서 주파수 값(k)이 기설정된 주파수 임계값(Thr) 미만인 경우, 확장 음성 신호는 입력된 제1 음성 신호가 될 수 있다.On the other hand, when the frequency value k is less than the preset frequency threshold Thr in Equation (1), the extended speech signal can be the input first speech signal.

이와 같이, 확장 음성 신호가 생성되면, 신호 처리 장치는 생성된 확장 음성 신호에 대한 정규화를 수행한다. 이 같이, 확장 음성 신호에 대한 정규화를 수행하는 것을 저대역 필터를 통과한 음성 신호와 제2 음성 신호가 자연스럽게 연결되도록 하기 위한 것으로써, 신호 처리 장치는 전술한 [수학식 3]을 통해 정의된 정규화 알고리즘을 통해 확장 음성 신호에 대한 정규화를 수행할 수 있다.As such, when the extended speech signal is generated, the signal processing apparatus performs normalization on the generated extended speech signal. In this way, normalization of the extended speech signal is performed so that the speech signal that has passed through the low-pass filter and the second speech signal are naturally connected to each other. By doing so, Normalization of the extended speech signal can be performed through the normalization algorithm.

전술한 [수학식 3]에서 설명한 바와 같이, 주파수 값(k)이 기설정된 주파수 임계값(Thr) 미만이면, 신호 처리 장치는 확장 음성 신호에 대한 모든 시간축 프레임에 대해서 동일한 필터를 적용한다. 즉, 주파수 값(k)이 기설정된 주파수 임계값(Thr) 미만이면, 신호 처리 장치는 확장 음성 신호에 대한 각각의 시간축 프레임별 주파수 빈에 해당하는 주파수 값에 1을 곱한다.If the frequency value k is less than the predetermined frequency threshold value Thr, the signal processing apparatus applies the same filter to all time-base frames for the extended speech signal, as described in the above-mentioned Equation (3). That is, when the frequency value k is less than the predetermined frequency threshold value Thr, the signal processing apparatus multiplies the frequency value corresponding to each time axis frame frequency bin for the extended speech signal by 1.

를 통해 산출된 제1 결과값을

를 통해 산출된 결과값으로 나누어 각 프레임 별로 적용할 변수값인

을 생성한다. 여기서, 제1 결과값은 제1 음성 신호의 기설정된 주파수 임계값(Thr)부터 종료 주파수 임계값(End)까지의 합이 될 수 있으며, 제2 결과값은 제2 음성 신호에 해당하는 주파수 임계값(Thr)부터 종료 주파수 임계값(End)까지의 합이 될 수 있다. 이 같은 제1 결과값을 제2 결과값으로 나누어 각 프레임 별로 적용할 변수값이 생성되면, 음성 신호 처리부(120)는 각 프레임 별로 적용할 변수값을 각 프레임 별 주파수 빈에 해당하는 주파수 값에 곱하여 확장 음성 신호에 대한 정규화를 수행할 수 있다.On the other hand, if the frequency value k is equal to or greater than the preset frequency threshold value Thr,

Lt; RTI ID = 0.0 >

And the resultant value is applied to each frame,

이후, 신호 처리 장치는 골 전도 마이크를 통해 입력된 제1 음성 신호의 주파수 성분을 고려하여 기저장된 글로벌 필터를 조정한다(S230). 여기서, 저장부에 기저장되는 글로벌 필터는 복수의 피실험자에 의해 발화된 테스트 음성에 대한 골 전도 마이크의 센싱값 및 일반 에어 마이크의 센싱값을 각각 복수의 구간으로 구분하고, 각 구간 별로 두 센싱값을 비교한 비교 결과의 평균값을 산출한 후, 산출된 평균값을 수집하여 생성한 이퀄라이제이션 필터가 될 수 있다. 이 같이, 저장부에 기저장되는 글로벌 필터는 전술한 [수학식 4]와 같이 정의된 글로벌 필터 알고리즘을 통해 생성될 수 있다.Thereafter, the signal processing apparatus adjusts the pre-stored global filter in consideration of the frequency component of the first speech signal input through the bone conduction microphone (S230). Here, the global filter, which is stored in the storage unit in advance, divides the sensing value of the bone conduction microphone and the sensing value of the general air microphone into test intervals for test speech uttered by a plurality of subjects, The average value of the comparison result obtained by comparing the calculated average value and the calculated average value may be used as the equalization filter. As described above, the global filter stored in the storage unit can be generated through the global filter algorithm defined as Equation (4).

[수학식 4]에서 정의된 글로벌 필터 알고리즘을 통해 글로벌 필터가 생성되어 저장되면, 신호 처리 장치는 전술한 [수학식 5]와 같이 정의된 글로벌 필터 조정 알고리즘을 통해 기저장된 글로벌 필터를 조정할 수 있다.When a global filter is created and stored through the global filter algorithm defined in Equation (4), the signal processing apparatus can adjust the previously stored global filter through a global filter adjustment algorithm defined as Equation (5) .

[수학식 5]에서 설명한 바와 같이, 골 전도 마이크로부터 입력된 제1 음성 신호의 크기가 기설정된 임계값을 초과하면, 신호 처리 장치는 변수 α_agg를 1로 설정하고, 임계값 미만이면, 변수 α_agg를 0 내지 1 사이 값으로 설정하여 글로벌 필터를 조정할 수 있다. 다시 말해, 제1 음성 신호의 크기가 기설정된 임계값을 초과할 경우, 신호 처리 장치는 변수 α_agg를 1로 설정함으로써, 입력된 제1 음성 신호는 기저장된 글로벌 필터값으로 보상될 수 있다. 한편, 제1 음성 신호의 크기가 기설정된 임계값 미만이면, 신호 처리 장치는 변수 α_agg를 0으로 설정함으로써, 입력된 제1 음성 신호는 기저장된 글로벌 필터값으로의 보상이 이루어지지 않는다.As described in Equation (5), if the magnitude of the first speech signal input from the bone conduction microphone exceeds a predetermined threshold value, the signal processing apparatus sets the variable? _Agg to 1, and if it is less than the threshold value, the global filter can be adjusted by setting? _agg to a value between 0 and 1. In other words, when the size of the first speech signal exceeds a predetermined threshold value, the signal processing apparatus can set the variable? _Agg to 1 so that the inputted first speech signal can be compensated with the previously stored global filter value. On the other hand, if the magnitude of the first audio signal is less than the predetermined threshold value, the signal processing apparatus sets the variable? _Agg to 0, so that the inputted first audio signal is not compensated to the previously stored global filter value.

이와 같이, 골 전도 마이크를 통해 입력된 제1 음성 신호에 대한 글로벌 필터가 조정되면, 신호 처리 장치는 전술한 [수학식 6]에서 정의된 출력 음성 신호 알고리즘을 통해 출력 음성 신호를 생성할 수 있다.Thus, if the global filter for the first speech signal input through the bone conduction microphone is adjusted, the signal processing apparatus can generate the output speech signal through the output speech signal algorithm defined in Equation (6) above .

[수학식 6]에서 설명한 바와 같이, 신호 처리 장치는 전술한 [수학식 2], [수학식 3], [수학식 5]에서 정의된 각각의 알고리즘을 통해 산출된 결과값에 대한 곱으로 출력 음성 신호를 생성할 수 있다. 이와 같이, 제1 음성 신호에 대한 출력 음성 신호가 생성되면, 신호 처리 장치는 생성된 출력 음성 신호에 남아있는 노이즈를 제거한다(S240). 실시예에 따라, 신호 처리 장치는 OM-LSA(Optimally Modified Log-spectral Amplitude) 및 IMCRA(Improved Minima Controlled Recursive Average) 기법 중 적어도 하나를 이용하여 게인값을 산출한다. 이후, 신호 처리 장치는 산출된 게인값과 [수학식 6]에서 정의된 알고리즘을 통해 생성된 출력 음성 신호를 곱셈하여 노이즈가 제거된 출력 음성 신호를 출력한다. 한편, 본 발명에서는 입력된 음성 신호에 대한 노이즈 제거 기법으로써, OM-LSA 및 IMCRA 기법에 대해서 한정하였으나, 이에 한정되지 않으며, 노이즈 성분을 추정 및 제거하는 또다른 기법이 이용될 수 있음이 바람직하다.As described in the expression (6), the signal processing apparatus multiplies the result values calculated through the respective algorithms defined in the above-described [Expression 2], [Expression 3] and [Expression 5] A voice signal can be generated. In this manner, when the output speech signal for the first speech signal is generated, the signal processing apparatus removes the remaining noise from the generated output speech signal (S240). According to an embodiment, the signal processing apparatus calculates a gain value using at least one of Optimally Modified Log-spectral Amplitude (OM-LSA) and Improved Minima Controlled Recursive Average (IMCRA) techniques. Thereafter, the signal processor multiplies the calculated gain value by the output speech signal generated through the algorithm defined in Equation (6) to output an output speech signal from which noise has been removed. In the present invention, the OM-LSA and IMCRA techniques are limited to noise reduction techniques for input speech signals. However, the present invention is not limited thereto, and another technique for estimating and eliminating noise components may be used .

이와 같이, 본 발명에 따른 신호 처리 장치는 출력 음성 신호에 남아있는 노이즈를 제거함으로써, 골 전도 마이크의 장점을 극대화시키면서, 나아가, 골 전도 마이크를 통해 출력되는 음성 신호의 고주파 성분을 보상하여 일반 에어 마이크에서 출력되는 음성 신호의 수준으로 음질을 향상시킬 수 있다. As described above, the signal processing apparatus according to the present invention maximizes the advantages of the bone conduction microphone by removing the noise remaining in the output speech signal, and further compensates the high frequency component of the voice signal output through the bone conduction microphone, The sound quality can be improved by the level of the voice signal output from the microphone.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다.The present invention has been described with reference to the preferred embodiments.

이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the invention as defined by the appended claims. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention.

110 : 입력부 120 : 음성 신호 처리부
130 : 저장부 140 : 필터 조정부
150 : 노이즈 제거부110: input unit 120: audio signal processing unit
130: storage unit 140: filter adjustment unit
150: Noise elimination

Claims

An input unit for receiving the first voice signal through the bone conduction microphone;
Generating a second voice signal having a frequency within a predetermined interval by using a frequency component of a low band region of the first voice signal, combining the first and second voice signals, A voice signal processor for generating a voice signal;
A storage unit in which a global filter is stored;
A filter adjusting unit adjusting the global filter in consideration of a frequency component of the first audio signal; And
A noise removing unit for removing noise of the extended speech signal to which the adjusted global filter value is applied;
Lt; / RTI >
Wherein the global filter comprises:
A sensing value of a bone conduction microphone and a sensing value of a general air microphone for a test voice uttered by a plurality of subjects are divided into a plurality of sections and the average value of comparison results obtained by comparing two sensed values for each section is calculated And an equalization filter generated by collecting the calculated average value.

The method according to claim 1,
The audio signal processing unit includes:
And adds the first audio signal and the second audio signal, and performs normalization to generate the extended audio signal.

The method according to claim 1,
Wherein the audio signal processing unit generates the second audio signal and the extended audio signal using the following equation:

,

Lt; / RTI >

A signal value obtained by passing the first audio signal through the low-pass filter,

The second audio signal value,

I is a frequency value between a frequency value (low_b) of a low band of the input first audio signal and a frequency value (high_b) of a high band, where i is a frequency value of the extended audio signal, k is a frequency value, .

delete

The method according to claim 1,
Wherein the adjusted global filter is expressed by the following equation:

Lt;
here,

Is a global filter value, k is a frequency value, and? _Agg is a variable.

6. The method of claim 5,
Wherein the filter adjusting unit comprises:
The variable is set to 1 if the size of the first speech signal input to the input unit exceeds a preset threshold value and the variable is set to a value between 0 and 1 if the size of the first speech signal is less than the threshold, And the signal processing apparatus.

A signal processing method for improving sound quality of a speech signal input through a bone conduction microphone in a signal processing apparatus,
Receiving a first speech signal through the bone conduction microphone;
Generating a second voice signal having a frequency within a predetermined interval by using a frequency component of a low band region of the first voice signal, combining the first and second voice signals, ;
Adjusting a pre-stored global filter in consideration of a frequency component of the first speech signal; And
Removing the noise of the extended speech signal to which the adjusted global filter value is applied;
Lt; / RTI >
Wherein the global filter comprises:
A sensing value of a bone conduction microphone and a sensing value of a general air microphone for a test voice uttered by a plurality of subjects are divided into a plurality of sections and the average value of comparison results obtained by comparing two sensed values for each section is calculated And an equalization filter generated by collecting the calculated average value.

8. The method of claim 7,
Wherein the step of generating the extended speech signal comprises:
Adding the first audio signal and the second audio signal, and performing normalization to generate the extended audio signal.

8. The method of claim 7,
Wherein the step of generating the extended speech signal comprises:
And the second audio signal and the extended audio signal are generated using the following equation:

,

Lt; / RTI >

The second audio signal value,

delete

8. The method of claim 7,
Wherein the adjusted global filter is expressed by the following equation:

Lt;
here,

Is a global filter value, k is a frequency value, and? _Agg is a variable.

12. The method of claim 11,
Wherein adjusting the pre-stored global filter comprises:
The variable is set to 1 if the magnitude of the input first audio signal exceeds a predetermined threshold value and is set to a value of 1 to 0 when the magnitude of the input first audio signal is less than the threshold value, .