KR101098763B1

KR101098763B1 - Method and system of suppressing noise

Info

Publication number: KR101098763B1
Application number: KR1020100005250A
Authority: KR
Inventors: 반재미; 김헌중
Original assignee: 주식회사 코아로직
Priority date: 2010-01-20
Filing date: 2010-01-20
Publication date: 2011-12-26
Also published as: KR20110085453A

Abstract

본 발명은 잡음 제거 방법에 관한 것으로서, 잡음 신호 및 음성 신호를 가지는 입력 신호에 포함된 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별하고, 복수의 프레임들 중 잡음 구간으로 판별된 적어도 하나의 잡음 프레임에서 잡음을 추정하며, 복수의 프레임들 중 잡음 구간에서 음성 구간으로 천이된 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 음성 천이 프레임에 대한 추정 잡음으로 이용하여 음성 천이 프레임에서 잡음을 제거하고, 복수의 프레임들 중 음성 구간으로 판별된 적어도 하나의 음성 프레임에서 추정 잡음을 이용하여 잡음을 제거한다.The present invention relates to a noise removing method, comprising: determining a plurality of frames included in an input signal having a noise signal and a voice signal as a noise section or a voice section, and at least one noise frame determined as a noise section among the plurality of frames Noise is estimated by using a noise estimated from a predetermined number of noise frames for a voice transition frame transitioned from a noise period to a voice interval among a plurality of frames as an estimated noise for the voice transition frame. And remove noise by using estimated noise in at least one voice frame determined as a voice section among the plurality of frames.

Description

Method and system of suppressing noise

본 발명은 멀티미디어 신호 중 음성/오디오 신호 처리에 관한 것으로, 더욱 상세하게는, 음성/오디오 신호에 포함된 잡음을 제거하는 방법 및 시스템에 관한 것이다.The present invention relates to voice / audio signal processing among multimedia signals, and more particularly, to a method and a system for removing noise included in a voice / audio signal.

이동 통신 분야에서 멀티미디어 신호 처리 기술은 소리나 영상과 같은 다양한 표현 매체들을 인식하여 관련 정보를 처리하는 것을 말한다. 디지털 이동 통신 시스템의 출현 및 유선 통신 시스템의 발전과 더불어 높은 수준의 음성/오디오 코덱 기술이 개발되고 있다. 이러한 음성/오디오 코덱 기술에서 중요한 요소는 전송 속도, 다양한 환경 하에서의 음질, 음성 부호화 지연 시간, 복잡도 등이 있는데, 이 중에서 음질이 실제 응용분야에서 특히 중요한 요소로 작용한다. In the mobile communication field, multimedia signal processing technology refers to processing various information by recognizing various expression media such as sound or video. With the advent of digital mobile communication systems and the development of wired communication systems, high-level voice / audio codec technology has been developed. Important factors in the voice / audio codec technology include transmission speed, sound quality under various environments, voice encoding delay time, and complexity, among which sound quality is particularly important in practical applications.

근래에 이동 통신용 단말기를 이용하여 음성 통화를 하는 기술이 일반적으로 사용되고 있다. 그러나, 주변 환경의 소음이 심한 환경인 전철, 길거리 또는 음악이 크게 나오는 커피숍, 행사장 등에서 사용자가 이러한 이동 통신용 단말기로 통화를 하는 경우에는 주위의 심한 잡음으로 인해 통화 품질이나 재생 음질이 저하될 수 있다. 이때, 단말기에 입력되는 신호에서 배경 잡음을 제거하는 기술이 요구되는데, 음성 신호의 에너지 레벨이 낮은 경우에는 배경 잡음으로 인해 음성 신호가 배경 잡음에 묻히는 경우가 생길 수 있다.Recently, a technology for making a voice call using a mobile communication terminal has been generally used. However, when a user makes a call to such a mobile communication terminal in a train, street, or music venue where there is a loud noise of the surrounding environment, the call quality or playback sound quality may be degraded due to the excessive noise. have. In this case, a technique of removing background noise from a signal input to the terminal is required. When the energy level of the voice signal is low, the voice signal may be buried in the background noise due to the background noise.

본 발명이 해결하고자 하는 과제는 잡음 신호와 음성 신호를 가지는 입력 신호의 잡음 구간에서 효과적으로 잡음을 추정함으로써, 입력 신호에서 출력되는 음성 신호의 품질을 향상시킬 수 있는 잡음 제거 방법 및 시스템, 및 상기 잡음 제거 방법을 실행하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 있다.The problem to be solved by the present invention is a noise reduction method and system that can improve the quality of the speech signal output from the input signal by effectively estimating the noise in the noise section of the input signal having a noise signal and the voice signal, and the noise A computer readable recording medium having recorded thereon a program for executing the removal method is provided.

상기 과제를 해결하기 위한 본 발명에 따른 잡음 제거 방법은 잡음 신호 및 음성 신호를 가지는 입력 신호에 포함된 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별하는 단계; 상기 복수의 프레임들 중 상기 잡음 구간으로 판별된 적어도 하나의 잡음 프레임에서 잡음을 추정하는 단계; 상기 복수의 프레임들 중 상기 잡음 구간에서 상기 음성 구간으로 천이된 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 상기 음성 천이 프레임에 대한 추정 잡음으로 이용하여 상기 음성 천이 프레임에서 잡음을 제거하는 단계; 및 상기 복수의 프레임들 중 상기 음성 구간으로 판별된 적어도 하나의 음성 프레임에서 상기 추정 잡음을 이용하여 잡음을 제거하는 단계를 포함한다.According to an aspect of the present invention, there is provided a noise removing method, comprising: determining a plurality of frames included in an input signal having a noise signal and a voice signal as a noise section or a voice section; Estimating noise in at least one noise frame determined as the noise section among the plurality of frames; Noise in the speech transition frame is estimated using noise estimated from a noise frame before the predetermined number of speech transition frames transitioned from the noise section to the speech section among the plurality of frames as the estimated noise for the speech transition frame. Removing; And removing noise from the at least one speech frame determined as the speech section among the plurality of frames by using the estimated noise.

또한, 상기 과제는 잡음 신호 및 음성 신호를 가지는 입력 신호에 포함된 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별하는 단계; 상기 복수의 프레임들 중 상기 잡음 구간으로 판별된 적어도 하나의 잡음 프레임에서 잡음을 추정하는 단계; 상기 복수의 프레임들 중 상기 잡음 구간에서 상기 음성 구간으로 천이된 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 상기 음성 천이 프레임에 대한 추정 잡음으로 이용하여 상기 음성 천이 프레임에서 잡음을 제거하는 단계; 및 상기 복수의 프레임들 중 상기 음성 구간으로 판별된 적어도 하나의 음성 프레임에서 상기 추정 잡음을 이용하여 잡음을 제거하는 단계를 포함하는 잡음 제거 방법을 실행하기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 의해 달성된다.In addition, the task is to determine a plurality of frames included in the input signal having a noise signal and a voice signal as a noise interval or a voice interval; Estimating noise in at least one noise frame determined as the noise section among the plurality of frames; Noise in the speech transition frame is estimated using noise estimated from a noise frame before the predetermined number of speech transition frames transitioned from the noise section to the speech section among the plurality of frames as the estimated noise for the speech transition frame. Removing; And removing noise from the at least one voice frame determined as the voice interval among the plurality of frames using the estimated noise. 2. A computer-readable recording medium having recorded thereon a program for executing the noise removing method. Is achieved by.

또한, 상기 과제를 해결하기 위한 본 발명에 따른 잡음 제거 시스템은 잡음 신호 및 음성 신호를 가지는 입력 신호에 포함된 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별하는 음성/잡음 구간 판별부; 상기 복수의 프레임들 중 상기 잡음 구간으로 판별된 적어도 하나의 잡음 프레임 또는 상기 음성 구간으로 판별된 적어도 하나의 음성 프레임에서 천이가 발생됐는지 여부를 판단하는 천이 판단부; 상기 적어도 하나의 잡음 프레임에서 잡음을 추정하고, 상기 잡음 구간에서 상기 음성 구간으로 천이가 발생된 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 상기 음성 천이 프레임에 대한 추정 잡음으로 갱신하는 잡음 추정 및 갱신부; 및 상기 추정 잡음을 이용하여 상기 음성 천이 프레임 또는 상기 음성 프레임에서 잡음을 제거하는 잡음 제거부를 포함한다.In addition, a noise canceling system according to the present invention for solving the above problems is a voice / noise section determination unit for discriminating a plurality of frames included in the input signal having a noise signal and a voice signal as a noise section or a speech section; A transition determining unit determining whether a transition has occurred in at least one noise frame determined as the noise section or at least one voice frame determined as the speech section among the plurality of frames; Estimating noise in the at least one noise frame, and updating noise estimated in a predetermined number of noise frames with respect to the speech transition frame to the speech transition frame in which the transition from the noise section to the speech section occurs. A noise estimation and updating unit; And a noise removing unit for removing noise from the speech transition frame or the speech frame using the estimated noise.

본 발명에 따르면, 잡음 구간에서 음성 구간으로 천이된 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 음성 천이 프레임에 대한 추정 잡음으로 이용하여 음성 천이 프레임에서 잡음을 제거함으로써, 잡음 구간으로 판별된 잡음 프레임에 음성 신호가 포함되더라도 해당 잡음 프레임들에서 추정된 잡음을 이후의 잡음 제거 동작에서 이용하지 않는다. 이로써, 잡음 제거 동작을 수행할 때에 입력 신호에서 제거되는 잡음 성분에는 음성 신호 성분이 거의 포함되지 않게 됨으로써, 출력되는 음성 신호의 왜곡을 방지할 수 있다.According to the present invention, by removing the noise from the speech transition frame using the noise estimated from the noise frame before the predetermined number for the speech transition frame transitioned from the noise interval to the speech interval as the estimated noise for the speech transition frame, Although the speech signal is included in the noise frame determined as, the noise estimated in the noise frames is not used in a subsequent noise removal operation. As a result, when the noise removing operation is performed, the noise component removed from the input signal includes almost no voice signal component, thereby preventing distortion of the output voice signal.

또한, 음성 구간에서 잡음 구간으로 천이된 잡음 천이 프레임에 대해 소정 개수 이후의 잡음 프레임에서 잡음 추정 동작을 수행하지 않음으로써, 잡음 구간으로 판별된 잡음 프레임에 음성 신호가 포함되더라도 해당 잡음 프레임들은 후속 잡음 제거 동작에 영향을 미치지 않는다.In addition, since the noise estimation operation is not performed on a noise transition frame after a predetermined number of noise transition frames transitioned from the speech section to the noise section, even if the noise signal determined as the noise section includes the voice signal, the corresponding noise frames are the subsequent noise. Does not affect the removal operation.

도 1은 본 발명의 일 실시예에 따른 음성/오디오 코덱을 개략적으로 나타내는 블록도이다.
도 2는 본 발명의 일 실시예에 따른 잡음 제거 시스템을 나타내는 블록도이다.
도 3은 도 2의 잡음 제거 시스템에 입력되는 입력 신호의 잡음 구간과 음성 구간을 나타내는 그래프이다.
도 4는 도 2의 잡음 제거 시스템에 입력되는 잡음 신호, 음성 신호 및 입력 신호 각각의 에너지 레벨의 일 예를 나타내는 그래프이다.
도 5는 도 2의 음성/잡음 구간 판별부의 출력으로써, 입력 신호에 포함된 프레임들이 잡음 구간에서 음성 구간으로 천이되는 일 예를 나타낸다.
도 6은 도 2의 음성/잡음 구간 판별부의 출력으로써, 입력 신호에 포함된 프레임들이 음성 구간에서 잡음 구간으로 천이되는 일 예를 나타낸다.
도 7a 및 7b는 a가 4인 경우 도 2의 추정 잡음 저장부의 구성을 개략적으로 나타낸다.
도 8은 본 발명의 일 실시예에 따른 잡음 제거 방법을 나타내는 흐름도이다.1 is a block diagram schematically illustrating a voice / audio codec according to an embodiment of the present invention.
2 is a block diagram illustrating a noise cancellation system according to an exemplary embodiment of the present invention.
3 is a graph illustrating a noise section and a voice section of an input signal input to the noise removing system of FIG. 2.
4 is a graph illustrating an example of energy levels of a noise signal, a voice signal, and an input signal input to the noise removing system of FIG. 2.
FIG. 5 illustrates an example in which frames included in an input signal transition from a noise section to a voice section as an output of the voice / noise section discriminating unit of FIG. 2.
FIG. 6 illustrates an example in which frames included in an input signal are transitioned from a speech section to a noise section as an output of the speech / noise section discriminator of FIG. 2.
7A and 7B schematically illustrate the configuration of the estimated noise storage unit of FIG. 2 when a is 4.
8 is a flowchart illustrating a noise removing method according to an embodiment of the present invention.

본문에 개시되어 있는 본 발명의 실시예들에 대해서, 특정한 구조적 내지 기능적 설명들은 단지 본 발명의 실시예를 설명하기 위한 목적으로 예시된 것으로, 본 발명의 실시예들은 다양한 형태로 실시될 수 있으며 본문에 설명된 실시예들에 한정되는 것으로 해석되어서는 아니 된다. With respect to the embodiments of the present invention disclosed in the text, specific structural to functional descriptions are merely illustrated for the purpose of describing embodiments of the present invention, embodiments of the present invention may be implemented in various forms and It should not be construed as limited to the embodiments described in.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, it will be described in detail a preferred embodiment of the present invention. The same reference numerals are used for the same constituent elements in the drawings and redundant explanations for the same constituent elements are omitted.

도 1은 본 발명의 일 실시예에 따른 음성/오디오 코덱을 개략적으로 나타내는 블록도이다.1 is a block diagram schematically illustrating a voice / audio codec according to an embodiment of the present invention.

도 1을 참조하면, 음성/오디오 코덱(10)은 입력 신호 처리부(11), 음성/오디오 프로세서(12) 및 출력 신호 처리부(13)를 포함한다. 여기서, 음성/오디오 코덱(10)은 부호화기 또는 복호화기일 수 있다.Referring to FIG. 1, the voice / audio codec 10 includes an input signal processor 11, a voice / audio processor 12, and an output signal processor 13. Here, the voice / audio codec 10 may be an encoder or a decoder.

입력 신호 처리부(11)는 아날로그/디지털 변환기(Analog to Digital Converter)를 포함하여, 아날로그의 입력 신호를 디지털로 변환할 수 있다. 음성/오디오 프로세서(12)는 입력 신호 처리부(11)에서 출력된 디지털 신호를 프레임 단위로 부호화/복호화 할 수 있다. 출력 신호 처리부(13)는 디지털/아날로그 변환기(Digital to Analog Converter)를 포함하여, 음성/오디오 프로세서(12)에서 출력된 디지털 신호를 아날로그 신호로 변환할 수 있다.The input signal processor 11 may include an analog to digital converter, and may convert an analog input signal into digital. The audio / audio processor 12 may encode / decode the digital signal output from the input signal processor 11 in units of frames. The output signal processor 13 may include a digital to analog converter, and may convert a digital signal output from the voice / audio processor 12 into an analog signal.

본 실시예에서, 입력 신호 처리부(11) 또는 출력 신호 처리부(13)는 잡음 제거 시스템을 더 포함할 수 있다. 구체적으로, 음성/오디오 코덱(10)이 부호화기인 경우 입력 신호 처리부(11)는 잡음 제거 시스템을 포함할 수 있고, 음성/오디오 코덱(10)이 복호화기인 경우 출력 신호 처리부(13)는 잡음 제거 시스템을 포함할 수 있으나, 본 실시예는 이에 한정되지 않고 입력 신호 처리부(11) 및 출력 신호 처리부(13) 둘 다 잡음 제거 시스템을 포함할 수 있다.In the present embodiment, the input signal processor 11 or the output signal processor 13 may further include a noise cancellation system. In detail, when the voice / audio codec 10 is an encoder, the input signal processor 11 may include a noise removing system. When the voice / audio codec 10 is a decoder, the output signal processor 13 may remove noise. A system may be included, but the exemplary embodiment is not limited thereto, and both the input signal processor 11 and the output signal processor 13 may include a noise canceling system.

이와 같은 음성/오디오 코덱(10)은 컴퓨터 단말기, PSTN(Public Switched Telephone Network) 단말기, VoIP, SIP(Session Initiation Protocol), Megaco, PDA(Personal Digital Assistant), 셀룰러폰, PCS(Personal Communication Service)폰, 핸드 헬드 PC(Hand-Held PC), CDMA-2000(1X, 3X)폰, WCDMA(Wideband CDMA)폰, 듀얼 밴드/듀얼 모드(Dual Band/Dual Mode)폰, GSM(Global Standard for Mobile)폰, MBS(Mobile Broadband System)폰, 또는 위성/지상파 DMB(Digital Multimedia Broadcasting)폰, 녹음기, 블랙박스 등에 포함될 수 있다.The voice / audio codec 10 may be a computer terminal, a public switched telephone network (PSTN) terminal, VoIP, a session initiation protocol (SIP), a megaco, a personal digital assistant (PDA), a cellular phone, a personal communication service (PCS) phone. , Hand-Held PC, CDMA-2000 (1X, 3X) Phone, Wideband CDMA Phone, Dual Band / Dual Mode Phone, Global Standard for Mobile (GSM) Phone It may be included in a mobile broadband system (MBS) phone, or a satellite / terrestrial digital multimedia broadcasting (DMB) phone, a recorder, or a black box.

도 2는 본 발명의 일 실시예에 따른 잡음 제거 시스템을 나타내는 블록도이다.2 is a block diagram illustrating a noise cancellation system according to an exemplary embodiment of the present invention.

도 2를 참조하면, 잡음 제거 시스템(20)은 음성/잡음 구간 판별부(21), 천이 판단부(22), 잡음 추정 및 갱신부(23), 추정 잡음 저장부(24), 변환부(25) 및 잡음 제거부(26)를 포함할 수 있다.Referring to FIG. 2, the noise removing system 20 includes a speech / noise section discriminating unit 21, a transition determining unit 22, a noise estimating and updating unit 23, an estimated noise storing unit 24, and a converting unit ( 25) and the noise canceller 26 may be included.

음성/잡음 구간 판별부(21)는 잡음 신호와 음성 신호를 가지는 입력 신호(IN)를 수신하여, 수신된 입력 신호(IN)에 포함된 복수의 프레임들(frames)을 잡음 구간 또는 음성 구간으로 판별할 수 있다. 본 실시예에서, 음성/잡음 구간 판별부(21)는 VAD(Voice Activity Detection)로 구현될 수 있고, 음성 신호의 진폭 값이 일정 한도(threshold)에 도달하는지를 감지해서 패킷(packet)의 생성 여부를 결정함으로써 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별할 수 있다.The voice / noise section determination unit 21 receives an input signal IN having a noise signal and a voice signal, and converts a plurality of frames included in the received input signal IN into a noise section or a voice section. Can be determined. In the present embodiment, the voice / noise section determination unit 21 may be implemented with Voice Activity Detection (VAD), and whether the packet is generated by detecting whether the amplitude value of the voice signal reaches a predetermined threshold. By determining, the plurality of frames may be determined as a noise section or a voice section.

일반적으로, 통화 환경 등에는 일정한 양의 배경 잡음, 즉, 잡음 신호가 존재하므로, 음성 신호가 생성되면 잡음 신호와 음성 신호가 포함된 입력 신호가 음성/잡음 구간 판별부(21)에 입력된다. 여기서, 잡음 구간은 음성 신호가 포함되지 않고 잡음 신호만 포함된 구간이고, 음성 구간은 잡음 신호 및 음성 신호가 포함된 구간이다.In general, since a certain amount of background noise, i.e., a noise signal, exists in a call environment or the like, when a voice signal is generated, an input signal including the noise signal and the voice signal is input to the voice / noise section discriminator 21. Here, the noise section is a section including only the noise signal, not the voice signal, the speech section is a section including the noise signal and the voice signal.

도 3은 도 2의 잡음 제거 시스템에 입력되는 입력 신호의 잡음 구간과 음성 구간을 나타내는 그래프이다.3 is a graph illustrating a noise section and a voice section of an input signal input to the noise removing system of FIG. 2.

도 3을 참조하면, X축은 시간을 나타내고, Y축은 에너지 레벨을 나타낸다. 음성/잡음 구간 판별부(21)는 입력 신호(IN)의 에너지 레벨을 분석하여, 에너지 레벨의 변화량이 일정한 범위 이내인 경우에는 잡음 구간으로 판별하고, 에너지 레벨의 변화량이 일정한 범위 이상인 경우에는 음성 구간으로 판별할 수 있다. Referring to FIG. 3, the X axis represents time and the Y axis represents energy level. The voice / noise section discriminating unit 21 analyzes the energy level of the input signal IN and discriminates the noise section when the amount of change in the energy level is within a certain range, and the voice when the amount of change in the energy level is above a certain range. It can be determined as a section.

입력 신호(IN)가 잡음 구간에서 음성 구간으로 천이되는 경우에, 음성/잡음 구간 판별부(21)는 이전 몇 개의 프레임들에 음성 신호가 존재함에도 불구하고 해당 프레임들을 잡음 구간으로 판별할 수 있다. 또한, 입력 신호(IN)가 음성 구간에서 잡음 구간으로 천이되는 경우에, 음성/잡음 구간 판별부(21)는 이후 몇 개의 프레임들에 음성 신호가 존재함에도 불구하고 해당 프레임들을 잡음 구간으로 판별할 수 있다.When the input signal IN transitions from the noise section to the voice section, the voice / noise section discriminating unit 21 may determine the corresponding frames as the noise section despite the presence of the voice signal in a few previous frames. . In addition, when the input signal IN transitions from the speech section to the noise section, the speech / noise section discriminating unit 21 may determine the frames as the noise section despite the presence of the speech signal in several frames thereafter. Can be.

도 4는 도 2의 잡음 제거 시스템에 입력되는 잡음 신호, 음성 신호 및 입력 신호 각각의 에너지 레벨의 일 예를 나타내는 그래프이다.4 is a graph illustrating an example of energy levels of a noise signal, a voice signal, and an input signal input to the noise removing system of FIG. 2.

도 4를 참조하면, X축은 시간을 나타내고, Y축은 에너지 레벨을 나타낸다. 잡음 신호는 시간에 관계없이 에너지 레벨의 변화량이 크지 않는 일정한 수준의 에너지 레벨을 가지는 반면, 음성 신호는 시간에 따라 에너지 레벨의 변화량이 크며, 이에 따라 잡음 신호와 음성 신호의 합에 해당하는 입력 신호도 시간에 따라 에너지 레벨이 크게 변동된다.Referring to FIG. 4, the X axis represents time and the Y axis represents energy level. While the noise signal has a constant energy level in which the amount of change in energy level is not large regardless of time, the voice signal has a large amount of change in energy level over time, and thus an input signal corresponding to the sum of the noise signal and the voice signal. The energy level varies greatly with time.

제1 구간(41)에서 음성 신호의 에너지 레벨은 잡음 신호의 에너지 레벨보다 작지만, 입력 신호의 에너지 레벨은 잡음 신호의 에너지 레벨보다 크기 때문에, 음성/잡음 구간 판별부(21)는 제1 구간(41)을 음성 구간으로 판별할 수 있다. 한편, 제2 내지 제4 구간(42, 43, 44)에서 음성 신호의 에너지 레벨은 잡음 신호의 에너지 레벨보다 훨씬 작으므로, 입력 신호의 에너지 레벨은 잡음 신호의 에너지 레벨과 거의 유사하게 된다. 따라서, 음성/잡음 구간 판별부(21)는 제2 내지 제4 구간(42, 43, 44)에 음성 신호가 존재함에도 불구하고, 제2 내지 제4 구간(42, 43, 44)을 잡음 구간으로 판별할 수 있다.Since the energy level of the voice signal is smaller than the energy level of the noise signal in the first section 41, the energy level of the input signal is greater than the energy level of the noise signal. 41 may be determined as a voice interval. On the other hand, since the energy level of the voice signal is much smaller than the energy level of the noise signal in the second to fourth sections 42, 43, and 44, the energy level of the input signal is almost similar to that of the noise signal. Accordingly, the voice / noise section discriminating unit 21 detects the second to fourth sections 42, 43, and 44 in the noise section despite the presence of the voice signal in the second to fourth sections 42, 43, and 44. Can be determined by

특히, 입력 신호(IN)가 잡음 구간에서 음성 구간으로 천이하는 경우 또는 음성 구간에서 잡음 구간으로 천이하는 경우에는 음성 신호의 에너지 레벨이 입력 신호의 에너지 레벨보다 작을 수 있다. 이 경우, 음성/잡음 구간 판별부(21)는 입력 신호에 음성 신호가 존재함에도 불구하고 잡음 구간으로 판별할 수 있다. 이에 따라, 잡음 추정 및 갱신부(23)는 음성 신호가 존재하는 잡음 프레임에서 잡음을 추정하게 되므로, 추정 잡음에 음성 신호가 포함되게 된다. 이어서, 잡음 제거부(26)는 입력 신호에서 추정 잡음을 제거하여 음성 신호를 출력하는데, 이때, 입력 신호에서 음성 신호와 유사한 성분이 제거될 수 있으므로, 음성 신호가 손상되어 열화된 음성 신호가 출력될 수 있다.In particular, when the input signal IN transitions from the noise section to the voice section or when the voice section transitions to the noise section, the energy level of the voice signal may be smaller than the energy level of the input signal. In this case, the voice / noise section determination unit 21 may determine the noise section even though a voice signal exists in the input signal. Accordingly, since the noise estimating and updating unit 23 estimates the noise in the noise frame in which the speech signal exists, the speech signal is included in the estimated noise. Subsequently, the noise removing unit 26 outputs the speech signal by removing the estimated noise from the input signal. At this time, since a component similar to the speech signal may be removed from the input signal, the speech signal is damaged and the degraded speech signal is output. Can be.

다시 도 2를 참조하면, 천이 판단부(22)는 입력 신호(IN)에 포함된 복수의 프레임들 중 잡음 구간으로 판별된 적어도 하나의 잡음 프레임 또는 음성 구간으로 판별된 적어도 하나의 음성 프레임에서 천이가 발생됐는지 여부를 판단한다. 구체적으로, 천이 판단부(22)는 현재 프레임이 잡음 프레임인 경우 이전 프레임이 잡음 프레임인지 또는 음성 프레임인지 판단하여, 이전 프레임이 잡음 프레임인 경우 현재 프레임에서 천이가 발생되지 않은 것으로 판단하고, 이전 프레임이 음성 프레임인 경우 현재 프레임에서 천이가 발생된 것으로 판단한다. 한편, 천이 판단부(22)는 현재 프레임이 음성 프레임인 경우 이전 프레임이 음성 프레임인지 또는 잡음 프레임인지 판단하여, 이전 프레임이 음성 프레임인 경우 현재 프레임에서 천이가 발생되지 않은 것으로 판단하고, 이전 프레임이 잡음 프레임인 경우 현재 프레임에서 천이가 발생된 것으로 판단한다.Referring back to FIG. 2, the transition determiner 22 transitions from at least one noise frame determined as a noise section or a speech frame determined as a noise section among a plurality of frames included in the input signal IN. Judge whether or not has occurred. Specifically, the transition determining unit 22 determines whether the previous frame is a noise frame or a voice frame when the current frame is a noise frame, and determines that no transition has occurred in the current frame when the previous frame is a noise frame. If the frame is an audio frame, it is determined that a transition has occurred in the current frame. Meanwhile, the transition determining unit 22 determines whether the previous frame is a voice frame or a noise frame when the current frame is a voice frame, and determines that no transition has occurred in the current frame when the previous frame is a voice frame. In the case of this noise frame, it is determined that a transition has occurred in the current frame.

천이 판단부(22)은 현재 프레임이 잡음 프레임이거나, 잡음 프레임에서 음성 프레임으로 천이가 발생된 음성 천이 프레임인 경우 천이 판단부(22)의 출력을 잡음 추정 및 갱신부(23)에 제공한다. 또한, 천이 판단부(22)는 현재 프레임이 음성 프레임인 경우 천이 판단부(22)의 출력을 잡음 제거부(26)에 제공한다. The transition determining unit 22 provides an output of the transition determining unit 22 to the noise estimation and updating unit 23 when the current frame is a noise frame or an audio transition frame in which a transition is made from a noise frame to a speech frame. In addition, when the current frame is an audio frame, the transition determiner 22 provides an output of the transition determiner 22 to the noise canceller 26.

한편, 천이 판단부(22)은 현재 프레임이 음성 프레임에서 잡음 프레임으로 천이가 발생된 잡음 천이 프레임인 경우 해당 잡음 천이 프레임을 잡음 추정 및 갱신부(20)에 제공하지 않고, 다음 프레임에서 천이가 발생됐는지 여부를 판단한다. 이때, 천이 판단부(22)는 잡음 천이 프레임 이후의 몇 개의 프레임에 대해서는 잡음 추정이 이루어지지 않도록, 잡음 추정 및 갱신부(23)에 출력을 제공하지 않거나, 잡음 추정 및 갱신부(23)의 동작이 활성화되지 않도록 제어할 수 있다. 이와 같이, 잡음 천이 프레임의 이후 몇 개의 프레임에 대하여 잡음 추정을 수행하지 않음으로써, 잡음 프레임에 음성 신호가 포함되더라도 해당 잡음 프레임에 포함된 잡음을 후속 잡음 제거 동작에서 제외시킬 수 있다.Meanwhile, when the current frame is a noise transition frame in which a transition from a voice frame to a noise frame occurs, the transition determination unit 22 does not provide the noise transition frame to the noise estimation and update unit 20, and transitions in the next frame. Determine if it has occurred. In this case, the transition determining unit 22 does not provide an output to the noise estimating and updating unit 23 so that the noise estimation is not performed for several frames after the noise transition frame, or the noise estimating and updating unit 23 does not provide an output. The operation can be controlled so that it is not activated. As such, by not performing noise estimation on the next few frames of the noise transition frame, even if the voice signal is included in the noise frame, the noise included in the noise frame can be excluded from the subsequent noise removal operation.

도 5는 도 2의 음성/잡음 구간 판별부의 출력으로써, 입력 신호에 포함된 프레임들이 잡음 구간에서 음성 구간으로 천이되는 일 예를 나타낸다.FIG. 5 illustrates an example in which frames included in an input signal transition from a noise section to a voice section as an output of the voice / noise section discriminating unit of FIG. 2.

도 5를 참조하면, 음성/잡음 구간 판별부(21)는 입력 신호(IN)에 포함된 복수의 프레임들을 분석하여, (n-5)번째 프레임 내지 (n-1)번째 프레임을 잡음 구간(N)으로, 그리고, n번째 프레임을 음성 구간(S)으로 판별할 수 있다. 이때, 천이 판단부(22)는 n번째 프레임에서 잡음 구간(N)에서 음성 구간(S)으로의 천이가 발생된 것으로 판단한다.Referring to FIG. 5, the voice / noise section determination unit 21 analyzes a plurality of frames included in the input signal IN, and selects (n-5) th to (n-1) th frames as the noise section ( N) and the nth frame can be determined as the voice section (S). In this case, the transition determining unit 22 determines that a transition from the noise section N to the voice section S occurs in the nth frame.

도 6은 도 2의 음성/잡음 구간 판별부의 출력으로써, 입력 신호에 포함된 프레임들이 음성 구간에서 잡음 구간으로 천이되는 일 예를 나타낸다.FIG. 6 illustrates an example in which frames included in an input signal are transitioned from a speech section to a noise section as an output of the speech / noise section discriminator of FIG. 2.

도 6을 참조하면, 음성/잡음 구간 판별부(21)는 입력 신호(IN)에 포함된 복수의 프레임들을 분석하여, n번째 프레임을 음성 구간(S)으로, 그리고, (n+1)번째 프레임 내지 (n+5)번째 프레임을 잡음 구간(N)으로 판별할 수 있다. 이때, 천이 판단부(22)는 (n+1)번째 프레임에서 음성 구간(S)에서 잡음 구간(N)으로의 천이가 발생된 것으로 판단한다.Referring to FIG. 6, the voice / noise section determination unit 21 analyzes a plurality of frames included in the input signal IN to convert the nth frame into the voice section S and the (n + 1) th The frame to the (n + 5) th frame may be determined as the noise period N. At this time, the transition determining unit 22 determines that a transition from the voice section S to the noise section N occurs in the (n + 1) th frame.

다시 도 2를 참조하면, 잡음 추정 및 갱신부(23)는 잡음 프레임이 입력되는 경우 잡음 프레임에서 잡음을 추정하고, 추정된 잡음을 추정 잡음 저장부(24)에 제공한다. 또한, 잡음 추정 및 갱신부(23)는 음성 천이 프레임이 입력되는 경우 음성 천이 프레임에 대해 소정 개수 이전의 잡음 프레임에서 추정된 잡음을 추정 잡음 저장부(24)로부터 로딩하고, 로딩된 잡음을 음성 천이 프레임에 대한 추정 잡음으로 갱신한다. 여기서, 음성 천이 프레임에 대한 추정 잡음은, 음성 천이 프레임에서 잡음을 제거하는데 이용되는 잡음 성분을 나타낸다. 구체적으로, 잡음 추정 및 갱신부(23)는 현재 프레임이 n번째 프레임인 경우에 (n-a)번째 프레임에서 추정된 잡음을 추정 잡음 저장부(24)로부터 로딩하고, 로딩된 잡음을 n번째 프레임에 대한 추정 잡음으로 갱신한다. Referring back to FIG. 2, when the noise frame is input, the noise estimating and updating unit 23 estimates the noise in the noise frame, and provides the estimated noise to the estimated noise storage unit 24. In addition, when the voice transition frame is input, the noise estimating and updating unit 23 loads the noise estimated from the noise frame before the predetermined number with respect to the voice transition frame from the estimated noise storage unit 24, and loads the loaded noise into the voice. Update to estimated noise for the transition frame. Here, the estimated noise for the speech transition frame represents a noise component used to remove noise in the speech transition frame. Specifically, when the current frame is the nth frame, the noise estimating and updating unit 23 loads the noise estimated in the (na) th frame from the estimated noise storage unit 24, and loads the loaded noise into the nth frame. Update to estimated noise for

추정 잡음 저장부(24)는 잡음 추정 및 갱신부(23)에서 추정된 잡음을 저장한다. 이때, 추정 잡음 저장부(24)는 FIFO(first in first out) 버퍼로 구현될 수 있다. 구체적으로, 추정 잡음 저장부(24)는 (n-1) 프레임 내지 (n-a) 프레임에서 추정된 잡음 성분들을 저장한다. 이때, a는 지연된 프레임 인덱스를 나타내고, a의 값은 각 프레임의 크기와 잡음 제거 시스템(20)의 지연 기간을 고려하여 결정된다.The estimated noise storage unit 24 stores the noise estimated by the noise estimation and update unit 23. In this case, the estimated noise storage unit 24 may be implemented as a first in first out (FIFO) buffer. Specifically, the estimated noise storage 24 stores the noise components estimated in the frame (n-1) to the frame (n-a). In this case, a represents a delayed frame index, and the value of a is determined in consideration of the size of each frame and the delay period of the noise reduction system 20.

도 7a 및 7b는 a 값이 4인 경우 도 2의 추정 잡음 저장부의 구성을 개략적으로 나타낸다.7A and 7B schematically illustrate the configuration of the estimated noise storage unit of FIG. 2 when a value is 4.

도 7a 및 7b를 참조하면, a 값이 4이므로, 추정 잡음 저장부(24)는 네 개의 프레임에서 추정된 잡음을 저장할 수 있다. (n-2)번째 프레임에서 추정된 잡음이 입력되는 경우에, 추정 잡음 저장부(24)에는 (n-5)번째 프레임 내지 (n-2)번째 프레임에서 추정된 잡음들이 저장될 수 있다. 이어서, (n-1)번째 프레임에서 추정된 잡음이 입력되는 경우에, 추정 잡음 저장부(24)에는 (n-4)번째 프레임 내지 (n-1)번째 프레임에서 추정된 잡음들이 저장될 수 있다.7A and 7B, since the a value is 4, the estimated noise storage 24 may store the estimated noise in four frames. When the noise estimated in the (n-2) th frame is input, the estimated noise storage unit 24 may store the noise estimated in the (n-5) th frame to the (n-2) th frame. Subsequently, when the noise estimated in the (n-1) th frame is input, the estimated noise storage unit 24 may store the noise estimated in the (n-4) th frame to the (n-1) th frame. have.

이와 같이, 추정 잡음 저장부(24)는 FIFO 방식에 따라, 잡음 추정 및 갱신부(23)로부터 제공된 추정 잡음을 저장할 수 있다. 구체적으로, (n-1)번째 프레임에서 추정된 잡음이 추정 잡음 저장부(24)에 입력되는 경우에, (n-5)번째 프레임에서 추정된 잡음이 저장되었던 공간에 (n-4)번째 프레임에서 추정된 잡음이 저장되고, (n-4)번째 프레임에서 추정된 잡음이 저장되었던 공간에 (n-3)번째 프레임에서 추정된 잡음이 저장되며, (n-3)번째 프레임에서 추정된 잡음이 저장되었던 공간에 (n-2)번째 프레임에서 추정된 잡음이 저장되고, (n-2)번째 프레임에서 추정된 잡음이 저장되었던 공간에 (n-1)번째 프레임에서 추정된 잡음이 저장된다.As such, the estimated noise storage unit 24 may store the estimated noise provided from the noise estimation and update unit 23 according to the FIFO method. Specifically, when the noise estimated in the (n-1) th frame is input to the estimated noise storage 24, the (n-4) th in the space where the estimated noise in the (n-5) th frame was stored. The noise estimated in the frame is stored, the noise estimated in the (n-3) th frame is stored in the space where the noise estimated in the (n-4) th frame was stored, and estimated in the (n-3) th frame. The noise estimated in frame (n-2) is stored in the space where the noise was stored, and the noise estimated in frame (n-1) is stored in the space in which the noise estimated in the (n-2) frame was stored. do.

다시 도 2를 참조하면, n번째 프레임이 음성 천이 프레임이고 a 값이 4이면, 잡음 추정 및 갱신부(23)는 추정 잡음 저장부(24)에 저장된 (n-4)번째 프레임에서 추정된 잡음을 로딩하여, 로딩된 (n-4)번째 프레임에서 추정된 잡음을 n번째 프레임에 대한 잡음 제거 동작에 필요한 추정 잡음으로 갱신한다. 이로써, 잡음 구간에서 음성 구간으로의 천이가 발생된 음성 천이 프레임의 이전 프레임인 (n-1)번째 프레임, (n-2)번째 프레임 및 (n-3)번째 프레임에서 추정된 잡음은 후속 잡음 제거 동작에서 제외된다. 따라서, 잡음 구간에서 음성 구간으로의 천이가 발생된 경우 이전 몇 개의 프레임에 음성 신호가 포함되었음에도 불구하고 잡음 프레임으로 판별되더라도, 이러한 잡음 프레임에서 추정된 잡음은 후속 잡음 제거 동작에서 이용되지 않으므로, 최종적으로 출력되는 음성 신호의 왜곡을 방지할 수 있다.Referring back to FIG. 2, if the nth frame is a voice transition frame and a value is 4, the noise estimating and updating unit 23 estimates the noise estimated at the (n-4) th frame stored in the estimated noise storage unit 24. And load the estimated noise in the loaded (n-4) th frame to the estimated noise required for the noise canceling operation for the nth frame. Thus, the noise estimated in the (n-1) th frame, the (n-2) th frame, and the (n-3) th frame, that is, the previous frame of the voice transition frame in which the transition from the noise interval to the speech interval occurs, is performed by the subsequent noise. Excluded from the remove operation. Therefore, even when the transition from the noise section to the speech section occurs, even if it is determined that the noise frame is included in the previous few frames, the noise estimated in the noise frame is not used in the subsequent noise canceling operation. It is possible to prevent distortion of the voice signal output to the.

변환부(25)는 입력 신호(IN)를 시간 도메인에서 주파수 도메인으로 변환한다. 이때, 변환부(25)는 복수의 필터뱅크들(filterbanks)을 포함하도록 구현될 수 있다. 구체적으로, 변환부(25)는 입력 신호(IN)에 윈도우(window)를 적용하고, 윈도우를 적용한 신호에 대하여 MDCT(modified discrete cosine transform)를 수행하여 시간 도메인의 입력 신호(IN)를 주파수 도메인의 스펙트럼 데이터로 변환한다. The converter 25 converts the input signal IN from the time domain to the frequency domain. In this case, the converter 25 may be implemented to include a plurality of filterbanks. In detail, the converter 25 applies a window to the input signal IN, performs a modified discrete cosine transform (MDCT) on the signal to which the window is applied, and converts the input signal IN of the time domain into the frequency domain. To spectral data.

잡음 제거부(26)는 음성 천이 프레임 또는 음성 프레임에 해당하는 변환된 입력 신호(IN)에서 추정 잡음을 제거함으로써, 음성 신호를 출력한다. 구체적으로, 잡음 제거부(26)는 주파수 도메인으로 변환된 입력 신호(IN)에서 추정 잡음에 해당하는 스펙트럼을 차감함으로써, 음성 신호를 출력할 수 있다. The noise canceller 26 outputs a speech signal by removing estimated noise from the speech transition frame or the converted input signal IN corresponding to the speech frame. In detail, the noise removing unit 26 may output a voice signal by subtracting a spectrum corresponding to the estimated noise from the input signal IN converted into the frequency domain.

도 8은 본 발명의 일 실시예에 따른 잡음 제거 방법을 나타내는 흐름도이다.8 is a flowchart illustrating a noise removing method according to an embodiment of the present invention.

도 8을 참조하면, 본 실시예에 따른 잡음 제거 방법은 도 2에 도시된 잡음 제거 시스템에서 시계열적으로 처리되는 단계들로 구성된다. 따라서, 이하 생략된 내용이라 하더라도 도 2에 도시된 잡음 제거 시스템에 관하여 이상에서 기술된 내용은 본 실시예에 따른 잡음 제거 방법에도 적용된다.Referring to FIG. 8, the noise reduction method according to the present embodiment includes steps that are processed in time series in the noise reduction system shown in FIG. 2. Therefore, even if omitted below, the above description of the noise canceling system illustrated in FIG. 2 is also applied to the noise canceling method according to the present embodiment.

810 단계에서, 음성/잡음 구간 판별부(21)는 잡음 신호와 음성 신호를 가지는 입력 신호를 수신한다.In operation 810, the voice / noise section determination unit 21 receives an input signal having a noise signal and a voice signal.

815 단계에서, 음성/잡음 구간 판별부(21)는 입력 신호에 포함된 복수의 프레임들을 잡음 구간 또는 음성 구간으로 판별한다. 여기서, 잡음 구간은 음성 신호가 포함되지 않고 잡음 신호만 포함된 구간이고, 음성 구간은 잡음 신호 및 음성 신호가 포함된 구간이다.In operation 815, the voice / noise section determination unit 21 determines a plurality of frames included in the input signal as a noise section or a voice section. Here, the noise section is a section including only the noise signal, not the voice signal, the speech section is a section including the noise signal and the voice signal.

820 단계에서, 천이 판단부(22)는 현재 프레임, 즉, n번째 프레임(F_n)이 음성 구간인지 여부를 판단한다. 판단 결과, n번째 프레임(F_n)이 음성 구간이면 845 단계를 수행하고, 음성 구간이 아니면 825 단계를 수행한다.In operation 820, the transition determiner 22 determines whether the current frame, that is, the nth frame F _n , is a voice interval. As a result of the determination, if the n th frame F _n is the voice interval, step 845 is performed, and if it is not the voice interval, step 825 is performed.

825 단계에서, 천이 판단부(22)는 이전 프레임, 즉, (n-1)번째 프레임(F_n _-1)이 음성 구간인지 여부를 판단한다. 판단 결과, (n-1)번째 프레임(F_n _-1)이 음성 구간이면 음성 구간에서 잡음 구간으로의 천이가 발생된 것으로 보고 840 단계를 수행하고, 음성 구간이 아니면 천이가 발생되지 않은 것으로 보고 830 단계를 수행한다.In operation 825, the transition determiner 22 determines whether the previous frame, that is, the (n−1) th frame F _n ₋₁ is a voice interval. As a result of the determination, when the (n-1) th frame (F _n _-1 ) is the voice interval, the transition from the voice interval to the noise interval is reported. In step 840, if it is not the voice interval, the transition is not reported. Follow step 830.

830 단계에서, 잡음 추정 및 갱신부(23)는 현재 프레임, 즉, n번째 프레임에서 잡음을 추정한다.In operation 830, the noise estimating and updating unit 23 estimates noise in the current frame, that is, the nth frame.

835 단계에서, 추정 잡음 저장부(24)는 n번째 프레임에서 추정된 잡음을 저장한다. 이때, 추정 잡음 저장부(24)의 용량의 각 프레임의 크기 또는 전체 지연을 기초로 하여 결정된다. 예를 들어, 추정 잡음 저장부(24)는 a의 프레임에서 추정된 잡음들을 저장할 수 있다.In operation 835, the estimated noise storage unit 24 stores the estimated noise in the nth frame. At this time, it is determined based on the size or the total delay of each frame of the capacity of the estimated noise storage 24. For example, the estimated noise storage unit 24 may store the noises estimated in the frame of a.

840 단계에서, 다음 프레임, 즉, (n+1)번째 프레임에 대하여 820 단계부터 다시 수행한다. 다시 말해, (n+1)번째 프레임이 음성 구간 또는 잡음 구간인지 판단하고, (n+1)번째 프레임에서 천이가 발생했는지 여부를 판단한다.In operation 840, the operation is performed again from the operation 820 on the next frame, that is, the (n + 1) th frame. In other words, it is determined whether the (n + 1) th frame is a voice section or a noise section, and whether or not a transition occurs in the (n + 1) th frame.

845 단계에서, 천이 판단부(22)는 이전 프레임, 즉, (n-1)번째 프레임(F_n-1)이 잡음 구간인지 여부를 판단한다. 판단 결과, (n-1)번째 프레임(F_n-1)이 잡음 구간이면 잡음 구간에서 음성 구간으로의 천이가 발생된 것으로 보고 850 단계를 수행하고, 잡음 구간이 아니면 천이가 발생되지 않은 것으로 보고 855 단계를 수행한다.In operation 845, the transition determiner 22 determines whether the previous frame, that is, the (n−1) th frame F _n−1 is a noise period. As a result of the determination, when the (n-1) th frame (F _n-1 ) is a noise section, the transition from the noise section to the speech section is reported. In step 850, if the transition is not the noise section, the transition is not reported. Perform step 855.

850 단계에서, 잡음 추정 및 갱신부(23)는 추정 잡음 저장부(24)에서 (n-a)번째 프레임(F_n _-a)에서 추정된 잡음을 로딩하고, 로딩된 잡음을 n번째 프레임(F_n)의 잡음 제거 동작에 이용되는 추정 잡음으로 갱신한다. 이로써, n번째 프레임(F_n)과 (n-a)번째 프레임(F_n _-a)의 사이에 존재하는 적어도 하나의 프레임에서 추정된 잡음은 이후의 잡음 제거 동작에서 이용되지 않는다.In operation 850, the noise estimating and updating unit 23 loads the estimated noise in the (na) th frame F _n _-a in the estimated noise storage unit 24, and loads the loaded noise into the nth frame (F _n). Update to the estimated noise used for the noise canceling operation. Thus, the estimated noise in at least one frame existing between the n th frame F _n and the (na) th frame F _n _-a is not used in subsequent noise cancellation operation.

855 단계에서, 잡음 제거부(26)는 갱신된 추정 잡음, 즉, (n-a)번째 프레임(F_n _-a)에서 추정된 잡음을 기초로 하여 n번째 프레임(F_n)에서 잡음을 제거한다.In operation 855, the noise removing unit 26 removes the noise in the n th frame F _n based on the updated estimated noise, that is, the noise estimated in the (na) th frame F _n _-a .

860 단계에서, 잡음 제거부(26)는 입력 신호에서 잡음이 제거된 음성 신호를 출력한다.In operation 860, the noise removing unit 26 outputs the speech signal from which the noise is removed from the input signal.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명이 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이와 균등하거나 또는 등가적인 변형 모두는 본 발명 사상의 범주에 속한다 할 것이다. As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications will fall within the scope of the invention.

또한, 본 발명에 따른 장치는 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.In addition, the apparatus according to the present invention can be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of the recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and also include a carrier wave (for example, transmission through the Internet). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

10: 음성/오디오 코덱
20: 잡음 제거 시스템
21: 음성/잡음 구간 판별부
22: 천이 판단부
23: 잡음 추정 및 갱신부
24: 추정 잡음 저장부
25: 변환부
26: 잡음 제거부10: voice / audio codec
20: noise reduction system
21: voice / noise section discriminating unit
22: transition judgment unit
23: noise estimation and update unit
24: estimated noise storage
25: converter
26: noise canceling unit

Claims

Determining a plurality of frames included in an input signal having a noise signal and a voice signal as a noise section or a voice section;
Estimating noise in at least one noise frame determined as the noise section among the plurality of frames;
For a voice transition frame transitioned from the noise period to the voice interval among the plurality of frames, the noise estimated in the noise frame before the predetermined number of transition frames is regarded as an estimated noise for the voice transition frame. Removing noise from the speech transition frame using estimated noise; And
And removing the noise using the estimated noise in at least one voice frame determined as the voice interval among the plurality of frames.

The method of claim 1,
And storing the noise estimated in the at least one noise frame in a buffer.

The method of claim 2,
Removing noise from the voice transition frame,
Loading noise estimated in a noise frame before the voice transition frame into a predetermined number of noise frames;
Updating the loaded noise with the estimated noise for the speech transition frame; And
And outputting the speech signal by removing the estimated noise from an input signal corresponding to the speech transition frame.

The method of claim 3,
The step of loading noise estimated from a noise frame before a predetermined number of the voice transition frame in the buffer may include loading noise estimated in a (na) th frame from the buffer when the voice transition frame is an nth frame. and,
The updating of the loaded noise with the estimated noise for the speech shifted frame may include updating the noise estimated in the (na) th frame with the estimated noise for the speech shifted frame, thereby generating the nth frame. Exclude noise estimated in at least one frame existing between and the (na) th frame in a subsequent noise cancellation step,
and n and a are natural numbers.

The method of claim 4, wherein
A is determined based on at least one of a size of each frame and an overall delay period.

The method of claim 4, wherein
And the capacity of the buffer is determined according to the a.

The method of claim 1,
Estimating noise in the at least one noise frame,
Estimating noise in the at least one noise frame when the previous frame of the at least one noise frame is the noise period,
And if the previous frame of the at least one noise frame is the voice interval, noise is not estimated in the at least one noise frame.

The method of claim 1,
And converting the input signal from time domain to frequency domain.

The method of claim 8,
Removing noise from the at least one voice frame,
And removing the estimated noise from the converted input signal corresponding to the at least one voice frame to output the voice signal.

delete

A voice / noise section discriminating unit for discriminating a plurality of frames included in an input signal having a noise signal and a voice signal as a noise section or a voice section;
A transition determining unit determining whether a transition has occurred in at least one noise frame determined as the noise section or at least one voice frame determined as the speech section among the plurality of frames;
Estimating noise in the at least one noise frame, and for a voice transition frame in which a transition is made from the noise section to the speech section, noise estimated from a noise frame before the voice transition frame is a predetermined number. A noise estimating and updating unit updating the estimated noise with respect to the noise; And
And a noise canceller configured to remove noise from the speech transition frame or the speech frame by using the estimated noise.

The method of claim 11,
And a noise estimator for storing noise estimated from the at least one noise frame.

The method of claim 12,
The noise estimation and update unit,
The noise canceling system, characterized in that for loading the speech transition frame from the estimated noise storage unit, the noise estimated in a previous number of noise frames, and updating the loaded noise with the estimated noise for the speech transition frame .

The method of claim 13,
The noise estimation and update unit,
If the speech shifted frame is an nth frame, the noise estimated in the (na) th frame is loaded from the estimated noise storage unit, and the noise estimated in the loaded (na) th frame is used for the speech shifted frame. By updating the estimated noise, the noise canceller is not provided to the noise canceller in at least one frame existing between the nth frame and the (na) th frame,
and n and a are natural numbers.

The method of claim 14,
And a is determined based on at least one of the size of each frame and the delay period of the noise cancellation system.

The method of claim 14,
The capacity of the estimated noise storage is determined according to the a.

The method of claim 11,
The noise estimation and update unit,
Estimating noise in the at least one noise frame when the previous frame of the at least one noise frame is the noise period, and noise in the at least one noise frame when the previous frame of the at least one noise frame is the voice interval Noise canceling system, characterized in that does not estimate.

The method of claim 11,
And a converter for converting the input signal from the time domain to the frequency domain.

The method of claim 18,
The noise removing unit,
And outputting the speech signal by removing the estimated noise from the speech transition frame or the converted input signal corresponding to the speech frame.