KR101121505B1

KR101121505B1 - Method for extracting non-vocal signal from stereo sound contents

Info

Publication number: KR101121505B1
Application number: KR1020100050828A
Authority: KR
Inventors: 김현태; 김태훈; 박종인
Original assignee: 동의대학교 산학협력단
Priority date: 2010-05-31
Filing date: 2010-05-31
Publication date: 2012-03-06
Also published as: KR20110131403A

Abstract

본 발명은 스테레오 음원으로부터 가수의 음성을 제거하여 스테레오 반주음악만의 출력이 가능하도록 하는 스테레오 음원으로부터의 비보컬 신호 추출 방법에 관한 것으로, 좌,우 양 채널에 각각 실린 음원, 및 상기 좌,우 양 채널 각각에 실린 음원 신호의 차신호에 대하여 각각 주파수 영역으로 변환시킨 후, 상기 좌,우 양 채널상의 주파수 성분과 상기 차신호의 주파수 성분을 비교하여 상기 좌,우 양 채널상의 주파수 성분중 보컬 성분이 내재된 주파수 성분을 제거함으로써 비보컬 신호를 복원하는 방법을 제안한다. The present invention relates to a method of extracting a non-vocal signal from a stereo sound source by removing a singer's voice from a stereo sound source so that only stereo accompaniment music can be output. After converting the difference signal of the sound source signal on each channel into the frequency domain, the frequency components on the left and right channels are compared with the frequency components of the difference signal, and the vocals of the frequency components on the left and right channels are compared. We propose a method of restoring a non-vocal signal by removing a frequency component in which a component is embedded.

Description

Method for extracting non-vocal signal from stereo sound contents}

본 발명은 반주음악과 가수의 음성으로 이루어진 음원으로부터 가수의 음성을 제거하여 반주음악을 추출하는 방법에 관한 것으로, 보다 바람직하게는 스테레오 음원으로부터 가수의 음성을 제거하여 스테레오 반주음악만의 출력이 가능하도록 하는 스테레오 음원으로부터의 비보컬 신호 추출 방법에 관한 것이다. The present invention relates to a method for extracting accompaniment music by removing a singer's voice from a sound source consisting of accompaniment music and a singer's voice, and more preferably, only stereo accompaniment music can be output by removing the singer's voice from a stereo sound source. The present invention relates to a method of extracting a non-vocal signal from a stereo sound source.

일반적으로, 카라오케 기기, 노래 반주기, MP3 플레이어, 기타 오디오 기기의 스피커로 출력되는 스테레오 음원(AR :ALL Recorded)은 가수의 노래 성분인 보컬 성분(Vocal recorded)과 적어도 하나 이상의 악기 등을 이용한 반주 음악인 비보컬 성분(MR : Music recorded)을 포함한다.In general, a stereo sound source (AR: ALL Recorded) output to a speaker of a karaoke device, a song half cycle, an MP3 player, or other audio device is an accompaniment music using a vocal component (Vocal recorded) and at least one or more instruments. Non-vocal component (MR: Music recorded).

이러한 스테레오 음원은 모노 음원과 달리 좌우측 채널에 각각에 서로 다른 악기음을 레코딩한 후 양 채널에 음성 신호를 공통으로 실어보냄으로써 음원 청취자는 입체감을 느끼며 음악을 감상할 수 있다.Unlike a mono sound source, such a stereo sound source records different musical instruments in each of the left and right channels, and then sends a voice signal to both channels so that sound source listeners can enjoy the music while feeling a three-dimensional effect.

음반 제작사는 이러한 음원 제작을 위하여 스튜디오에서 작업을 수행하고 그 최종 결과물인 음원을 CD 등과 같은 오프라인 매체를 통하여 또는 파일 형태로 가공하여 오프라인 유통망 또는 온라인 유통망를 통하여 음원을 판매한다. The record producer works in the studio to produce such a sound source, and processes the final result sound source through an offline medium such as a CD or a file form and sells the sound source through an offline distribution network or an online distribution network.

수요자는 음반 제작사에서 온,오프라인 유통망을 통하여 판매하는 이러한 음원을 다양한 방법으로 구매하여 자신이 소지한 음향 기기를 통하여 감상하게 된다.The consumer purchases these sound sources, which are sold through the online and offline distribution networks, by the record producer in various ways and enjoys them through their own audio equipment.

그런데,최근 들어 소비자들 사이에서 최초 음원으로부터 반주 음악을 제거하여 직접 가수의 육성만을 청취하고자 하거나 가수의 음성을 제거하고 반주 음악만을 청취하고자 하는 욕구들이 분출되고 있고 음악 관련 업계에서는 이에 대한 다양한 연구를 진행하고 있다.Recently, there has been a growing desire among consumers to remove only accompaniment music from the original sound source and listen to the singer's upbringing, or to remove the singer's voice and listen to only the accompaniment music. I'm going.

본 발명은 이러한 연구 활동의 한 성과물로 제안되는 것으로, 스테레오 음원중에서 가수의 음성인 보컬 신호를 제거하여 반주 음악이 비보컬 신호만을 추출하는 새로운 방법을 제안하고자 한다. The present invention is proposed as a result of this research activity, and proposes a new method in which accompaniment music extracts only a non-vocal signal by removing a singer's voice from a stereo sound source.

본 발명은 스테레오 음원으로부터의 비보컬 신호 추출 방법에 관한 것으로, 좌,우 양 채널에 각각 실린 음원, 및 상기 좌,우 양 채널 각각에 실린 음원 신호의 차신호에 대하여 각각 주파수 영역으로 변환시킨 후, 상기 좌,우 양 채널상의 주파수 성분과 상기 차신호의 주파수 성분을 비교하여 상기 좌,우 양 채널상의 주파수 성분중 보컬 성분이 내재된 주파수 성분을 제거함으로써 비보컬 신호를 복원하는 방법을 제안한다. The present invention relates to a method of extracting a non-vocal signal from a stereo sound source, and to converting a sound source loaded on both left and right channels, and a difference signal of a sound source signal loaded on each of the left and right channels, respectively, into a frequency domain. The present invention proposes a method for restoring a non-vocal signal by comparing a frequency component on both left and right channels with a frequency component of the difference signal and removing a frequency component inherent in a vocal component among the frequency components on the left and right channels. .

본 발명에서, 전술한 좌, 우 양 채널상의 음원 신호 및 상기 차신호에 대한 주파수 영역 변환은 FFT(Fast Fourier Transform), DCT(Discrete Cosine Transform) 등과 같은 일반적인 주파수 변환 처리에 의하여 이루어진다. In the present invention, the above-described sound source signals on the left and right channels and the frequency domain transformation of the difference signal are performed by a general frequency conversion process such as a fast fourier transform (FFT), a discrete cosine transform (DCT), or the like.

본 발명의 좌,우 양 채널상의 주파수 성분과 상기 차신호의 주파수 성분비교는 상기 좌,우 채널상의 각 주파수 성분이 상기 차신호 주파수 성분의 소정 배수 이상인 경우인지 여부를 판별하는 방식으로, 상기 좌,우 채널상의 각 주파수 성분이 상기 차신호 주파수 성분의 소정 배수보다 큰 경우에는 보컬 신호가 실렸다고 판단하여 해당 주파수 성분을 제거하고 그렇지 않은 경우에는 반주 음악인 비보컬 신호라고 판단하여 잔존시킴으로써 비보컬 신호를 추출한다.The frequency component comparison between the left and right channels of the present invention and the frequency component of the difference signal determine whether each frequency component on the left and right channels is equal to or greater than a predetermined multiple of the difference signal frequency component. If each frequency component on the right channel is larger than a predetermined multiple of the difference signal frequency component, it is determined that the vocal signal is carried out, and the corresponding frequency component is removed. Otherwise, it is judged to be a non-vocal signal, which is accompaniment music. Extract

본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법의 바람직한 실시예는 (a)보컬 신호와 제 1 비보컬 신호가 혼합된 제 1 음원을 제 1 채널을 통하여 수신하고 상기 보컬 신호와 제 2 비보컬 신호가 혼합된 제 2 음원을 제 2채널을 통하여 수신하는 단계; (b)상기 제 1 및 제 2 음원의 차신호 생성하여 상기 보컬 신호를 제거하는 단계; (c)상기 제 1, 제 2 음원 및 차신호의 각각의 주파수 성분을 구하는 단계; (d)상기 제 1 및 제 2 음원의 각 주파수 성분의 에너지와 상기 차신호의 주파수 성분 에너지의 α배를 각각 비교하는 단계(α＞1 인 유리수); (e)상기 제 1 및 제 2 음원의 각 주파수 성분의 에너지가, 상기 차신호의 주파수 성분 에너지의 α배보다 큰 경우에는 그에 해당하는 제 1및 제 2 음원의 주파수 성분에 보컬 신호가 존재하는 것으로 판단하고, 상기 차신호의 주파수 성분 에너지의 α배보다 작은 경우에는 제 1 및 제 2 보컬 신호가 비존재하는 것으로 판단하는 단계를 구비한다.A preferred embodiment of the method for extracting a non-vocal signal from a stereo sound source according to the present invention is (a) receiving a first sound source mixed with a vocal signal and a first non-vocal signal through a first channel, and receiving the vocal signal and the second ratio. Receiving a second sound source mixed with a vocal signal through a second channel; (b) removing the vocal signal by generating a difference signal between the first and second sound sources; (c) obtaining respective frequency components of the first and second sound sources and difference signals; (d) comparing α times the energy of each frequency component of the first and second sound sources with the energy of the frequency component of the difference signal (a rational number of α> 1); (e) When the energy of each frequency component of the first and second sound sources is greater than α times the frequency component energy of the difference signal, the vocal signal is present in the frequency components of the corresponding first and second sound sources. And determining that the first and second vocal signals do not exist when they are smaller than α times the frequency component energy of the difference signal.

본 발명에 따른 실시예는, 상기 (e)단계에서, 상기 제 1및 제 2 음원의 주파수 성분에 보컬 신호가 존재하는 것으로 판단된 경우에는 상기 제 1 및 제 2 음원의 해당 주파수 성분을 제거하고, 상기 상기 제 1및 제 2 음원의 주파수 성분에 보컬 신호가 비존재하는 것으로 판단된 경우에 해당 주파수 성분을 보존하는 단계를 더 구비한다.According to an embodiment of the present invention, when it is determined in step (e) that vocal signals exist in the frequency components of the first and second sound sources, the corresponding frequency components of the first and second sound sources are removed. And preserving the frequency component when it is determined that the vocal signal does not exist in the frequency components of the first and second sound sources.

본 발명에 따른 실시예는 상기 제 1 및 제 2 음원의 주파수 성분을 주파수 역변환 처리하여 시간 영역으로 전환하는 단계를 더 구비한다.An embodiment according to the present invention further includes converting the frequency components of the first and second sound sources into a time domain by performing frequency inverse transform processing.

본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법을 스테레오 음원에 적용하는 경우 가수의 육성이 제거된 상태에서 음반 제작사가 최초에 녹음한 반주 음악만을 감상할 수 있다.When the method of extracting the non-vocal signal from the stereo sound source according to the present invention is applied to the stereo sound source, only the accompaniment music recorded by the record producer in the state in which the singer's training is removed can be enjoyed.

또한 본 발명에서 제안하는 방법을 노래방 기기 등에 적용하는 경우, 노래 반주기 제작업체로서는 반주 음악의 제작에 따른 고비용을 절감할 수 있다는 이점이 있고, 노래방을 이용하는 일반 소비자의 경우에는 음반 제작사가 최초에 녹음한 원음에 근접한 반주 음악을 배경으로 실감나게 노래를 부를 수 있다.In addition, when the method proposed in the present invention is applied to a karaoke apparatus or the like, a song half-cycle producer has an advantage of reducing the high cost of the production of accompaniment music. You can sing realistically against the backdrop of accompaniment music.

본 발명의 기술적 사상은 좌우 채널의 원음 신호와 각 채널간의 차신호의 주파수 성분 비교를 통하여 보컬 성분을 제거함에 있으므로, 스테레오 음원뿐만 아니라 모노 음원 또는 3채널 이상의 원음 신호에서 본 발명의 기술적 사상을 응용하는 경우에도 본 발명의 기술적 범위가 내포되며, 본 발명에서 제안하는 비보컬 신호의 처리 방법을 당업자 수준에서 단순히 부가 변경 삭제하여 제안하는 방법 역시 본 발명의 기술적 범주에 포함된다고 보아야 한다. The technical idea of the present invention is to remove the vocal components by comparing the frequency components of the original sound signals of the left and right channels and the difference signals between the respective channels. Even if the technical scope of the present invention is included, it should be considered that the method of processing the non-vocal signal proposed in the present invention by simply adding and deleting the proposed method is also included in the technical scope of the present invention.

도 1은 본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법을 설명하는 흐름도이다.
도 2a 내지 도 2f는 본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법 중 주파수 영역에서의 처리 과정을 설명하는 도면이다. 1 is a flowchart illustrating a method of extracting a non-vocal signal from a stereo sound source according to the present invention.
2A to 2F are diagrams illustrating a process in a frequency domain of a method for extracting a non-vocal signal from a stereo sound source according to the present invention.

이하, 도면을 참조하여 본 발명의 기술적 사상에 대하여 구체적으로 설명하기로 한다. Hereinafter, with reference to the drawings will be described in detail the technical spirit of the present invention.

도 1은 본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법을 설명하는 프로세스 도면이다.1 is a process diagram illustrating a method of extracting a non-vocal signal from a stereo sound source according to the present invention.

도 1의 단계(S110)에 도시된 바와같이, 좌채널(110)과 우채널(120)은 음반 제작사에서 제작한 스테레오 음원이 전송되는 채널이다. 좌채널(110)은 반주음악(MR_L)과 가수의 음성(Vocal)을 전송하는 채널이고, 우채널(120)은 반주음악(MR_R)과 가수의 음성(Vocal)을 전송하는 채널이다. As shown in step S110 of FIG. 1, the left channel 110 and the right channel 120 are channels through which a stereo sound source produced by a record producer is transmitted. The left channel 110 is a channel for transmitting accompaniment music (MR _L ) and the singer's voice (Vocal), and the right channel 120 is a channel for transmitting accompaniment music (MR _R ) and the singer's voice (Vocal).

반주음악(MR_L,MR_R)은 음반 제작시 소정의 악기로 연주한 반주 음악으로 스테레오 음향 효과를 얻기 위하여 각 채널에는 서로 다른 악기의 반주 음악이 내재되는 것이 일반적이며, 경우에 따라서 특정 악기의 반주 음악은 좌우 양 채널에 동일하게 레코딩될 수 있다. 한편, 본 발명에서 가수의 음성(Vocal)은 좌우 양채널에 동일하게 레코딩된고 가정한다. Accompaniment music (MR _L , MR _R ) is the accompaniment music played by a certain instrument during the production of the record. In order to obtain stereo sound effects, accompaniment music of different instruments is generally embedded in each channel. Accompaniment music can be recorded equally in both the left and right channels. On the other hand, it is assumed in the present invention that the voice of the singer (Vocal) is the same recorded in both the left and right channels.

한편, 본 발명에서는 기술적 이해의 편의 및 용어 사용의 일관성을 위하여 이하에서 반주 음악(MR_L,MR_R)은 비보컬 신호로, 가수의 음성(Vocal)은 보컬 신호로, 좌우 각 채널에 실린 음원인 반주 음악과 가수의 음성은 각각 제 1 및 제 2 음원으로 표현하기로 한다. Meanwhile, in the present invention, for convenience of technical understanding and consistency of terminology, the accompaniment music (MR _L , MR _R ) is a non-vocal signal, the voice of the singer (Vocal) is a vocal signal, and the sound source is loaded on each channel. The accompaniment music and the voice of the singer will be represented by the first and second sound sources, respectively.

따라서, 본 명세서에서 제 1 음원은 좌채널상의 반주음악(MR_L)과 가수의 음성(Vocal)을 통칭한 표현이고, 제 2 음원은 우채널상의 반주음악(MR_R)과 가수의 음성(Vocal)을 통칭한 표현이라고 이해하는 것이 바람직하다. Therefore, in the present specification, the first sound source is a collective expression of accompaniment music (MR _L ) on the left channel and the voice of the singer (Vocal), and the second sound source is the accompaniment music (MR _R ) on the right channel and the voice of the singer (Vocal). It is desirable to understand) as a generic expression.

참고로, 본 발명에서 적용되는 제 1 및 제 2 음원은 디지털 신호를 의미한다. For reference, the first and second sound sources applied in the present invention mean a digital signal.

따라서, 아날로그 음원인 경우, 디지털 신호로 변환된 후에는 본 명세서에서 설명되는 본 발명의 기술적 사상이 동일하게 적용 가능하다. Therefore, in the case of an analog sound source, after being converted into a digital signal, the technical idea of the present invention described herein is equally applicable.

이하 도면을 참조하여 본 발명의 기술적 사상을 보다 구체적으로 설명한다.Hereinafter, the technical spirit of the present invention will be described in more detail with reference to the accompanying drawings.

도 1의 단계(S120)에 도시된 바와같이, 본 발명에서는 좌우 양채널을 통하여 전송된 제 1 및 제 2 음원에 대한 차신호 처리 과정(130)을 수행한다. As shown in step S120 of FIG. 1, the present invention performs the difference signal processing process 130 for the first and second sound sources transmitted through the left and right channels.

차신호 처리결과 양 채널에 공통되는 성분인 보컬 성분은 제거되고 비보컬 신호(MR_L,MR_R)의 차신호(MR_L-MR_R)만 잔존하게 된다. As a result of the difference signal processing, the vocal component, which is a component common to both channels, is removed, and only the difference signals MR _L -MR _{R of the} non-vocal signals MR _L and MR _R remain.

다음 단계(S130)에서, 제 1 음원(MR_L+Vocal)과 제 2 음원(MR_R+Vocal) 및 상기 차신호 (MR_L-MR_R) 각각에 대하여 FFT(Fast Fourier Transform), DCT(Discrete Cosine Transform) 등과 같은 일반적인 주파수 변환 변환을 수행하여 주파수 성분을 추출한다.In a next step S130, Fast Fourier Transform (FFT) and DCT (Discrete) for the first sound source (MR _L + Vocal), the second sound source (MR _R + Vocal), and the difference signal (MR _L -MR _R ), respectively. Frequency components, such as Cosine Transform, are performed to extract frequency components.

상기 FFT 처리에 의하여 제 1 음원(MR_L+Vocal)은 비보컬 성분(MR_L)과 보컬 성분(Vocal)의 주파수 성분을 모두 포함하는 주파수 신호(M_L _{_} _FFT)로 변환되고, 제 2 음원(MR_R+Vocal)은 비보컬 성분(MR_R)과 보컬 성분(Vocal)의 주파수 성분을 모두 포함하는 주파수 신호(M_R _{_} _FFT)로 변환되고, 차신호(MR_L-MR_R)는 비보컬 신호(MR_L)에서 비보컬 신호(MR_R)를 차감한 신호에 대한 주파수 신호(M_D _{_} _FFT)로 변환된다.By the FFT process, the first sound source MR _L + Vocal is converted into a frequency signal M _L _{_} _FFT including both a non-vocal component MR _L and a frequency component of the vocal component Vocal, and a second sound source. (MR _R + Vocal) is converted into a frequency signal (M _R _{_} _FFT ) including both the non-vocal component (MR _R ) and the frequency component of the vocal component (Vocal), and the difference signal (MR _L -MR _R ) is non- The vocal signal MR _L is converted into a frequency signal M _D _{_} _FFT for a signal obtained by subtracting the non-vocal signal MR _R.

다음, 이들 각 신호에 대한 주파수 에너지를 비교하는 단계가 수행되며 이는 다음과 같은 식으로 표현된다. Next, a step of comparing the frequency energy for each of these signals is performed, which is expressed as follows.

[식 1][Equation 1]

주파수 신호(M_L _{_} _FFT)의 주파수 빈 에너지＞ α* 차신호(MR_L-MR_R)의 주파수 빈 에너지Frequency bin energy of frequency signal (M _L _{_} _FFT )> Frequency bin energy of α * difference signal (MR _L -MR _R )

[식 2][Equation 2]

주파수 신호(M_R _{_} _FFT)의 주파수 빈 에너지＞ α* 차신호(MR_L-MR_R)의 주파수 빈 에너지Frequency bin energy of frequency signal (M _R _{_} _FFT )> Frequency bin energy of α * difference signal (MR _L -MR _R )

여기서, α는 바람직하게는 1보다 큰 유리수이며, 실험에 의하여 유효한 비보컬 신호 추출을 위한 해서는 α의 범위는 대략 1.5 ＜α＜2.5 이었다. 그러나, α의 범위는 음반 제작사에 의하여 최초 레코딩되는 음원의 특성에 따라 상기 범위를 벗어날 수 있으며 α 수치 범위의 단순 변경은 본 발명의 기술적 사상에 당연히 포함된다. Here, α is preferably a rational number greater than 1, and the range of α for the effective vocal signal extraction by experiment was approximately 1.5 <α <2.5. However, the range of α may deviate from the above range depending on the characteristics of the sound source originally recorded by the record producer, and a simple change of the α numerical range is naturally included in the technical idea of the present invention.

식 1 에서, 주파수 신호(M_L _{_} _FFT)의 각 주파수 빈(frequency bin) 에너지가 이에 대응하는 α* 차신호(MR_L-MR_R)의 각 주파수 빈 에너지보다 큰 경우, 이에 대응하는 주파수 신호(M_L _{_} _FFT) 주파수 빈은 보컬 신호의 에너지가 내재되어 있고 또한 보컬 신호의 에너지기 크다고 판단하여 그에 대응하는 해당 주파수 신호(M_L _{_} _FFT)의 주파수 성분을 제거한다. 즉, 해당 주파수 빈을 제로로 초기화한다. 이러한 과정을 모든 주파수 빈에 대하여 개별적으로 수행하면 최종적으로 주파수 신호(M_L _{_} _FFT)의 주파수 성분에는 보컬 신호가 제거되는 효과가 발생한다. In Equation 1, when each frequency bin energy of the frequency signal M _L _{_} _FFT is greater than each frequency bin energy of the corresponding α * difference signal MR _L -MR _R , the corresponding frequency signal (M _L _{_} _FFT ) The frequency bin determines that the energy of the vocal signal is inherent and that the energy of the vocal signal is large and removes the frequency component of the corresponding frequency signal M _L _{_} _FFT . That is, the frequency bin is initialized to zero. Performing this process individually for all frequency bins has the effect that the vocal signal is removed in the frequency component of the frequency signal M _L _{_} _FFT .

마찬가지로, 식 2에 의하여 식 1과 동일한 과정이 즉, 주파수 신호(M_R _{_} _FFT)의 각 주파수 빈(frequency bin) 에너지가 이에 대응하는 α* 차신호(MR_L-MR_R)의 각 주파수 빈 에너지보다 큰 경우, 이에 대응하는 주파수 신호(M_R _{_} _FFT) 주파수 빈은 보컬 신호의 에너지가 내재되어 있고 또한 보컬 신호의 에너지기 크다고 판단하여 그에 대응하는 해당 주파수 신호(M_L _{_} _FFT)의 주파수 성분을 제거한다. 즉, 해당 주파수 빈을 제로로 초기화한다. 이러한 과정을 모든 주파수 빈에 대하여 개별적으로 수행하면 최종적으로 주파수 신호(M_R _{_} _FFT)의 주파수 성분에는 보컬 신호가 제거되는 효과가 발생한다. Similarly, according to Equation 2, the same process as that of Equation 1, i.e., each frequency bin energy of the frequency signal M _R _{_} _FFT corresponds to the α * difference signal MR _L -MR _R corresponding thereto. If it is larger than the energy, the corresponding frequency signal M _R _{_} _FFT determines that the energy of the vocal signal is inherent and that the energy of the vocal signal is large, and thus the frequency of the corresponding frequency signal M _L _{_} _FFT . Remove the ingredients. That is, the frequency bin is initialized to zero. Performing this process individually for all frequency bins has the effect that the vocal signal is removed in the frequency component of the frequency signal M _R _{_} _FFT .

다음 단계(S160)에서 알 수 있듯이, 식 1과 2의 조건을 만족하지 않는 주파수 빈은 비보컬 신호 성분이 우세하다고 보아 잔존시킨다.As can be seen in the next step (S160), the frequency bin that does not satisfy the conditions of equations 1 and 2 is left as the non-vocal signal component is predominant.

따라서 식 1과 식2의 처리 과정이 완료되면 보컬 신호의 성분이 큰 주파수 빈 성분은 제거되고 보컬 신호의 성분이 약하거나 없는 주파수 빈 성분은 잔존하게 된다. Therefore, when the processing of Equation 1 and Equation 2 is completed, the frequency bin component having a large vocal signal component is removed and the frequency bin component having a weak or no component of the vocal signal remains.

다음 단계(S170)에서 식 1과 식2의 처리 과정을 거친 주파수 성분에 대하여 시간영역 변환(Inverse FFT, Inverse DCT 등)을 하면 제 1 음원(MR_L+Vocal)과 제 2 음원(MR_R+Vocal)에서 보컬 신호(Vocal)가 제거된 효과를 얻을 수 있다. In the next step (S170), if the time domain transformation (Inverse FFT, Inverse DCT, etc.) is performed on the frequency components processed through the equations 1 and 2, the first sound source (MR _L + Vocal) and the second sound source (MR _R + It is possible to obtain an effect of removing the vocal signal (Vocal) from Vocal.

따라서 본 발명에 따른 방법을 실시하는 경우 제 1 음원의 비보컬 신호인 반주음악(MR_L)과 제 2 음원의 비보컬 신호인 반주음악(MR_R)만을 추출하여 청취할 수 있다는 이점이 있다.Therefore, when the method according to the present invention is implemented, there is an advantage that only the accompaniment music MR _{L which} is the non-vocal signal of the first sound source and the accompaniment music MR _{R which} is the non-vocal signal of the second sound source can be extracted and listened to.

다만, 본 발명에서 제안하는 스테레오 음원으로부터의 비보컬 신호 추출 방법을 실시하더라도 음반 제작시 레코딩한 반주음악을 100% 추출할 수는 없다. 다만, 실험에 의한 결과, 원래의 반주음악과 본 발명의 실시에 의하여 최종적으로 추출한 반주음악을 개별적으로 청취한 결과 청감의 차이를 크게 느낄 수가 없었으며 이는 본 발명에서 구현하고자 한 기술적 사상의 최종 목적에도 부응하였다.However, even if the non-vocal signal extraction method from the stereo sound source proposed in the present invention is implemented, 100% of the accompaniment music recorded during the production of the recording cannot be extracted. However, as a result of the experiment, as a result of separately listening to the original accompaniment music and the final accompaniment music extracted by the implementation of the present invention, the difference in hearing could not be felt significantly, which is the final purpose of the technical idea to be implemented in the present invention. It also responded to.

도 2a 내지 도 2f는 본 발명의 기술적 사상인 스테레오 음원으로부터의 비보컬 신호 추출 방법의 보다 가시적으로 설명하기 위한 도면이다.2A to 2F are views for more visually explaining a method of extracting a non-vocal signal from a stereo sound source, which is a technical idea of the present invention.

도 2a는 제 1 음원의 주파수 스펙트럼의 개념도이고, 도 2b는 제 2 음원의 주파스 스펙트럼 개념도이다.2A is a conceptual diagram of a frequency spectrum of a first sound source, and FIG. 2B is a conceptual diagram of a frequency spectrum of a second sound source.

도 2c는 도 2a 및 도 2b에 도시한 제 1 음원과 제 2 음원간 차신호의 주파수 스펙트럼 개념도이고, 도 2d는 도 2c의 주파수 스펙트럼을 α배한 주파수 스펙트럼의 개념도이다.FIG. 2C is a conceptual diagram of the frequency spectrum of the difference signal between the first sound source and the second sound source shown in FIGS. 2A and 2B, and FIG. 2D is a conceptual diagram of the frequency spectrum multiplied by the frequency spectrum of FIG. 2C.

도 2e는 도 2a의 주파수 스펙트럼과 도 2d의 주파수 스펙트럼의 에너지를 비교하여 보컬 성분이 포함되어 있다고 판단된 주파수 빈을 제거한 후의 주파수 스펙트럼이고, 도 2f는 도 2b의 주파수 스펙트럼과 도 2d의 주파수 스펙트럼의 에너지를 비교하여 보컬 성분이 포함되어 있다고 판단된 주파수 빈을 제거한 후의 주파수 스펙트럼 개념도이다.FIG. 2E is a frequency spectrum after removing a frequency bin determined to include vocal components by comparing the energy of the frequency spectrum of FIG. 2A with the frequency spectrum of FIG. 2D, and FIG. 2F is the frequency spectrum of FIG. 2B and the frequency spectrum of FIG. 2D. This is a conceptual diagram of the frequency spectrum after removing the frequency bins determined to include the vocal component by comparing the energies.

도 2e와 도 2f에서 알 수 있듯이, 본 발명에 따른 비보컬 신호 추출 방법을 사용한 경우 제 1 및 제 2 음원에 포함된 비보컬 신호 성분의 주요 성분을 추출할 수 있으며, 이를 시간 영역으로 변환한 후 실제 실험실에서 청취하여 본 결과 제 1 및 제 2 음원에 포함된 비보컬 신호와 상기 도 2e 및 도 2f의 주파수 성분을 시간 영역으로 변환하여 청취한 비보컬 신호의 음감에는 큰 차이가 나지 않다는 것을 알 수 있었다.As shown in FIG. 2E and FIG. 2F, when the non-vocal signal extraction method according to the present invention is used, the main components of the non-vocal signal components included in the first and second sound sources may be extracted and converted into the time domain. As a result of listening in a real laboratory, the sound of the non-vocal signal included in the first and second sound sources and the frequency component of FIGS. Could know.

본 발명에 따른 스테레오 음원으로부터의 비보컬 신호 추출 방법을 컴퓨터를 포함한 다양한 음향 기기에 적용하는 경우 가수의 음성이 제거된 상태의 반주 음악을 효율적으로 감상할 수 있다.When the method of extracting a non-vocal signal from a stereo sound source according to the present invention is applied to various sound apparatuses including a computer, it is possible to efficiently enjoy the accompaniment music in which the singer's voice is removed.

이러한 본 발명의 실시는 음향기기내에 프로그램화되어 처리되는 것이 일반적이므로 노래 반주기 외에도 컴퓨터, 휴대폰, MP3 기기 등 본 발명의 실시를 가능하게 하는 프로그램을 다운로드 받을 수 있는 다양한 전자적 매체에도 적용 가능하다는 점에서 매우 간편하며 특히 스마트 폰용 어플리케이션으로도 유용하다.Since the implementation of the present invention is generally programmed and processed in an acoustic device, it can be applied to various electronic media capable of downloading a program enabling the implementation of the present invention, such as a computer, a mobile phone, and an MP3 device, in addition to a half cycle. It's very simple and especially useful for smartphone applications.

Claims

In the non-vocal signal extraction method from a stereo sound source,
(a) receiving a first sound source mixed with a vocal signal and a first non-vocal signal through a first channel, and receiving a second sound source mixed with the vocal signal and a second non-vocal signal through a second channel ;
(b) removing the vocal signal by generating a difference signal between the first and second sound sources;
(c) obtaining respective frequency components of the first and second sound sources and difference signals;
(d) comparing α times the energy of each frequency component of the first and second sound sources with the energy of the frequency component of the difference signal (a rational number of α>1);
(e) When the energy of each frequency component of the first and second sound sources is greater than α times the frequency component energy of the difference signal, the vocal signal is present in the frequency components of the corresponding first and second sound sources. And determining that the first and second vocal signals do not exist if they are smaller than α times the frequency component energy of the difference signal.

The method of claim 1,
(f) In step (e), if it is determined that a vocal signal exists in the frequency components of the first and second sound sources, the corresponding frequency components of the first and second sound sources are removed, and the first and second sound sources are removed. And preserving the frequency component when it is determined that the vocal signal does not exist in the frequency component of the second sound source.

The method of claim 2,
and (g) converting the frequency components of the first and second sound sources subjected to step (e) into a time domain by performing a frequency inverse conversion process.