KR20090064857A

KR20090064857A - Method for processing an audio, and apparatus for implementing the same

Info

Publication number: KR20090064857A
Application number: KR1020070132215A
Authority: KR
Inventors: 김연정; 김기수; 권준호
Original assignee: 엘지전자 주식회사
Priority date: 2007-12-17
Filing date: 2007-12-17
Publication date: 2009-06-22

Abstract

An audio processing method and a device thereof are provided to store an audio by dividing the audio into a voice section and a non-voice section on a frequency domain, thereby saving computing resources. A receiver(110) receives an audio bit stream. An extractor(120) extracts audio attribute information from the audio bit stream. A voice section identifier(130) identifies whether each section of an audio is a voice section or a non-voice section, based on the audio attribute information. If allocation differences on each channel are small, the corresponding sections are identified as voice sections. If the allocation differences are large, the corresponding sections are identified as non-voice sections.

Description

TECHNICAL FOR PROCESSING AN AUDIO, AND APPARATUS FOR IMPLEMENTING THE SAME}

본 발명은 오디오 처리 방법 및 장치에 관한 것으로서, 더욱 상세하게는 방송 신호 등을 통해 수신된 오디오를 재생하거나 저장할 수 있는 오디오 처리 방법 및 장치에 관한 것이다.The present invention relates to an audio processing method and apparatus, and more particularly, to an audio processing method and apparatus capable of playing or storing audio received through a broadcast signal.

일반적으로, DMB, DAB 등과 같은 디지털 방송을 수신하면서, 비디오 또는 오디오를 재생하는데 그치지 않고, 녹화 또는 녹음하여 저장할 수 있는 기능까지 플레이어에서 제공하고 있다. 이와 같이 오디오를 재생하거나 저장하는 데 있어서, 사용자는 오디오 신호 중에서 음성 구간 또는 음악 구간(비음성 구간)만을 선택하여 추출하는 것을 원할 수 있는데, 이러한 요구에 따라 음성 및 음악 구간을 구분하기 위한 방법으로서, ZCR(Zero Crossing Rate), 에너지(Energy), 피치(Pitch), 멜-캡스트럼(Mel-Cepstrum) 등의 방식이 제안되어 왔다. 그러나 이러한 방법들은 오디오 비트스트림을 디코딩하여 PCM 신호를 생성한 후에, PCM 신호로부터 음성구간 또는 음악 구간을 추출하는 과정을 거쳐야 한다. 즉, 디코딩 과정 및 구간 추출 과정이 모두 수행되어야 하기 때문에, 컴퓨팅 자원이 낭비되는 문제점이 있었다.In general, players receive digital broadcasts such as DMB and DAB, and not only play video or audio, but also record, record, and store in a player. As described above, in playing or storing audio, a user may want to select and extract only a voice section or a music section (non-speech section) from the audio signal. , ZCR (Zero Crossing Rate), Energy, Pitch, Mel-Cepstrum and the like have been proposed. However, these methods must decode the audio bitstream to generate a PCM signal, and then go through a process of extracting a speech or music section from the PCM signal. That is, since both the decoding process and the interval extraction process have to be performed, there is a problem in that computing resources are wasted.

본 발명은 상기와 같은 문제점을 해결하기 위해 창안된 것으로서, 오디오 비트스트림을 수신한 후 PCM 신호를 생성하거나 분석할 필요 없이, 주파수 도메인상에서 오디오를 음성 구간 및 비음성 구간으로 분리할 수 있는 오디오 처리 방법 및 장치를 제공하는데 그 목적이 있다.The present invention has been made to solve the above problems, and audio processing capable of separating audio into speech and non-voice sections in the frequency domain without having to generate or analyze a PCM signal after receiving the audio bitstream. Its purpose is to provide a method and apparatus.

본 발명의 또 다른 목적은 비트스트림에 포함된 오디오 속성 정보를 이용하여 수신된 오디오를 음성 구간 및 비음성 구간으로 분리할 수 있는 오디오 처리 방법 및 장치를 제공하는데 있다.Another object of the present invention is to provide an audio processing method and apparatus capable of separating received audio into a voice section and a non-voice section using audio attribute information included in a bitstream.

상기와 같은 목적을 달성하기 위하여 본 발명에 따른 오디오 처리 방법은 오디오 비트스트림을 수신하는 단계; 상기 오디오 비트스트림으로부터 오디오 속성 정보를 추출하는 단계; 및, 상기 오디오 속성 정보를 근거로 하여, 상기 오디오의 각 구간이 음성 구간인지 비음성 구간인지를 식별하는 단계를 포함한다.In order to achieve the above object, an audio processing method according to the present invention comprises the steps of: receiving an audio bitstream; Extracting audio attribute information from the audio bitstream; And identifying whether each section of the audio is a voice section or a non-voice section based on the audio attribute information.

본 발명에 따르면, 상기 오디오 속성 정보는, 앨로케이션, 스케일 팩터, 샘플값 중 하나 이상을 포함할 수 있다.According to the present invention, the audio attribute information may include one or more of an allocation, a scale factor, and a sample value.

본 발명에 따르면, 상기 식별하는 단계는, 주파수 밴드별로 수행될 수 있다.According to the present invention, the identifying may be performed for each frequency band.

본 발명에 따르면, 상기 오디오 속성 정보는, 채널별 앨로케이션 또는 채널별 샘플값을 포함하고, 상기 식별하는 단계는, 상기 앨로케이션의 채널별 차이 또는 상기 샘플값의 채널별 차이가 상대적으로 작은 구간을 음성 구간으로 식별하고, 상기 앨로케이션 또는 상기 샘플값의 채널별 차이가 상대적으로 큰 구간을 비음성 구간으로 식별할 수 있다.According to the present invention, the audio property information includes a channel-specific location or a channel-specific sample value, and the identifying includes: a section having a relatively small channel-to-channel difference or a channel-specific difference of the sample value. May be identified as a voice interval, and a section having a relatively large difference between channels of the location or the sample value may be identified as a non-voice interval.

본 발명에 따르면, 상기 오디오 속성 정보는, 채널별 스케일 팩터 정보를 포함하고, 상기 식별하는 단계는, 상기 스케일 팩터의 채널별 편차가 작은 구간을 음성 구간으로 식별하고, 상기 스케일 팩터의 채널별 편차가 큰 구간을 비음성 구간으로 식별할 수 있다.According to the present invention, the audio property information includes scale factor information for each channel, and the identifying may include identifying a section having a small deviation for each channel of the scale factor as a voice interval, and for each channel deviation of the scale factor. The larger interval may be identified as a non-voice interval.

본 발명에 따르면, 상기 오디오 속성 정보는, 시간별 스케일 팩터 정보를 포함하고, 상기 식별하는 단계는, 상기 스케일 팩터의 시간적 편차가 큰 구간을 음성 구간으로 식별하고, 상기 스케일 팩터의 시간적 편차가 작은 구간을 비음성 구간으로 식별할 수 있다.According to the present invention, the audio attribute information includes time-scale scale factor information, and the identifying includes identifying a section having a large temporal deviation of the scale factor as a speech section and a section having a small temporal deviation of the scale factor. Can be identified as a non-voice interval.

본 발명에 따르면, 상기 오디오 속성 정보는, 채널별 샘플값을 포함하고, 상기 식별하는 단계는, 상기 샘플값의 채널별 차이가 작은 구간을 음성 구간으로 식별하고, 상기 스케일 팩터가 작은 구간을 비음성 구간으로 식별할 수 있다.According to the present invention, the audio attribute information includes a sample value for each channel, and the identifying may include identifying a section having a small difference between channels of the sample value as a voice section and a section having a small scale factor. It can be identified by the voice interval.

본 발명에 따르면, 상기 식별 결과를 근거로 하여, 상기 오디오 중 음성 구간 또는 비음성 구간 중 하나에 대하여 재생 및 저장 중 하나 이상을 수행하는 단계를 더 포함할 수 있다.According to the present invention, the method may further include performing one or more of reproduction and storage on one of the audio section and the non-voice section of the audio based on the identification result.

본 발명에 따르면, 상기 재생은, 합성 서브밴드 필터를 이용하여 수행될 수 있다.According to the invention, the reproduction can be performed using a synthetic subband filter.

본 발명의 또 다른 측면에 따르면, 오디오 비트스트림을 수신하는 수신부; 상기 오디오 비트스트림으로부터 오디오 속성 정보를 추출하는 추출부; 및, 상기 오디오 속성 정보를 근거로 하여, 상기 오디오의 각 구간이 음성 구간인지 비음성 구간인지를 식별하는 음성구간 식별부를 포함하는 오디오 처리 장치가 제공된다.According to another aspect of the invention, the receiver for receiving an audio bitstream; An extraction unit for extracting audio attribute information from the audio bitstream; And a voice section identification unit for identifying whether each section of the audio section is a voice section or a non-voice section based on the audio attribute information.

본 발명의 일 측면에 따르면, 오디오 비트스트림을 수신하여 PCM 신호를 생성하고 분석하는 과정 없이, 주파수 도메인상에서 오디오를 음성 구간 및 비음성 구간으로 분리하여 저장할 수 있기 때문에, 컴퓨팅 자원이 절약되는 효과가 있다.According to an aspect of the present invention, since the audio can be divided and stored in a voice interval and a non-voice interval in the frequency domain without receiving an audio bitstream and generating and analyzing a PCM signal, the computing resource is saved. have.

본 발명의 다른 측면에 따르면, 오디오 비트스트림에 포함된 속성정보를 이용하여 오디오를 음성 구간 및 비음성 구간으로 분리할 수 있기 때문에, 역시 컴퓨팅 자원이 절약되는 효과가 있다.According to another aspect of the present invention, since audio may be divided into a voice section and a non-voice section using attribute information included in the audio bitstream, there is also an effect of saving computing resources.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 1은 본 발명의 일 실시예에 따른 오디오 처리 장치의 구성을 보여주는 도 면이고, 도 2는 본 발명의 일 실시예에 따른 오디오 처리 방법의 순서를 나타내는 도면이다. 우선 도 1을 참조하면, 본 발명의 일 실시예에 따른 오디오 처리 장치(100)는 수신부(110), 추출부(120), 음성구간 식별부(130), 재생부(140), 및 저장부(150)를 포함한다.1 is a view showing the configuration of an audio processing apparatus according to an embodiment of the present invention, Figure 2 is a view showing the procedure of the audio processing method according to an embodiment of the present invention. First, referring to FIG. 1, an audio processing apparatus 100 according to an exemplary embodiment of the present invention includes a receiver 110, an extractor 120, a voice segment identification unit 130, a playback unit 140, and a storage unit. And 150.

도 1 및 도 2를 참조하면, 우선 수신부(110)는 방송 신호 등을 통해 오디오 비트스트림을 수신한다(S120 단계). 그리고 추출부(120)는, 수신된 오디오 비트스트림을 파싱하여 오디오 속성 정보를 추출한다(S120 단계). 여기서 오디오 속성 정보는, 채널별 주파수별 앨로케이션(α), 채널별 주파수별 스케일 팩터(β), 채널별 주파수별 샘플값(γ)을 포함할 수 있다. 도 3은 본 발명에서 수신하는 비트스트림 구조의 일 예를 보여주는 도면이다. 도 3을 참조하면, 채널별 주파수별 앨로케이션(α)(allocation[ch][sb]), 채널별 주파수별 스케일팩터(β)(scalefactor[ch][sb][]), 채널별 주파수별 샘플값(γ)(sample[ch][sb][])이 포함되어 있음을 알 수 있다. 여기서, 도면에 나타난 바와 같이(if(allocation[ch][sb]!=0{}), 앨로케이션(α)은 스페일팩터(β) 및 샘플값(γ)을 추출하기 위해 이용될 수도 있다.1 and 2, first, the receiver 110 receives an audio bitstream through a broadcast signal (S120). The extraction unit 120 parses the received audio bitstream and extracts audio attribute information (step S120). The audio attribute information may include channel-specific frequency (α), channel-specific frequency scale factor (β), and channel-specific frequency sample value (γ). 3 is a diagram illustrating an example of a bitstream structure received in the present invention. Referring to FIG. 3, each channel-specific frequency α (allocation [ch] [sb]), channel-specific frequency scale factor β, scalefactor [ch] [sb] [], and channel-specific frequency It can be seen that the sample value γ (sample [ch] [sb] []) is included. Here, as shown in the figure (if (allocation [ch] [sb]! = 0 {}), the location α may be used to extract the spacing factor β and the sample value γ. .

다시, 도 1 및 도 2를 참조하면, 음성구간 식별부(130)는 S120 단계에서 추출된 오디오 속성정보를 이용하여, 오디오의 해당 구간이 음성 구간에 해당하는지 비음성 구간에 해당하는지를 식별한다(S130 단계 내지 S160 단계). 구체적으로, 앨로케이션(α)의 채널별 차이가 작은 경우에는(S130 단계의 '예'), 해당 구간을 음성구간으로 식별하고(S170 단계), 그 반대의 경우는 비음성 구간으로 식별한 다(S180 단계). Referring back to FIGS. 1 and 2, the speech section identifying unit 130 identifies whether the corresponding section of the audio corresponds to the speech section or the non-voice section by using the audio attribute information extracted in step S120 ( Step S130 to step S160). In detail, when the difference between channels of the location α is small (YES in step S130), the corresponding section is identified as a voice section (step S170), and vice versa, a non-voice section is identified. (Step S180).

도 4는 앨로케이션의 채널별 차이의 일 예를 나타내는 도면이다. 여기서 앨로케이션(allocation)의 채널별 차이는 예를 들어, 좌측 채널(left channel) 및 우측 채널(right channel)간 할당의 차이를 나타내는 것이다. 도 4를 참조하면, x 축은 재생 시간(playtime)[sec]이고, y축은 좌우측 앨로케이션의 차이(LR difference (allocation value)로서, 음성 구간(SP)과 음악 구간(MU)(MU(pop), MU(라틴음악), MU(보사노바), MU(현악연주), MU(팝-기타반주), MU(아카펠라))을 비교하여 볼 때, 음성 구간(SP)이 어떤 종류의 음악 구간(MU)보다 채널별 차이가 작은 것을 알 수 있다. 이는, 대체적으로 가수, 사회자나 진행자의 음성은 좌우 채널에서 유사한 반면에, 음악의 경우, 좌우 채널에서 유사성이 떨어지는 경향 때문이다. 이러한 특성을 이용하여 S130 단계에서는, 비트스트림에서 추출된 앨로케이션의 채널별 차이로 해당 오디오 구간이 음성구간인지 비음성구간인지를 식별할 수 있다.4 is a diagram illustrating an example of a difference for each channel of an allocation. Here, the channel-specific difference in allocation indicates, for example, the difference in allocation between the left channel and the right channel. Referring to FIG. 4, the x-axis is a play time [sec], and the y-axis is a LR difference (allocation value) between the left and right positions, and the voice section SP and the music section MU (pop) , MU (Latin Music), MU (Bossa Nova), MU (String Performance), MU (Pop-Guitar Accompaniment), MU (Acapella)). This is because the difference between channels is smaller than that of MUs, because the voices of singers, moderators and hosts are generally similar in the left and right channels, while in music, the similarity tends to be inferior in the left and right channels. In operation S130, whether the corresponding audio section is a voice section or a non-voice section may be identified based on the channel-specific difference of the location extracted from the bitstream.

한편, 음성구간 식별부(130)는 S120 단계에서 추출된 스케일팩터(β)의 채널별 차이가 작은지를 판단하고, 스케일팩터(β)의 채널별 차이가 작은 경우(S140 단계의 '예'), 해당 오디오 구간을 음성 구간으로 식별하고(S170 단계), 반대의 경우, 비음성 구간으로 식별한다(S180 단계). On the other hand, the voice interval identification unit 130 determines whether the channel-specific difference of the scale factor β extracted in step S120 is small, and if the channel-specific difference of the scale factor β is small (YES in step S140). In step S170, the corresponding audio section is identified as a voice section.

도 5는 스케일팩터의 채널별 차이의 일 예를 나타내는 도면이다. x 축은 역시 재생 시간(play time)[sec]이고, y 축은 스케일팩터의 좌우 차이(Mean of LR SCF difference)이다. 스케일팩터는 밴드단위로 가장 큰 성분의 크기를 반영하기 위한 값으로서, 디지털 오디오 방송 규격(DAB)의 MUSICAM(MPEG-1 오디오 계층 Ⅱ, MP2) 신택스에도 포함되어 있다. 좌우의 채널 신호들이 서로 유사한 경우, 스케일팩터 또한 채널별로 유사한 값을 가지게 되며, 반대로, 좌우의 채널 신호들이 큰 차이를 가질 경우, 스케일팩터 또한 채널별로 큰 차이를 갖게 된다. 도 5를 참조하면, 음성 구간(SP)의 스케일팩터의 좌우 차이는 어떤 종류의 음악 구간(MP)에서보다 낮은 것을 알 수 있다. 도 5의 아래 도면은 PCM 도메인에서의 RMS(Root Mean Square)이다.5 is a diagram illustrating an example of difference between channels of a scale factor. The x axis is also the play time [sec] and the y axis is the Mean of LR SCF difference. The scale factor is a value for reflecting the size of the largest component in units of bands and is also included in the MUSICAM (MPEG-1 Audio Layer II, MP2) syntax of the digital audio broadcasting standard (DAB). If the left and right channel signals are similar to each other, the scale factor also has a similar value for each channel. On the contrary, if the left and right channel signals have a large difference, the scale factor also has a large difference for each channel. Referring to FIG. 5, it can be seen that the left and right difference of the scale factor of the voice section SP is lower than that of any kind of music section MP. 5 is a root mean square (RMS) in the PCM domain.

한편, 음성구간 식별부(130)는 S120 단계에서 추출된 스케일팩터(β)의 시간적 편차가 큰지 여부를 판단하고(S150 단계), 시간적 편차가 큰 경우(S150 단계의 '예'), 해당 오디오 구간을 음성 구간으로 식별하고(S170 단계). 반대의 경우에는 비음성 구간으로 식별한다(S180 단계). On the other hand, the voice section identification unit 130 determines whether the temporal deviation of the scale factor β extracted in step S120 is large (step S150), and if the temporal deviation is large (Yes in step S150), the corresponding audio The section is identified as a voice section (step S170). In the opposite case, it is identified as a non-voice interval (step S180).

도 6은 스케일팩터의 시간적 편차의 일 예를 나타내는 도면이다. x축은 재생 시간(play time)[sec])이고, y 축은 스케일팩터인데, 도 6의 (A)의 경우 '63-스케일팩터 인덱스', 도 6의 (B)는 '63-스케일팩터 인덱스'의 시간적 편차(variance), 도 6의 (C)는 스케일팩터 시간적 편차의 인터벌을 나타내고 있다. 여기서, 스케일팩터 인덱스(index)란, 스케일팩터(SCF)의 값을 지시하는 지시자로서, MUSICAM의 경우 다음과 같이 정의된다.6 is a diagram illustrating an example of temporal deviation of a scale factor. The x-axis is the play time [sec]) and the y-axis is the scale factor. In FIG. 6A, the '63 -scale factor index 'and FIG. 6B are the '63 -scale factor index' Fig. 6C shows the interval of the scale factor temporal deviation. Here, the scale factor index is an indicator indicating the value of the scale factor SCF, and is defined as follows in the case of MUSICAM.

SCF =

SCF: 스케일팩터SCF: Scale Factor

index: 스케일팩터 인덱스index: scale factor index

스케일팩터 인덱스가 0~62 값을 가지는 경우, 스케일팩터(SCF)는 2-0.00000120155435 값을 가진다. 즉, 스케일팩터 인덱스(index)가 작을수록 스케일팩터(SCF)가 커지는 이른바 반비례관계이다. 따라서 도 6의 (A) 및 (B)에서 '63-스케일팩터 인덱스(index)'값은 스케일팩터(SCF)와 비례관계가 될 수 있으므로, '63-스케일팩터 인덱스(index)'가 큰 경우, 스케일팩터(SCF) 값도 큰 것이라고 할 수 있다. When the scale factor index has a value of 0 to 62, the scale factor SCF has a value of 2-0.00000120155435. In other words, the smaller the scale factor index is, the more inversely proportional the scale factor SCF becomes. Accordingly, since the '63 -scale factor index 'value in FIG. 6A and (B) may be proportional to the scale factor (SCF), when the '63 -scale factor index' is large The scale factor (SCF) value can also be said to be large.

음성 구간의 경우, 음운 사이에 묵음에 가까운 짧은 구간이 존재하기 때문에, 이런 구한은 상대적으로 작은 크기를 가지므로, 이를 표현하기 위해서 스케일팩터 또한 작아지게 된다. 따라서, 음성 구간의 경우, 짧은 구간에서 발생되는 낮은 크기를 표현하기 위해 스케일팩터의 변화폭이 크기 때문에, 즉, 편차(variance)가 커지게 된다. 도 6의 (B)를 참조하면, 음성 구간(SP)에서의 스케일팩터(63-스케일팩터 인덱스)의 편차(variance)가, 음악 구간(MU)에서의 편차보다 매우 큰 것을 알 수 있다. 도 6의 (C)를 참조하면, 음성 구간(SP)에서의 간격(interval)이 음악 구간에서의 간격보다 매우 큰 것을 알 수 있다.In the case of the voice interval, since there is a short interval close to the silence between the phonograms, such a calculation has a relatively small size, so that the scale factor is also reduced to express this. Accordingly, in the case of the voice interval, since the variation of the scale factor is large to express the low magnitude generated in the short interval, that is, the variation is large. Referring to FIG. 6B, it can be seen that the variation of the scale factor 63-scale factor index in the speech section SP is much larger than the deviation in the music section MU. Referring to FIG. 6C, it can be seen that the interval in the voice section SP is much larger than the interval in the music section.

마지막으로, 음성구간 식별부(130)는 S120 단계에서 추출된 샘플값(γ)의 채널별 차이가 큰 지 여부를 판단하고, 채널별 차이가 큰 경우(S160 단계의 '예'), 해당 오디오 구간을 음성 구간으로 식별하고(S170 단계), 그 반대의 경우, 비음성 구간으로 식별한다(S180 단계).Finally, the voice interval identification unit 130 determines whether the channel difference of the sample value γ extracted in step S120 is large, and if the channel difference is large (YES in step S160), the corresponding audio. The section is identified as a voice section (step S170), and vice versa, the section is identified as a non-voice section (step S180).

도 7은 샘플값 채널별 차이의 일 예를 나타내는 도면이다. x축은 재생 시 간(play time[sec])이고, y축은 역양자화된 샘플값의 좌우 차이의 평균값이다. 도 7을 참조하면, 앨로케이션(α)의 채널별 차이, 스케일팩터(β)의 채널별 차이와 마찬가지로, 샘플값(γ)의 채널별 차이 또한, 음성 구간(SP)에서는 매우 낮고, 비음성구간(MU)에서는 상대적으로 높은 것을 알 수 있다. 앨로케이션(α)의 채널별 차이와 샘플값(γ)의 차이는 비례 관계를 가질 수 있는데, 앨로케이션의 채널별 차이 및 샘플값의 채널별 차이의 관계가 도 8에 도시되어 있다.7 is a diagram illustrating an example of difference between sample value channels. The x-axis is the play time [sec] and the y-axis is the average of the left and right differences of the dequantized sample values. Referring to FIG. 7, similar to the channel-to-channel difference of the location α and the channel-to-channel difference of the scale factor β, the channel-to-channel difference of the sample value γ is also very low in the voice interval SP and is non-voice. It can be seen that the interval MU is relatively high. The difference between the channel-specific difference of the allocation α and the sample value γ may have a proportional relationship. The relationship between the channel-specific difference and the channel-specific difference of the sample value is shown in FIG. 8.

이와 같이 음성구간 식별부(130)는 S130 단계 내지 S160 단계를 거쳐서, 해당 오디오 구간이 음성 구간인지 비음성 구간인지 식별하는데, 오디오 속성정보가 주파수 대역별로 존재하는 경우, S130 단계 내지 S160 기재된 식별 과정 또한 주파수 대역별로 수행될 수 있다. 즉, 어떤 오디오 구간 중 어떤 대역이 음성 구간인지, 아니면 비음성 구간인지를 판별할 수 있는 것이다. 또한, S130 단계 내지 S160 단계는 전부 수행될 필요는 없고, 4개의 단계 중 임의의 1개 이상의 단계가 수행될 수 있다.As such, the voice section identification unit 130 identifies whether the corresponding audio section is a voice section or a non-voice section through steps S130 to S160. When audio attribute information exists for each frequency band, the identification process described in steps S130 to S160 is performed. It may also be performed for each frequency band. That is, it is possible to determine which band of which audio section is a voice section or a non-voice section. In addition, the steps S130 to S160 need not be all performed, and any one or more of four steps may be performed.

위와 같이 음성구간 식별부(130)가 음성 구간 및 비음성 구간의 식별을 완료하면, 사용자의 선택에 따라 음성 구간 및 비음성 구간 중 하나가 재생부(140)를 통해 PCM 신호로 재생되거나 또는 저장부(150)에 저장된다(S175 단계 및 S185 단계). 이때 재생부(140)는 합성 서브밴드 필터를 이용하여 PCM 신호를 재생할 수 있다. 예를 들어 사회자의 진행 멘트 등의 음성만을 제외시키고 배경 음악만을 저장하거나, 또는 음악 구간을 삭제하고 음성만을 저장할 수 있다.When the voice section identifier 130 completes the identification of the voice section and the non-voice section as described above, one of the voice section and the non-voice section is reproduced or stored as a PCM signal through the playback unit 140 according to a user's selection. It is stored in the unit 150 (steps S175 and S185). At this time, the reproduction unit 140 may reproduce the PCM signal using the synthesized subband filter. For example, only the background music may be excluded and only the background music may be stored, or the music section may be deleted and only the voice may be stored.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

본 발명은 방송 수신기, 오디오 플레이어 등에 적용될 수 있다.The present invention can be applied to a broadcast receiver, an audio player, and the like.

도 1은 본 발명의 일 실시예에 따른 오디오 처리 장치의 구성도.1 is a block diagram of an audio processing apparatus according to an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 따른 오디오 처리 방법의 순서도.2 is a flowchart of an audio processing method according to an embodiment of the present invention.

도 3은 본 발명에서 수신하는 비트스트림 구조의 일 예를 보여주는 도면.3 is a view showing an example of a bitstream structure received in the present invention.

도 4는 앨로케이션의 채널별 차이의 일 예를 나타내는 도면.4 is a diagram illustrating an example of channel-to-channel difference of location;

도 5는 스케일팩터의 채널별 차이의 일 예를 나타내는 도면.5 is a diagram illustrating an example of difference between channels of a scale factor.

도 6은 스케일팩터의 시간적 편차의 일 예를 나타내는 도면.6 is a diagram illustrating an example of temporal deviation of a scale factor.

도 7은 샘플값 채널별 차이의 일 예를 나타내는 도면.7 is a diagram illustrating an example of difference between sample value channels.

도 8은 앨로케이션의 채널별 차이 및 샘플값의 채널별 차이의 관계를 나타내는 도면.8 is a diagram illustrating a relationship between a channel-specific difference of location and a channel-to-channel difference of sample values.

Claims

Receiving an audio bitstream;

Extracting audio attribute information from the audio bitstream; And,

And identifying whether each section of the audio is a voice section or a non-voice section based on the audio attribute information.

The method of claim 1,

The audio attribute information includes at least one of an allocation, a scale factor, and a sample value.

The method of claim 1,

The identifying may be performed for each frequency band.

The method of claim 1,

The audio attribute information includes channel-specific location or channel-specific sample value,

The identifying may include identifying, as a voice interval, a section having a relatively small channel-to-channel difference or a channel-specific difference of the sample value as a voice section, and a section having a relatively large channel-to-channel difference of the location or sample value. Audio processing method characterized in that it identifies by a non-voice interval.

The method of claim 1,

The audio attribute information includes scale factor information for each channel.

The identifying may include identifying a section having a small channel deviation of the scale factor as a voice section and identifying a section having a large channel variation of the scale factor as a non-voice section.

The method of claim 1,

The audio attribute information includes scale factor information for each time.

The identifying may include identifying a section having a large temporal deviation of the scale factor as a voice section and identifying a section having a small temporal deviation of the scale factor as a non-voice section.

The method of claim 1,

The audio attribute information includes a sample value for each channel,

The identifying may include identifying a section having a small difference between channels of the sample value as a voice section and identifying a section having the small scale factor as a non-voice section.

The method of claim 1,

And based on the identification result, performing one or more of reproduction and storage of one of a voice section and a non-voice section of the audio.

The method of claim 8,

And said reproducing is performed using a synthetic subband filter.

A receiver for receiving an audio bitstream;

An extraction unit for extracting audio attribute information from the audio bitstream; And,

And a speech section identifying unit for identifying whether each section of the audio section is a speech section or a non-voice section based on the audio attribute information.