KR0123274B1

KR0123274B1 - User's taste compensation type in psychological sound model

Info

Publication number: KR0123274B1
Application number: KR1019920027195A
Authority: KR
Inventors: 강철석; 윤정식
Original assignee: 김주용; 현대전자산업 주식회사
Priority date: 1992-12-31
Filing date: 1992-12-31
Publication date: 1997-11-17
Also published as: KR940017879A

Abstract

Disclosed is a psychology acoustic model for compensation of user's appetite within a high definition television(HDTV) or a digital audio compressor. The model comprises a maximum value detector(21), a frequency analyzer(22), an user's appetite applier(25), a frequency distribution deformer(26), a hearing characteristic applier(23) and a bit assigner(24). The user's appetite applier(25) outputs a frequency distribution deforming coefficient according to an user's appetite input. The frequency distribution deformer(26) processes the audio signal whose frequency is according to the frequency distribution deforming coefficient. Thereby, the voice signal can be coded according to the user's individual appetite.

Description

User preference reward psychoacoustic model

제1도는 일반적인 뮤지캠 엔코더 구성도.1 is a diagram of a general MUSCAM cam encoder.

제2도는 제1도의 심리음향 모델에 대한 상세 구성도.2 is a detailed block diagram of the psychoacoustic model of FIG.

제3도는 본 발명 사용자 기호 보상형 심리음향 모델의 구성도이다.3 is a block diagram of the user preference compensation psychoacoustic model of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 서브밴드 분석 필터 20 : 심리음향 모델10: subband analysis filter 20: psychoacoustic model

30 : 양자화기 40 : 프레임 포메터30: quantizer 40: frame formatter

21 : 최대치 검출기 22 : 주파수 분석기21: maximum detector 22: frequency analyzer

23 : 청각 특성 적용기 24 : 비트 할당기23: Auditory characteristic applicator 24: Bit allocator

25 : 사용자 기호 적용기 26 : 주파수 분포 변형기25: user symbol applicator 26: frequency distribution transducer

본 발명은 고선명 텔레비젼(High Definition TeleVision 이하 HDTV라 칭함) 및 기타 디지털 오디오 압축기기 등에 사용되는 심리음향 모델에 관한 것으로, 특히 인간의 청각특성에 근거한 심리음향 모델에 사용자의 개별취향 및 오디오장르등에 따른 주파수 대역 특성을 보상한 사용자 기호 보상형 심리음향 모델에 관한 것이다.The present invention relates to psychoacoustic models used in high-definition television (hereinafter referred to as High Definition TeleVision HDTV) and other digital audio compressors. The present invention relates to a user preference compensation psychoacoustic model that compensates frequency band characteristics.

심리음향 모델을 사용하는 대표적인 디지털 오디오 압축기기는 뮤지캠(MUSICAM)의 엔코더를 들 수 있다. 뮤지캠의 엔코더에서 사용되는 데이터압축기기는 제1도에 도시된 바와 같이, 입력된 오디오 신호를 주파수 대역성분으로 분해하는 서브밴드 분석필터(10)와 ; 이 서브밴드 분석필터(10)에서 출력된 분석결과의 양자화율을 결정하는 심리음향 모델(20)과 ; 이 심리음향 모델(20)에서 결정된 양자화 제어신호에 의해 상기 서브밴드 분석필터(10)에서 출력된 각 대역신호를 적응 양자화하는 양자화기(30)와 ; 이 양자화기(30)에 의해 압축된 데이터를 전송에 맞는 형태로 합성하는 프레임 포메터(40)로 구성된다.A representative digital audio compressor using a psychoacoustic model is an encoder of MUSICAM. As shown in FIG. 1, the data compression device used in the encoder of the music cam includes a subband analysis filter 10 for decomposing an input audio signal into frequency band components; A psychoacoustic model 20 for determining a quantization rate of the analysis result output from the subband analysis filter 10; A quantizer 30 for adaptive quantization of each band signal output from the subband analysis filter 10 by the quantization control signal determined by the psychoacoustic model 20; The frame formatter 40 synthesizes the data compressed by the quantizer 30 into a form suitable for transmission.

이와 같이 구성된 엔코더는, 먼저 오디오신호(ㄱ)가 서브밴드 분석필터(10)와 심리음향 모델(20)에 입력되는 서브밴드 분석필터(10)에서는 입력된 오디오신호(ㄱ)를 이용하여, 예를 들어 32개의 주파수대역성분 (ㄴ)으로 분해하고, 이와 병행하여 입력된 오디오신호(ㄱ)에 대하여 주파수성분을 분석하여, 그 결과를 심리음향 모델(20)과 양자화기(30)로 출력한다.In the encoder configured as described above, the audio signal a is first inputted to the subband analysis filter 10 and the psychoacoustic model 20 by using the input audio signal a at the subband analysis filter 10. For example, the frequency component is decomposed into 32 frequency band components (b), and the frequency components are analyzed in parallel with the input audio signal a, and the results are output to the psychoacoustic model 20 and the quantizer 30. .

심리음향 모델(20)에서는 오디오신호(ㄱ)와 서브밴드 분석필터(10)의 출력신호를 입력받아 인간의 청각특성에 근거하여 서브밴드 분석필터(10)에서의 분석결과의 양자화율을 결정하여 각 주파수대역성분 (ㄴ)의 양자화제어 신호(ㄷ)를 만들어 양자화기로 출력하며, 양자화기(30)에서는 서브밴드 분석필터(10)로부터 입력되는 각 대역신호(ㄴ)을 심리음향 모델(20)에서 입력되는 양자화제어신호(ㄷ)에 의해 적응양자화하여 데이터를 압축한다.The psychoacoustic model 20 receives the audio signal a and the output signal of the subband analysis filter 10 and determines the quantization rate of the analysis result in the subband analysis filter 10 based on the human auditory characteristics. A quantization control signal (c) of each frequency band component (b) is generated and output to a quantizer, and the quantizer 30 outputs each band signal (b) input from the subband analysis filter 10 to the psychoacoustic model 20. The data is compressed by adaptive quantization based on the quantization control signal (c) inputted from.

프레임 포메터(40)에서는 양자화기(30)에서 양자화되어 입력되는 각 주파수대역성분(ㄹ) 및 기타 제어정보를 전송에 맞는 형태로 합성하여 전송용 비트스트림(ㅇ)을 만든다. 전송되는 데이터는 양자화된 각 주파수대역신호 및 양자화정보 등의 제어신호이다.The frame formatter 40 synthesizes each frequency band component (d) and other control information input and quantized by the quantizer 30 into a form suitable for transmission, thereby creating a bitstream for transmission. The transmitted data are control signals such as quantized frequency band signals and quantization information.

종래 상기 엔코더에서 사용되는 심리음향 모델(20)의 상세 블럭도는 제2도에 도시된 바와 같이, 서브밴드 분석필터(10)로부터 입력되는 입력신호(a)에 대하여 32개의 대역의 주파수대역성분에서 단위 프레임에서의 최대치(c)를 각각 검출하는 최대치 검출기(21)와 ; 오디오 원신호 입력(ㄱ)인 입력신호(b)로부터 예를들어 고속푸리에변환(Fast Fourier Transform 이하 FET라 칭함) 등의 방법으로 주파수성분(d)를 구하는 주파수 분석기(22)와 ; 상기 (c),(d)의 신호에 대하여 인간의 청각특성에 근거한 분석을 실시하여 32개의 주파수대역에 대하여 입력 오디오신호(b)에 의하여 들을 수 없게 되는 에너지 즉, 허용가능한 노이즈량(e)을 계산하는 청각특성 적용기(23)와 ; 32개의 주파수대역에 대하여 각각 허용 노이즈량(e)과 비트할당에 의한 양자화 노이즈를 비교하여 양자화 노이즈가 들리지 않도록 전송채널의 데이터 전송율 등에 대하여 제한되는 최대의 비트수 한도이내에서 32개의 주파수대역에 비트를 할당하는 비트 할당기(24)로 구성된다.The detailed block diagram of the psychoacoustic model 20 used in the conventional encoder is a frequency band component of 32 bands for the input signal (a) input from the subband analysis filter 10, as shown in FIG. A maximum detector 21 for respectively detecting a maximum c in a unit frame at < RTI ID = 0.0 >; A frequency analyzer 22 for obtaining a frequency component d from, for example, a fast Fourier transform (hereinafter, referred to as a FET) from an input signal (b) which is an audio original signal input (a); The signals of (c) and (d) are analyzed based on the human auditory characteristics, and thus energy that cannot be heard by the input audio signal (b) in 32 frequency bands, i.e., the allowable amount of noise (e) An auditory characteristic applicator 23 for calculating a value; By comparing the allowable noise amount (e) and the quantization noise by bit allocation for each of the 32 frequency bands, bits in the 32 frequency bands within the maximum number of bits limited by the data rate of the transmission channel so that the quantization noise is not heard. It consists of a bit allocator (24) for assigning.

이와 같이 구성된 심리음향 모델의 입력신호(a)는 제1도의 서브밴드 분석필터(10)의 출력(ㄴ)으로서 신호처리단위인 오디오프레임에 대한 32개의 대역의 주파수대역성분이고, 입력신호(b)는 제1도의 오디오 원신호 입력(ㄱ)이다. 제2도의 심리음향 모델에서 최대치 검출기(21)는 입력신호(a)의 단위프레임에서의 최대치(c)를 검출하고, 이와 병행하여 주파수 분석기(22)는 예를 들어 고속푸리에변환(FET)등의 방법으로 주파수성분(d)을 구한다. 청각특성 적용기(23)에서는 상기 (c),(d)의 신호에 대하여 인간의 청각특성에 근거한 분석을 실시하여 32개의 주파수대역에 대하여 입력 오디오신호(b)에 의하여 들을 수 없게 되는 에너지 즉, 허용가능한 노이즈량(e)을 계산한다. 마지막으로 비트 할당기(24)은 32개의 주파수대역에 대하여 각각 허용노이즈량(e)과 비트할당에 의한 양자화노이즈를 비교하여 양자화노이즈가 들리지 않도록 전송채널의 데이터전송율 등에 의하여 제한되는 최대의 비트수 한도 이내에서 32개의 주파수대역에 비트를 할당하여 32개의 주파수대역에 대한 비트할당정보(f)를 만든다.The input signal (a) of the psychoacoustic model configured as described above is a frequency band component of 32 bands for the audio frame, which is a signal processing unit, as the output (b) of the subband analysis filter 10 of FIG. Is the audio source signal input a of FIG. In the psychoacoustic model of FIG. 2, the maximum detector 21 detects the maximum value c in the unit frame of the input signal a, and in parallel, the frequency analyzer 22, for example, a fast Fourier transform (FET) or the like. The frequency component (d) is obtained by the following method. In the auditory characteristic applicator 23, the signals of (c) and (d) are analyzed based on the human auditory characteristics, and energy that cannot be heard by the input audio signal (b) for 32 frequency bands, namely Calculate the allowable noise amount e. Finally, the bit allocator 24 compares the allowable noise amount (e) and the quantization noise by bit allocation with respect to 32 frequency bands, respectively, so that the maximum number of bits limited by the data rate of the transmission channel and the like so that the quantization noise is not heard. Bit allocation information (f) for 32 frequency bands is generated by allocating bits to 32 frequency bands within the limit.

그러나, 이와같은 심리음향 모델에서 사용된 종래의 청각특성적용기의 인간청각특성은 개별특성이 아니고 많은 실험치를 모델링한 평균특성이 적용되는 것이므로, 사용자의 기호에 따른 개별적인 청각특성 및 오디오신호의 음악장르 등의 개별특성에 대해서는 적절히 대응하지 못하는 단점이 있다.However, since the human auditory characteristics of the conventional auditory characteristic applicator used in such a psychoacoustic model are not individual characteristics but average characteristics modeled by many experimental values, the individual auditory characteristics and the music of the audio signal according to the user's preference are applied. There is a disadvantage in that it does not adequately respond to individual characteristics such as genres.

본 발명은 이와 같은 종래의 문제점을 감안하여, 사용자의 취향에 맞는 향상된 음질을 얻을 수 있도록 함을 특징으로 한다. 즉, 종래의 뮤지캠에서 사용되는 심리음향 모델로 서브밴드 분석결과의 양자화율을 결정하는데 있어서 사용자의 개별적인 취향 및, 오디오신호의 음악장르 등의 개별특성 등을 추가함으로서, 보다 사용자의 기호에 맞는 음성신호의 코딩이 가능하도록 한 것이다.The present invention is characterized in that it is possible to obtain an improved sound quality according to the user's taste, in view of such a conventional problem. In other words, in determining the quantization rate of the subband analysis result using the psychoacoustic model used in the conventional music cam, the user's individual taste and individual characteristics such as the music genre of the audio signal are added to meet the user's taste. The coding of the voice signal is made possible.

이하 도면을 참조하여 상세히 설명하면 다음과 같다.When described in detail with reference to the drawings as follows.

본 발명에서 제한하는 사용자 기호보상형 심리음향 모델방식의 실시예를 제3도에 나타낸다. 서브밴드 분석필터(10)로부터 입력되는 입력신호(a)에 대하여 32개의 대역의 주파수대역 성분을 단위 프레임에서의 최대치를 (c)를 각각 검출하는 최대치검출기(21)와 ; 오디오 원신호 입력(ㄱ)인 입력신호(b)로부터 예를 들어 고속푸리에변환(FET) 등의 방법으로 주파수성분(d)를 구하는 주파수 분석기(22)와 ; 여러가지 형태의 오디오 전 주파수대역의 주파수분포특성을 기록하는 메모리로서, 사용자 기호선택입력(g)에 따라 사용자의 취향에 따른 주파수분포 변형 계수(h)를 출력하는 사용자 기호적용기(25)와 ; 상기 주파수 분석기(22)와 사용자 기호적용기(25)의 출력단에 접속되어 주파수성분(d)와 주파수 분포 변형 계수(h)를 곱하여 입력 오디오 신호를 사용자의 취향에 맞도록 변형된 주파수성분(i)을 출력하는 주파수 분포 변형기(26)와 ; 상기 최대치검출기 (21)와 주파수 분포 변형기(26)의 출력단에 접속되어 상기 (c),(i)의 신호에 대하여 인간의 청각 특성에 근거한 분석을 실시하여 32개의 주파수대역에 대하여 입력 오디오신호(b)에 의하여 들을 수 없게 되는 에너지 즉, 하용가능한 노이즈량(e)을 계산하는 청각특성 적용기(23)와 ; 상기 청각특성 적용기(23)의 출력단에 접속되어 32개의 주파수대역에 대하여 각각 허용 노이즈량(e)과 비트할당에 의한 양자화 노이즈를 비교하여 양자화 노이즈가 들리지 않도록 전송 채널의 데이터전송율 등에 의하여 제한되는 최대의 비트수 한도이내에서 32개의 주파수대역에 비트를 할당하는 비트 할당기(24)로 구성한다.3 shows an embodiment of a user preference compensating psychoacoustic model system limited in the present invention. A maximum value detector 21 for detecting a maximum value in unit frames of 32 bands of frequency band components with respect to an input signal a inputted from the subband analysis filter 10; A frequency analyzer 22 for obtaining a frequency component d from, for example, a fast Fourier transform (FET) or the like from an input signal (b) which is an audio original signal input (a); A memory for recording the frequency distribution characteristics of various types of audio frequency bands, comprising: a user preference applicator 25 for outputting a frequency distribution distortion coefficient h according to a user's preference according to a user preference selection input g; The frequency component (i) connected to the output of the frequency analyzer 22 and the user symbol applicator 25 and modified by multiplying the frequency component (d) by the frequency distribution distortion coefficient (h) to fit the user's taste (i) And a frequency distribution transducer 26 for outputting; It is connected to the outputs of the maximum detector 21 and the frequency distribution transformer 26, and analyzes the signals of (c) and (i) based on the human auditory characteristics. an auditory characteristic applicator 23 for calculating an energy that is inaudible by b), i.e., the amount of available noise e; It is connected to the output terminal of the auditory characteristic applicator 23 and is limited by the data rate of the transmission channel so as not to hear the quantization noise by comparing the allowable noise amount e and the quantization noise by bit allocation for 32 frequency bands, respectively. It consists of a bit allocator 24 which allocates bits to 32 frequency bands within the maximum number of bits.

이와 같이 구성한 본 발명의 작용 및 효과를 상세히 설명하면 다음과 같다.Referring to the operation and effects of the present invention configured as described above in detail.

심리음향 모델의 입력신호(a)는 제1도의 서브밴드 분석필터(10)의 출력(ㄴ)으로 신호처리단위인 오디오프레임에 대한 32개의 대역의 주파수대역성분이고, 입력신호(b)는 제1도의 오디오 원신호 입력(ㄱ)이며, 입력신호(g)는 사용자 기호선택 입력이다. 제3도는 심리음향 모델에서 최대치검출기(21)는 입력신호(a)의 단위프레임에서의 최대치(c)를 검출하고, 이와 병행하여 주파수 분석기(22)는 예를 들어 고속푸리에변환(FET)등의 방법으로 입력신호(b)의 주파수성분(d)를 구한다. 사용자 기호적용기(25)는 여러가지 형태의 오디오 전 주파수대역의 주파수 분포특성을 기록하는 메모리로서, 입력신호(g)에 따라 사용자의 취향에 따른 주파수분포변형 계수(h)를 출력한다. 주파수분포변형기(26)은 주파수성분(d)과 주파수분포변형 계수(h)를 곱하여 미리 사용자의 취향에 맞도록 변형된 주파수성분(i)를 출력한다. 청각특성 적용기(23)에서는 상기 (c),(i)의 신호에 대하여 인간의 청각특성에 근거한 분석을 실시하여 32개의 주파수대역에 대하여 입력 오디오 신호(b)에 의하여 들을 수 없게 되는 에너지 즉, 허용가능한 노이즈량(e)을 계산한다. 마지막으로 비트 할당기(24)는 32개의 주파수대역에 대하여 각각 허용노이즈량(e)과 비트할당에 의한 양자화 노이즈를 비교하여 양자화 노이즈가 들리지 않도록 전송채널의 데이터전송율 등에 의하여 제한되는 최대의 비트수 한도 이내에서 32개의 주파수대역에 비트(f)를 할당한다.The input signal (a) of the psychoacoustic model is a frequency band component of 32 bands for the audio frame which is a signal processing unit to the output (b) of the subband analysis filter 10 of FIG. 1 degree audio source signal input (a), and input signal (g) is user's preference selection input. 3, in the psychoacoustic model, the maximum value detector 21 detects the maximum value c in the unit frame of the input signal a, and in parallel with this, the frequency analyzer 22 performs a fast Fourier transform (FET) or the like. The frequency component d of the input signal b is obtained by the following method. The user preference applicator 25 is a memory for recording frequency distribution characteristics of various types of audio frequency bands, and outputs a frequency distribution distortion coefficient h according to a user's preference according to the input signal g. The frequency distribution modifier 26 multiplies the frequency component d by the frequency distribution strain coefficient h and outputs a frequency component i that has been modified to suit the user's taste in advance. In the auditory characteristics applicator 23, the signals of (c) and (i) are analyzed based on the human auditory characteristics, and energy that cannot be heard by the input audio signal (b) for 32 frequency bands, namely Calculate the allowable noise amount e. Finally, the bit allocator 24 compares the allowable noise amount e and the quantization noise by bit allocation for the 32 frequency bands, respectively, so that the maximum number of bits limited by the data rate of the transmission channel or the like is not heard. Bit f is allocated to 32 frequency bands within the limit.

이상에서 상세히 설명한 바와 같이 본 발명은, 고선명 텔레비젼 및 디지털오디오압축기기에 있어서 개별사용자의 개별적인 취향 및, 오디오신호의 음악장르 등의 개별 특성 등을 추가, 보정함으로서 보다 사용자의 기호에 맞는 즉, 향상된 음질을 얻을 수 있으므로 고품위의 제품생산이 가능한 효과가 있다.As described in detail above, the present invention is improved according to user's taste, that is, improved by adding and correcting individual tastes of individual users and individual characteristics such as music genres of audio signals in high definition television and digital audio compression equipment. Since sound quality can be obtained, high quality products can be produced.

Claims

A maximum value detector 21 for detecting a maximum value c in each unit frame of 32 bands of frequency band components with respect to the input signal a inputted from the subband analysis filter 10; A frequency analyzer 22 for obtaining a frequency component d from, for example, a fast Fourier transform (FET) or the like from an input signal (b) which is an audio original signal input (a); A memory for recording the frequency distribution characteristics of various types of audio frequency bands. The recorded contents can be changed, and the frequency distribution distortion coefficient (h) is output according to the user's taste according to the user's preference selection input (g). A user preference applicator 25 for performing; It is connected to the output terminal of the frequency analyzer 22 and the user preference applicator 25 and multiplies the frequency component (d) by the frequency distribution distortion coefficient (h) to adjust the frequency component (i) to fit the user's taste. Output frequency distribution transducer 26 and: connected to the output terminals of the maximum value detector 21 and the frequency distribution transducer 26 to analyze the signals of (c) and (i) based on human auditory characteristics An auditory characteristic applicator 23 for calculating the energy that cannot be heard by the input audio signal b, i.e., the allowable noise amount e, for the 32 frequency bands; It is connected to the output terminal of the auditory characteristics applicator 23 and is limited by the data rate of the transmission channel so that the quantization noise is not heard by comparing the allowable noise amount e and the quantization noise by bit allocation for 32 frequency bands, respectively. A user preference compensation psychoacoustic model comprising a bit allocator (24) for allocating bits to 32 frequency bands within the maximum number of bits.